Wednesday, June 18, 2025

Azure Data Factory Metadata-Driven Pipelines - 1

 I've had a bit of free time work wise lately so figured I'd finally get onto revamping one of our Azure Data Factory (ADF) extract processes. 

This particular one gets data from our Dynamics CRM and exports to the landing stage of our data warehouse. It was stood up in a hurry due to Microsoft retiring the previous, CRM driven, export process, and us internally missing the notifications so that by the time we realised what was happening the drop-dead date was almost upon us.

It's a pretty simple process which just dumps data from selected Dynamics entities into SQL tables - no transforms and very little filtering, and has been pretty stable though we have had to filter a couple of the entities due to size, and also reduce the number of attributes/columns as our customisations have resulted in 2 particular entities having > 500 attributes. 

ADF Metadata-Driven copy task 

What I've decided on is to update this to use the Metadata-Driven copy task, as well as cleaning up the extract attributes and implementing some better filtering to reduce the daily data extract quantity.

There's already a lot of info out there about the metadata-driven task, so I'm not going to get too into it here. Basically it will allow effectively a single process that will cycle through all the entities to be exported, rather than creating multiple datasets and copy tasks. 

I always forget how to get to it though, so here's how. Once you're launched the ADF Studio from the portal you create a new task by creating a new Factory Resource and choosing the Copy Data tool.

 

 This will open a window/blade that lets you select Metadata-driven copy task as the copy task.

A few notes about this - some of my decisions during the create will be explained later:

  • It needs an existing database to create the SQL objects needed for storing the task metadata. The dataset can be created from the blade but the SQL Server and database need to already exist and be accessible to the data factory
  •  If you want the pipelines to run on a specific runtime environment that needs to already exist
  •  Source and destination connections can be created during the wizard steps, or you can use existing ones
  • For this setup I was intending to customise it afterwards, so I just selected 2 tables to use as examples/templates 
  •  I chose Configure for each table separately as the loading behaviour, and then Delta load for one table and left the other as Full load. For delta loading you need to select a Highwater column which will be used to track whether changes have occurred.
  •  Allow the wizard to generate column mappings
  • Also set the destination properties. We want to use Upsert for our Write behaviour so set this with the appropriate key. You can also add a Pre-copy script for example purposes and any other settings that you think might be useful later on

 Once all the selections have been made the wizard will create the ADF resources such as pipelines and datasets, as well as the SQL database tables and stored procs. 

From the ADF side this is 3 pipelines named xxx_TopLevelxxx_MiddleLevel, xxx_BottomLevel. The top level pipeline is the main orchestrator and the one which will be scheduled to run. This calls the middle level, which then calls the bottom level one to do the actual export and destination population.

The heart of this SQL side is a table which contains a row for each entity that's being extracted, with columns of JSON values of the metadata that ADF interprets to run the pipelines.

Limitations and Next Steps

 I might be missing something, but updating the copy task for changes to existing items or adding new ones looks to be a bit of a chore. Seeing as I have time on my hands I'm going to try altering the generated objects to allow using some custom tables - sort of a metadata-driven metadata-driven task. More on this in the next post

 

 

Azure Data Factory Metadata-Driven Pipelines - 1

 I've had a bit of free time work wise lately so figured I'd finally get onto revamping one of our Azure Data Factory (ADF) extract ...