Today, I will demonstrate an end-to-end process on how to setup CICD for Azure Data Factory using Azure DevOps pipelines. Continuous integration and delivery (CI/CD) means moving Data Factory pipelines from one environment to another in an automated manner. To setup a CICD workflow for Azure Data Factory can be a complex and challenging task, especially in larger enterprises which may involve a lot of different stages and environments. In this blog, I will outline the key focus areas and steps that we need to undertake to setup CICD for Azure Data Factory using Azure DevOps Pipelines.

Pre-requisites:

  1. Azure Portal and Azure DevOps accounts

2. Azure Data Factories created for all environments e.g. Development, Test, Staging and Production.

3. Linked Services, Integration runtimes, and any other required components are configured in Azure Data Factory for all environments.  If you need to create a new Azure Data Factory, you can refer to Microsoft’s official documentation here

Step-1: Connect Azure Data Factory with GIT repository

The Azure Data Factory source code should be stored in a Azure repos. To setup CICD for ADF, we will connect the development environment (it can be a pre-development environment as well) with the git repository. After that, we will publish them to other environments automatically using GIT and release pipelines.

Source Control Repository and Branching Workflow

The GIT repo will consist of two branches, one for collaboratio and other for publish.  We will use thedev branch for collaboration and master for publish. The default publish branch name is adf_publish and is automatically created if not specified inpublish_config.json (refer to step 2.3 below). You can choose different branch names as well based upon your branching strategy.

There are multiple ways to connect your Azure Data Factory to GIT repository. Please refer to this guide that outlines how to configure and work in a git repository with best practices – Source Control in Azure Data Factory

GIT repository

After you connect to a Git repository, you can view and manage your configuration in the management hub for sandbox environment under Git configuration in the Source control section i.e.

Repository type:                Azure DevOps Git

Azure DevOps Account: coderise-azure

Project name:                     coderise-azure-adf-integration

Repository name:             coderise-adf-int

Collaboration branch:     dev

Publish branch:                master

Root folder: /

Any commits to development environment will now be sync’d to the collaboration branch ( dev ). Once development is complete,  Publish the Azure Data Factory using Azure Portal or programatically and it would automatically commit changes to the publish branch ( master )..The setup for source control repository is complete for storing Azure Data Factory template and code.

Step-2: Create Configuration files to setup CICD for Azure Data Factory

Step-2.1: Custom Parameters with the Resource Manager template

For automated CI/CD deployments, we want to change some properties as we promote code from development to higher environments e.g. azure key vault or storage name may vary per environment. We can change these properties during Resource Manager deployment, but the properties aren’t parameterized by default. Therefore, we will have to create a custom Resource Manager parameter configuration file arm-template-parameters-definition.json in git repo as outlined in this document. The details on how to create the file are outlined here:  Use Custom Parameters with the Resource Manager Template

When publishing from the master branch, Azure Data Factory will read this file and use its configuration to generate which properties get parameterized. If no file is found, the default template is used.

In addition, here’s a sample arm-template-parameters-definitions.json file for reference – LINK

Step-2.2: Create pre- and post-deployment script for Azure DevOps Pipelines

Next, we will need a pre- and post- deployment script that will be used later in the Azure DevOps pipelines. Microsoft provides a script for pre- and post-deployment. It accounts for deleted resources and resource references. Save the script  as ci-cd.ps1 in the git repo.

The script can be found here – LINK . We have also checked in the file in our repo here

Step-2.3: Create publish script to configure publishing settings

By default, data factory generates the Resource Manager templates of the published factory and saves them into a branch called adf_publish. To configure a custom publish branch, add a publish_config.json file to the root folder in the collaboration branch. When publishing, ADF reads this file, looks for the field publishBranch, and saves all Resource Manager templates to the specified location. If the branch doesn’t exist, data factory will automatically create it.

Create a publish_config.json file in root folder of GIT repository i.e.

{“publishBranch”:”master”,”includeFactoryTemplate”:true}

Step-3: Configure CICD using Azure DevOps Pipelines

For continuous integration and delivery, we will create a new Azure DevOps Release pipeline to deploy the changes from development to other environments. The steps on how to create a release pipeline can be found here

Step-3.1: Add artifacts to setup CICD for ADF

After that, we will add two artifacts to the release pipeline, one for collaboration branch (dev) and other for publish branch (master).

Publish artifact for ADF CICD

Source alias:         _adf_publish_source

Default branch:   master

Default version:   Latest from the default branch

Collaboration artifact for ADF CICD

Source alias:        _adf_collab_source

Default branch:  dev

Default version:  Latest from the default branch

Step-3.2: Create environment stages for CICD ADF pipelines

After that, we will create environment stages for each environment e.g. Test, Staging and Production, For example,

Stage name: Staging

Agent: vs2017-win2016

Step-3.3: Create tasks for each environment stage to setup CICD for Azure Data Factory

Finally, create the following tasks for each environment stage using updated values for your environments.

Step 3.3.1: Create Stop Trigger Task for Azure DevOps pipeline

task: AzurePowerShell@5
displayName: ‘Stop Trigger’
inputs:
azureSubscription: ‘CodeRise Staging Subscription’
ScriptPath: ‘$(System.DefaultWorkingDirectory)/_adf_collab_source/ci-cd.ps1’
ScriptArguments: ‘-armTemplate “$(System.DefaultWorkingDirectory)/__adf_publish_source/cr-adf-stg-001-adf/ARMTemplateForFactory.json” -ResourceGroupName “cr-stg-001-rgp” -DataFactoryName “cr-adf-stg–adf” -predeployment $true -deleteDeployment $false’

Step 3.3.2: Clean Resources and Start Trigger Task for Azure DevOps pipeline

displayName: ‘Clean Resources and Start Trigger’
inputs:
azureSubscription: ‘CodeRise Staging Subscription’
  ScriptPath: ‘$(System.DefaultWorkingDirectory)/_adf_collab_source/ci-cd.ps1’
ScriptArguments: ‘-armTemplate “$(System.DefaultWorkingDirectory)/__adf_publish_source/cr-adf-stg-001-adf/ARMTemplateForFactory.json” -ResourceGroupName “cr-stg-001-rgp” -DataFactoryName “cr-adf-stg-001-adf” -predeployment $true -deleteDeployment $true’

Step 3.3.3: ARM Template Deployment Task for Azure DevOps pipeline

task: AzureResourceManagerTemplateDeployment@3
displayName: ‘ARM Template deployment’
inputs:
azureResourceManagerConnection: ‘CodeRise Staging Subscription’
resourceGroupName: ‘cr-stg-001-rgp’
csmFile: ‘$(System.DefaultWorkingDirectory)/__adf_publish_source/cr-adf-stg-001-adf/ARMTemplateForFactory.json’
csmParametersFile: ‘$(System.DefaultWorkingDirectory)/__adf_publish_source/cr-adf-stg-001-adf/ARMTemplateParametersForFactory.json’
overrideParameters: ‘-factoryName “cr-adf-stg-001-adf” 

Similarly, you can repeat the configuration for other environments and also add approvals if needed.

Step-3.4: Automated CICD for Azure Data Factory deployment pattern

In conclusion, all commits to development environment will now be sync’d with the collaboration branch ( dev ) in GIT. , To publish changes, you can Publish the Azure Data Factory using Azure Portal or programatically and it would automatically commit changes to the publish branch ( master )..

Similarly, we can now deploy changes to other environments by creating a release for our Azure Data Factory release pipeline and triggering a deployment to destination environment.

Finally, if you liked the blog, please feel free to add a comment below and also make sure to checkout our other blogs here

Reference

Continuous Integration and Delivery in Azure Data Factory

Release Pipelines

Leave a Reply

Your email address will not be published. Required fields are marked *