Today, I will demonstrate an end-to-end process on how to setup CICD for Azure Data Factory using Azure DevOps pipelines. Continuous integration and delivery (CI/CD) means moving Data Factory pipelines from one environment to another in an automated manner. To setup a CICD workflow for Azure Data Factory can be a complex and challenging task, especially in larger enterprises which may involve a lot of different stages and environments. In this blog, I will outline the key focus areas and steps that we need to undertake to setup CICD for Azure Data Factory using Azure DevOps Pipelines.
Pre-requisites:
- Azure Portal and Azure DevOps accounts
2. Azure Data Factories created for all environments e.g. Development
, Test
, Staging
and Production
.
3. Linked Services, Integration runtimes, and any other required components are configured in Azure Data Factory for all environments. If you need to create a new Azure Data Factory, you can refer to Microsoft’s official documentation here
Step-1: Connect Azure Data Factory with GIT repository
The Azure Data Factory source code should be stored in a Azure repos. To setup CICD for ADF, we will connect the development
environment (it can be a pre-development environment as well) with the git repository. After that, we will publish them to other environments automatically using GIT and release pipelines.
Source Control Repository and Branching Workflow
The GIT repo will consist of two branches, one for collaboratio
and other for publish
. We will use thedev
branch for collaboration and master
for publish. The default publish
branch name is adf_publish
and is automatically created if not specified inpublish_config.json
(refer to step 2.3 below). You can choose different branch names as well based upon your branching strategy.
There are multiple ways to connect your Azure Data Factory to GIT repository. Please refer to this guide that outlines how to configure and work in a git repository with best practices – Source Control in Azure Data Factory
GIT repository
After you connect to a Git repository, you can view and manage your configuration in the management hub for sandbox
environment under Git configuration in the Source control section i.e.
Repository type: Azure DevOps Git
Azure DevOps Account: coderise-azure
Project name: coderise-azure-adf-integration
Repository name: coderise-adf-int
Collaboration branch: dev
Publish branch: master
Root folder: /
Any commits to development
environment will now be sync’d to the collaboration branch ( dev
). Once development is complete, Publish
the Azure Data Factory using Azure Portal or programatically and it would automatically commit changes to the publish
branch ( master
)..The setup for source control repository is complete for storing Azure Data Factory template and code.
Step-2: Create Configuration files to setup CICD for Azure Data Factory
Step-2.1: Custom Parameters with the Resource Manager template
For automated CI/CD deployments, we want to change some properties as we promote code from development
to higher environments e.g. azure key vault or storage name may vary per environment. We can change these properties during Resource Manager deployment, but the properties aren’t parameterized by default. Therefore, we will have to create a custom Resource Manager parameter configuration file arm-template-parameters-definition.json
in git repo as outlined in this document. The details on how to create the file are outlined here: Use Custom Parameters with the Resource Manager Template
When publishing from the master
branch, Azure Data Factory will read this file and use its configuration to generate which properties get parameterized. If no file is found, the default template is used.
In addition, here’s a sample arm-template-parameters-definitions.json
file for reference – LINK
Step-2.2: Create pre- and post-deployment script for Azure DevOps Pipelines
Next, we will need a pre- and post- deployment script that will be used later in the Azure DevOps pipelines. Microsoft provides a script for pre- and post-deployment. It accounts for deleted resources and resource references. Save the script as ci-cd.ps1
in the git repo.
The script can be found here – LINK . We have also checked in the file in our repo here
Step-2.3: Create publish script to configure publishing settings
By default, data factory generates the Resource Manager templates of the published factory and saves them into a branch called adf_publish. To configure a custom publish branch, add a publish_config.json file to the root folder in the collaboration branch. When publishing, ADF reads this file, looks for the field publishBranch, and saves all Resource Manager templates to the specified location. If the branch doesn’t exist, data factory will automatically create it.
Create a publish_config.json
file in root folder of GIT repository i.e.
{“publishBranch”:”master”,”includeFactoryTemplate”:true} |
Step-3: Configure CICD using Azure DevOps Pipelines
For continuous integration and delivery, we will create a new Azure DevOps Release pipeline to deploy the changes from development
to other environments. The steps on how to create a release pipeline can be found here
Step-3.1: Add artifacts to setup CICD for ADF
After that, we will add two artifacts to the release pipeline, one for collaboration
branch (dev) and other for publish
branch (master).
Publish artifact for ADF CICD
Source alias: _adf_publish_source
Default branch: master
Default version: Latest from the default branch
Collaboration artifact for ADF CICD
Source alias: _adf_collab_source
Default branch: dev
Default version: Latest from the default branch
Step-3.2: Create environment stages for CICD ADF pipelines
After that, we will create environment stages for each environment e.g. Test
, Staging
and Production
, For example,
Stage name: Staging
Agent: vs2017-win2016
Step-3.3: Create tasks for each environment stage to setup CICD for Azure Data Factory
Finally, create the following tasks for each environment stage using updated values for your environments.
Step 3.3.1: Create Stop Trigger Task for Azure DevOps pipeline
task: AzurePowerShell@5
displayName: ‘Stop Trigger’
inputs:
azureSubscription: ‘CodeRise Staging Subscription’
ScriptPath: ‘$(System.DefaultWorkingDirectory)/_adf_collab_source/ci-cd.ps1’
ScriptArguments: ‘-armTemplate “$(System.DefaultWorkingDirectory)/__adf_publish_source/cr-adf-stg-001-adf/ARMTemplateForFactory.json” -ResourceGroupName “cr-stg-001-rgp” -DataFactoryName “cr-adf-stg–adf” -predeployment $true -deleteDeployment $false’
Step 3.3.2: Clean Resources and Start Trigger Task for Azure DevOps pipeline
-
task: AzurePowerShell@5
displayName: ‘Clean Resources and Start Trigger’
inputs:
azureSubscription: ‘CodeRise Staging Subscription’ScriptPath: ‘$(System.DefaultWorkingDirectory)/_adf_collab_source/ci-cd.ps1’
ScriptArguments: ‘-armTemplate “$(System.DefaultWorkingDirectory)/__adf_publish_source/cr-adf-stg-001-adf/ARMTemplateForFactory.json” -ResourceGroupName “cr-stg-001-rgp” -DataFactoryName “cr-adf-stg-001-adf” -predeployment $true -deleteDeployment $true’
Step 3.3.3: ARM Template Deployment Task for Azure DevOps pipeline
task: AzureResourceManagerTemplateDeployment@3
displayName: ‘ARM Template deployment’
inputs:
azureResourceManagerConnection: ‘CodeRise Staging Subscription’
resourceGroupName: ‘cr-stg-001-rgp’
csmFile: ‘$(System.DefaultWorkingDirectory)/__adf_publish_source/cr-adf-stg-001-adf/ARMTemplateForFactory.json’
csmParametersFile: ‘$(System.DefaultWorkingDirectory)/__adf_publish_source/cr-adf-stg-001-adf/ARMTemplateParametersForFactory.json’
overrideParameters: ‘-factoryName “cr-adf-stg-001-adf” …
Similarly, you can repeat the configuration for other environments and also add approvals if needed.
Step-3.4: Automated CICD for Azure Data Factory deployment pattern
In conclusion, all commits to development
environment will now be sync’d with the collaboration branch ( dev
) in GIT. , To publish changes, you can Publish
the Azure Data Factory using Azure Portal or programatically and it would automatically commit changes to the publish
branch ( master
)..
Similarly, we can now deploy changes to other environments by creating a release for our Azure Data Factory release pipeline and triggering a deployment to destination environment.
Finally, if you liked the blog, please feel free to add a comment below and also make sure to checkout our other blogs here