Starting your journey with Microsoft Azure Data Factory

Working as an Azure Data Engineer, you are responsible for designing and implementing Azure data platform solutions
in which you need to choose the best method to ingest structured, semi-structured and unstructured data that are
generated from various data sources, in order to be processed in batches or in real-time by the applications and
stored or visualized as per the project requirements.

In this article, we will go through the Microsoft Azure Data Factory service that can be used to ingest, copy and
transform data generated from various data sources.


Azure Data Factory is a cloud-managed service that is used to orchestrate the data copying between different
relational and non-relational data sources, hosted in the cloud or locally in your datacenters and transforming it
to meet the business requirements. It is a big data service that is responsible for the cloud-based Extract,
Transform and Load (ETL), Extract, Load and Transform (ELT) and data ingestion processes using scheduled data-driven

Using Azure Data Factory, you can easily create a workflow in which you connect to a specific on-premises or in the
cloud data source, using the 90+ built-in connectors available in Azure Data Factory, transform it using ADF mapping
data flow activities, without the need to be skilled in Spark clustering or programming, with the ability to write
your own custom transformation code if the required transformation is not available, and finally load the
transformed data in a cloud centralized storage to be processed by another analysis or cognitive service or store it
directly to relational data sources such as Azure SQL Database or Azure Data Warehouse or non-relational data store
such as Azure Cosmos DB to be transformed and processed later.

In Azure Data Factory, you can use Azure Monitor, PowerShell and health tab on the Azure portal to monitor the
developed and scheduled workflows status and track any performance or failure issues faced while copying or
transforming data.

In addition, Azure Data Factory provides you with the ability to use the Azure DevOps and GitHub repositories in the
Continuous Integration and Delivery (CI/CD) of your ETL and ELT workflows while developing it and before it is being
published to the production. In other words, it allows you to move it incrementally between the different
environments, using the Azure Resource Manager Templates as the configuration storage.

Imagine the effort that you need to spend in developing a custom data movement solution that can perform the data
copy and transformation tasks in addition to the complexity and difficulty that you will face while trying to
integrate it with other systems and data sources!

Azure Data Factory Components

Azure Data Factory consists of a number of components, that together, allow you to build the data copy, ingestion
and transformation workflows. These components include:

  • Pipeline: The logical container for the activities that are managed and scheduled to run
    sequentially or in parallel as a single unit. The pipeline run is an instance of the pipeline execution after
    passing values to the pipeline parameters
  • Activity: A single execution step in the pipeline that is used to ingest, copy or transform the
    data. A pipeline activity comes in three main types, including data movement, data transformation and control
  • Mapping Data Flow: A visually designed data transformation logic, requires no coding skills
    that are executed on a completely managed Spark cluster, and spins up and down based on the workload
  • Dataset: A reference to the data in the data source or sink that will be used in the pipeline
    as an activity input or output
  • Linked Service: A descriptive connection string that is used to connect to the data and compute
    resources that are used in the pipeline activities, in which you will be asked to provide the name of the data
    source and the credentials that are used to connect to that data source. The information that is required to
    connect to the data source depends mainly on the type of that data source
  • Trigger: The processing unit specifies when the pipeline will be executed. Azure Data Factory
    supports three main types of triggers: A Schedule trigger that invokes the pipeline on a specific time and
    frequency, a tumbling window trigger that works on a periodic interval and an Event-based trigger that invokes
    the pipeline as a response to an event
  • Control flow: Pipeline activities orchestration controller that specifies the execution flow of
    the pipeline, in sequence or parallel, with the ability to define execution branches and loops

Create a New Azure Data Factory

Before creating a new Data Factory that will be used to orchestrate the data copying and transformation, make sure
that you have an Azure subscription, and that you are signing in with a user account that is a member of the
contributor, owner, or administrator role on the Azure subscription.

In order to create a new Data Factory, open the Microsoft Azure Portal in your web browser, sign in using an authorized user account, then
search for Data Factory in the portal search panel and click on the Data Factories option, as shown

Azure Portal Search

In the Data Factories window, click on the + Add option to create a new data factory, as below:

Add a new Data Factory

From the Basics tab of the Create Data Factory window, provide the Subscription under which the
Azure Data Factory will be created, an existing or a new Resource Group where the ADF will be created, the nearest
Azure region for you to host the ADF on it, a unique and indicative name of the Data Factory, and whether to create
a V1 or V2 data factory, where it is highly recommended to create a V2 Data Factory to take advantages of the new
features and enhancements, as shown below:

Create ADF_ Basic

In the Git Configuration tab of the Create Data Factory window, you will be asked to configure a
repository for your Data Factory CI/CD process, that helps you to incrementally move the changes between the
Development and Production environment, where it will ask you whether to configure the Git during the ADF creation
or later after creating the Data Factory.

If you plan to configure the Git now, you will be asked to specify whether to use a GitHub or an Azure DevOps
repository, the repository name, the branch name and the repository root folder, as shown below:

ADF GIT Configuration

From the Networking tab of the Create Data Factory window, you need to decide if you will use a
Managed VNET for the ADF and the type of the endpoint that will be used for the Data Factory connection, as below:

ADF Networking

After configuring the Data Factory network options, click on the Review + Create option to review
the selected options before creating the Data Factory, as shown below:

ADF Review

Verify your choices then click on the Create option to start creating the Data Factory, as below:

ADF Create

You can monitor the progress of the Data Factory creation from the Notifications button of the Azure Portal, and a
new window will be displayed once the Data Factory created successfully, as shown below:

ADF Created

From the displayed window, click on the Go to Resources button to open the created Data Factory, as
shown below:

ADF Overview

Now you will see that a new Data Factory is created, with the ability to check the Data Factory essential
information, the Azure Data Factory documentation and the pipelines and activities summary under the
Overview window.

You can also check different activities performed on the Data Factory under the Activity Log, control the ADF
permissions under the Access Control, check and fix the different problems under the Diagnose and Solve Problems,
configure the ADF networking, lock the ADF to prevent changes or deletion of the ADF resource and other monitoring,
automation and troubleshooting options, as below:

ADF configurations


In this article, we provided all the required information to become familiar with the Azure Data Factory and showed
how to create a new Data Factory. In the next article, we will see how to use the ADF to move data between different
data sources. Stay tuned!

Ahmad Yaseen
Latest posts by Ahmad Yaseen (see all)

Author: admin

Leave a Reply

Your email address will not be published.