Copy data between Azure data stores using Azure Data Factory

In the previous article, Starting your journey with Microsoft Azure Data Factory, we discussed the main concept of
the Azure Data Factory, described the Data Factory components and showed how to create a new Data Factory step by
step.

In this article, we will show how to use the Azure Data Factory to orchestrate copying data between Azure data
stores.

Copy Activity Overview

The Copy activity in Azure Data Factory is used to copy data between the different data stores that are located
on-premises and in the cloud, in order to use the copied data in other transformation or analysis tasks or copy the
transformed or analyzed data to the final store to be visualized.

The copy activity supports various numbers of data stores such as Azure data stores, On-premises relational and
non-relational data stores, file stores, generic protocols such as HTTPS and services and applications such as
Service Now. Azure Data Factory supports reading from and writing to different file formats, including Avro, Binary,
Delimited text, Excel, JSON, ORC, Parquet and XML file formats. For a complete list of the supported data sources
(called sources), and data targets (called sinks), in the Azure Data Factory copy activity, check the Supported Data Stores and Formats.

Azure Data Factory uses the Integration Runtime (IR) as a secure compute infrastructure to run the copy activity
across the different network environments and make sure that this activity is performed in the closest possible
region to the data store. You can imagine it as a bridge between the copy activity and the linked services.

Azure Data Factory supports three types of Integration Runtimes: (1) Azure Integration Runtime that
is used when copying data between data stores that are accessed publicly via the internet, (2) Self-Hosted Integration Runtime that is used to copy data from or to an on-premises data store or from a network
with access control and (3) Azure SSIS Integration Runtime that is used to run
SSIS packages in the Data Factory as we will see later in this articles series. For more information, check the Integration Runtime in Azure Data Factory.

Take into consideration that, you can perform the copy activity in the Data Factory using different tools and SDKs,
include the Copy Data tool, Azure portal, .NET SDK, Python SDK, Azure PowerShell, REST API and Azure Resource
Manager template.

Copy Data from Azure Blob Storage to Azure SQL Database

In order to copy data from a text file stored in an Azure Blob Storage to an Azure SQL Database table, we need to
make sure that we have a storage account with a blob container that contains the data file, as shown below:

Blob Container

And an Azure SQL Database, with Allow Azure services and resources to access this server firewall
option enabled, to allow the Azure Data Factory to access that database server, as below:

Db Firewall

Also, we will create a new table in the Azure SQL Database, in order to insert the data from the text file stored in
the blob container to that Azure SQL Database table, as in the CREATE TABLE T-SQL statement below:

Create Azure SQL Database Table

Now we are ready to configure the Data Factory pipeline that will be used to copy the data from the blob container
to the Azure SQL Database. To configure the copy process, open the Azure Data Factory from the Azure Portal and
click on the Author & Monitor option under the Overview tab, as shown below:

Author & Monitor

From the opened Data Factory, you have two options to configure the copy pipeline, the first one is to create the
pipeline components one by one manually, using the Create Pipeline option. The second option, which
we will use in this demo, is to create the pipeline using the Copy data tool, as shown below:

ADF Pipeline creation options

The first step in the Copy Data tool is to provide a unique name for the copy activity and specify whether to
schedule this copy process or run it once, as shown below:

Copy Data tool - Step 1

After that, the Copy Data wizard will ask you to specify the type of source data store in order to create a Linked
Service to connect to that data source. In this demo, we will copy data from an Azure Blob Storage,
as shown below:

Source Linked service

When you click on the Azure Blob Storage source type then click on Continue, a new window will be displayed, in
which you need to provide a unique name for the Linked Service, the subscription where the storage account is
created, the name of the storage account and the authentication method that will be used to connect to that storage
account. After providing all required information, click on Test Connection to verify that you can
reach that storage account then click on Create to create the Linked Service, as shown below:

Create Linked Service.

Once created successfully, the source linked Service will be displayed in the data sources window, as below:

Data sources Window

After that, you need to create the Dataset for the data source, in which you simply need to specify the input data
file, or folder in case you need to copy multiple files recursively, as shown below:

Source data set

When you specify the input file, the Data Factory will check the format for that file and allow you to review the
file format settings, preview the source data and perform the required changes to fit your requirements, as shown
below:

Input file format

The next step is to configure the Linked Service and Dataset for the destination data store. In this demo, we will
copy the input file to an Azure SQL Database table. To achieve that, choose the Azure SQL Database from the New
linked Service list, then click Continue to configure the sink Linked service, as shown below:

sink Linked service type

In the displayed New linked Service window, provide a unique name for the sink Linked Service, the subscription
name, the Azure SQL server and database names and finally the credentials that will be used by the Azure Data
Factory to access the Azure SQL Database. After providing all the required information, click on the Test Connection option to verify the connection between the Data Factory and the Azure SQL Database then
click Create to create the Linked Service, as shown below:

Create Sink Linked Service

Once created successfully, the sink Linked Service will be listed in the Destination Data stores, as below:

Destination Data Sources

After creating the Linked Service that points to the Sink data store, we need to create the sink dataset, that
specifies the target database table, as shown below:

Sink Dataset

Also, you need to review the schema mapping between the input data file and the sink data table and provide any
script that should be executed before copying the data, as shown below:

Schema mapping

Now we have the Linked Services and Datasets for both the source and sink data stores configured successfully. The
next step now is to configure the copy activity settings, such as the performance and fault tolerance settings, as
below:

Copy Activity Settings

From the Summary window, review all the copy process configurations, with the ability to edit it from the Summary
window before running the pipeline, as shown below:

Summary window

Now we have reviewed all the pipeline settings and it is ready for deployment and execution. The Data Factory will
create all pipeline components then execute the pipeline. Executing the pipeline means executing the copy activity
that will copy the data from the input file that is stored in the Azure Blob Storage and write it to the Azure SQL
Database table, as shown below:

Pipeline deployment

To verify that the data is copied to the Azure SQL Database table, connect to the Azure SQL Database using the SSMS
then run a select statement to read the table data, and you will see that the data is copied successfully, as shown
below:

Verify copy using SSMS

Let us see where the different pipeline components are created in the Azure Data Factory. Click on the
Author button of the Data Factory and review the available pipelines and datasets, where you can
see that the pipeline that is created using the Copy Data tool is saved in the Data Factory pipelines list and the
source and destination datasets are available under the Datasets list. Clicking on the created pipeline, you will
see the Copy Activity that is used to copy the data from the source file to the destination table, as shown below:

ADF Author

Clicking on the Monitor button, you can review the execution logs of the created pipeline,
including the execution time, duration, and result, as shown below:

ADF monitor


The Linked Services of the Data Factory pipelines can be checked by clicking on the Manage button,
allowing you to review all available Linked Services and edit it based on your requirements, as shown below:

ADF Monitor

Conclusion

In this article, we showed how to copy data between Azure data stores and verify the pipeline components and
results. In the next article, we will see how to copy data from an On-premises data store to an Azure data store
using Azure Data Factory. Stay tuned!

Ahmad Yaseen
Latest posts by Ahmad Yaseen (see all)

Author: admin