Copy data from On-premises data store to an Azure data store using Azure Data Factory

In the previous article, Copy data between Azure data stores using Azure Data Factory, we discussed how to copy the
data stored in an Azure Blob Storage container to an Azure SQL Database table using Azure Data Factory and review
the created pipeline components and result.

In this article, we will show how to copy data from an on-premises SQL Server database to an Azure SQL Database
using Azure Data Factory.

Prepare On-premises SQL Server

In order to copy data from an on-premises SQL Server database to an Azure SQL Database using Azure Data Factory,
make sure that the SQL Server version is 2005 and above, and that you have a SQL or Windows authentication user to
connect to the on-premises SQL Server instance.

As the SQL Server instance is located inside an on-premises network, you need to set up a self-hosted integration
runtime to connect to it. A self-hosted integration runtime provides a secure infrastructure to run the copy
activities between a cloud data store and a data store in a private network. The self-hosted integration runtime
should be installed on the on-premises machine where the SQL Server instance is hosted.

In order to set up the self-hosted integration runtime, open the Azure Data Factory from the Azure Portal, click on
the Manage icon at the left side of the window, select the Integration Runtime tab
and click on the New option, as shown below:

New SH-IR

From the displayed Integration Runtime Setup window, choose the type of the integration runtime that you plan to
create, which is Self-Hosted Integration Runtime in our scenario, then click
Continue, as below:

IR Setup-Select type

After that, provide a unique meaningful name for your SH-IR and click Create to proceed. A new
window will be displayed, providing you with the link that will be used to download the latest version of the
Self-Hosted IR and the authentication key that will be used to register the SH-IR. Click on the download link, copy
the installation media to the machine that hosts the on-premises SQL Server instance and copy the authentication key
to register it once installed, as below:

SH-IR download link and key

Now run the installation wizard of the Self-Hosted IR on the on-premises machine where the SQL Server instance is
hosted, in which you will be asked to accept the terms of the license agreement, specify the installation path of
the SH-IR then install it, as shown below:

SH-IR installation wizard

Once installed, a new window will be displayed to register the Self-Hosted IR to the Azure Data Factory using the
authentication key copied previously, as shown below:

SH-IR registeration

Then it will show the nodes that are assigned for the SH-IR, with the ability to configure the security of the SH-IR
connection using certificates, as below:

SH-IR nodes

Finally, the Self-Hosted Integration Runtime is installed to the on-premises machine, registered and connected to
the Azure Data Factory, as shown below:

SH-IR connected

You can verify the SH-IR registration and connection from the Azure Data Factory, from the integration runtime tab
under the Manage tab, where the SH-IR is listed under the IR list and the status of the IR is
Running, with the ability to upgrade the SH-IR to the latest version from that page, as shown below:

SH-IR status from ADF

Copy from On-premises to Azure SQL Database

With the Self-Hosted Integration Runtime installed and registered on the On-premises SQL Server machine, the source
data store is ready. From the Azure SQL Database side, we need to make sure that the Allow Azure Services
and resources to access this server
firewall option is enabled, in order to allow the Azure Data
Factory to access that database server.

To create a pipeline that will be used to copy data from the on-premises SQL Server to the Azure SQL Database, open
the Azure Data Factory from the Azure Portal and click on the Author & Monitor option under the
Overview tab, as shown below:

Author & Monitor

From the opened Data Factory, you can create and configure the copy pipeline using two methods: creating the
pipeline components one by one manually, using the Create Pipeline option or to create the pipeline
using the Copy data tool, as shown below:

ADF-Copy Data Tool

First, you need to provide a unique name for the copy activity that indicates the main purpose of that pipeline then
specify whether to schedule this copy process or run it once, as shown below:

Copy Data - Properties

After that, you need to create the Linked Service that connects to the source data store, which is the on-premises
SQL Server instance, as in our demo. To select the source data store type, click on the Database
category, click on Create new connection then choose SQL Server data store, as shown below:

Source data store type

The New linked Service wizard will ask you to provide a unique name for the source Linked Service, the integration
runtime that will be used to connect to the source data store, which is the Self-Hosted IR, the name of the
on-premises SQL Server, the source database name and the credentials that will be used to connect to that database
server. To verify the provided connection properties, click on the Test Connection option then
Create to proceed with the Linked Service creation, as shown below:

Source Linked Service

And the created source Linked Service will be displayed under the Source Data stores, as shown below:

Source Data stores

With the source Linked Service created successfully, you need to create the Dataset that is a reference to the
source table(s) you will read from. Where you will be requested to select the tables to be copied to the sink data
store, or to write a query that filters the data you are interested in, as shown below:

Source Dataset

With the ability to apply additional filters to the source data, as below:

Filter the source data

The next step is to configure the destination Linked Service, which is the Azure SQL Database that can be found
under the Azure category, as shown below:

Destination data store type

In the destination New Linked Service window, you need to provide a unique name for the destination Linked Service,
the subscription name, the Azure SQL server and database names and finally the credentials that will be used by the
Azure Data Factory to access the Azure SQL Database. After providing all required connection properties, click on
the Test Connection option to verify the connection between the Data Factory and the Azure SQL
Database then click Create to create the Linked Service, as shown below:

Destination Linked Service creation

And the destination Linked Service will be displayed under the Linked Services list, as below:

Destination Linked Service list

With the destination Linked Service created, you need to create the destination dataset, by selecting the
destination table in the Azure SQL Database, where a new table will be created automatically if it does not exist or
appending the copied data to an existing table, based on the source schema as shown below:

Destination dataset

In addition, you need to review the schema mapping between the source and destination tables, with the ability to
provide any script that should be executed before copying the data, as below:

Tables schema mapping

And finally, you are asked to configure the copy activity settings, such as the performance and fault tolerance
settings, as below:

Copy Data Settings

When you finish configuring all required settings, review all your choices before starting the pipeline deployment
process, with the ability to edit it from the Summary window, then click Next to proceed:

Summary

At that step, the Linked services and the pipeline will be created then executed as shown below:

Pipeline deployment

To edit the created pipeline components, click on the Edit pipeline button in the previous window,
or click directly on the Author tab, where you will be able to perform changes to the pipeline copy
activity properties and the created datasets, as shown below:

Pipeline Author

Clicking on the Monitor button will move you to the Pipeline runs window that
shows the execution logs for all pipelines, including the execution time, duration and result, as shown below:

Pipeline Monitor

The Azure Data Factory linked Services can be checked and edited from the copy activity properties or directly from
the Manage option, as shown below:

Pipeline Manage

Finally, to verify that the data is copied from the on-premises SQL Server database to the Azure SQL Database,
connect to the Azure SQL Database using SSMS and run the SELECT statement below to check the copied data:

Verify copy data

Conclusion

In this article, we showed how to copy data from an on-premises data store to an Azure data store. In the next
article, we will show how to perform a transformation process for the data using the Azure Data Factory. Stay tuned!

Table of contents

Ahmad Yaseen
Latest posts by Ahmad Yaseen (see all)

Author: admin