In this article, we will show how to prepare yourself for one of the important Microsoft Azure exams, DP-200:
Implementing an Azure Data Solution certificate exam.
Implementing an Azure Data Solution certificate exam measures your intermediate-level knowledge in three main areas.
- How to implement data storage solutions, with relative questions weight in the exam up to 45%
- How to manage and develop data processing solutions, with relative questions weight in the exam up to 30%
- How to monitor and optimize data solutions, with relative questions weight in the exam up to 35%
With no official prerequisites for this exam, it is recommended, but not mandatory, to take the Microsoft Azure Fundamentals (AZ-900) exam if you are very new to Microsoft Azure world, and
taking the Microsoft Azure Data Fundamentals (DP-900) if you are new to all Microsoft Azure data platform.
You can easily schedule the exam from the Implementing an Azure Data Solution certificate page. You need
to pass both the Implementing an Azure Data Solution (DP-200) and the Designing an Azure Data Solution (DP-201)
certificate exams in order to be certified as an Azure Data Engineer Associate. For more information about Microsoft Azure certificates, check
It is time to specify your Microsoft Certifications path.
Microsoft Azure data engineers are responsible for all data-related design and implementation tasks, including
provisioning the proper data storage service, ingesting streaming and batch data using the suitable mechanism,
transforming data between different sources and storage types, implementing security requirements and data retention
policies that meet the business requirements and identifying and fixing the performance bottlenecks during the
implementation and running phases.
The Implementing an Azure Data Solution certificate exam is designed for the Microsoft Azure data engineers, data
professionals, data architects, and business intelligence professionals, who will participate in the implementation
phase of the data-related tasks for any solution that is implemented using the relational and non-relational Azure
data services. These Microsoft Azure data services include Azure Cosmos DB, Azure SQL Database, Azure Synapse
Analytics, Azure Data Lake Storage, Azure Data Factory, Azure Stream Analytics, Azure Databricks, and Azure Blob
In order to prepare yourself for the Implementing an Azure Data Solution exam, you can go through the 7-module Implementing an Azure Data Solution learning path self-study
course provided by Microsoft that helps you in getting the basic knowledge required to pass that exam.
Take into consideration that this exam contains a large number of subjects. In order to pass the exam, you need to
have enough knowledge in each subject, without going very deep in each subject. For me, I prefer to be fully
prepared for the certificates exams and gain all the required knowledge in order to be able to provide training in
the courses I am certified in and apply these skills in my customers’ sites. So, I will list all measured skills in
this course and the official resource to study that subject.
Implement Data Storage Solutions
- Implement non-relational data stores
- Implement relational data stores
- Manage data security
Manage and Develop Data Processing Solutions
- Develop batch processing solutions
- Develop streaming solutions
Monitor and Optimize Data Solutions
- Monitor data storage
- Monitor data processing
- Optimize of Azure data solutions
As any exam, after completing the study material, you need to make sure that you are prepared well for the exam. You
can search on the internet for any free practice tests, such as the ExamTopics site or
any other free test, but after making sure that you have completed studying the official course outline. To be
familiar with Microsoft exams shape, check the Microsoft certificates Exam Formats and Questions Types.
In this article, I will provide some review questions that I usually use to measure my trainees general skills, , to make sure that they are ready for the Implementing an Azure Data Solution exam, taking into consideration that most of the exam questions are scenario-related questions in which you are requested to apply what you learn in these issues.
The type of data that can have its own schema defined at query time:
The process of duplicating the content for redundancy in order to meet the customers SLA in Microsoft Azure:
The Microsoft Azure data platform technology that is a globally distributed, multi-model database that can offer sub-second query performance and low latency:
Microsoft Azure Cosmos DB
The cheapest data store that can be used when you want to store your data without the need to query it directly:
Azure Storage Account
The Microsoft Azure Service that can be used to store documentation about a data source:
Azure Data Catalog
The Microsoft Azure Data Platform technology that is used to process data in an ELT framework:
Azure Data Factory
Working as a data engineer in a startup with limited funding, why would you prefer to use the Microsoft Azure
data storage instead of purchasing on-premises storage?
The Microsoft Azure pay-as-you-go billing model provides you with the ability to avoid buying expensive hardware that you may not use continuously
Assume that you are requested to store two video files as blobs. The first video file is business-critical and
requires a replication policy that creates multiple copies across geographically diverse datacenters. The second
video file is non-critical, and a local replication policy is sufficient. How could we store these two Video
The two video files should be stored in separate storage accounts
When creating a new storage account, the name of a storage account should be:
When creating an Azure Data Lake Storage Gen 2 account, you need to configure it to be able to processes
analytical data workloads for the best performance. To achieve that, you should enable a specific option when
creating that account:
From the Advanced tab, set the Hierarchical Namespace to enabled
The tool that can be used to upload a single file to a Data Lake Storage Account (Gen 2) without the need for any installation or configuration:
Microsoft Azure Portal
The tool that can be used to perform a movement of hundreds of files from Amazon S3 to Azure Data Lake Storage:
Azure Data Factory
The Apache Storage technology that is encapsulated in Microsoft Azure Databricks:
The Notebook format that is used in Databricks:
The browsers recommended for best use with Databricks Notebook:
Chrome and Firefox
In order to connect the Spark cluster to the Azure Blob, we should:
Apache Spark can connect to databases like MySQL, Hive and other data stores using:
The recommended storage format to use with Spark, is:
In order to ensure that there is 99.999% availability for the reading and writing of all your data that is stored in a Cosmos DB database, you should:
Configure reads and writes of data for multi-region accounts with multi-region writes
You are requested to move the data that is stored in a Table Storage account located in the West US region available globally, so you should migrate it to :
Azure Cosmos DB Table API
The Cosmos DB API that provides a traversal language that enables connections and traversals across connected data:
In order to maximize the data integrity of data that is stored in a Cosmos DB, you should use _____ consistency level
You just created a new Azure SQL Database, who will be responsible for performing operating system and database software updates?
The cloud provider: Microsoft Azure. Azure manages the hardware, software updates, and OS patches for you
Few days after provisioning your Azure SQL database, you find that you need additional IO throughput, the
performance model that should be used is:
The scale of compute that is used in Azure SQL Synapse Analytics servers:
Assume that you have an Azure Synapse Analytics database, within this, you have a dimension table named Stores
that contains store information. There is a total of 263 stores nationwide. Store information is retrieved in
more than half of the queries that are issued against this database. These queries include staff information per
store, sales information per store and finance information. You want to improve the query performance of these
queries by configuring the table geometry of the stores table. The best table geometry to select for the Store
The default port for connecting to an enterprise data warehouse in Azure Synapse Analytics, is:
TCP port 1433
You have a Data Warehouse created with a database named Contoso. Within the database is a table named
DimSuppliers. The suppliers’ information is stored in a single text file named Suppliers.txt and is 1200MB in
size. It is currently stored in a container with an Azure Blob store. Your Azure Synapse Analytics is configured
as Gen 2 DW30000c. In order to maximize the performance of the data load, you should:
Split the text file into 60 files of 20MB each.
You have a Data Warehouse created with a database named Contoso. You have created a master key, followed by a
database scoped credential, After that, in order to copy data using Polybase, you should create:
An external data source
The Microsoft Azure technology that provides an ingestion point for data streaming in an event processing
solution that uses static data as a source, is:
Azure Blob storage
Will an application that publishes messages to Azure Event Hub very frequently get the best performance using
Advanced Message Queuing Protocol (AMQP, as it establishes a persistent socket?
By default, the number of partitions that a new Event Hub will have is:
Assume that an Event Hub goes offline before a consumer group can process the events it holds. Will those events be lost?
The job input that consumes data streams from applications at low latencies and high throughput:
The tool that can be used to view the key health metrics of your Stream Analytics jobs, is:
The Microsoft Azure Data Factory component that contains the transformation logic or the analysis commands of
the Azure Data Factory’s work, is called:
In order to move data from an Azure Data Lake Gen2 store to Azure Synapse Analytics, the Azure Data Factory
integration runtime that should be used in a data copy activity is:
The Mapping Data Flow transformation that is used to routes data rows to different streams based on matching
conditions, is called:
The transformation that is used to load data into a data store or compute resource, is called:
The cloud service category that requires the greatest security effort on your part, is:
Infrastructure as a service (IaaS)
The best way to protect sensitive customer data is to encrypt:
Encrypt data both as it sits in your database and as it travels over the network
The Microsoft Azure service that helps in storing certificates to centrally manage them for your services:
Azure Key Vault
Your company is storing thousands of images in an Azure BLOB storage account. The web application you are
developing needs to have access to these images, the best way to provide secure access for the third-party web
Use a Shared Access Signature to give the web application access.
The best method to have insights into any unusual activity be occurring with your storage account with minimal
Automatic Threat Detection
The most efficient way to secure a database to allow only access from a VNet while restricting access from the
internet is creating:
A server-level virtual network rule
If a mask is applied to a column in your database that holds a user’s email address, JohnCal@contoso.com, then the database administrator will be able to see the email address like:
JohnCal@contoso.com with no change
Is the “Encrypted communication” option turned on automatically when connecting to an Azure SQL Database or
Azure Synapse Analytics?
What are the steps that you should follow to set the encryption for the data stored in Stream Analytics?
It cannot be done as Stream Analytics does not store data
In order to respond to the critical condition and take corrective automated actions using Azure Monitor, then
you should use:
Microsoft Azure Monitor Alerts
You are receiving an error message in Azure Synapse Analytics, You want to view information about the service
and help to solve the problem, what can you use to quickly check the availability of the service?
Diagnose and solve problems
While performing a daily data load to SQL Data Warehouse using Polybase with CTAS statements, the users are
complaining that the reports are running slow. In order to improve the performance of the report query, you
Create table statistics and keep it up to date
The maximum number of activities per pipeline in Azure Data Factory is:
While monitoring the job output of a streaming analytics job, the monitor reported back that there is a “Runtime
Errors > 0”, the issue mainly related to:
The job can receive the data but is generating errors while processing the query.
The Recovery Point Objective for Azure Synapse Analytics is:
The backup taken for Azure Cosmos DB every is: