Author avatar

Néstor Campos

Understanding Azure Synapse Analytics

Néstor Campos

  • Nov 9, 2020
  • 5 Min read
  • 500 Views
  • Nov 9, 2020
  • 5 Min read
  • 500 Views
Data
Microsoft Azure
Data Analytics
Cloud Platforms

Introduction

Azure Synapse Analytics, formerly known as Azure Data Warehouse, is a new analytics engine developed by Microsoft. But the new service represents not only a name change, but also an evolution of the way of doing analytics within Azure. We'll look into the details in this guide.

All in One

Azure Synapse Analytics is a powerful, multi-functional engine in any modern data management environment. The advantage of this engine is that it is all in one, that is, it offers several ways of working and technologies in a single service, which streamlines and unifies processes for data development and management in an innovative and moderate way.

The functionalities it offers include:

  • Data Warehouse: The already popular Azure Data Warehouse technology for storing and managing data for analysis and decision making, now through SQL pools.
  • Big Data engine: With Spark pools, engineers can now run scalable analytics with Spark languages to do Big Data processing with them .
  • Serverless engine: Query Data Lakes directly using SQL statements in a simple way.
  • Data flows: To Develop ETL flows that consume or receive data in your Data Warehouse or Data Lake with the same engine used with Azure Data Factory.

SQL Pool

In Synapse Analytics, the Data Warehouse can be consumed through SQL pools, which allow you to query databases through clusters that are scalable both in number of machines and in their size.

In addition to the advantages of pool scaling, you can assign a level of processing and pool resources according to roles and types of queries. You can add security at the row and column level so that only the corresponding users have certain access.

Finally, you can do everything you already did with Azure Data Warehouse, but in a way that is much more in line with the modernity of current technologies.

Spark Pool

If, in addition to the SQL queries on the data in your Data Warehouse, you need to execute other types of queries and data transformations, you can take advantage of the Spark engine to create processes in notebooks, similar to how you would do it in other technologies, such as Databricks.

The Spark pool allows you to use Python, SQL and even C# (.NET) to process data in a Big Data environment, which you can configure and scale according to your requirements. (Decide how many machines you need or if it should automatically scale according to the necessary processing at a certain time.)

You can create as many pools as you require, for example, one for production processes and another for your engineers and data scientists to explore the Data Warehouse or Data Lakes.

Serverless

In addition to the pools, you can also make queries directly to Data Lakes (with Azure Data Lake Gen2), which will help you execute SQL statements on files and directories in a simple way, paying only for the consumption generated by that query without having a cluster on for adh-hoc queries—that is, in a serverless environment.

In each Synapse Analytics environment, the serverless mode is already available, and to use it, your only job is to connect Synapse with the Azure Data Lake Gen2 repository for its use (and of course, have adequate governance over that repository to take advantage of all the potentiality of the serverless engine).

Connection with Other Services

  • PowerBI: Connect your data directly to PowerBI datasets to expose and develop your reports and dashboards in an optimal and simple way.
  • Data Factory: Take advantage of Synapse Analytics Data flows with Azure Data Factory advantages for your most robust ETL processes.
  • Machine Learning: Take advantage of your Spark processes to develop models and expose them through Azure Machine Learning to have your analytical processes interconnected, scalable, and highly available.

Courses

If you want to learn more about Synapse, Pluralsight has several courses dedicated to this great technology. Here are some of them:

Implementing a Cloud Data Warehouse

Deploying

11