Azure Synapse Analytics, formerly known as Azure Data Warehouse, is a new analytics engine developed by Microsoft. But the new service represents not only a name change, but also an evolution of the way of doing analytics within Azure. We'll look into the details in this guide.
Azure Synapse Analytics is a powerful, multi-functional engine in any modern data management environment. The advantage of this engine is that it is all in one, that is, it offers several ways of working and technologies in a single service, which streamlines and unifies processes for data development and management in an innovative and moderate way.
The functionalities it offers include:
In Synapse Analytics, the Data Warehouse can be consumed through SQL pools, which allow you to query databases through clusters that are scalable both in number of machines and in their size.
In addition to the advantages of pool scaling, you can assign a level of processing and pool resources according to roles and types of queries. You can add security at the row and column level so that only the corresponding users have certain access.
Finally, you can do everything you already did with Azure Data Warehouse, but in a way that is much more in line with the modernity of current technologies.
If, in addition to the SQL queries on the data in your Data Warehouse, you need to execute other types of queries and data transformations, you can take advantage of the Spark engine to create processes in notebooks, similar to how you would do it in other technologies, such as Databricks.
The Spark pool allows you to use Python, SQL and even C# (.NET) to process data in a Big Data environment, which you can configure and scale according to your requirements. (Decide how many machines you need or if it should automatically scale according to the necessary processing at a certain time.)
You can create as many pools as you require, for example, one for production processes and another for your engineers and data scientists to explore the Data Warehouse or Data Lakes.
In addition to the pools, you can also make queries directly to Data Lakes (with Azure Data Lake Gen2), which will help you execute SQL statements on files and directories in a simple way, paying only for the consumption generated by that query without having a cluster on for adh-hoc queries—that is, in a serverless environment.
In each Synapse Analytics environment, the serverless mode is already available, and to use it, your only job is to connect Synapse with the Azure Data Lake Gen2 repository for its use (and of course, have adequate governance over that repository to take advantage of all the potentiality of the serverless engine).
If you want to learn more about Synapse, Pluralsight has several courses dedicated to this great technology. Here are some of them: