In this course, you will learn how to create big data machine learning experiments using the Microsoft Machine Learning Server. Detailed code examples in both R and Python demonstrate how to scale your code and work with Apache Spark and SQL Server.
Working with big data often exceeds the capacity of in-memory dataframes. In this course, Scalable Machine Learning with the Machine Learning Server, you will learn how to build scalable, end-to-end machine learning experiments using both R and Python using the Microsoft Machine Learning Server. First, you will learn how to import, process, transform, and visualize big data. Next, we will cover how to write custom, scalable, distributed functions which can be executed in a number of compute contexts. In addition, you will learn how to use the state of the art machine learning algorithms included in the MicrosoftML package. Then, we will integrate machine learning experiments into SQL Server. Finally, we will cover how to using the machine learning server with Hadoop and Spark, including integration with popular frameworks such as PySpark, SparkR and Sparklyr. We will spin up an HDInsight cluster in Microsoft Azure, and also build a Spark development environment from scracth. When you’re finished with this course, you will have the skills and knowledge needed to build scalable machine learning experiments using R and Python using XDF files, the Hadoop Distributed File System, SQL Server and Apache Spark.
Course Overview (Music) Hi everyone. My name is Shawn Hainsworth. Welcome to my course, Scalable Machine Learning with the Microsoft Machine Learning Server. I am a Microsoft Certified Solutions Associate in Machine Learning. I work in business intelligence and data analytics solutions for the legal industry, and I blog as the legalbiguy. In this course, we are going to break the memory barrier and perform machine learning on big data. The major topics that we will cover include scaling data processing and visualization, distributing machine learning across processors and partitions, building machine learning pipelines with SQL Server, and building machine learning pipelines with Hadoop and Spark. We will work through detailed code examples in both R and Python. We will spin up a Hadoop and Spark cluster using HDInsight on Microsoft Azure and integrate with popular frameworks like PySpark, SparkR, and sparklyr. By the end of this course, you'll know how to write highly scalable distributed machine learning algorithms that run in a variety of compute contexts. We will develop in RStudio, Visual Studio Code, and Jupyter Notebooks. In addition, I will show you how to get started with your own Spark development cluster. Before beginning the course, you should have some experience programming in either R or Python, and it will be helpful, although not necessary, to have some familiarity with SQL Server, Hadoop, and Spark. I hope you will join me on this journey to learn machine learning techniques for big data with the Scalable Machine Learning with the Microsoft Machine Learning Server course, at Pluralsight.