In the modern world, data has become a critical factor in companies making decisions about their business and understanding their customers. But making sense of this data is no small or cheap task. In this course, Managing Big Data with AWS Storage Options, you will learn how to process large amounts of data generated by your company using Amazon Web Services. First, you will learn how to collect, process and store petabytes of data into Amazon Redshift using AWS Glue. Next, you will discover how to process and store real-time data using Amazon Kinesis Data Firehose. Finally, you will explore how to use Amazon Elastic Map Reduce to provision clusters and perform data analysis on your big data, by turning it into meaningful insights. At the end of this course, you will be able to use Amazon Web Services to set up an infrastructure to collect and analyze petabytes of data, to take your business to the next level. In order to get the most out of this course, you will need to have an AWS account to follow the exercises demonstrated.
Course Overview Hi everyone, my name is Nertil, and welcome to my course, Managing Big Data with AWS Storage Options. In today's world, data is the new oil, and managing the data to derive insights about your business and customers could be just what you need to take your firm to the next level, and the best part is, using Amazon Web Services, you don't need any infrastructure to get up and running. Learn how to collect, store, and process big data for your firm using Amazon Redshift and Amazon Elastic MapReduce. Some of the major topics that we will cover include getting started with Amazon Redshift, using AWS Glue to perform ETL into our data warehouse, processing real time data using Amazon Kinesis Firehose for real time reports, provisioning EMR clusters for different scenarios, and securing and encrypting data for our EMR clusters. By the end of the course, you'll know how to use AWS Glue and Amazon Kinesis Data Firehose to populate your data warehouse with data from different sources generated by your firm, and also learn how to provision EMR clusters to digest petabytes of unstructured datasets to meaningful insights to better understand your customers and your firm. Before beginning this course, I recommend being familiar with ETL and writing ETL scripts using the PySpark framework, and the Hadoop framework for processing large datasets across clusters. I hope you'll join me on this journey to learn big data solutions with Managing Big Data with AWS Storage Options course, at Pluralsight.