Implement a Data Ingestion Solution Using AWS Glue
In this lab, you'll practice ingesting semi-structured JSON sample data into a normalized AWS Glue Data Catalog from a source S3 data store. When you're finished, you'll have configured AWS Glue to continuously crawl S3 for new data every 12 hours.
Terms and conditions apply.
Obtain Source Data Files
Download the JSON source data files from the official AWS Samples GitHub repository that will be imported into S3 and ingested by AWS Glue.
Create an S3 Bucket
Provision an S3 bucket that will be used by AWS Glue as the primary data store for its Data Catalog.
Upload Source Data Files to S3
Load the JSON source data files into the S3 data store.
Create the Crawler
Provision a crawler in AWS Glue to populate the AWS Glue Data Catalog every 12 hours with tables from the source S3 data store.
Manually Run the Crawler
Execute the crawler manually to verify it performs as expected, and populate the AWS Glue Data Catalog with tables from the source S3 data store.
Provided environment for hands-on practice
We will provide the credentials and environment necessary for you to practice right within your browser.
Follow along with the author’s guided walkthrough and build something new in your provided environment!
Did you know?
On average, you retain 75% more of your learning if you get time for practice.
- Amazon S3
- AWS Glue