Skip to content

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.
  • Labs icon Lab
  • A Cloud Guru
Google Cloud Platform icon
Labs

Querying Data in S3 with Amazon Athena

This hands-on lab allows you to practice analyzing data stored in S3 using SQL queries in AWS Athena. Additionally, we use AWS Glue to reduce storage costs and increase the efficiency of query scans. We'll walk through an interactive query service which helps to analyze data for various AWS services, including CloudFront access logs. You create a table, load the data partitions, and query the data in the table using SQL. Note: Until a new version of this lab is created, please use the legacy Athena interface for the query editor.

Google Cloud Platform icon
Labs

Path Info

Level
Clock icon Intermediate
Duration
Clock icon 1h 0m
Published
Clock icon May 10, 2019

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Table of Contents

  1. Challenge

    Create a Table from S3 Bucket Metadata

    1. Navigate to the Amazon Athena service:
    • Click Get Started if this is our first trip into Athena, otherwise continue to #2
    1. First, add an S3 location for your queries by clicking on the 'Before you run your first query, you need to set up a query result location in Amazon S3.' link
    2. Paste in the S3 Bucket ARN we copied earlier, being sure to remove "arn:aws:s3:::" from the beginning of the data we paste in and including a trailling slash
    3. Once the S3 location is properly configured you will notice the Run query button has been made active.
    4. In the query editor paste the following query, then press Ctrl+Enter to run the query:
    CREATE database aws_service_logs
    
    1. Under Tables, select Create Table > from S3 bucket data.
    2. Step 1: Name and Location:
    • Database: aws_service_logs
    • Table: cf_access_optimized
    • Location: s3://Name of the generated S3 bucket/ (including trailing slash)
    1. Step 2: Data Format
    • Select Parquet
    1. Step 3: Columns
    • Bulk add columns using this data:
    time timestamp, location string, bytes bigint, requestip string, method string, host string, uri string, status int, referrer string, useragent string, querystring string, cookie string, resulttype string, requestid string, hostheader string, requestprotocol string, requestbytes bigint, timetaken double, xforwardedfor string, sslprotocol string, sslcipher string, responseresulttype string, httpversion string
    
    1. Step 4: Partitions
    • Column Name: year, Column Type: string
    • Column Name: month, Column Type: string
    • Column Name: day, Column Type: string
    • Click Create table
    1. Click Run query on the generated SQL statement. Ensure the S3 bucket location in the query matches the one generated in your lab environment.
  2. Challenge

    Add Partition Metadata

    1. Open a new query tab
    2. Run the following query: MSCK REPAIR TABLE aws_service_logs.cf_access_optimized
    3. Verify the partitions were created with the following query: SELECT count(*) AS rowcount FROM aws_service_logs.cf_access_optimized. You should see 207535 rows present in the table.
    4. Run the following query: SELECT * FROM aws_service_logs.cf_access_optimized LIMIT 10
  3. Challenge

    Query the Total Bytes Served in a Date Range

    1. Perform the following query:
    SELECT SUM(bytes) AS total_bytes
    FROM aws_service_logs.cf_access_optimized
    WHERE time BETWEEN TIMESTAMP '2018-11-02' AND TIMESTAMP '2018-11-03'
    
    1. Observe the value for total_bytes equals 87310409.

The Cloud Content team comprises subject matter experts hyper focused on services offered by the leading cloud vendors (AWS, GCP, and Azure), as well as cloud-related technologies such as Linux and DevOps. The team is thrilled to share their knowledge to help you build modern tech solutions from the ground up, secure and optimize your environments, and so much more!

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.

Start learning by doing today

View Plans