Skip to content

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.
  • Labs icon Lab
  • A Cloud Guru
Google Cloud Platform icon

Categorizing Uploaded Data Using AWS Step Functions

AWS provides Step Functions as a way to help manage the flow of information through a pipeline of steps. This includes calling services such as Lambda, Glue, Athena, and DynamoDB, as well as performing some basic decisions and waiting for things to complete. Step Functions allow you to move state information in between steps and act on the state. In this lab, we'll build a serverless pipeline to translate audio to text, and sort the data based on keywords in the transcript. ### Prerequisites This is an advanced lab and is designed to challenge you, but it will reward you with a lot of valuable experience in a few parts of AWS. While solution code is available, as well as a full walkthrough, you should attempt to solve the challenges on your own. This lab is focused on helping you to learn Step Functions, so some resources and the more complex logic code are provided for you already in the lab. To get the most out of this lab, you should be familiar with the following: * IAM roles * S3 buckets * S3 events * Lambda functions * Python

Google Cloud Platform icon

Path Info

Clock icon Advanced
Clock icon 1h 30m
Clock icon Feb 08, 2021

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Table of Contents

  1. Challenge

    Prepare to Launch the Step Function

    While some resources have been provided, you will need to complete the following steps to finish configuring the environment.

    1. Create an IAM role to allow Step Functions to start Lambda functions.
    2. Create a Step Function. Use a default configuration for now, which will be properly set up later. You need the ARN of the Step Function.
    3. In the run-step-function-lambda, set the STATEMACHINEARN environment variable to the ARN of the Step Function you just created.
    4. Create an S3 Event Notification that will call the run-step-function-lambda.
    5. Restrict the notification to mp3 data created in the upload folder.
  2. Challenge

    Create the Step Function Flow

    Implement the following logic as a Step Function flow. Refer to the lab instructions to find resources already provided that will do some of these things for you. You can also reference the lab diagram for an example architecture.

    1. Create a Transcribe job to translate the audio uploaded to S3 into text.
    2. The Transcribe job can take a few minutes to run, so wait for it to complete.
    3. After 30 seconds, check the status of the Transcribe job to see if it has completed.
    4. If the Transcribe job has not completed, wait another 30 seconds for it to complete.
    5. If the Transcribe job has failed, stop the pipeline with an error.
    6. Once the Transcribe job has completed successfully, move the audio and transcript into categories based on the presence of keywords in the transcript.
  3. Challenge

    Create the Lambda Business Logic

    The Lambda functions that are used in the Step Function flow have been partially written, but there is a lot still to do. Complete all of the TODO lines.

    These functions are using the IAM role transcribe-audio-lambda-role, which gives them access to S3, Transcribe, and CloudWatch.

    1. Finish writing transcribe-audio-lambda.
      • Get the state from the function's input.
      • Construct the parameters needed for the Transcribe job.
      • Add the transcript's file key and the Transcribe job name into the state.
    2. Finish writing transcribe-status-lambda.
      • Retrieve the Transcribe job name from the state.
      • Add the job's status into the state.
    3. Finish writing categorize-data-lambda.
      • Retrieve the state information. You've seen it done twice. Do you remember how to now?
      • Extract pieces of the state for further processing.
      • Determine the output location. Read the preceding code blocks.
      • Move the files to their proper destinations.
      • Note that this function relies on the environment variable KEYWORDS, which is a comma-separated list of words that are sensitive. If you are using the provided audio file, you can set this to important if you want the audio to match, or boring if you don't want it to match.

    Helpful Resources:

  4. Challenge

    Categorize Audio Data

    With the previous steps complete, our serverless app is ready to do work! Let's test it out.

    1. Create an upload folder in the provided S3 bucket.
    2. Upload a test audio file to the upload folder.
    3. View the Input and Output of the steps in the Step Function as the state machine executes.
    4. Once your Step Function has completed, view the upload folder of the S3 bucket to confirm the file is no longer there.
    5. View the folder you categorized your audio into.

The Cloud Content team comprises subject matter experts hyper focused on services offered by the leading cloud vendors (AWS, GCP, and Azure), as well as cloud-related technologies such as Linux and DevOps. The team is thrilled to share their knowledge to help you build modern tech solutions from the ground up, secure and optimize your environments, and so much more!

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.

Start learning by doing today

View Plans