Skip to content

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.
  • Labs icon Lab
  • A Cloud Guru
Google Cloud Platform icon
Labs

Utilizing Write Sharding to Optimize Data Ingestion

In this lab, we investigate and improve a DynamoDB table loading script that is losing data by modifying the data item partition key to shard the table partitions.

Google Cloud Platform icon
Labs

Path Info

Level
Clock icon Advanced
Duration
Clock icon 45m
Published
Clock icon Mar 13, 2020

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Table of Contents

  1. Challenge

    Investigate Provided Instance and Data

    Log in to the provided EC2 instance with the credentials provided in the lab information. Look at the data being written to the table with the following command:

    cat dataload/bin_meta_10  | jq
    
  2. Challenge

    (OPTIONAL) Run Unmodified `loadtable.py`

    Run loadtable.py.py with the following command, and observe the results in the DynamoDB web console:

    python3 loadtable.py > load.log &
    

    (If you do this step, re-create the AmazonBins table with a partition key of Partition, which is a string, and a sort key of bin, which is also a string.)

  3. Challenge

    Modify `loadtable.py`

    Edit the transform function in loadtable.py to modify the Partition key for each item to create multiple partitions in the table. This can be accomplished by:

    • Generating random characters
    • Assigning alphanumeric partitions to records
    • Using some value from the existing data to increase the cardinality of the values stored in the Partition key
  4. Challenge

    Run Modified `loadtable.py`

    Run your modified version of loadtable.py:

    python3 loadtable.py > load2.log &
    
  5. Challenge

    Observe Results

    In the DynamoDB console in the Metrics tab, observe the write capacity unit usage, throttled write request, and throttled write events. WCU usage should be around 2000, and both throttle metrics should be zero.

The Cloud Content team comprises subject matter experts hyper focused on services offered by the leading cloud vendors (AWS, GCP, and Azure), as well as cloud-related technologies such as Linux and DevOps. The team is thrilled to share their knowledge to help you build modern tech solutions from the ground up, secure and optimize your environments, and so much more!

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.

Start learning by doing today

View Plans