- A Cloud Guru
Utilizing Write Sharding to Optimize Data Ingestion
In this lab, we investigate and improve a DynamoDB table loading script that is losing data by modifying the data item partition key to shard the table partitions.
Table of Contents
Investigate Provided Instance and Data
Log in to the provided EC2 instance with the credentials provided in the lab information. Look at the data being written to the table with the following command:
cat dataload/bin_meta_10 | jq
(OPTIONAL) Run Unmodified `loadtable.py`
loadtable.py.pywith the following command, and observe the results in the DynamoDB web console:
python3 loadtable.py > load.log &
(If you do this step, re-create the
AmazonBinstable with a partition key of
Partition, which is a string, and a sort key of
bin, which is also a string.)
loadtable.pyto modify the
Partitionkey for each item to create multiple partitions in the table. This can be accomplished by:
- Generating random characters
- Assigning alphanumeric partitions to records
- Using some value from the existing data to increase the cardinality of the values stored in the
Run Modified `loadtable.py`
Run your modified version of
python3 loadtable.py > load2.log &
In the DynamoDB console in the Metrics tab, observe the write capacity unit usage, throttled write request, and throttled write events. WCU usage should be around 2000, and both throttle metrics should be zero.
What's a lab?
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.