- Lab
-
Libraries: If you want this lab, consider one of these libraries.
- Data
Analyze Sales Trends with MongoDB Aggregations
Dive into MongoDB’s powerful aggregation framework in this hands-on, beginner-friendly lab! Learn how to filter recent sales, calculate revenue by product or category, identify top-selling items, and format results like a pro data analyst.
Lab Info
Table of Contents
-
Challenge
Introduction
Step 1: Introduction to Analyzing Sales Trends with MongoDB Aggregations
This Code Lab introduces you to analyzing sales data using MongoDB's powerful aggregation framework. You will work with the
salescollection in thesalesDBdatabase to explore sales trends. In this lab, you will learn how to filter recent sales, calculate total revenue by product or category, identify top-selling items, format output fields, and combine these operations into a multi-stage aggregation pipeline.The dataset available for this lab will be loaded into the
salescollection within thesalesDBdatabase. It contains aggregated sales records with the following fields:date: The date of the sales transactions, stored as anISODate.product_name: The name of the product sold.category: The category of the product.quantity: The total quantity of the product sold on that date.revenue: The total revenue generated from the product on that date.
These fields allow you to analyze sales trends from August 24, 2024, to May 23, 2025.
Starting Notes:
- MongoDB Environment: MongoDB is already running in the lab environment. You will load the dataset into the
salesDBdatabase using thesalescollection in the steps below. - Accessing the Data: You are currently in a shell environment. You will first load the dataset using
mongoimport, then enter the MongoDB Shell (mongosh) to interact with thesalesDBdatabase. - Dataset Structure: The dataset contains 436 aggregated sales records. The
datefield will be stored as anISODate(e.g.,ISODate("2024-08-24T00:00:00Z")), withproduct_nameandcategoryas strings,quantityas an integer, andrevenueas a double. - Initial Setup: Begin by loading the dataset and inspecting it to understand its structure and contents.
Task 1.1: Load the Sales Data into MongoDB
In the terminal, run the following command to load the sales data into the
salescollection in thesalesDBdatabase:mongoimport --db=salesDB --collection=sales --type=json --file=sales.json --dropExpected Output
You should see output similar to the following in the **MongoDB Shell**:2025-05-17T21:12:34.123+0000 connected to: mongodb://localhost/ 2025-05-17T21:12:34.456+0000 dropping: salesDB.sales 2025-05-17T21:12:34.789+0000 436 document(s) imported successfully. 0 document(s) failed to import.This command uses
mongoimportto load the data, and the--dropoption ensures that any existingsalescollection is dropped before importing the new data, avoiding duplicates.Task 1.2: Connect to the Database and Inspect the Sales Collection
In the terminal, run the following command to enter the MongoDB Shell for the
salesDBdatabase:mongosh salesDBOnce inside the MongoDB Shell (
mongosh), execute the following commands to inspect thesalescollection:Count the total number of documents:
db.sales.countDocuments()Expected Output
`436`The
countDocuments({})command confirms the total number of records (436) in the collection.Inspect the first 5 documents in the sales collection:
db.sales.find().limit(5)Expected Output
You should see output similar to the following in the **MongoDB Shell**:[ { _id: ObjectId('68282780d16f8059e36cfa97'), date: ISODate('2024-08-25T00:00:00.000Z'), product_name: 'Book', category: 'Books', quantity: 3, revenue: 38.97 }, { _id: ObjectId('68282780d16f8059e36cfa98'), date: ISODate('2024-08-24T00:00:00.000Z'), product_name: 'Shirt', category: 'Apparel', quantity: 2, revenue: 59.98 }, { _id: ObjectId('68282780d16f8059e36cfa99'), date: ISODate('2024-08-24T00:00:00.000Z'), product_name: 'Headphones', category: 'Electronics', quantity: 1, revenue: 59.99 }, { _id: ObjectId('68282780d16f8059e36cfa9a'), date: ISODate('2024-08-26T00:00:00.000Z'), product_name: 'Shoes', category: 'Apparel', quantity: 1, revenue: 49.99 }, { _id: ObjectId('68282780d16f8059e36cfa9b'), date: ISODate('2024-08-26T00:00:00.000Z'), product_name: 'TV', category: 'Electronics', quantity: 1, revenue: 399.99 } ]The
find().limit(5)command displays the first five documents, showing the structure of the aggregated data, including thedate,product_name,category,quantity, andrevenuefields.Now that you have connected to the
salesDBdatabase and inspected thesalescollection, you are ready to proceed with filtering and analyzing the sales data using MongoDB aggregation pipelines. -
Challenge
Use `$match` to filter orders from the last 6 months
Step 2: Use
$matchto Filter Orders From the Last 6 MonthsIn this step, you'll learn how to use the
$matchstage in MongoDB's aggregation framework to filter orders from the last six months. You'll explore the$gteand$lteoperators for date filtering, understand how to constructISODateobjects, and see how to filter by additional fields likecategory. By the end, you'll be able to select specific subsets of orders for further analysis.For this lab, assume today's date is May 23, 2025. You'll filter orders from the last six months, which covers November 23, 2024, to May 23, 2025, inclusive. The
salescollection in thesalesDBdatabase, loaded in Step 1, contains 436 records spanning August 24, 2024, to May 23, 2025.Understanding the
$matchStageThe
$matchstage filters documents in an aggregation pipeline based on specified conditions. It's similar to thefind()method but is designed to work within a pipeline, passing only matching documents to subsequent stages. Here, you'll use$matchto select orders from the last six months.Filtering by Date with
$gteand$lteLet's start with a general example of filtering orders between two specific dates. Suppose you want to filter orders placed between April 23, 2025, and May 23, 2025. In MongoDB, dates are stored as
ISODateobjects, and you can compare them using operators like$gte(greater than or equal to) and$lte(less than or equal to).To specify dates in a query, use the
ISODate()constructor with the formatISODate("YYYY-MM-DD"). For example:ISODate("2025-04-23")represents April 23, 2025.ISODate("2025-05-23")represents May 23, 2025.
Here's how you can use
$matchto filter orders within this date range:db.sales.aggregate([ { $match: { date: { $gte: ISODate("2025-04-23"), $lte: ISODate("2025-05-23") } } } ])After running the code in MongoDB Shell, you should see a result similar to the following:
{ _id: ObjectId('68292accaeb470deaab86bd4'), date: ISODate('2025-04-23T00:00:00.000Z'), product_name: 'Book', category: 'Books', quantity: 1, revenue: 12.99 } { _id: ObjectId('68292accaeb470deaab86bd5'), date: ISODate('2025-04-24T00:00:00.000Z'), product_name: 'Shirt', category: 'Apparel', quantity: 2, revenue: 59.98 } { _id: ObjectId('68292accaeb470deaab86bd6'), date: ISODate('2025-04-24T00:00:00.000Z'), product_name: 'Shoes', category: 'Apparel', quantity: 3, revenue: 149.97 } ... Type "it" for moreFor your analysis, you'll filter orders from the last six months. Given today's date of May 23, 2025, this means orders from November 23, 2024, to May 23, 2025.
Note: Using both
$gteand$lteoperators is not mandatory; you can use just$gteif your dataset has no future dates.
Task 2.1: Filter Orders from the Last Six Months
In the MongoDB Shell, write an aggregation pipeline with a
$matchstage to filter orders wheredateis between November 23, 2024, and May 23, 2025, inclusive. Add a$limitstage to display the first 5 matching documents.Solution
db.sales.aggregate([ { $match: { date: { $gte: ISODate("2024-11-23"), $lte: ISODate("2025-05-23") } } }, { $limit: 5 } ])Expected Output
You should see five orders with dates between November 23, 2024, and May 23, 2025. For example:{ _id: ObjectId('68292accaeb470deaab86abf'), date: ISODate('2024-11-23T00:00:00.000Z'), product_name: 'Backpack', category: 'Accessories', quantity: 2, revenue: 79.98 } { _id: ObjectId('68292accaeb470deaab86ac0'), date: ISODate('2024-11-23T00:00:00.000Z'), product_name: 'Book', category: 'Books', quantity: 4, revenue: 51.96 } { _id: ObjectId('68292accaeb470deaab86ac1'), date: ISODate('2024-11-24T00:00:00.000Z'), product_name: 'Book', category: 'Books', quantity: 1, revenue: 12.99 } { _id: ObjectId('68292accaeb470deaab86ac2'), date: ISODate('2024-11-24T00:00:00.000Z'), product_name: 'Pen', category: 'Stationery', quantity: 2, revenue: 5.98 } { _id: ObjectId('68292accaeb470deaab86ac3'), date: ISODate('2024-11-23T00:00:00.000Z'), product_name: 'Phone', category: 'Electronics', quantity: 4, revenue: 2799.96 }
Filtering by Category
You can add more conditions to
$matchto filter by other fields, such ascategory. For example, to select only "Electronics" orders from the last six months:db.sales.aggregate([ { $match: { date: { $gte: ISODate("2024-11-23"), $lte: ISODate("2025-05-23") }, category: "Electronics" } } ])This pipeline keeps only documents where
dateis within the last six months andcategoryis "Electronics."
Task 2.2: Filter Orders by Category
In the MongoDB Shell, write an aggregation pipeline to filter orders from the last six months in the "Apparel" category. Add a
$limitstage to display the first 3 matching documents.Solution
db.sales.aggregate([ { $match: { date: { $gte: ISODate("2024-11-23"), $lte: ISODate("2025-05-23") }, category: "Apparel" } }, { $limit: 3 } ])Expected Output
You should see three orders from the "Apparel" category with dates between November 23, 2024, and May 23, 2025. For example:{ _id: ObjectId('68292accaeb470deaab86ac4'), date: ISODate('2024-11-24T00:00:00.000Z'), product_name: 'Shirt', category: 'Apparel', quantity: 2, revenue: 59.98 } { _id: ObjectId('68292accaeb470deaab86ad4'), date: ISODate('2024-12-01T00:00:00.000Z'), product_name: 'Shoes', category: 'Apparel', quantity: 2, revenue: 99.98 } { _id: ObjectId('68292accaeb470deaab86adc'), date: ISODate('2024-12-06T00:00:00.000Z'), product_name: 'Shoes', category: 'Apparel', quantity: 4, revenue: 199.96 }
Conclusion
In this step, you've learned how to use the
$matchstage to filter orders by date and category. The$gteand$lteoperators, combined withISODateobjects, allow precise date filtering, while additional conditions can refine your query further. In the next step, you'll use$groupto calculate total revenue per product or category, building on this filtered data. -
Challenge
Use `$group` to Calculate Total Revenue per Product or Category
Step 3: Use
$groupto Calculate Total Revenue per Product or CategoryIn this step, you'll learn how to use the
$groupstage in MongoDB's aggregation framework to calculate total revenue per product or category. You'll understand the role of the_idfield and accumulators like$sum, and see how to group by different fields or use multiple accumulators. By the end, you'll be able to summarize your sales data effectively.Understanding the
$groupStageThe
$groupstage groups documents by a specified field and applies accumulators to compute aggregated values for each group. It's perfect for summarizing data, such as calculating total revenue or total quantity sold.Calculating Total Revenue per Product
To calculate the total revenue for each product, you'll group the documents by
product_nameand use the$sumaccumulator to add up therevenuefield. Here's how you can do it:db.sales.aggregate([ { $group: { _id: "$product_name", total_revenue: { $sum: "$revenue" } } } ])In this pipeline:
_id: "$product_name"specifies that you want to group by theproduct_namefield.total_revenue: { $sum: "$revenue" }calculates the sum of therevenuefield for each group.
This will give you a list of documents, each showing a product and its total revenue as follows:
{ _id: 'Book', total_revenue: 1740.66 } { _id: 'TV', total_revenue: 36399.090000000004 } { _id: 'Pen', total_revenue: 475.41 } { _id: 'Backpack', total_revenue: 6638.34 } { _id: 'Shirt', total_revenue: 4708.429999999999 } { _id: 'Phone', total_revenue: 125998.2 } { _id: 'Shoes', total_revenue: 8748.25 } { _id: 'Headphones', total_revenue: 9358.44 }
Task 3.1: Calculate Total Revenue per Category
In the MongoDB Shell, write an aggregation pipeline to calculate the total revenue for each category. Use the
$groupstage with_id: "$category"and the$sumaccumulator for therevenuefield. Add a$limitstage to display the first 3 results.Solution
db.sales.aggregate([ { $group: { _id: "$category", total_revenue: { $sum: "$revenue" } } }, { $limit: 3 } ])Expected Output
You should see three documents, each showing a category and its total revenue:{ _id: 'Stationery', total_revenue: 475.41 } { _id: 'Apparel', total_revenue: 13456.68 } { _id: 'Accessories', total_revenue: 6638.34 }
Using Multiple Accumulators
You can use multiple accumulators in the same
$groupstage to calculate different aggregated values. For example, you might want to calculate both the total revenue and the total quantity sold for each product.Here's how you can do it:
db.sales.aggregate([ { $group: { _id: "$product_name", total_revenue: { $sum: "$revenue" }, total_quantity: { $sum: "$quantity" } } } ])In this pipeline:
_id: "$product_name"groups the documents byproduct_name.total_revenue: { $sum: "$revenue" }calculates the sum ofrevenue.total_quantity: { $sum: "$quantity" }calculates the sum ofquantity.
This will give you documents with both total revenue and total quantity for each product:
{ _id: 'Headphones', total_revenue: 9358.44, total_quantity: 156 } { _id: 'Book', total_revenue: 1740.66, total_quantity: 134 } { _id: 'TV', total_revenue: 36399.090000000004, total_quantity: 91 } { _id: 'Shirt', total_revenue: 4708.429999999999, total_quantity: 157 } { _id: 'Pen', total_revenue: 475.41, total_quantity: 159 } { _id: 'Phone', total_revenue: 125998.2, total_quantity: 180 } { _id: 'Backpack', total_revenue: 6638.34, total_quantity: 166 } { _id: 'Shoes', total_revenue: 8748.25, total_quantity: 175 }
Task 3.2: Use Multiple Accumulators
In the MongoDB Shell, write an aggregation pipeline to calculate the total revenue and the total quantity sold for each category. Use the
$groupstage with_id: "$category", and include two accumulators: one for summingrevenueand another for summingquantity. Add a$limitstage to display the first 2 results.Solution
db.sales.aggregate([ { $group: { _id: "$category", total_revenue: { $sum: "$revenue" }, total_quantity: { $sum: "$quantity" } } }, { $limit: 2 } ])Expected Output
You should see two documents, each showing a category, its total revenue, and its total quantity sold. For example:{ _id: 'Stationery', total_revenue: 475.41, total_quantity: 159 } { _id: 'Books', total_revenue: 1740.66, total_quantity: 134 }
Conclusion
In this step, you've learned how to use the
$groupstage to calculate total revenue per product or category. You've seen how to group by different fields and use multiple accumulators to compute various aggregated values. In the next step, you'll use$sortand$limitto find the top-selling items, building on the skills you've gained here. -
Challenge
Use $sort and $limit to Find the Top 5 Best-Selling Items
Step 4: Use
$sortand$limitto Find the Top 5 Best-Selling ItemsIn this step, you'll learn how to use the
$sortand$limitstages in MongoDB's aggregation framework to identify top-performing products. You'll explore sorting data in ascending and descending order, combine these stages to find the top 5 best-selling products by revenue, and understand how to rank items effectively. By the end, you'll be able to highlight key performers in your sales data.For this lab, you'll continue working with the
salescollection in thesalesDBdatabase.Understanding the
$sortStageThe
$sortstage arranges documents in an aggregation pipeline based on a specified field. You can sort in ascending order (lowest to highest) or descending order (highest to lowest), making it ideal for ranking items like top- or bottom-selling categories or products.Sorting with
$sortTo sort data, you'll use the
$sortstage with a field and a sort direction:1for ascending or-1for descending. For example, to sort categories by total revenue, you first group the data (from Step 3) and then sort it.Here's an example that groups by
category, calculates total revenue, and sorts bytotal_revenuein ascending order to show the worst-selling categories first:db.sales.aggregate([ { $group: { _id: "$category", total_revenue: { $sum: "$revenue" } } }, { $sort: { total_revenue: 1 } } ])In this pipeline:
$groupcalculates the total revenue for each category.$sort: { total_revenue: 1 }sorts the results bytotal_revenuein ascending order (lowest to highest).
Running this in the MongoDB Shell gives you a result like this:
{ _id: 'Stationery', total_revenue: 475.41 } { _id: 'Books', total_revenue: 1740.66 } { _id: 'Accessories', total_revenue: 6638.34 } { _id: 'Apparel', total_revenue: 13456.68 } { _id: 'Electronics', total_revenue: 171755.73 }If you want descending order (highest to lowest, for top performers), use
-1instead:{ $sort: { total_revenue: -1 } }This would reverse the order, starting with the highest
total_revenue(e.g., "Electronics").Combining
$sortand$limitBy combining
$sortand$limit, you can pinpoint the top performers in your data. For example, to find the top 5 best-selling products by revenue, you'll group byproduct_name, sort by total revenue in descending order, and limit to 5 results.
Task 4.1: Find the Top 5 Best-Selling Products by Revenue
In the MongoDB Shell, write an aggregation pipeline to find the top 5 best-selling products by total revenue. Use
$groupto calculate total revenue per product,$sortto order by revenue in descending order, and$limitto show only the top 5 results.Solution
db.sales.aggregate([ { $group: { _id: "$product_name", total_revenue: { $sum: "$revenue" } } }, { $sort: { total_revenue: -1 } }, { $limit: 5 } ])Expected Output
You should see five documents, each showing a product and its total revenue, sorted from highest to lowest:{ _id: 'Phone', total_revenue: 125998.2 } { _id: 'TV', total_revenue: 36399.09 } { _id: 'Headphones', total_revenue: 9358.44 } { _id: 'Shoes', total_revenue: 8748.25 } { _id: 'Backpack', total_revenue: 6638.34 }
Conclusion
In this step, you've learned how to use the
$sortand$limitstages to identify top-performing products. You've seen how to sort data in ascending (1) or descending (-1) order and combine with$limitto focus on key results. In the next step, you'll learn how to use the$projectstage to rename or format fields. -
Challenge
Add a $project Stage to Rename or Format Fields
Step 5: Add a
$projectStage to Rename or Format FieldsIn this step, you'll learn how to use the
$projectstage in MongoDB's aggregation framework to shape the output of your pipeline. You'll explore including or excluding fields, renaming fields, formatting values, and creating complex projections. By the end, you'll be able to customize your aggregation results to make them more readable and relevant for analysis.For this lab, you'll continue working with the
salescollection in thesalesDBdatabase.Understanding the
$projectStageThe
$projectstage reshapes documents in an aggregation pipeline by specifying which fields to include, exclude, or transform. It's like a filter for your output, allowing you to rename fields, format values, or compute new ones, making your results cleaner and more tailored to your needs.Using
$projectto Shape OutputTo begin, consider a simple
$projectexample. Suppose you've grouped your data byproduct_nameto calculate total revenue (from Step 3). You want to renametotal_revenuetorevenueand exclude the_idfield. Here's how you can do it:db.sales.aggregate([ { $group: { _id: "$product_name", total_revenue: { $sum: "$revenue" } } }, { $project: { _id: 0, product: "$_id", revenue: "$total_revenue" } } ])In this pipeline:
$groupcalculates the total revenue per product.$projectrenames_idtoproductandtotal_revenuetorevenue, and excludes_idfrom the output.
Running this in the MongoDB Shell gives you a result like this:
{ product: 'Shirt', revenue: 4708.43 } { product: 'TV', revenue: 36399.09 } { product: 'Backpack', revenue: 6638.34 } { product: 'Phone', revenue: 125998.2 } { product: 'Book', revenue: 1740.66 } { product: 'Headphones', revenue: 9358.44 } { product: 'Pen', revenue: 475.41 } { product: 'Shoes', revenue: 8748.25 }Including, Excluding, and Suppressing Fields
In
$project, you can control which fields appear in the output:- Include a field: Set it to
1(e.g.,{ field: 1 }) to keep it. - Exclude a field: Set it to
0(e.g.,{ field: 0 }) to remove it. - Rename a field: Assign a new name using a
$reference (e.g.,{ newName: "$oldName" }). - Suppress the
_idfield: Explicitly set_id: 0to exclude it, as it's included by default.
For example, to include only
total_revenueand exclude_idafter grouping:db.sales.aggregate([ { $group: { _id: "$product_name", total_revenue: { $sum: "$revenue" } } }, { $project: { _id: 0, total_revenue: 1 } } ])Running this in the MongoDB Shell produces:
{ total_revenue: 4708.43 } { total_revenue: 36399.09 } { total_revenue: 6638.34 } { total_revenue: 125998.2 } { total_revenue: 1740.66 } { total_revenue: 9358.44 } { total_revenue: 475.41 } { total_revenue: 8748.25 }You can't mix inclusion and exclusion for non-
_idfields in the same$projectstage (e.g.,{ field1: 1, field2: 0 }is invalid). Either specify all fields to include (others are excluded) or all fields to exclude (others are included).Formatting Fields
You can also use
$projectto format field values. For example, you might want to round thetotal_revenueto two decimal places using the$roundoperator. Here's how:db.sales.aggregate([ { $group: { _id: "$product_name", total_revenue: { $sum: "$revenue" } } }, { $project: { _id: 0, product: "$_id", revenue: { $round: ["$total_revenue", 2] } } } ])This pipeline rounds
total_revenueto two decimal places, ensuring clean output. The result in the MongoDB Shell looks like:{ product: 'Shirt', revenue: 4708.43 } { product: 'TV', revenue: 36399.09 } { product: 'Backpack', revenue: 6638.34 } { product: 'Phone', revenue: 125998.2 } { product: 'Book', revenue: 1740.66 } { product: 'Headphones', revenue: 9358.44 } { product: 'Pen', revenue: 475.41 } { product: 'Shoes', revenue: 8748.25 }
Task 5.1: Project and Format Top Products
In the MongoDB Shell, write an aggregation pipeline to find the top 5 products by total revenue. Use
$groupto calculate total revenue per product,$projectto rename_idtoproduct, renametotal_revenuetorevenueand round it to two decimal places, exclude_id,$sortto order by revenue in descending order, and$limitto show the top 5 results.Solution
db.sales.aggregate([ { $group: { _id: "$product_name", total_revenue: { $sum: "$revenue" } } }, { $project: { _id: 0, product: "$_id", revenue: { $round: ["$total_revenue", 2] } } }, { $sort: { revenue: -1 } }, { $limit: 5 } ])Expected Output
You should see five documents, each showing a product and its rounded total revenue, sorted from highest to lowest:{ product: 'Phone', revenue: 125998.2 } { product: 'TV', revenue: 36399.09 } { product: 'Headphones', revenue: 9358.44 } { product: 'Shoes', revenue: 8748.25 } { product: 'Backpack', revenue: 6638.34 }
Conclusion
In this step, you've learned how to use the
$projectstage to shape and format your aggregation output. You've seen how to include or exclude fields, suppress_id, rename fields, and apply formatting like rounding. You've also tackled a complex projection to compute new fields. In the next step, you'll combine these all the techniques you've learned in this code lab. -
Challenge
Combine Stages into a 3-Step Pipeline and Interpret the Result
Step 6: Combine Stages into a 3-Step Pipeline and Interpret the Result
In this step, you'll learn how to build a multi-stage aggregation pipeline by combining
$match,$group,$sort,$project, and$limitto analyze your sales data. You'll see how each stage transforms the data, create a complete four-step pipeline to find top-performing categories, and interpret the results to identify trends. By the end, you'll be able to construct and understand complex pipelines to extract meaningful insights.For this lab, you'll continue working with the
salescollection in thesalesDBdatabase. Assume today's date is May 23, 2025, so you'll filter orders from the last six months, starting from November 23, 2024.Building a Three-Step Multi-Stage Pipeline
To start, consider a pipeline that filters orders from the last six months, groups them by
product_nameto calculate total revenue, and sorts the results by revenue. Here's how you can do it:db.sales.aggregate([ { $match: { date: { $gte: ISODate("2024-11-23") } } }, { $group: { _id: "$product_name", total_revenue: { $sum: "$revenue" } } }, { $sort: { total_revenue: -1 } } ])Running this in the MongoDB Shell gives you a result like this:
{ _id: 'Phone', total_revenue: 111298.41 } { _id: 'TV', total_revenue: 35999.1 } { _id: 'Headphones', total_revenue: 8338.61 } { _id: 'Shoes', total_revenue: 7948.41 } { _id: 'Backpack', total_revenue: 6118.47 } { _id: 'Shirt', total_revenue: 3658.78 } { _id: 'Book', total_revenue: 1389.93 } { _id: 'Pen', total_revenue: 415.61 }This pipeline filters recent orders, aggregates revenue by product, and ranks products by total revenue, with top performers like "Phone" and "TV" appearing first.
How Each Stage Transforms the Data
Each stage in the pipeline plays a specific role in shaping the data:
$match: Filters the documents to include only those that meet your criteria. Here, it selects orders from November 23, 2024 (because the dataset ends on May 23, 2025), reducing the dataset to relevant records. For example, it might narrow down thousands of orders to those within the six-month period.$group: Aggregates the filtered documents by a specified field, applying accumulators like$sum. In this case, it groups byproduct_nameand sums therevenuefield, transforming individual order documents into summary documents with total revenue.$sort: Orders the grouped documents based on a field, such astotal_revenue. Using-1(descending) ensures high-revenue items appear first, making it easy to identify top performers.
Together, these stages transform raw order data into a ranked list of products by revenue, ready for analysis.
A Four-Step Pipeline
Now, you'll build a four-step pipeline to analyze products for the last six months. This pipeline will filter orders, group by
product_name, sort by total revenue, and project the output for clarity:db.sales.aggregate([ { $match: { date: { $gte: ISODate("2024-11-23") } } }, { $group: { _id: "$product_name", total_revenue: { $sum: "$revenue" } } }, { $sort: { total_revenue: -1 } }, { $project: { _id: 0, product: "$_id", revenue: { $round: ["$total_revenue", 2] } } } ])Running this in the MongoDB Shell produces:
{ product: 'Phone', revenue: 111298.41 } { product: 'TV', revenue: 35999.1 } { product: 'Headphones', revenue: 8338.61 } { product: 'Shoes', revenue: 7948.41 } { product: 'Backpack', revenue: 6118.47 } { product: 'Shirt', revenue: 3658.78 } { product: 'Book', revenue: 1389.93 } { product: 'Pen', revenue: 415.61 }This output shows products ranked by total revenue, with fields renamed and revenue rounded for clarity.
Interpreting the Results
The results reveal key trends in your sales data:
- Electronics Dominates: With "Phone" ($111,298.41) and "TV" ($35,999.1) at the top, the "Electronics" category is the top performer, driven by high-value items. This suggests strong demand for tech products.
- Apparel is Strong: Products like "Shoes" ($7,948.41) and "Shirt" ($3,658.78) indicate that "Apparel" is a solid performer, though it trails behind "Electronics".
- Lower Performers: "Backpack" ($6,118.47, "Accessories"), "Book" ($1,389.93, "Books"), and "Pen" ($415.61, "Stationery") have lower revenues. "Stationery" is the weakest, possibly due to low unit prices or limited demand.
These insights can guide business decisions, such as focusing marketing efforts on "Electronics" or exploring ways to boost "Stationery" sales.
Task 6.1: Build a Five-Step Pipeline for Top Categories
In the MongoDB Shell, write an aggregation pipeline to find the top 3 categories by total revenue for orders from the last six months (from November 23, 2024). Use
$matchto filter orders by date,$groupto calculate total revenue per category,$sortto order by revenue in descending order,$projectto rename_idtocategoryandtotal_revenuetorevenue(rounded to two decimal places) and exclude_id, and$limitto show only the top 3 results.Solution
db.sales.aggregate([ { $match: { date: { $gte: ISODate("2024-11-23") } } }, { $group: { _id: "$category", total_revenue: { $sum: "$revenue" } } }, { $sort: { total_revenue: -1 } }, { $project: { _id: 0, category: "$_id", revenue: { $round: ["$total_revenue", 2] } } }, { $limit: 3 } ])Expected Output
You should see three documents, each showing a category and its rounded total revenue, sorted from highest to lowest:{ category: 'Electronics', revenue: 155636.12 } { category: 'Apparel', revenue: 11607.19 } { category: 'Accessories', revenue: 6118.47 }
Conclusion
In this step, you've learned how to combine
$match,$group,$sort,$project, and$limitinto a powerful multi-step pipeline to analyze sales data. You've seen how each stage transforms the data and interpreted the results to identify top-performing categories.Now that you've completed this code lab, you're ready to analyze sales trends with MongoDB aggregations!
About the author
Real skill practice before real-world application
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Learn by doing
Engage hands-on with the tools and technologies you’re learning. You pick the skill, we provide the credentials and environment.
Follow your guide
All labs have detailed instructions and objectives, guiding you through the learning process and ensuring you understand every step.
Turn time into mastery
On average, you retain 75% more of your learning if you take time to practice. Hands-on labs set you up for success to make those skills stick.