- Lab
- Data

Analyze Sales Trends with MongoDB Aggregations
Dive into MongoDB’s powerful aggregation framework in this hands-on, beginner-friendly lab! Learn how to filter recent sales, calculate revenue by product or category, identify top-selling items, and format results like a pro data analyst.

Path Info
Table of Contents
-
Challenge
Introduction
Step 1: Introduction to Analyzing Sales Trends with MongoDB Aggregations
This Code Lab introduces you to analyzing sales data using MongoDB's powerful aggregation framework. You will work with the
sales
collection in thesalesDB
database to explore sales trends. In this lab, you will learn how to filter recent sales, calculate total revenue by product or category, identify top-selling items, format output fields, and combine these operations into a multi-stage aggregation pipeline.The dataset available for this lab will be loaded into the
sales
collection within thesalesDB
database. It contains aggregated sales records with the following fields:date
: The date of the sales transactions, stored as anISODate
.product_name
: The name of the product sold.category
: The category of the product.quantity
: The total quantity of the product sold on that date.revenue
: The total revenue generated from the product on that date.
These fields allow you to analyze sales trends from August 24, 2024, to May 23, 2025.
Starting Notes:
- MongoDB Environment: MongoDB is already running in the lab environment. You will load the dataset into the
salesDB
database using thesales
collection in the steps below. - Accessing the Data: You are currently in a shell environment. You will first load the dataset using
mongoimport
, then enter the MongoDB Shell (mongosh
) to interact with thesalesDB
database. - Dataset Structure: The dataset contains 436 aggregated sales records. The
date
field will be stored as anISODate
(e.g.,ISODate("2024-08-24T00:00:00Z")
), withproduct_name
andcategory
as strings,quantity
as an integer, andrevenue
as a double. - Initial Setup: Begin by loading the dataset and inspecting it to understand its structure and contents.
Task 1.1: Load the Sales Data into MongoDB
In the terminal, run the following command to load the sales data into the
sales
collection in thesalesDB
database:mongoimport --db=salesDB --collection=sales --type=json --file=sales.json --drop
Expected Output
You should see output similar to the following in the **MongoDB Shell**:2025-05-17T21:12:34.123+0000 connected to: mongodb://localhost/ 2025-05-17T21:12:34.456+0000 dropping: salesDB.sales 2025-05-17T21:12:34.789+0000 436 document(s) imported successfully. 0 document(s) failed to import.
This command uses
mongoimport
to load the data, and the--drop
option ensures that any existingsales
collection is dropped before importing the new data, avoiding duplicates.Task 1.2: Connect to the Database and Inspect the Sales Collection
In the terminal, run the following command to enter the MongoDB Shell for the
salesDB
database:mongosh salesDB
Once inside the MongoDB Shell (
mongosh
), execute the following commands to inspect thesales
collection:Count the total number of documents:
db.sales.countDocuments()
Expected Output
`436`The
countDocuments({})
command confirms the total number of records (436) in the collection.Inspect the first 5 documents in the sales collection:
db.sales.find().limit(5)
Expected Output
You should see output similar to the following in the **MongoDB Shell**:[ { _id: ObjectId('68282780d16f8059e36cfa97'), date: ISODate('2024-08-25T00:00:00.000Z'), product_name: 'Book', category: 'Books', quantity: 3, revenue: 38.97 }, { _id: ObjectId('68282780d16f8059e36cfa98'), date: ISODate('2024-08-24T00:00:00.000Z'), product_name: 'Shirt', category: 'Apparel', quantity: 2, revenue: 59.98 }, { _id: ObjectId('68282780d16f8059e36cfa99'), date: ISODate('2024-08-24T00:00:00.000Z'), product_name: 'Headphones', category: 'Electronics', quantity: 1, revenue: 59.99 }, { _id: ObjectId('68282780d16f8059e36cfa9a'), date: ISODate('2024-08-26T00:00:00.000Z'), product_name: 'Shoes', category: 'Apparel', quantity: 1, revenue: 49.99 }, { _id: ObjectId('68282780d16f8059e36cfa9b'), date: ISODate('2024-08-26T00:00:00.000Z'), product_name: 'TV', category: 'Electronics', quantity: 1, revenue: 399.99 } ]
The
find().limit(5)
command displays the first five documents, showing the structure of the aggregated data, including thedate
,product_name
,category
,quantity
, andrevenue
fields.Now that you have connected to the
salesDB
database and inspected thesales
collection, you are ready to proceed with filtering and analyzing the sales data using MongoDB aggregation pipelines. -
Challenge
Use `$match` to filter orders from the last 6 months
Step 2: Use
$match
to Filter Orders From the Last 6 MonthsIn this step, you'll learn how to use the
$match
stage in MongoDB's aggregation framework to filter orders from the last six months. You'll explore the$gte
and$lte
operators for date filtering, understand how to constructISODate
objects, and see how to filter by additional fields likecategory
. By the end, you'll be able to select specific subsets of orders for further analysis.For this lab, assume today's date is May 23, 2025. You'll filter orders from the last six months, which covers November 23, 2024, to May 23, 2025, inclusive. The
sales
collection in thesalesDB
database, loaded in Step 1, contains 436 records spanning August 24, 2024, to May 23, 2025.Understanding the
$match
StageThe
$match
stage filters documents in an aggregation pipeline based on specified conditions. It's similar to thefind()
method but is designed to work within a pipeline, passing only matching documents to subsequent stages. Here, you'll use$match
to select orders from the last six months.Filtering by Date with
$gte
and$lte
Let's start with a general example of filtering orders between two specific dates. Suppose you want to filter orders placed between April 23, 2025, and May 23, 2025. In MongoDB, dates are stored as
ISODate
objects, and you can compare them using operators like$gte
(greater than or equal to) and$lte
(less than or equal to).To specify dates in a query, use the
ISODate()
constructor with the formatISODate("YYYY-MM-DD")
. For example:ISODate("2025-04-23")
represents April 23, 2025.ISODate("2025-05-23")
represents May 23, 2025.
Here's how you can use
$match
to filter orders within this date range:db.sales.aggregate([ { $match: { date: { $gte: ISODate("2025-04-23"), $lte: ISODate("2025-05-23") } } } ])
After running the code in MongoDB Shell, you should see a result similar to the following:
{ _id: ObjectId('68292accaeb470deaab86bd4'), date: ISODate('2025-04-23T00:00:00.000Z'), product_name: 'Book', category: 'Books', quantity: 1, revenue: 12.99 } { _id: ObjectId('68292accaeb470deaab86bd5'), date: ISODate('2025-04-24T00:00:00.000Z'), product_name: 'Shirt', category: 'Apparel', quantity: 2, revenue: 59.98 } { _id: ObjectId('68292accaeb470deaab86bd6'), date: ISODate('2025-04-24T00:00:00.000Z'), product_name: 'Shoes', category: 'Apparel', quantity: 3, revenue: 149.97 } ... Type "it" for more
For your analysis, you'll filter orders from the last six months. Given today's date of May 23, 2025, this means orders from November 23, 2024, to May 23, 2025.
Note: Using both
$gte
and$lte
operators is not mandatory; you can use just$gte
if your dataset has no future dates.
Task 2.1: Filter Orders from the Last Six Months
In the MongoDB Shell, write an aggregation pipeline with a
$match
stage to filter orders wheredate
is between November 23, 2024, and May 23, 2025, inclusive. Add a$limit
stage to display the first 5 matching documents.Solution
db.sales.aggregate([ { $match: { date: { $gte: ISODate("2024-11-23"), $lte: ISODate("2025-05-23") } } }, { $limit: 5 } ])
Expected Output
You should see five orders with dates between November 23, 2024, and May 23, 2025. For example:{ _id: ObjectId('68292accaeb470deaab86abf'), date: ISODate('2024-11-23T00:00:00.000Z'), product_name: 'Backpack', category: 'Accessories', quantity: 2, revenue: 79.98 } { _id: ObjectId('68292accaeb470deaab86ac0'), date: ISODate('2024-11-23T00:00:00.000Z'), product_name: 'Book', category: 'Books', quantity: 4, revenue: 51.96 } { _id: ObjectId('68292accaeb470deaab86ac1'), date: ISODate('2024-11-24T00:00:00.000Z'), product_name: 'Book', category: 'Books', quantity: 1, revenue: 12.99 } { _id: ObjectId('68292accaeb470deaab86ac2'), date: ISODate('2024-11-24T00:00:00.000Z'), product_name: 'Pen', category: 'Stationery', quantity: 2, revenue: 5.98 } { _id: ObjectId('68292accaeb470deaab86ac3'), date: ISODate('2024-11-23T00:00:00.000Z'), product_name: 'Phone', category: 'Electronics', quantity: 4, revenue: 2799.96 }
Filtering by Category
You can add more conditions to
$match
to filter by other fields, such ascategory
. For example, to select only "Electronics" orders from the last six months:db.sales.aggregate([ { $match: { date: { $gte: ISODate("2024-11-23"), $lte: ISODate("2025-05-23") }, category: "Electronics" } } ])
This pipeline keeps only documents where
date
is within the last six months andcategory
is "Electronics."
Task 2.2: Filter Orders by Category
In the MongoDB Shell, write an aggregation pipeline to filter orders from the last six months in the "Apparel" category. Add a
$limit
stage to display the first 3 matching documents.Solution
db.sales.aggregate([ { $match: { date: { $gte: ISODate("2024-11-23"), $lte: ISODate("2025-05-23") }, category: "Apparel" } }, { $limit: 3 } ])
Expected Output
You should see three orders from the "Apparel" category with dates between November 23, 2024, and May 23, 2025. For example:{ _id: ObjectId('68292accaeb470deaab86ac4'), date: ISODate('2024-11-24T00:00:00.000Z'), product_name: 'Shirt', category: 'Apparel', quantity: 2, revenue: 59.98 } { _id: ObjectId('68292accaeb470deaab86ad4'), date: ISODate('2024-12-01T00:00:00.000Z'), product_name: 'Shoes', category: 'Apparel', quantity: 2, revenue: 99.98 } { _id: ObjectId('68292accaeb470deaab86adc'), date: ISODate('2024-12-06T00:00:00.000Z'), product_name: 'Shoes', category: 'Apparel', quantity: 4, revenue: 199.96 }
Conclusion
In this step, you've learned how to use the
$match
stage to filter orders by date and category. The$gte
and$lte
operators, combined withISODate
objects, allow precise date filtering, while additional conditions can refine your query further. In the next step, you'll use$group
to calculate total revenue per product or category, building on this filtered data. -
Challenge
Use `$group` to Calculate Total Revenue per Product or Category
Step 3: Use
$group
to Calculate Total Revenue per Product or CategoryIn this step, you'll learn how to use the
$group
stage in MongoDB's aggregation framework to calculate total revenue per product or category. You'll understand the role of the_id
field and accumulators like$sum
, and see how to group by different fields or use multiple accumulators. By the end, you'll be able to summarize your sales data effectively.Understanding the
$group
StageThe
$group
stage groups documents by a specified field and applies accumulators to compute aggregated values for each group. It's perfect for summarizing data, such as calculating total revenue or total quantity sold.Calculating Total Revenue per Product
To calculate the total revenue for each product, you'll group the documents by
product_name
and use the$sum
accumulator to add up therevenue
field. Here's how you can do it:db.sales.aggregate([ { $group: { _id: "$product_name", total_revenue: { $sum: "$revenue" } } } ])
In this pipeline:
_id: "$product_name"
specifies that you want to group by theproduct_name
field.total_revenue: { $sum: "$revenue" }
calculates the sum of therevenue
field for each group.
This will give you a list of documents, each showing a product and its total revenue as follows:
{ _id: 'Book', total_revenue: 1740.66 } { _id: 'TV', total_revenue: 36399.090000000004 } { _id: 'Pen', total_revenue: 475.41 } { _id: 'Backpack', total_revenue: 6638.34 } { _id: 'Shirt', total_revenue: 4708.429999999999 } { _id: 'Phone', total_revenue: 125998.2 } { _id: 'Shoes', total_revenue: 8748.25 } { _id: 'Headphones', total_revenue: 9358.44 }
Task 3.1: Calculate Total Revenue per Category
In the MongoDB Shell, write an aggregation pipeline to calculate the total revenue for each category. Use the
$group
stage with_id: "$category"
and the$sum
accumulator for therevenue
field. Add a$limit
stage to display the first 3 results.Solution
db.sales.aggregate([ { $group: { _id: "$category", total_revenue: { $sum: "$revenue" } } }, { $limit: 3 } ])
Expected Output
You should see three documents, each showing a category and its total revenue:{ _id: 'Stationery', total_revenue: 475.41 } { _id: 'Apparel', total_revenue: 13456.68 } { _id: 'Accessories', total_revenue: 6638.34 }
Using Multiple Accumulators
You can use multiple accumulators in the same
$group
stage to calculate different aggregated values. For example, you might want to calculate both the total revenue and the total quantity sold for each product.Here's how you can do it:
db.sales.aggregate([ { $group: { _id: "$product_name", total_revenue: { $sum: "$revenue" }, total_quantity: { $sum: "$quantity" } } } ])
In this pipeline:
_id: "$product_name"
groups the documents byproduct_name
.total_revenue: { $sum: "$revenue" }
calculates the sum ofrevenue
.total_quantity: { $sum: "$quantity" }
calculates the sum ofquantity
.
This will give you documents with both total revenue and total quantity for each product:
{ _id: 'Headphones', total_revenue: 9358.44, total_quantity: 156 } { _id: 'Book', total_revenue: 1740.66, total_quantity: 134 } { _id: 'TV', total_revenue: 36399.090000000004, total_quantity: 91 } { _id: 'Shirt', total_revenue: 4708.429999999999, total_quantity: 157 } { _id: 'Pen', total_revenue: 475.41, total_quantity: 159 } { _id: 'Phone', total_revenue: 125998.2, total_quantity: 180 } { _id: 'Backpack', total_revenue: 6638.34, total_quantity: 166 } { _id: 'Shoes', total_revenue: 8748.25, total_quantity: 175 }
Task 3.2: Use Multiple Accumulators
In the MongoDB Shell, write an aggregation pipeline to calculate the total revenue and the total quantity sold for each category. Use the
$group
stage with_id: "$category"
, and include two accumulators: one for summingrevenue
and another for summingquantity
. Add a$limit
stage to display the first 2 results.Solution
db.sales.aggregate([ { $group: { _id: "$category", total_revenue: { $sum: "$revenue" }, total_quantity: { $sum: "$quantity" } } }, { $limit: 2 } ])
Expected Output
You should see two documents, each showing a category, its total revenue, and its total quantity sold. For example:{ _id: 'Stationery', total_revenue: 475.41, total_quantity: 159 } { _id: 'Books', total_revenue: 1740.66, total_quantity: 134 }
Conclusion
In this step, you've learned how to use the
$group
stage to calculate total revenue per product or category. You've seen how to group by different fields and use multiple accumulators to compute various aggregated values. In the next step, you'll use$sort
and$limit
to find the top-selling items, building on the skills you've gained here. -
Challenge
Use $sort and $limit to Find the Top 5 Best-Selling Items
Step 4: Use
$sort
and$limit
to Find the Top 5 Best-Selling ItemsIn this step, you'll learn how to use the
$sort
and$limit
stages in MongoDB's aggregation framework to identify top-performing products. You'll explore sorting data in ascending and descending order, combine these stages to find the top 5 best-selling products by revenue, and understand how to rank items effectively. By the end, you'll be able to highlight key performers in your sales data.For this lab, you'll continue working with the
sales
collection in thesalesDB
database.Understanding the
$sort
StageThe
$sort
stage arranges documents in an aggregation pipeline based on a specified field. You can sort in ascending order (lowest to highest) or descending order (highest to lowest), making it ideal for ranking items like top- or bottom-selling categories or products.Sorting with
$sort
To sort data, you'll use the
$sort
stage with a field and a sort direction:1
for ascending or-1
for descending. For example, to sort categories by total revenue, you first group the data (from Step 3) and then sort it.Here's an example that groups by
category
, calculates total revenue, and sorts bytotal_revenue
in ascending order to show the worst-selling categories first:db.sales.aggregate([ { $group: { _id: "$category", total_revenue: { $sum: "$revenue" } } }, { $sort: { total_revenue: 1 } } ])
In this pipeline:
$group
calculates the total revenue for each category.$sort: { total_revenue: 1 }
sorts the results bytotal_revenue
in ascending order (lowest to highest).
Running this in the MongoDB Shell gives you a result like this:
{ _id: 'Stationery', total_revenue: 475.41 } { _id: 'Books', total_revenue: 1740.66 } { _id: 'Accessories', total_revenue: 6638.34 } { _id: 'Apparel', total_revenue: 13456.68 } { _id: 'Electronics', total_revenue: 171755.73 }
If you want descending order (highest to lowest, for top performers), use
-1
instead:{ $sort: { total_revenue: -1 } }
This would reverse the order, starting with the highest
total_revenue
(e.g., "Electronics").Combining
$sort
and$limit
By combining
$sort
and$limit
, you can pinpoint the top performers in your data. For example, to find the top 5 best-selling products by revenue, you'll group byproduct_name
, sort by total revenue in descending order, and limit to 5 results.
Task 4.1: Find the Top 5 Best-Selling Products by Revenue
In the MongoDB Shell, write an aggregation pipeline to find the top 5 best-selling products by total revenue. Use
$group
to calculate total revenue per product,$sort
to order by revenue in descending order, and$limit
to show only the top 5 results.Solution
db.sales.aggregate([ { $group: { _id: "$product_name", total_revenue: { $sum: "$revenue" } } }, { $sort: { total_revenue: -1 } }, { $limit: 5 } ])
Expected Output
You should see five documents, each showing a product and its total revenue, sorted from highest to lowest:{ _id: 'Phone', total_revenue: 125998.2 } { _id: 'TV', total_revenue: 36399.09 } { _id: 'Headphones', total_revenue: 9358.44 } { _id: 'Shoes', total_revenue: 8748.25 } { _id: 'Backpack', total_revenue: 6638.34 }
Conclusion
In this step, you've learned how to use the
$sort
and$limit
stages to identify top-performing products. You've seen how to sort data in ascending (1
) or descending (-1
) order and combine with$limit
to focus on key results. In the next step, you'll learn how to use the$project
stage to rename or format fields. -
Challenge
Add a $project Stage to Rename or Format Fields
Step 5: Add a
$project
Stage to Rename or Format FieldsIn this step, you'll learn how to use the
$project
stage in MongoDB's aggregation framework to shape the output of your pipeline. You'll explore including or excluding fields, renaming fields, formatting values, and creating complex projections. By the end, you'll be able to customize your aggregation results to make them more readable and relevant for analysis.For this lab, you'll continue working with the
sales
collection in thesalesDB
database.Understanding the
$project
StageThe
$project
stage reshapes documents in an aggregation pipeline by specifying which fields to include, exclude, or transform. It's like a filter for your output, allowing you to rename fields, format values, or compute new ones, making your results cleaner and more tailored to your needs.Using
$project
to Shape OutputTo begin, consider a simple
$project
example. Suppose you've grouped your data byproduct_name
to calculate total revenue (from Step 3). You want to renametotal_revenue
torevenue
and exclude the_id
field. Here's how you can do it:db.sales.aggregate([ { $group: { _id: "$product_name", total_revenue: { $sum: "$revenue" } } }, { $project: { _id: 0, product: "$_id", revenue: "$total_revenue" } } ])
In this pipeline:
$group
calculates the total revenue per product.$project
renames_id
toproduct
andtotal_revenue
torevenue
, and excludes_id
from the output.
Running this in the MongoDB Shell gives you a result like this:
{ product: 'Shirt', revenue: 4708.43 } { product: 'TV', revenue: 36399.09 } { product: 'Backpack', revenue: 6638.34 } { product: 'Phone', revenue: 125998.2 } { product: 'Book', revenue: 1740.66 } { product: 'Headphones', revenue: 9358.44 } { product: 'Pen', revenue: 475.41 } { product: 'Shoes', revenue: 8748.25 }
Including, Excluding, and Suppressing Fields
In
$project
, you can control which fields appear in the output:- Include a field: Set it to
1
(e.g.,{ field: 1 }
) to keep it. - Exclude a field: Set it to
0
(e.g.,{ field: 0 }
) to remove it. - Rename a field: Assign a new name using a
$
reference (e.g.,{ newName: "$oldName" }
). - Suppress the
_id
field: Explicitly set_id: 0
to exclude it, as it's included by default.
For example, to include only
total_revenue
and exclude_id
after grouping:db.sales.aggregate([ { $group: { _id: "$product_name", total_revenue: { $sum: "$revenue" } } }, { $project: { _id: 0, total_revenue: 1 } } ])
Running this in the MongoDB Shell produces:
{ total_revenue: 4708.43 } { total_revenue: 36399.09 } { total_revenue: 6638.34 } { total_revenue: 125998.2 } { total_revenue: 1740.66 } { total_revenue: 9358.44 } { total_revenue: 475.41 } { total_revenue: 8748.25 }
You can't mix inclusion and exclusion for non-
_id
fields in the same$project
stage (e.g.,{ field1: 1, field2: 0 }
is invalid). Either specify all fields to include (others are excluded) or all fields to exclude (others are included).Formatting Fields
You can also use
$project
to format field values. For example, you might want to round thetotal_revenue
to two decimal places using the$round
operator. Here's how:db.sales.aggregate([ { $group: { _id: "$product_name", total_revenue: { $sum: "$revenue" } } }, { $project: { _id: 0, product: "$_id", revenue: { $round: ["$total_revenue", 2] } } } ])
This pipeline rounds
total_revenue
to two decimal places, ensuring clean output. The result in the MongoDB Shell looks like:{ product: 'Shirt', revenue: 4708.43 } { product: 'TV', revenue: 36399.09 } { product: 'Backpack', revenue: 6638.34 } { product: 'Phone', revenue: 125998.2 } { product: 'Book', revenue: 1740.66 } { product: 'Headphones', revenue: 9358.44 } { product: 'Pen', revenue: 475.41 } { product: 'Shoes', revenue: 8748.25 }
Task 5.1: Project and Format Top Products
In the MongoDB Shell, write an aggregation pipeline to find the top 5 products by total revenue. Use
$group
to calculate total revenue per product,$project
to rename_id
toproduct
, renametotal_revenue
torevenue
and round it to two decimal places, exclude_id
,$sort
to order by revenue in descending order, and$limit
to show the top 5 results.Solution
db.sales.aggregate([ { $group: { _id: "$product_name", total_revenue: { $sum: "$revenue" } } }, { $project: { _id: 0, product: "$_id", revenue: { $round: ["$total_revenue", 2] } } }, { $sort: { revenue: -1 } }, { $limit: 5 } ])
Expected Output
You should see five documents, each showing a product and its rounded total revenue, sorted from highest to lowest:{ product: 'Phone', revenue: 125998.2 } { product: 'TV', revenue: 36399.09 } { product: 'Headphones', revenue: 9358.44 } { product: 'Shoes', revenue: 8748.25 } { product: 'Backpack', revenue: 6638.34 }
Conclusion
In this step, you've learned how to use the
$project
stage to shape and format your aggregation output. You've seen how to include or exclude fields, suppress_id
, rename fields, and apply formatting like rounding. You've also tackled a complex projection to compute new fields. In the next step, you'll combine these all the techniques you've learned in this code lab. -
Challenge
Combine Stages into a 3-Step Pipeline and Interpret the Result
Step 6: Combine Stages into a 3-Step Pipeline and Interpret the Result
In this step, you'll learn how to build a multi-stage aggregation pipeline by combining
$match
,$group
,$sort
,$project
, and$limit
to analyze your sales data. You'll see how each stage transforms the data, create a complete four-step pipeline to find top-performing categories, and interpret the results to identify trends. By the end, you'll be able to construct and understand complex pipelines to extract meaningful insights.For this lab, you'll continue working with the
sales
collection in thesalesDB
database. Assume today's date is May 23, 2025, so you'll filter orders from the last six months, starting from November 23, 2024.Building a Three-Step Multi-Stage Pipeline
To start, consider a pipeline that filters orders from the last six months, groups them by
product_name
to calculate total revenue, and sorts the results by revenue. Here's how you can do it:db.sales.aggregate([ { $match: { date: { $gte: ISODate("2024-11-23") } } }, { $group: { _id: "$product_name", total_revenue: { $sum: "$revenue" } } }, { $sort: { total_revenue: -1 } } ])
Running this in the MongoDB Shell gives you a result like this:
{ _id: 'Phone', total_revenue: 111298.41 } { _id: 'TV', total_revenue: 35999.1 } { _id: 'Headphones', total_revenue: 8338.61 } { _id: 'Shoes', total_revenue: 7948.41 } { _id: 'Backpack', total_revenue: 6118.47 } { _id: 'Shirt', total_revenue: 3658.78 } { _id: 'Book', total_revenue: 1389.93 } { _id: 'Pen', total_revenue: 415.61 }
This pipeline filters recent orders, aggregates revenue by product, and ranks products by total revenue, with top performers like "Phone" and "TV" appearing first.
How Each Stage Transforms the Data
Each stage in the pipeline plays a specific role in shaping the data:
$match
: Filters the documents to include only those that meet your criteria. Here, it selects orders from November 23, 2024 (because the dataset ends on May 23, 2025), reducing the dataset to relevant records. For example, it might narrow down thousands of orders to those within the six-month period.$group
: Aggregates the filtered documents by a specified field, applying accumulators like$sum
. In this case, it groups byproduct_name
and sums therevenue
field, transforming individual order documents into summary documents with total revenue.$sort
: Orders the grouped documents based on a field, such astotal_revenue
. Using-1
(descending) ensures high-revenue items appear first, making it easy to identify top performers.
Together, these stages transform raw order data into a ranked list of products by revenue, ready for analysis.
A Four-Step Pipeline
Now, you'll build a four-step pipeline to analyze products for the last six months. This pipeline will filter orders, group by
product_name
, sort by total revenue, and project the output for clarity:db.sales.aggregate([ { $match: { date: { $gte: ISODate("2024-11-23") } } }, { $group: { _id: "$product_name", total_revenue: { $sum: "$revenue" } } }, { $sort: { total_revenue: -1 } }, { $project: { _id: 0, product: "$_id", revenue: { $round: ["$total_revenue", 2] } } } ])
Running this in the MongoDB Shell produces:
{ product: 'Phone', revenue: 111298.41 } { product: 'TV', revenue: 35999.1 } { product: 'Headphones', revenue: 8338.61 } { product: 'Shoes', revenue: 7948.41 } { product: 'Backpack', revenue: 6118.47 } { product: 'Shirt', revenue: 3658.78 } { product: 'Book', revenue: 1389.93 } { product: 'Pen', revenue: 415.61 }
This output shows products ranked by total revenue, with fields renamed and revenue rounded for clarity.
Interpreting the Results
The results reveal key trends in your sales data:
- Electronics Dominates: With "Phone" ($111,298.41) and "TV" ($35,999.1) at the top, the "Electronics" category is the top performer, driven by high-value items. This suggests strong demand for tech products.
- Apparel is Strong: Products like "Shoes" ($7,948.41) and "Shirt" ($3,658.78) indicate that "Apparel" is a solid performer, though it trails behind "Electronics".
- Lower Performers: "Backpack" ($6,118.47, "Accessories"), "Book" ($1,389.93, "Books"), and "Pen" ($415.61, "Stationery") have lower revenues. "Stationery" is the weakest, possibly due to low unit prices or limited demand.
These insights can guide business decisions, such as focusing marketing efforts on "Electronics" or exploring ways to boost "Stationery" sales.
Task 6.1: Build a Five-Step Pipeline for Top Categories
In the MongoDB Shell, write an aggregation pipeline to find the top 3 categories by total revenue for orders from the last six months (from November 23, 2024). Use
$match
to filter orders by date,$group
to calculate total revenue per category,$sort
to order by revenue in descending order,$project
to rename_id
tocategory
andtotal_revenue
torevenue
(rounded to two decimal places) and exclude_id
, and$limit
to show only the top 3 results.Solution
db.sales.aggregate([ { $match: { date: { $gte: ISODate("2024-11-23") } } }, { $group: { _id: "$category", total_revenue: { $sum: "$revenue" } } }, { $sort: { total_revenue: -1 } }, { $project: { _id: 0, category: "$_id", revenue: { $round: ["$total_revenue", 2] } } }, { $limit: 3 } ])
Expected Output
You should see three documents, each showing a category and its rounded total revenue, sorted from highest to lowest:{ category: 'Electronics', revenue: 155636.12 } { category: 'Apparel', revenue: 11607.19 } { category: 'Accessories', revenue: 6118.47 }
Conclusion
In this step, you've learned how to combine
$match
,$group
,$sort
,$project
, and$limit
into a powerful multi-step pipeline to analyze sales data. You've seen how each stage transforms the data and interpreted the results to identify top-performing categories.Now that you've completed this code lab, you're ready to analyze sales trends with MongoDB aggregations!
What's a lab?
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Provided environment for hands-on practice
We will provide the credentials and environment necessary for you to practice right within your browser.
Guided walkthrough
Follow along with the author’s guided walkthrough and build something new in your provided environment!
Did you know?
On average, you retain 75% more of your learning if you get time for practice.