Featured resource
Forrester Wave Report 2025
Pluralsight named a Leader in the Forrester Wave™

Our tech skill development platform earned the highest scores possible across 11 criteria.

Learn more
  • Labs icon Lab
  • Data
Labs

Analyze Sales Trends with MongoDB Aggregations

Dive into MongoDB’s powerful aggregation framework in this hands-on, beginner-friendly lab! Learn how to filter recent sales, calculate revenue by product or category, identify top-selling items, and format results like a pro data analyst.

Labs

Path Info

Level
Clock icon Intermediate
Duration
Clock icon 45m
Published
Clock icon May 30, 2025

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Table of Contents

  1. Challenge

    Introduction

    Step 1: Introduction to Analyzing Sales Trends with MongoDB Aggregations

    This Code Lab introduces you to analyzing sales data using MongoDB's powerful aggregation framework. You will work with the sales collection in the salesDB database to explore sales trends. In this lab, you will learn how to filter recent sales, calculate total revenue by product or category, identify top-selling items, format output fields, and combine these operations into a multi-stage aggregation pipeline.

    The dataset available for this lab will be loaded into the sales collection within the salesDB database. It contains aggregated sales records with the following fields:

    • date: The date of the sales transactions, stored as an ISODate.
    • product_name: The name of the product sold.
    • category: The category of the product.
    • quantity: The total quantity of the product sold on that date.
    • revenue: The total revenue generated from the product on that date.

    These fields allow you to analyze sales trends from August 24, 2024, to May 23, 2025.

    Starting Notes:

    • MongoDB Environment: MongoDB is already running in the lab environment. You will load the dataset into the salesDB database using the sales collection in the steps below.
    • Accessing the Data: You are currently in a shell environment. You will first load the dataset using mongoimport, then enter the MongoDB Shell (mongosh) to interact with the salesDB database.
    • Dataset Structure: The dataset contains 436 aggregated sales records. The date field will be stored as an ISODate (e.g., ISODate("2024-08-24T00:00:00Z")), with product_name and category as strings, quantity as an integer, and revenue as a double.
    • Initial Setup: Begin by loading the dataset and inspecting it to understand its structure and contents.

    Task 1.1: Load the Sales Data into MongoDB

    In the terminal, run the following command to load the sales data into the sales collection in the salesDB database:

    mongoimport --db=salesDB --collection=sales --type=json --file=sales.json --drop
    
    Expected Output You should see output similar to the following in the **MongoDB Shell**:
    2025-05-17T21:12:34.123+0000    connected to: mongodb://localhost/
    2025-05-17T21:12:34.456+0000    dropping: salesDB.sales
    2025-05-17T21:12:34.789+0000    436 document(s) imported successfully. 0 document(s) failed to import.
    

    This command uses mongoimport to load the data, and the --drop option ensures that any existing sales collection is dropped before importing the new data, avoiding duplicates.

    Task 1.2: Connect to the Database and Inspect the Sales Collection

    In the terminal, run the following command to enter the MongoDB Shell for the salesDB database:

    mongosh salesDB
    

    Once inside the MongoDB Shell (mongosh), execute the following commands to inspect the sales collection:

    Count the total number of documents:

    db.sales.countDocuments()
    
    Expected Output `436`

    The countDocuments({}) command confirms the total number of records (436) in the collection.

    Inspect the first 5 documents in the sales collection:

    db.sales.find().limit(5)
    
    Expected Output You should see output similar to the following in the **MongoDB Shell**:
     [
      {
        _id: ObjectId('68282780d16f8059e36cfa97'),
        date: ISODate('2024-08-25T00:00:00.000Z'),
        product_name: 'Book',
        category: 'Books',
        quantity: 3,
        revenue: 38.97
      },
      {
        _id: ObjectId('68282780d16f8059e36cfa98'),
        date: ISODate('2024-08-24T00:00:00.000Z'),
        product_name: 'Shirt',
        category: 'Apparel',
        quantity: 2,
        revenue: 59.98
      },
      {
        _id: ObjectId('68282780d16f8059e36cfa99'),
        date: ISODate('2024-08-24T00:00:00.000Z'),
        product_name: 'Headphones',
        category: 'Electronics',
        quantity: 1,
        revenue: 59.99
      },
      {
        _id: ObjectId('68282780d16f8059e36cfa9a'),
        date: ISODate('2024-08-26T00:00:00.000Z'),
        product_name: 'Shoes',
        category: 'Apparel',
        quantity: 1,
        revenue: 49.99
      },
      {
        _id: ObjectId('68282780d16f8059e36cfa9b'),
        date: ISODate('2024-08-26T00:00:00.000Z'),
        product_name: 'TV',
        category: 'Electronics',
        quantity: 1,
        revenue: 399.99
      }
    ]
    

    The find().limit(5) command displays the first five documents, showing the structure of the aggregated data, including the date, product_name, category, quantity, and revenue fields.

    Now that you have connected to the salesDB database and inspected the sales collection, you are ready to proceed with filtering and analyzing the sales data using MongoDB aggregation pipelines.

  2. Challenge

    Use `$match` to filter orders from the last 6 months

    Step 2: Use $match to Filter Orders From the Last 6 Months

    In this step, you'll learn how to use the $match stage in MongoDB's aggregation framework to filter orders from the last six months. You'll explore the $gte and $lte operators for date filtering, understand how to construct ISODate objects, and see how to filter by additional fields like category. By the end, you'll be able to select specific subsets of orders for further analysis.

    For this lab, assume today's date is May 23, 2025. You'll filter orders from the last six months, which covers November 23, 2024, to May 23, 2025, inclusive. The sales collection in the salesDB database, loaded in Step 1, contains 436 records spanning August 24, 2024, to May 23, 2025.

    Understanding the $match Stage

    The $match stage filters documents in an aggregation pipeline based on specified conditions. It's similar to the find() method but is designed to work within a pipeline, passing only matching documents to subsequent stages. Here, you'll use $match to select orders from the last six months.

    Filtering by Date with $gte and $lte

    Let's start with a general example of filtering orders between two specific dates. Suppose you want to filter orders placed between April 23, 2025, and May 23, 2025. In MongoDB, dates are stored as ISODate objects, and you can compare them using operators like $gte (greater than or equal to) and $lte (less than or equal to).

    To specify dates in a query, use the ISODate() constructor with the format ISODate("YYYY-MM-DD"). For example:

    • ISODate("2025-04-23") represents April 23, 2025.
    • ISODate("2025-05-23") represents May 23, 2025.

    Here's how you can use $match to filter orders within this date range:

    db.sales.aggregate([
      { 
        $match: { 
          date: { 
            $gte: ISODate("2025-04-23"), 
            $lte: ISODate("2025-05-23") 
          } 
        } 
      }
    ])
    

    After running the code in MongoDB Shell, you should see a result similar to the following:

    {
      _id: ObjectId('68292accaeb470deaab86bd4'),
      date: ISODate('2025-04-23T00:00:00.000Z'),
      product_name: 'Book',
      category: 'Books',
      quantity: 1,
      revenue: 12.99
    }
    {
      _id: ObjectId('68292accaeb470deaab86bd5'),
      date: ISODate('2025-04-24T00:00:00.000Z'),
      product_name: 'Shirt',
      category: 'Apparel',
      quantity: 2,
      revenue: 59.98
    }
    {
      _id: ObjectId('68292accaeb470deaab86bd6'),
      date: ISODate('2025-04-24T00:00:00.000Z'),
      product_name: 'Shoes',
      category: 'Apparel',
      quantity: 3,
      revenue: 149.97
    }
    ...
    Type "it" for more
    

    For your analysis, you'll filter orders from the last six months. Given today's date of May 23, 2025, this means orders from November 23, 2024, to May 23, 2025.

    Note: Using both $gte and $lte operators is not mandatory; you can use just $gte if your dataset has no future dates.


    Task 2.1: Filter Orders from the Last Six Months

    In the MongoDB Shell, write an aggregation pipeline with a $match stage to filter orders where date is between November 23, 2024, and May 23, 2025, inclusive. Add a $limit stage to display the first 5 matching documents.

    Solution
    db.sales.aggregate([
      { 
        $match: { 
          date: { 
            $gte: ISODate("2024-11-23"), 
            $lte: ISODate("2025-05-23") 
          } 
        } 
      },
      { $limit: 5 }
    ])
    
    Expected Output You should see five orders with dates between November 23, 2024, and May 23, 2025. For example:
    {
      _id: ObjectId('68292accaeb470deaab86abf'),
      date: ISODate('2024-11-23T00:00:00.000Z'),
      product_name: 'Backpack',
      category: 'Accessories',
      quantity: 2,
      revenue: 79.98
    }
    {
      _id: ObjectId('68292accaeb470deaab86ac0'),
      date: ISODate('2024-11-23T00:00:00.000Z'),
      product_name: 'Book',
      category: 'Books',
      quantity: 4,
      revenue: 51.96
    }
    {
      _id: ObjectId('68292accaeb470deaab86ac1'),
      date: ISODate('2024-11-24T00:00:00.000Z'),
      product_name: 'Book',
      category: 'Books',
      quantity: 1,
      revenue: 12.99
    }
    {
      _id: ObjectId('68292accaeb470deaab86ac2'),
      date: ISODate('2024-11-24T00:00:00.000Z'),
      product_name: 'Pen',
      category: 'Stationery',
      quantity: 2,
      revenue: 5.98
    }
    {
      _id: ObjectId('68292accaeb470deaab86ac3'),
      date: ISODate('2024-11-23T00:00:00.000Z'),
      product_name: 'Phone',
      category: 'Electronics',
      quantity: 4,
      revenue: 2799.96
    }
    

    Filtering by Category

    You can add more conditions to $match to filter by other fields, such as category. For example, to select only "Electronics" orders from the last six months:

    db.sales.aggregate([
      { 
        $match: { 
          date: { 
            $gte: ISODate("2024-11-23"), 
            $lte: ISODate("2025-05-23") 
          },
          category: "Electronics"
        } 
      }
    ])
    

    This pipeline keeps only documents where date is within the last six months and category is "Electronics."


    Task 2.2: Filter Orders by Category

    In the MongoDB Shell, write an aggregation pipeline to filter orders from the last six months in the "Apparel" category. Add a $limit stage to display the first 3 matching documents.

    Solution
    db.sales.aggregate([
      { 
        $match: { 
          date: { 
            $gte: ISODate("2024-11-23"), 
            $lte: ISODate("2025-05-23") 
          },
          category: "Apparel"
        } 
      },
      { $limit: 3 }
    ])
    
    Expected Output You should see three orders from the "Apparel" category with dates between November 23, 2024, and May 23, 2025. For example:
    {
      _id: ObjectId('68292accaeb470deaab86ac4'),
      date: ISODate('2024-11-24T00:00:00.000Z'),
      product_name: 'Shirt',
      category: 'Apparel',
      quantity: 2,
      revenue: 59.98
    }
    {
      _id: ObjectId('68292accaeb470deaab86ad4'),
      date: ISODate('2024-12-01T00:00:00.000Z'),
      product_name: 'Shoes',
      category: 'Apparel',
      quantity: 2,
      revenue: 99.98
    }
    {
      _id: ObjectId('68292accaeb470deaab86adc'),
      date: ISODate('2024-12-06T00:00:00.000Z'),
      product_name: 'Shoes',
      category: 'Apparel',
      quantity: 4,
      revenue: 199.96
    }
    

    Conclusion

    In this step, you've learned how to use the $match stage to filter orders by date and category. The $gte and $lte operators, combined with ISODate objects, allow precise date filtering, while additional conditions can refine your query further. In the next step, you'll use $group to calculate total revenue per product or category, building on this filtered data.

  3. Challenge

    Use `$group` to Calculate Total Revenue per Product or Category

    Step 3: Use $group to Calculate Total Revenue per Product or Category

    In this step, you'll learn how to use the $group stage in MongoDB's aggregation framework to calculate total revenue per product or category. You'll understand the role of the _id field and accumulators like $sum, and see how to group by different fields or use multiple accumulators. By the end, you'll be able to summarize your sales data effectively.

    Understanding the $group Stage

    The $group stage groups documents by a specified field and applies accumulators to compute aggregated values for each group. It's perfect for summarizing data, such as calculating total revenue or total quantity sold.

    Calculating Total Revenue per Product

    To calculate the total revenue for each product, you'll group the documents by product_name and use the $sum accumulator to add up the revenue field. Here's how you can do it:

    db.sales.aggregate([
      { 
        $group: { 
          _id: "$product_name", 
          total_revenue: { $sum: "$revenue" } 
        } 
      }
    ])
    

    In this pipeline:

    • _id: "$product_name" specifies that you want to group by the product_name field.
    • total_revenue: { $sum: "$revenue" } calculates the sum of the revenue field for each group.

    This will give you a list of documents, each showing a product and its total revenue as follows:

    {
      _id: 'Book',
      total_revenue: 1740.66
    }
    {
      _id: 'TV',
      total_revenue: 36399.090000000004
    }
    {
      _id: 'Pen',
      total_revenue: 475.41
    }
    {
      _id: 'Backpack',
      total_revenue: 6638.34
    }
    {
      _id: 'Shirt',
      total_revenue: 4708.429999999999
    }
    {
      _id: 'Phone',
      total_revenue: 125998.2
    }
    {
      _id: 'Shoes',
      total_revenue: 8748.25
    }
    {
      _id: 'Headphones',
      total_revenue: 9358.44
    }
    

    Task 3.1: Calculate Total Revenue per Category

    In the MongoDB Shell, write an aggregation pipeline to calculate the total revenue for each category. Use the $group stage with _id: "$category" and the $sum accumulator for the revenue field. Add a $limit stage to display the first 3 results.

    Solution
    db.sales.aggregate([
      { 
        $group: { 
          _id: "$category", 
          total_revenue: { $sum: "$revenue" } 
        } 
      },
      { $limit: 3 }
    ])
    
    Expected Output You should see three documents, each showing a category and its total revenue:
    {
      _id: 'Stationery',
      total_revenue: 475.41
    }
    {
      _id: 'Apparel',
      total_revenue: 13456.68
    }
    {
      _id: 'Accessories',
      total_revenue: 6638.34
    }
    

    Using Multiple Accumulators

    You can use multiple accumulators in the same $group stage to calculate different aggregated values. For example, you might want to calculate both the total revenue and the total quantity sold for each product.

    Here's how you can do it:

    db.sales.aggregate([
      { 
        $group: { 
          _id: "$product_name", 
          total_revenue: { $sum: "$revenue" }, 
          total_quantity: { $sum: "$quantity" } 
        } 
      }
    ])
    

    In this pipeline:

    • _id: "$product_name" groups the documents by product_name.
    • total_revenue: { $sum: "$revenue" } calculates the sum of revenue.
    • total_quantity: { $sum: "$quantity" } calculates the sum of quantity.

    This will give you documents with both total revenue and total quantity for each product:

    {
      _id: 'Headphones',
      total_revenue: 9358.44,
      total_quantity: 156
    }
    {
      _id: 'Book',
      total_revenue: 1740.66,
      total_quantity: 134
    }
    {
      _id: 'TV',
      total_revenue: 36399.090000000004,
      total_quantity: 91
    }
    {
      _id: 'Shirt',
      total_revenue: 4708.429999999999,
      total_quantity: 157
    }
    {
      _id: 'Pen',
      total_revenue: 475.41,
      total_quantity: 159
    }
    {
      _id: 'Phone',
      total_revenue: 125998.2,
      total_quantity: 180
    }
    {
      _id: 'Backpack',
      total_revenue: 6638.34,
      total_quantity: 166
    }
    {
      _id: 'Shoes',
      total_revenue: 8748.25,
      total_quantity: 175
    }
    

    Task 3.2: Use Multiple Accumulators

    In the MongoDB Shell, write an aggregation pipeline to calculate the total revenue and the total quantity sold for each category. Use the $group stage with _id: "$category", and include two accumulators: one for summing revenue and another for summing quantity. Add a $limit stage to display the first 2 results.

    Solution
    db.sales.aggregate([
      { 
        $group: { 
          _id: "$category", 
          total_revenue: { $sum: "$revenue" }, 
          total_quantity: { $sum: "$quantity" } 
        } 
      },
      { $limit: 2 }
    ])
    
    Expected Output You should see two documents, each showing a category, its total revenue, and its total quantity sold. For example:
    {
      _id: 'Stationery',
      total_revenue: 475.41,
      total_quantity: 159
    }
    {
      _id: 'Books',
      total_revenue: 1740.66,
      total_quantity: 134
    }
    

    Conclusion

    In this step, you've learned how to use the $group stage to calculate total revenue per product or category. You've seen how to group by different fields and use multiple accumulators to compute various aggregated values. In the next step, you'll use $sort and $limit to find the top-selling items, building on the skills you've gained here.

  4. Challenge

    Use $sort and $limit to Find the Top 5 Best-Selling Items

    Step 4: Use $sort and $limit to Find the Top 5 Best-Selling Items

    In this step, you'll learn how to use the $sort and $limit stages in MongoDB's aggregation framework to identify top-performing products. You'll explore sorting data in ascending and descending order, combine these stages to find the top 5 best-selling products by revenue, and understand how to rank items effectively. By the end, you'll be able to highlight key performers in your sales data.

    For this lab, you'll continue working with the sales collection in the salesDB database.

    Understanding the $sort Stage

    The $sort stage arranges documents in an aggregation pipeline based on a specified field. You can sort in ascending order (lowest to highest) or descending order (highest to lowest), making it ideal for ranking items like top- or bottom-selling categories or products.

    Sorting with $sort

    To sort data, you'll use the $sort stage with a field and a sort direction: 1 for ascending or -1 for descending. For example, to sort categories by total revenue, you first group the data (from Step 3) and then sort it.

    Here's an example that groups by category, calculates total revenue, and sorts by total_revenue in ascending order to show the worst-selling categories first:

    db.sales.aggregate([
      { 
        $group: { 
          _id: "$category", 
          total_revenue: { $sum: "$revenue" } 
        } 
      },
      { 
        $sort: { 
          total_revenue: 1 
        } 
      }
    ])
    

    In this pipeline:

    • $group calculates the total revenue for each category.
    • $sort: { total_revenue: 1 } sorts the results by total_revenue in ascending order (lowest to highest).

    Running this in the MongoDB Shell gives you a result like this:

    { _id: 'Stationery', total_revenue: 475.41 }
    { _id: 'Books', total_revenue: 1740.66 }
    { _id: 'Accessories', total_revenue: 6638.34 }
    { _id: 'Apparel', total_revenue: 13456.68 }
    { _id: 'Electronics', total_revenue: 171755.73 }
    

    If you want descending order (highest to lowest, for top performers), use -1 instead:

    { $sort: { total_revenue: -1 } }
    

    This would reverse the order, starting with the highest total_revenue (e.g., "Electronics").

    Combining $sort and $limit

    By combining $sort and $limit, you can pinpoint the top performers in your data. For example, to find the top 5 best-selling products by revenue, you'll group by product_name, sort by total revenue in descending order, and limit to 5 results.


    Task 4.1: Find the Top 5 Best-Selling Products by Revenue

    In the MongoDB Shell, write an aggregation pipeline to find the top 5 best-selling products by total revenue. Use $group to calculate total revenue per product, $sort to order by revenue in descending order, and $limit to show only the top 5 results.

    Solution
    db.sales.aggregate([
      { 
        $group: { 
          _id: "$product_name", 
          total_revenue: { $sum: "$revenue" } 
        } 
      },
      { 
        $sort: { 
          total_revenue: -1 
        } 
      },
      { $limit: 5 }
    ])
    
    Expected Output You should see five documents, each showing a product and its total revenue, sorted from highest to lowest:
    { _id: 'Phone', total_revenue: 125998.2 }
    { _id: 'TV', total_revenue: 36399.09 }
    { _id: 'Headphones', total_revenue: 9358.44 }
    { _id: 'Shoes', total_revenue: 8748.25 }
    { _id: 'Backpack', total_revenue: 6638.34 }
    

    Conclusion

    In this step, you've learned how to use the $sort and $limit stages to identify top-performing products. You've seen how to sort data in ascending (1) or descending (-1) order and combine with $limit to focus on key results. In the next step, you'll learn how to use the $project stage to rename or format fields.

  5. Challenge

    Add a $project Stage to Rename or Format Fields

    Step 5: Add a $project Stage to Rename or Format Fields

    In this step, you'll learn how to use the $project stage in MongoDB's aggregation framework to shape the output of your pipeline. You'll explore including or excluding fields, renaming fields, formatting values, and creating complex projections. By the end, you'll be able to customize your aggregation results to make them more readable and relevant for analysis.

    For this lab, you'll continue working with the sales collection in the salesDB database.

    Understanding the $project Stage

    The $project stage reshapes documents in an aggregation pipeline by specifying which fields to include, exclude, or transform. It's like a filter for your output, allowing you to rename fields, format values, or compute new ones, making your results cleaner and more tailored to your needs.

    Using $project to Shape Output

    To begin, consider a simple $project example. Suppose you've grouped your data by product_name to calculate total revenue (from Step 3). You want to rename total_revenue to revenue and exclude the _id field. Here's how you can do it:

    db.sales.aggregate([
      { 
        $group: { 
          _id: "$product_name", 
          total_revenue: { $sum: "$revenue" } 
        } 
      },
      { 
        $project: { 
          _id: 0, 
          product: "$_id", 
          revenue: "$total_revenue" 
        } 
      }
    ])
    

    In this pipeline:

    • $group calculates the total revenue per product.
    • $project renames _id to product and total_revenue to revenue, and excludes _id from the output.

    Running this in the MongoDB Shell gives you a result like this:

    { product: 'Shirt', revenue: 4708.43 }
    { product: 'TV', revenue: 36399.09 }
    { product: 'Backpack', revenue: 6638.34 }
    { product: 'Phone', revenue: 125998.2 }
    { product: 'Book', revenue: 1740.66 }
    { product: 'Headphones', revenue: 9358.44 }
    { product: 'Pen', revenue: 475.41 }
    { product: 'Shoes', revenue: 8748.25 }
    

    Including, Excluding, and Suppressing Fields

    In $project, you can control which fields appear in the output:

    • Include a field: Set it to 1 (e.g., { field: 1 }) to keep it.
    • Exclude a field: Set it to 0 (e.g., { field: 0 }) to remove it.
    • Rename a field: Assign a new name using a $ reference (e.g., { newName: "$oldName" }).
    • Suppress the _id field: Explicitly set _id: 0 to exclude it, as it's included by default.

    For example, to include only total_revenue and exclude _id after grouping:

    db.sales.aggregate([
      { 
        $group: { 
          _id: "$product_name", 
          total_revenue: { $sum: "$revenue" } 
        } 
      },
      { 
        $project: { 
          _id: 0, 
          total_revenue: 1 
        } 
      }
    ])
    

    Running this in the MongoDB Shell produces:

    { total_revenue: 4708.43 }
    { total_revenue: 36399.09 }
    { total_revenue: 6638.34 }
    { total_revenue: 125998.2 }
    { total_revenue: 1740.66 }
    { total_revenue: 9358.44 }
    { total_revenue: 475.41 }
    { total_revenue: 8748.25 }
    

    You can't mix inclusion and exclusion for non-_id fields in the same $project stage (e.g., { field1: 1, field2: 0 } is invalid). Either specify all fields to include (others are excluded) or all fields to exclude (others are included).

    Formatting Fields

    You can also use $project to format field values. For example, you might want to round the total_revenue to two decimal places using the $round operator. Here's how:

    db.sales.aggregate([
      { 
        $group: { 
          _id: "$product_name", 
          total_revenue: { $sum: "$revenue" } 
        } 
      },
      { 
        $project: { 
          _id: 0, 
          product: "$_id", 
          revenue: { $round: ["$total_revenue", 2] } 
        } 
      }
    ])
    

    This pipeline rounds total_revenue to two decimal places, ensuring clean output. The result in the MongoDB Shell looks like:

    { product: 'Shirt', revenue: 4708.43 }
    { product: 'TV', revenue: 36399.09 }
    { product: 'Backpack', revenue: 6638.34 }
    { product: 'Phone', revenue: 125998.2 }
    { product: 'Book', revenue: 1740.66 }
    { product: 'Headphones', revenue: 9358.44 }
    { product: 'Pen', revenue: 475.41 }
    { product: 'Shoes', revenue: 8748.25 }
    

    Task 5.1: Project and Format Top Products

    In the MongoDB Shell, write an aggregation pipeline to find the top 5 products by total revenue. Use $group to calculate total revenue per product, $project to rename _id to product, rename total_revenue to revenue and round it to two decimal places, exclude _id, $sort to order by revenue in descending order, and $limit to show the top 5 results.

    Solution
    db.sales.aggregate([
      { 
        $group: { 
          _id: "$product_name", 
          total_revenue: { $sum: "$revenue" } 
        } 
      },
      { 
        $project: { 
          _id: 0, 
          product: "$_id", 
          revenue: { $round: ["$total_revenue", 2] } 
        } 
      },
      { 
        $sort: { 
          revenue: -1 
        } 
      },
      { $limit: 5 }
    ])
    
    Expected Output You should see five documents, each showing a product and its rounded total revenue, sorted from highest to lowest:
    { product: 'Phone', revenue: 125998.2 }
    { product: 'TV', revenue: 36399.09 }
    { product: 'Headphones', revenue: 9358.44 }
    { product: 'Shoes', revenue: 8748.25 }
    { product: 'Backpack', revenue: 6638.34 }
    

    Conclusion

    In this step, you've learned how to use the $project stage to shape and format your aggregation output. You've seen how to include or exclude fields, suppress _id, rename fields, and apply formatting like rounding. You've also tackled a complex projection to compute new fields. In the next step, you'll combine these all the techniques you've learned in this code lab.

  6. Challenge

    Combine Stages into a 3-Step Pipeline and Interpret the Result

    Step 6: Combine Stages into a 3-Step Pipeline and Interpret the Result

    In this step, you'll learn how to build a multi-stage aggregation pipeline by combining $match, $group, $sort, $project, and $limit to analyze your sales data. You'll see how each stage transforms the data, create a complete four-step pipeline to find top-performing categories, and interpret the results to identify trends. By the end, you'll be able to construct and understand complex pipelines to extract meaningful insights.

    For this lab, you'll continue working with the sales collection in the salesDB database. Assume today's date is May 23, 2025, so you'll filter orders from the last six months, starting from November 23, 2024.

    Building a Three-Step Multi-Stage Pipeline

    To start, consider a pipeline that filters orders from the last six months, groups them by product_name to calculate total revenue, and sorts the results by revenue. Here's how you can do it:

    db.sales.aggregate([
      { 
        $match: { 
          date: { 
            $gte: ISODate("2024-11-23")
          } 
        } 
      },
      { 
        $group: { 
          _id: "$product_name", 
          total_revenue: { $sum: "$revenue" } 
        } 
      },
      { 
        $sort: { 
          total_revenue: -1 
        } 
      }
    ])
    

    Running this in the MongoDB Shell gives you a result like this:

    { _id: 'Phone', total_revenue: 111298.41 }
    { _id: 'TV', total_revenue: 35999.1 }
    { _id: 'Headphones', total_revenue: 8338.61 }
    { _id: 'Shoes', total_revenue: 7948.41 }
    { _id: 'Backpack', total_revenue: 6118.47 }
    { _id: 'Shirt', total_revenue: 3658.78 }
    { _id: 'Book', total_revenue: 1389.93 }
    { _id: 'Pen', total_revenue: 415.61 }
    

    This pipeline filters recent orders, aggregates revenue by product, and ranks products by total revenue, with top performers like "Phone" and "TV" appearing first.

    How Each Stage Transforms the Data

    Each stage in the pipeline plays a specific role in shaping the data:

    • $match: Filters the documents to include only those that meet your criteria. Here, it selects orders from November 23, 2024 (because the dataset ends on May 23, 2025), reducing the dataset to relevant records. For example, it might narrow down thousands of orders to those within the six-month period.
    • $group: Aggregates the filtered documents by a specified field, applying accumulators like $sum. In this case, it groups by product_name and sums the revenue field, transforming individual order documents into summary documents with total revenue.
    • $sort: Orders the grouped documents based on a field, such as total_revenue. Using -1 (descending) ensures high-revenue items appear first, making it easy to identify top performers.

    Together, these stages transform raw order data into a ranked list of products by revenue, ready for analysis.

    A Four-Step Pipeline

    Now, you'll build a four-step pipeline to analyze products for the last six months. This pipeline will filter orders, group by product_name, sort by total revenue, and project the output for clarity:

    db.sales.aggregate([
      { 
        $match: { 
          date: { 
            $gte: ISODate("2024-11-23")
          } 
        } 
      },
      { 
        $group: { 
          _id: "$product_name", 
          total_revenue: { $sum: "$revenue" } 
        } 
      },
      { 
        $sort: { 
          total_revenue: -1 
        } 
      },
      { 
        $project: { 
          _id: 0, 
          product: "$_id", 
          revenue: { $round: ["$total_revenue", 2] } 
        } 
      }
    ])
    

    Running this in the MongoDB Shell produces:

    { product: 'Phone', revenue: 111298.41 }
    { product: 'TV', revenue: 35999.1 }
    { product: 'Headphones', revenue: 8338.61 }
    { product: 'Shoes', revenue: 7948.41 }
    { product: 'Backpack', revenue: 6118.47 }
    { product: 'Shirt', revenue: 3658.78 }
    { product: 'Book', revenue: 1389.93 }
    { product: 'Pen', revenue: 415.61 }
    

    This output shows products ranked by total revenue, with fields renamed and revenue rounded for clarity.

    Interpreting the Results

    The results reveal key trends in your sales data:

    • Electronics Dominates: With "Phone" ($111,298.41) and "TV" ($35,999.1) at the top, the "Electronics" category is the top performer, driven by high-value items. This suggests strong demand for tech products.
    • Apparel is Strong: Products like "Shoes" ($7,948.41) and "Shirt" ($3,658.78) indicate that "Apparel" is a solid performer, though it trails behind "Electronics".
    • Lower Performers: "Backpack" ($6,118.47, "Accessories"), "Book" ($1,389.93, "Books"), and "Pen" ($415.61, "Stationery") have lower revenues. "Stationery" is the weakest, possibly due to low unit prices or limited demand.

    These insights can guide business decisions, such as focusing marketing efforts on "Electronics" or exploring ways to boost "Stationery" sales.


    Task 6.1: Build a Five-Step Pipeline for Top Categories

    In the MongoDB Shell, write an aggregation pipeline to find the top 3 categories by total revenue for orders from the last six months (from November 23, 2024). Use $match to filter orders by date, $group to calculate total revenue per category, $sort to order by revenue in descending order, $project to rename _id to category and total_revenue to revenue (rounded to two decimal places) and exclude _id, and $limit to show only the top 3 results.

    Solution
    db.sales.aggregate([
      { 
        $match: { 
          date: { 
            $gte: ISODate("2024-11-23")
          } 
        } 
      },
      { 
        $group: { 
          _id: "$category", 
          total_revenue: { $sum: "$revenue" } 
        } 
      },
      { 
        $sort: { 
          total_revenue: -1 
        } 
      },
      { 
        $project: { 
          _id: 0, 
          category: "$_id", 
          revenue: { $round: ["$total_revenue", 2] } 
        } 
      },
      { $limit: 3 }
    ])
    
    Expected Output You should see three documents, each showing a category and its rounded total revenue, sorted from highest to lowest:
    { category: 'Electronics', revenue: 155636.12 }
    { category: 'Apparel', revenue: 11607.19 }
    { category: 'Accessories', revenue: 6118.47 }
    

    Conclusion

    In this step, you've learned how to combine $match, $group, $sort, $project, and $limit into a powerful multi-step pipeline to analyze sales data. You've seen how each stage transforms the data and interpreted the results to identify top-performing categories.

    Now that you've completed this code lab, you're ready to analyze sales trends with MongoDB aggregations!

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.