• Labs icon Lab
  • Data
Labs

Search Text-based Product Descriptions and Reviews in MongoDB

In this lab, you'll create and leverage a text index in MongoDB to search for products based on their descriptions and customer reviews, sort results by relevance, and filter by SKUs. This will give you hands-on practice with standard MongoDB techniques using a sample based on real-world product review data.

Labs

Path Info

Level
Clock icon Intermediate
Duration
Clock icon 43m
Published
Clock icon Jun 02, 2025

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Table of Contents

  1. Challenge

    Introduction

    In this lab, you'll create and leverage a text index in MongoDB to search for products based on their descriptions and customer reviews, sort results by relevance, and filter by SKUs. This will give you hands-on practice with standard MongoDB techniques using a sample based on real-world product review data.

    The dataset you'll use comes preloaded into a collection within a MongoDB database, and the environment comes prepared for you to write Node.js code to query this collection.

    Some notes:

    • You'll do all your coding in the file src/main.js. You should keep strict mode enabled by leaving 'use strict' as the first line.
    • For each new task, replace the body of the learnerFunction function from your previous task.
    • If you get stuck on a task, you can consult the solutions/ folder.
  2. Challenge

    Find Products Using a Text Index on Product Descriptions, and One That Also Includes Reviews

    This lab's dataset consists of reviews of consumer items, particularly frozen foods and beauty products. Each MongoDB document in the provided collection represents a single product review. Most products in this dataset include multiple reviews. Thus, several review docs will contain identical product information. Here's an example review document object:

    {
        _id: new ObjectId('68274ceb20f54e361fd6ae29'),
        'Product Description': 'Our natural, proprietary blend of shea butter and lavender calms and soothes minor skin irritations. Its hydrates and leaves skin delicately fragranced. Creates a fragrant spa bath, relaxing mind and body.',
        SKU: 'BE04SC2NA4E',
        'Review Title': 'The combined oils and scents are perfect!',
        'Review Rating': 5,
        'Review Date': 'March 9, 2020',
        'Review Content': 'I absolutely love this body oil! The combined oils and scents used are perfect. I like that it is a natural, organic, fair trade, and family product!'
    }
    

    Here, the fields 'Product Description' and SKU happen to be duplicated across four of the other reviews in this dataset regarding the same product.

    Searching for substrings within fields like 'Product Description' becomes computationally inefficient at scale. Not only that, but with English (among others) it takes extra work to handle irregular plurals with such an approach. For example, searching for a "tooth whitening kit" will exclude relevant matches for "teeth whitening kit".

    Fortunately, MongoDB comes with a text index feature that mitigates both the inefficiency and plural handling awkwardness inherent in naive substring matching. The latter feature is called "stemming," and means that searches for "tooth" will match indexed fields containing "teeth," and vice versa.

    To create an index on a given collection, pass an object to its createIndex method, where each key is the name of the field you want to index, and each corresponding value is the literal string 'text':

    collection.createIndex({ 'your field name goes here': 'text' })
    ``` Once you've created an index on a collection, it persists.  MongoDB automatically manages the index's maintenance, including adding entries to it whenever you insert new documents.  The implication is that inserting documents is slowed a bit.  However, for most use cases, the search performance boost gained more than compensates for the insertion performance lost.
    
    In other words, you create an index just once, and after that, it's ready to be leveraged in searches using the `$search` field within the `$text` operator passed to the `find` method:
    
    ``` js
    collection.find({ $text: { $search: 'your search query here' } })
    

    The find method returns a cursor, which points to the results so they can be retrieved in batches if you want. For the purposes of this lab, every call you make to find should have a call to toArray chained to its result. As its name implies, this returns all the retrieved docs in order as an array. That works for searching for "flavor" in the product description. What if you need results where that appears in reviews or their titles, too?

    In MongoDB, there can only be one text index on a given collection. Fortunately, that single index can include multiple fields.

    MongoDB doesn't provide a way to modify an existing index. Since you already have an index — automatically assigned the name 'Product Description_text' by MongoDB — you'll need to remove it before you can add a new one:

    collection.dropIndex('Product Description_text')
    .then(() => collection.createIndex({
        'Product Description': 'text',
        'Review Title': 'text',
      }))
    

    To include multiple text fields, use as many key-value pairs (in the same format as in Task 1) as needed in the object you're passing to createIndex. Though your replacement index includes three fields, the method for searching the collection remains the same:

    collection.find({ $text: { $search: 'your search query here' } }).toArray()
    

    The only difference (if any) is in the results you'll get for a given query. In this case, "good" (or another match via stemming) appears in 14 product descriptions (describing 2 products), and also 12 reviews — only 3 of which are for those 14 product descriptions (and only for 1 product).

    If you were to rerun the same query against your original index, it would return 9 fewer review results and include 4 fewer products compared to your replacement index.

  3. Challenge

    Find Products Matching One of Several Keywords

    When you want to search an indexed MongoDB collection for documents matching any of several keywords, it's very convenient — the structure of your JavaScript code doesn't change at all.

    All that changes is the query string itself, so that it includes all desired keywords, separated by spaces. For example, if you want results that include matches for green, blue, indigo, or violet, your $text operator's value would be { $search: 'green blue indigo violet' }. That works fine to find onion, lavender, or delicious (or any variants via stemming), and matches typical search engine behavior. It's good that's the default, since the exact phrase "onion lavender delicious" isn't a likely occurrence.

  4. Challenge

    Perform an Exact Phrase Search and Exclude Certain Terms

    Sometimes you might want to search for an exact phrase. You don't want the union of results for "bright" with the results for "red," or even their intersection. You want an even narrower set of results that match "bright red" as an exact, multi-word phrase.

    MongoDB text indexes don't store multi-word phrases. Nonetheless, there is a straightforward syntax for searching for them, again matching typical search engine behavior. Just surround the phrase in matching quotation marks: { $search: '"bright red"' }.

    Note, however, that this can't leverage the stemming feature text indexes normally provide, so a search for "bright reds" won't match a document that only contains "bright red." Note that the quotation marks used can be single or double, and whether they need escaping follows the normal rules for JavaScript:

    /* context:
    
       { $search: '"bright red"' }
    
    */
    '"bright red"' // valid
    "'bright red'" // valid
    '\"bright red\"' // valid, but unnecessary
    "\'bright red\'" // valid, but unnecessary
    '\'bright red\'' // valid
    "\"bright red\"" // valid
    // ''bright red'' // invalid
    // ""bright red"" // invalid
    

    Now, suppose you want to search for "bright red", but don't want any results that mention "pink." MongoDB follows common search engine syntax here again: Negate any term by prefixing it with a hyphen (-). In this case, that means { $search: '"bright red" -pink' }. Note that MongoDB doesn't treat the hyphens in hyphenated phrases as negations, but as delimiters (just like spaces). In other words, { $search: 'bright-red' } doesn't negate red, and instead is the same as { $search: 'bright red' }. On the other hand, in an exact phrase, hyphens and spaces are both treated literally (neither as negations nor as delimiters).

    In other words:

    | Query String
    Contents | Matches for
    bright | Matches for
    red | Matches for
    bright red | Matches for
    bright-red | |--------------------------|-------------------------|----------------------|-----------------------------|-----------------------------| | bright red | included | included | (included) | (included) | | bright-red | included | included | (included) | (included) | | "bright red" | N/A | N/A | included | N/A | | "bright-red" | N/A | N/A | N/A | included | | bright -red | included | excluded | (excluded) | (excluded) |

  5. Challenge

    Find Products Containing All Specified Keywords

    Negating terms is not the only way to narrow down your search results on a textual basis. Another search method that's more restrictive than the default "OR" behavior (matching any term) is "AND" behavior, which insists on matching all terms.

    To accomplish this with MongoDB, the trick is to treat each term you want to include as an exact phrase, using the syntax you saw in the previous two tasks, { $search: '"bright" "red" "blues"' }, will only return results that contain bright and red and blues (regardless of the terms' positions relative to each other or which phrases they may appear in). In this example:

    • "This bright red car runs well" won't match because "blues" isn't present.
    • "I love the red and blue hues outside on a bright day" won't match, because "blues" isn't present. (Remember, there's no stemming with exact phrases.)
    • "I love the reds and blues outside on a bright day" will match, despite the lack of stemming, because "red" is contained within "reds".

    Searching your multi-field MongoDB index like this will result in positive matches even when the different search terms appear in different fields. For example, a shea-butter soap may not contain "perfect" in its product description, but if it appears in a review field, it will still count as a match when searching so as to include "shea", "soap", and "perfect". If need be, you can combine the techniques for exact phrases and for "AND"/all inclusions. For example, to make sure every result includes a match for both the phrase "bright red" and "dark blue," you would use { $search: '"bright red" "dark blue"' }.

    Likewise, with negation: { $search: 'bright red blue "dark blue" -"bright red" -"bright blue"' } will match bright, red, blue, or "dark blue" (as an exact phrase), but will exclude any results that contain either "bright red" or "bright blue" as exact phrases.

  6. Challenge

    Retrieve Relevance Score Alongside Search Results and Sort by It

    When you search using a text index, MongoDB has the ability to include a relevance score ("textScore") with each result. Since relevance scores are document metadata, this feature uses the $meta operator in a projection, like so: { score: { $meta: "textScore" } }.

    This argument is then the value corresponding to a projection key in the second parameter (an options object) given to find:

      collection.find(
        { $text: { $search: 'your query here' } },
        { projection: { score: { $meta: 'textScore' } } }
      )
    

    A projection using the $meta operator has the handy side-effect of automatically sorting by the included scores, which MongoDB adds to each search result object as a number under the key score, with a larger number indicating a better match.

    The Explicit Equivalent Adding such a projection is the equivalent of adding a `sort` key with a duplicate value:
      collection.find(
        { $text: { $search: 'your query here' } },
        {
          projection: { score: { $meta: 'textScore' } },
          sort:       { score: { $meta: 'textScore' } },
        }
      )
    

    Some consider it a best practice to specify this as well rather than rely on the default, implied behavior. Others consider this particular default unlikely to change. It's a tradeoff, but if your code follows other best practices like automated testing, at least a breaking change would be caught after an upgrade.

    You may come across snippets of MongoDB code that use a more concise syntax for what you just accomplished:
      collection.find(
        { $text: { $search: 'your query here' } },
        { score: { $meta: "textScore" } } // Caution: mongosh-specific projection shorthand!
      )
    

    However, that implicit syntax doesn't work in Node.js, where it will be silently ignored.

    Thankfully, the syntax you used to complete the task, while slightly more verbose, is compatible with both Node.js's MongoDB driver and mongosh, the MongoDB command-line shell.

  7. Challenge

    Use `$regex` to Find Product SKUs Matching a Naming Convention

    The dataset you're using includes a field called SKU (stock-keeping unit), which is an alphanumeric code unique to a product.

    Though the rest of the data in this lab is real data from the public domain, the SKUs in the dataset have been generated for the purposes of this lab exercise, to illustrate a use case typical to many types of SKU codes.

    In particular, each product in the dataset:

    • Is made by one of five different companies (numbered 01 through 05)
    • Falls into one of two different wider categories:
      1. "Beauty & Personal Care" ("BE")
      2. "Grocery & Gourmet Food" ("FO")
    • Falls into one of four different subcategories:
      1. "Personal Care" ("PC")
      2. "Pantry Staples" ("PS")
      3. "Skin Care" ("SC")
      4. "Frozen" ("FR")

    In this lab, a product's SKU will consist of

    1. A two-letter category code
    2. A two-digit company number
    3. A two-letter subcategory code
    4. A five-character product ID

    For example, you can tell that the SKU "FO02PSKSX1F" is for a product:

    • In the "Grocery & Gourmet Food" category ("FO")
    • From company number 02
    • In the "Pantry Staples" ("PS") subcategory

    If you wanted to find more products from the same company, you could search the dataset for SKUs that have "02" after the first two characters. Likewise, you could find more food products looking for SKUs that have "FO" as a prefix.

    In terms of matching using regular expressions, these examples correspond, respectively, to the patterns /^..02/ and /^FO/.

    In MongoDB, to search using a regular expression, you can use the name of the field you want to search instead of $text as the key in the object you pass as the first parameter to find. For the value, you can use the $regex operator and standard /pattern/ notation for regular expressions in JavaScript:

    collection.find({ SKU: { $regex: /^..02/ } })
    

    Since $text is not used, this doesn't leverage a text index, unlike in previous tasks. Note that there are additional ways to search MongoDB collections using regular expressions, including:

    • Using the $in operator instead, which has an implicit "OR"/any behavior
      • For example, { SKU: { $in: [ /^FO/, /^BE/ ] } } would find results for products in either category (i.e., In this case, the entire dataset).
    • Combining the $regex operator with the $nin ("not in") operator, which behaves similarly to the use of exact phrase matches alongside negations as you saw a few tasks ago
      • For example, { SKU: { $regex: /^BE/i, $nin: [ 'BE04SC0GQPC' ] } } would search for all beauty products, but exclude a particular SKU.

      Note that $nin array values can also be regular expressions: Using $nin: [ /^BE0[14]/ ] instead would exclude beauty products made by companies 01 or 04.


    Congratulations on completing this lab!

Kevin has 25+ years in full-stack development. Now he's focused on PostgreSQL and JavaScript. He's also used Haxe to create indie games, after a long history in desktop apps and Perl back ends.

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.