- Lab
- Data

Search Text-based Product Descriptions and Reviews in MongoDB
In this lab, you'll create and leverage a text index in MongoDB to search for products based on their descriptions and customer reviews, sort results by relevance, and filter by SKUs. This will give you hands-on practice with standard MongoDB techniques using a sample based on real-world product review data.

Path Info
Table of Contents
-
Challenge
Introduction
In this lab, you'll create and leverage a text index in MongoDB to search for products based on their descriptions and customer reviews, sort results by relevance, and filter by SKUs. This will give you hands-on practice with standard MongoDB techniques using a sample based on real-world product review data.
The dataset you'll use comes preloaded into a collection within a MongoDB database, and the environment comes prepared for you to write Node.js code to query this collection.
Some notes:
- You'll do all your coding in the file
src/main.js
. You should keep strict mode enabled by leaving'use strict'
as the first line. - For each new task, replace the body of the
learnerFunction
function from your previous task. - If you get stuck on a task, you can consult the
solutions/
folder.
- You'll do all your coding in the file
-
Challenge
Find Products Using a Text Index on Product Descriptions, and One That Also Includes Reviews
This lab's dataset consists of reviews of consumer items, particularly frozen foods and beauty products. Each MongoDB document in the provided collection represents a single product review. Most products in this dataset include multiple reviews. Thus, several review docs will contain identical product information. Here's an example review document object:
{ _id: new ObjectId('68274ceb20f54e361fd6ae29'), 'Product Description': 'Our natural, proprietary blend of shea butter and lavender calms and soothes minor skin irritations. Its hydrates and leaves skin delicately fragranced. Creates a fragrant spa bath, relaxing mind and body.', SKU: 'BE04SC2NA4E', 'Review Title': 'The combined oils and scents are perfect!', 'Review Rating': 5, 'Review Date': 'March 9, 2020', 'Review Content': 'I absolutely love this body oil! The combined oils and scents used are perfect. I like that it is a natural, organic, fair trade, and family product!' }
Here, the fields
'Product Description'
andSKU
happen to be duplicated across four of the other reviews in this dataset regarding the same product.Searching for substrings within fields like
'Product Description'
becomes computationally inefficient at scale. Not only that, but with English (among others) it takes extra work to handle irregular plurals with such an approach. For example, searching for a "tooth whitening kit" will exclude relevant matches for "teeth whitening kit".Fortunately, MongoDB comes with a text index feature that mitigates both the inefficiency and plural handling awkwardness inherent in naive substring matching. The latter feature is called "stemming," and means that searches for "tooth" will match indexed fields containing "teeth," and vice versa.
To create an index on a given
collection
, pass an object to itscreateIndex
method, where each key is the name of the field you want to index, and each corresponding value is the literal string'text'
:collection.createIndex({ 'your field name goes here': 'text' }) ``` Once you've created an index on a collection, it persists. MongoDB automatically manages the index's maintenance, including adding entries to it whenever you insert new documents. The implication is that inserting documents is slowed a bit. However, for most use cases, the search performance boost gained more than compensates for the insertion performance lost. In other words, you create an index just once, and after that, it's ready to be leveraged in searches using the `$search` field within the `$text` operator passed to the `find` method: ``` js collection.find({ $text: { $search: 'your search query here' } })
The
find
method returns a cursor, which points to the results so they can be retrieved in batches if you want. For the purposes of this lab, every call you make tofind
should have a call totoArray
chained to its result. As its name implies, this returns all the retrieved docs in order as an array. That works for searching for "flavor" in the product description. What if you need results where that appears in reviews or their titles, too?In MongoDB, there can only be one text index on a given collection. Fortunately, that single index can include multiple fields.
MongoDB doesn't provide a way to modify an existing index. Since you already have an index — automatically assigned the name
'Product Description_text'
by MongoDB — you'll need to remove it before you can add a new one:collection.dropIndex('Product Description_text') .then(() => collection.createIndex({ 'Product Description': 'text', 'Review Title': 'text', }))
To include multiple text fields, use as many key-value pairs (in the same format as in Task 1) as needed in the object you're passing to
createIndex
. Though your replacement index includes three fields, the method for searching the collection remains the same:collection.find({ $text: { $search: 'your search query here' } }).toArray()
The only difference (if any) is in the results you'll get for a given query. In this case, "good" (or another match via stemming) appears in 14 product descriptions (describing 2 products), and also 12 reviews — only 3 of which are for those 14 product descriptions (and only for 1 product).
If you were to rerun the same query against your original index, it would return 9 fewer review results and include 4 fewer products compared to your replacement index.
-
Challenge
Find Products Matching One of Several Keywords
When you want to search an indexed MongoDB collection for documents matching any of several keywords, it's very convenient — the structure of your JavaScript code doesn't change at all.
All that changes is the query string itself, so that it includes all desired keywords, separated by spaces. For example, if you want results that include matches for green, blue, indigo, or violet, your
$text
operator's value would be{ $search: 'green blue indigo violet' }
. That works fine to find onion, lavender, or delicious (or any variants via stemming), and matches typical search engine behavior. It's good that's the default, since the exact phrase "onion lavender delicious" isn't a likely occurrence. -
Challenge
Perform an Exact Phrase Search and Exclude Certain Terms
Sometimes you might want to search for an exact phrase. You don't want the union of results for "bright" with the results for "red," or even their intersection. You want an even narrower set of results that match "bright red" as an exact, multi-word phrase.
MongoDB text indexes don't store multi-word phrases. Nonetheless, there is a straightforward syntax for searching for them, again matching typical search engine behavior. Just surround the phrase in matching quotation marks:
{ $search: '"bright red"' }
.Note, however, that this can't leverage the stemming feature text indexes normally provide, so a search for "bright reds" won't match a document that only contains "bright red." Note that the quotation marks used can be single or double, and whether they need escaping follows the normal rules for JavaScript:
/* context: { $search: '"bright red"' } */ '"bright red"' // valid "'bright red'" // valid '\"bright red\"' // valid, but unnecessary "\'bright red\'" // valid, but unnecessary '\'bright red\'' // valid "\"bright red\"" // valid // ''bright red'' // invalid // ""bright red"" // invalid
Now, suppose you want to search for "bright red", but don't want any results that mention "pink." MongoDB follows common search engine syntax here again: Negate any term by prefixing it with a hyphen (
-
). In this case, that means{ $search: '"bright red" -pink' }
. Note that MongoDB doesn't treat the hyphens in hyphenated phrases as negations, but as delimiters (just like spaces). In other words,{ $search: 'bright-red' }
doesn't negatered
, and instead is the same as{ $search: 'bright red' }
. On the other hand, in an exact phrase, hyphens and spaces are both treated literally (neither as negations nor as delimiters).In other words:
| Query String
Contents | Matches forbright
| Matches forred
| Matches forbright red
| Matches forbright-red
| |--------------------------|-------------------------|----------------------|-----------------------------|-----------------------------| |bright red
| included | included | (included) | (included) | |bright-red
| included | included | (included) | (included) | |"bright red"
| N/A | N/A | included | N/A | |"bright-red"
| N/A | N/A | N/A | included | |bright -red
| included | excluded | (excluded) | (excluded) | -
Challenge
Find Products Containing All Specified Keywords
Negating terms is not the only way to narrow down your search results on a textual basis. Another search method that's more restrictive than the default "OR" behavior (matching any term) is "AND" behavior, which insists on matching all terms.
To accomplish this with MongoDB, the trick is to treat each term you want to include as an exact phrase, using the syntax you saw in the previous two tasks,
{ $search: '"bright" "red" "blues"' }
, will only return results that contain bright and red and blues (regardless of the terms' positions relative to each other or which phrases they may appear in). In this example:- "This bright red car runs well" won't match because "blues" isn't present.
- "I love the red and blue hues outside on a bright day" won't match, because "blues" isn't present. (Remember, there's no stemming with exact phrases.)
- "I love the reds and blues outside on a bright day" will match, despite the lack of stemming, because "red" is contained within "reds".
Searching your multi-field MongoDB index like this will result in positive matches even when the different search terms appear in different fields. For example, a shea-butter soap may not contain "perfect" in its product description, but if it appears in a review field, it will still count as a match when searching so as to include "shea", "soap", and "perfect". If need be, you can combine the techniques for exact phrases and for "AND"/all inclusions. For example, to make sure every result includes a match for both the phrase "bright red" and "dark blue," you would use
{ $search: '"bright red" "dark blue"' }
.Likewise, with negation:
{ $search: 'bright red blue "dark blue" -"bright red" -"bright blue"' }
will match bright, red, blue, or "dark blue" (as an exact phrase), but will exclude any results that contain either "bright red" or "bright blue" as exact phrases. -
Challenge
Retrieve Relevance Score Alongside Search Results and Sort by It
When you search using a text index, MongoDB has the ability to include a relevance score (
"textScore"
) with each result. Since relevance scores are document metadata, this feature uses the$meta
operator in a projection, like so:{ score: { $meta: "textScore" } }
.This argument is then the value corresponding to a
projection
key in the second parameter (an options object) given tofind
:collection.find( { $text: { $search: 'your query here' } }, { projection: { score: { $meta: 'textScore' } } } )
A projection using the
$meta
operator has the handy side-effect of automatically sorting by the included scores, which MongoDB adds to each search result object as a number under the keyscore
, with a larger number indicating a better match.You may come across snippets of MongoDB code that use a more concise syntax for what you just accomplished:The Explicit Equivalent
Adding such a projection is the equivalent of adding a `sort` key with a duplicate value:collection.find( { $text: { $search: 'your query here' } }, { projection: { score: { $meta: 'textScore' } }, sort: { score: { $meta: 'textScore' } }, } )
Some consider it a best practice to specify this as well rather than rely on the default, implied behavior. Others consider this particular default unlikely to change. It's a tradeoff, but if your code follows other best practices like automated testing, at least a breaking change would be caught after an upgrade.
collection.find( { $text: { $search: 'your query here' } }, { score: { $meta: "textScore" } } // Caution: mongosh-specific projection shorthand! )
However, that implicit syntax doesn't work in Node.js, where it will be silently ignored.
Thankfully, the syntax you used to complete the task, while slightly more verbose, is compatible with both Node.js's MongoDB driver and
mongosh
, the MongoDB command-line shell. -
Challenge
Use `$regex` to Find Product SKUs Matching a Naming Convention
The dataset you're using includes a field called SKU (stock-keeping unit), which is an alphanumeric code unique to a product.
Though the rest of the data in this lab is real data from the public domain, the SKUs in the dataset have been generated for the purposes of this lab exercise, to illustrate a use case typical to many types of SKU codes.
In particular, each product in the dataset:
- Is made by one of five different companies (numbered 01 through 05)
- Falls into one of two different wider categories:
- "Beauty & Personal Care" ("BE")
- "Grocery & Gourmet Food" ("FO")
- Falls into one of four different subcategories:
- "Personal Care" ("PC")
- "Pantry Staples" ("PS")
- "Skin Care" ("SC")
- "Frozen" ("FR")
In this lab, a product's SKU will consist of
- A two-letter category code
- A two-digit company number
- A two-letter subcategory code
- A five-character product ID
For example, you can tell that the SKU "FO02PSKSX1F" is for a product:
- In the "Grocery & Gourmet Food" category ("FO")
- From company number 02
- In the "Pantry Staples" ("PS") subcategory
If you wanted to find more products from the same company, you could search the dataset for SKUs that have "02" after the first two characters. Likewise, you could find more food products looking for SKUs that have "FO" as a prefix.
In terms of matching using regular expressions, these examples correspond, respectively, to the patterns
/^..02/
and/^FO/
.In MongoDB, to search using a regular expression, you can use the name of the field you want to search instead of
$text
as the key in the object you pass as the first parameter tofind
. For the value, you can use the$regex
operator and standard/pattern/
notation for regular expressions in JavaScript:collection.find({ SKU: { $regex: /^..02/ } })
Since
$text
is not used, this doesn't leverage a text index, unlike in previous tasks. Note that there are additional ways to search MongoDB collections using regular expressions, including:- Using the
$in
operator instead, which has an implicit "OR"/any behavior- For example,
{ SKU: { $in: [ /^FO/, /^BE/ ] } }
would find results for products in either category (i.e., In this case, the entire dataset).
- For example,
- Combining the
$regex
operator with the$nin
("not in") operator, which behaves similarly to the use of exact phrase matches alongside negations as you saw a few tasks ago- For example,
{ SKU: { $regex: /^BE/i, $nin: [ 'BE04SC0GQPC' ] } }
would search for all beauty products, but exclude a particular SKU.
Note that
$nin
array values can also be regular expressions: Using$nin: [ /^BE0[14]/ ]
instead would exclude beauty products made by companies 01 or 04. - For example,
Congratulations on completing this lab!
What's a lab?
Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.
Provided environment for hands-on practice
We will provide the credentials and environment necessary for you to practice right within your browser.
Guided walkthrough
Follow along with the author’s guided walkthrough and build something new in your provided environment!
Did you know?
On average, you retain 75% more of your learning if you get time for practice.