Featured resource
2026 Tech Forecast
2026 Tech Forecast

1,500+ tech insiders, business leaders, and Pluralsight Authors share their predictions on what’s shifting fastest and how to stay ahead.

Download the forecast
  • Lab
    • Libraries: If you want this lab, consider one of these libraries.
    • AI
Labs

Lab: Build and Evaluate a Document Q&A RAG Application with LangChain

Build a practical Retrieval-Augmented Generation (RAG) system in this hands-on guided lab. RAG systems combine document retrieval with large language models to generate answers grounded in external knowledge. In this lab, you will learn how to load and chunk documents, generate embeddings, store them in a ChromaDB vector database, and create a RAG chain that retrieves relevant context to answer user questions using an LLM. You will also evaluate RAG performance using RAGAS and improve retrieval quality with advanced techniques such as MultiQueryRetriever, ContextualCompressionRetriever, and SelfQueryRetriever. By the end of the lab, you will understand how modern RAG pipelines are designed, evaluated, and optimized for real-world AI applications.

Lab platform
Lab Info
Level
Intermediate
Last updated
Apr 17, 2026
Duration
50m

Contact sales

By clicking submit, you agree to our Privacy Policy and Terms of Use, and consent to receive marketing emails from Pluralsight.
Table of Contents
  1. Challenge

    Introduction

    Welcome to the Build and Evaluate a Document Q&A RAG Application with LangChain lab!

    In this hands-on lab, you will build and evaluate a complete RAG pipeline using LangChain, ChromaDB, and OpenAI models. You will load and chunk documents, generate vector embeddings, store them in a vector database, and create a RAG chain to retrieve relevant context and answer user questions.

    You will also evaluate the system using the RAGAS framework and explore advanced retrieval techniques such as MultiQueryRetriever, ContextualCompressionRetriever, and SelfQueryRetriever to improve retrieval accuracy.

    Retrieval-Augmented Generation (RAG) combines information retrieval with large language models to generate accurate, context-aware responses. Instead of relying only on pre-trained knowledge, the system retrieves relevant data from an external source and uses it as context for answering questions.

    A RAG system has two main stages:

    • Indexing: loading documents, splitting them into chunks, generating embeddings, and storing them in a vector database
    • Querying: retrieving relevant chunks for a user query and passing them to an LLM to generate an answer

    The following diagram shows the overall flow of the system you will build:

    Indexing:  Documents → Load → Chunk → Embed → Vector DB  
    Querying:  Query → Embed → Retrieve (from Vector DB) → Augment Prompt with Context → LLM → Answer
    ``` ### **Key Takeaways**
    
    - Build a complete Retrieval-Augmented Generation (RAG) pipeline using LangChain.
    
    - Load and split documents using multiple chunking strategies.
    
    - Store document embeddings in a ChromaDB vector database for similarity search.
    
    - Implement a RAG chain to retrieve relevant context and generate answers using an LLM.
    
    - Evaluate RAG performance using RAGAS metrics such as `faithfulness`.
    
    - Improve retrieval quality using advanced retrievers like `MultiQueryRetriever`, `ContextualCompressionRetriever`, and `SelfQueryRetriever`. ## Prerequisites
    
    ### Basic Python Knowledge
    
    - Learners should be familiar with Python fundamentals such as functions, variables, lists, and basic control flow.
    
    ### Introductory AI or LLM Concepts
    
    - A basic understanding of Large Language Models (LLMs), embeddings, and Retrieval-Augmented Generation (RAG) concepts is helpful but not required.
    
    ### Text Editor, Terminal & Python Environment
    
    - Comfort using a code editor or IDE.
    
    - Experience running Python programs from the terminal. ## Horizon Hotels RAG App
    
    A simple Horizon Hotels RAG application is provided for this lab. It uses a small knowledge base containing hotel information stored in multiple document formats.
    
    ### Lab Data Sources
    
    The knowledge base files are located in the `data` folder:
    
    - `horizon_hotel_profiles.pdf` — Property guide for all eight Horizon hotels, including amenities, dining options, and travel details.
    https://{{hostname}}--3000.pluralsight.run/data/horizon_hotel_profiles.pdf
    
    - `horizon_dining_guide.md` — Restaurants, chefs, menus, pricing, and dining experiences across Horizon properties.
    https://{{hostname}}--3000.pluralsight.run/data/horizon_dining_guide.md
    
    - `horizon_faq.html` — Frequently asked questions covering reservations, Wi-Fi, transfers, accessibility, and loyalty basics.
    https://{{hostname}}--3000.pluralsight.run/data/horizon_faq.html
     ### Solution Code
    
    info> The final code for each step is stored in the `__solution/code` folder. For instance, the final code for Step 2 is available in the `__solution/code/Step02` directory.
    
  2. Challenge

    Create Document Loaders

    In this step, you will implement document loaders to ingest data for the RAG application. You will use LangChain loaders to read PDF, Markdown, and HTML files and convert them into Document objects for chunking, embedding, and retrieval.

    Copy the API key from the top bar. Then, open the .env file and replace the placeholder <pluralsight-openai-api-key> with your copied API key.

    Explanation
    • PyPDFLoader(file_path) initialises a LangChain document loader capable of reading PDF files and converting each page into a document object.

    • LangChain loaders return Document objects containing both page_content and metadata.

    • doc.metadata["source_file"] = pdf_file attaches the original file name as metadata.

    • This is useful for tracing the source of retrieved chunks, debugging retrieval results, and evaluating answers against ground-truth sources.

    • docs.extend(pdf_docs) adds all loaded documents to the final list returned by the function.

    Navigate to the first terminal and execute the following command to load the PDF files.
    python -m app.loaders --document_type pdf 
    

    You should see output in your terminal similar to the example below.

    ....
    [9]
    source: horizon_hotel_profiles.pdf
    metadata keys: ['producer', 'creator', 'creationdate', 'author', 'keywords', 'moddate', 'subject', 'title', 'trapped', 'source', 'total_pages', 'page', 'page_label', 'source_file']
    content: Horizon Cape Town V&A; Waterfront, Cape Town, South Africa Star Rating 3 Stars City Cape Town, South Africa 
    ...
    
    ``` <details>
    <summary>Explanation</summary>
    
    - `UnstructuredMarkdownLoader(file_path)` loads Markdown files and converts them into LangChain `Document` objects.
    
    - `BSHTMLLoader(file_path)` loads HTML documents and extracts readable content into `Document` objects.
    
    - `TextLoader(file_path, encoding="utf-8")` loads plain text files and ensures correct UTF-8 decoding.
    
    - The `load()` method reads the file and returns a list of `Document` objects containing the extracted text and metadata.
    
    - `doc.metadata["source_file"]` attaches the original file name to each document so the system can track the source of the content.
    
    - This metadata is useful for tracing retrieved chunks, debugging retrieval behaviour, and evaluating answers against ground-truth sources.
    
    These loaders enable the RAG system to ingest multiple document formats, including PDFs, Markdown files, HTML pages, and plain text files. In real-world scenarios, data may also come from sources such as CSV files, JSON files, Word documents (DOCX), Excel spreadsheets (XLSX), web pages, databases, or API responses.
    
    </details> Navigate to the first terminal and execute the following command to load the Markdown file.
    ```bash
    python -m app.loaders --document_type markdown       
    

    You should see output in your terminal similar to the example below.

    ...
    Document type: markdown
    Total documents loaded: 1
    ============================================================
    
    [1]
    source: horizon_dining_guide.md
    metadata keys: ['source', 'source_file']
    content: # Horizon Hotels & Resorts — Dining & 
    ....
    

    Here are a few additional commands to test other document types:

    python -m app.loaders --document_type html
    python -m app.loaders --document_type html --document_loader html_loader
    ``` Congratulations! In this step, you implemented loaders to ingest documents from multiple formats, including PDF, Markdown, and HTML. These documents are now converted into LangChain `Document` objects with metadata that tracks their source.
  3. Challenge

    Split Documents into Chunks

    In this step, you will implement different chunking strategies to split documents into smaller, meaningful pieces for retrieval. You will use recursive chunking for PDFs and structure-aware chunking for Markdown and HTML files.

    Explanation
    • RecursiveCharacterTextSplitter is a LangChain text splitter that breaks documents into smaller chunks for embedding and retrieval.

    • chunk_size controls the maximum size of each chunk, while chunk_overlap ensures that neighbouring chunks share content to preserve context between chunks.

    • The separators list defines how the splitter attempts to break the text. It tries larger logical boundaries first (paragraphs and lines) before falling back to smaller ones, such as spaces and individual characters.

    • splitter.split_documents(docs) processes the input documents and returns a list of chunks while preserving metadata from the original documents.

    • Chunking improves retrieval accuracy because embeddings represent shorter, more focused sections of text rather than entire documents.

    Navigate to the first terminal and execute the following command to chunk data using `RecursiveCharacterTextSplitter`.
    python -m app.chunker --document_type pdf
    

    You should see output in your terminal similar to the example below.

    ...
    Document type: pdf
    Documents processed: 9
    PDF strategy: recursive
    Total chunks: 17
    Chunk sizes — avg: 645, min: 388, max: 827 chars
    ============================================================
    
    [1]
    source: horizon_hotel_profiles.pdf
    hotel: unmatched
    section: unmatched
    size: 388 chars
    metadata keys: ['producer', 'creator', 'creationdate', 'author', 'keywords', 'moddate', 'subject', 'title', 'trapped', 'source', 'total_pages', 'page', 'page_label', 'source_file', 'document_type']
    content: Horizon Hotels & Resorts Property Guide
    ...
    ``` <details>
    <summary>Explanation</summary>
    
    - `MarkdownHeaderTextSplitter` splits Markdown documents based on their header structure rather than fixed character boundaries.
    
    - The `headers_to_split_on` setting tells the splitter to create chunks based on `##` and `###` headings and store those heading values in chunk metadata. The resulting chunks may still be large. Recursive chunking can be used to split them further if needed.
    
    - `strip_headers=False` keeps the headers inside the chunk content, which helps preserve context for retrieval and generation.
    
    - `HTMLHeaderTextSplitter` works in a similar way for HTML documents by splitting content based on header tags, such as `<h1>` and `<h2>`.
    
    - `md_splitter.split_text(doc.page_content)` splits the Markdown document text into structured chunks.
    
    - `html_splitter.split_text(doc.page_content)` splits the raw HTML content into structured chunks based on the HTML heading hierarchy.
    
    - This approach is called context-aware chunking because it uses the document's structure to create more meaningful chunks. Instead of splitting purely by size, it keeps related content grouped under headings, which can improve retrieval quality in a RAG system.
    
    </details> Observe the `chunk_documents` function in the `app/chunker.py` file. This function applies chunking strategies for each document type and enriches metadata.
    
    Navigate to the first terminal and execute the following command to chunk the data in the markdown file.
    
    ```bash
    python -m app.chunker --document_type markdown
    

    You should see output in your terminal similar to the example below.

    ...
    [1]
    source: horizon_dining_guide.md
    hotel: unmatched
    section: unmatched
    size: 277 chars
    metadata keys: ['source', 'source_file', 'document_type']
    content: # Horizon Hotels & Resorts — Dining & Restaurant Guide 2025   Welcome to the Horizon culinary experience. Each of our
    ...
    

    Here is an additional command to test another document type:

    python -m app.chunker --document_type html
    ``` Congratulations! In this step, you implemented multiple chunking strategies to prepare documents for retrieval. You used recursive chunking for PDFs and structure-aware chunking for Markdown and HTML files.
  4. Challenge

    Create a ChromaDB Vector Store

    In this step, you will create a vector database to store the document chunks generated in the previous step. Each chunk will be converted into an embedding using the OpenAI embedding model (text-embedding-3-small) and stored in ChromaDB along with its metadata.

    Explanation
    • Storing chunks as embeddings in a vector database enables efficient similarity search, a key component of the retrieval stage in a RAG system.

    • Chroma.from_documents() creates a vector store by converting document chunks into embeddings and storing them in a ChromaDB collection.

    • documents=chunks provides the list of document chunks generated during the chunking step.

    • embedding=embeddings specifies the embedding model used to convert text into vector representations.

    • persist_directory=db_path tells ChromaDB where to store the vector database on disk so it can be reused without recomputing embeddings.

    • collection_name=collection_name assigns a name to the collection, allowing the vector store to organise and manage related embeddings.

    • Each stored vector is linked to the chunk text and its metadata, allowing the retriever to return both relevant content and its source information.

    Observe the CLI entry point and the `load_existing_vectorstore` function in the `app/vectorstore.py` file.
    • The CLI entry point tests this functionality by creating a test_collection with two sample documents.

    • load_existing_vectorstore loads an existing ChromaDB collection and returns a Chroma instance. This function reconnects to the previously created vector database using the same embedding model and collection name.

    Navigate to the first terminal and execute the following command to embed and store test sample documents in the vector store.

    python -m app.vectorstore
    

    You should see output in your terminal similar to the example below.

    ...
    Creating test vectorstore with 2 documents...
    Collection: test_collection
    
    âś… Vectorstore created successfully!
    Collection count: 2
    
    Testing retrieval...
    Query: 'Paris hotel'
    Result: Horizon Paris is a luxury hotel in the heart of Paris.
    ...
    ``` All documents for this lab have already been indexed using the `rag_indexing` file, which uses the functionality above.
    
    Navigate to the second terminal and execute the following command to start a Streamlit server.
    ```bash
    python -m streamlit run streamlit/dashboard-chunk-browser.py
    

    Navigate to the following URL https://{{hostname}}--8080.pluralsight.run to view the chunks created. Congratulations! In this step, you created a ChromaDB vector store and stored the document chunks as embeddings along with their metadata.

  5. Challenge

    Build Basic RAG chain

    In this step, you will implement the retrieval and generation components of the RAG pipeline. You will first create a retriever to fetch relevant document chunks from the vector store, then build a basic RAG chain that combines retrieval with the OpenAI LLM (gpt-5-mini) to answer user questions.

    Explanation
    • vectorstore.as_retriever() converts the Chroma vector store into a retriever object that can perform different types of searches.

    • search_type=retriever_type specifies the retrieval strategy. The default "similarity" retrieves chunks that are most similar to the query embedding.

    • search_kwargs={"k": retriever_k} controls how many document chunks are returned for each query.

    • In this lab, the default value k=6 means the retriever will return the top six most relevant chunks.

    • The retriever is the retrieval component of the RAG pipeline, responsible for selecting the most relevant chunks from the vector store and passing them as context to the language model.

    Explanation
    • ChatPromptTemplate.from_template() creates a reusable chat prompt template from a string. It allows placeholders such as {context} and {question} to be dynamically filled with retrieved content and the user’s query when the chain runs.

    • The RAG chain above combines retrieval and generation, allowing the model to answer questions using relevant information from the vector database.

    • This chain is built using LCEL (LangChain Expression Language), which allows components to be connected using the pipe operator | to form a processing pipeline.

    • {"context": retriever | RunnableLambda(format_docs), "question": RunnablePassthrough()} prepares the inputs for the prompt:

      • The retriever fetches relevant document chunks from the vector store.

      • RunnableLambda(format_docs) converts retrieved document objects into a formatted text string to insert into the prompt as context.

      • RunnablePassthrough() passes the user’s question directly into the chain without modification, allowing it to be included in the prompt alongside the context.

    • prompt_template formats the context and question into a structured prompt sent to the language model.

    • llm generates a response using the prompt and retrieved context.

    • StrOutputParser() converts the language model output into a clean string for easy display or return.

    Navigate to the first terminal and execute the following command to run the RAG chain. This command retrieves the top 3 most relevant chunks using the --k parameter.
    python -m app.rag_chain --question "What is the check-in time at Horizon London?" --k 3
    

    You should see output in your terminal similar to the example below. Your results may vary.

    ...
    Question: What is the check-in time at Horizon London?
    Retriever: basic
    Top-k: 3
    ============================================================
    
    Answer: Check-in at Horizon London is at 2:00 PM (local time).
    ...
    

    The current prompt is basic and can be improved to better control how the model answers questions. In the next task, you will enhance the prompt to improve the quality and reliability of the responses.

    Explanation
    • Improving prompt design is an important step in building reliable RAG systems, as it helps control how the language model uses retrieved information to generate answers.

    • This enhanced prompt guides the language model to produce more accurate and grounded responses.

    • The instruction "Answer the question based only on the provided context" encourages the model to rely on retrieved documents rather than its own prior knowledge.

    • The fallback instruction "If the answer is not in the context, say 'This information is not available in our records.'" helps reduce hallucinations.

    • {context} is dynamically populated with the document chunks retrieved from the vector database.

    • {question} is populated with the user's query.

    • ChatPromptTemplate.from_template() creates a reusable prompt template where placeholders such as {context} and {question} are automatically filled when the chain runs.

    Congratulations! In this step, you implemented the core components of the RAG pipeline by creating a retriever, building a basic RAG chain, and improving the prompt template.
  6. Challenge

    Implement advanced retrievers

    In this step, you will improve the RAG system by implementing three advanced retrievers: MultiQueryRetriever, ContextualCompressionRetriever, and SelfQueryRetriever.

    Why are advanced retrievers needed?
    • Advanced retrievers are needed in RAG systems because basic retrieval methods often struggle with complex queries, ambiguous questions, and retrieving truly relevant information. These limitations can lead to incomplete or less accurate responses.

    • Advanced techniques improve retrieval accuracy by generating more effective queries, filtering irrelevant content, and providing more precise context to the language model.

    • Approaches such as MultiQueryRetriever, ContextualCompressionRetriever, and SelfQueryRetriever enhance retrieval by generating query variations, extracting the most relevant document sections, and applying metadata-based filtering, improving the quality of generated answers.

    Explanation
    • MultiQueryRetriever improves retrieval quality by generating multiple variations of the user’s question using an LLM, allowing the system to capture different semantic interpretations.

    • This approach improves recall by retrieving relevant documents that may not match the exact wording of the original query.

    • Each generated query is used to retrieve documents from the vector store.

    • The results from all queries are combined and deduplicated to produce a richer set of retrieved contexts.

    • MultiQueryRetriever.from_llm() creates the retriever by connecting the base retriever with an LLM that generates query variations.

    • retriever=base_retriever specifies the underlying retriever used to perform document search.

    • llm=llm enables the retriever to generate alternative versions of the user’s query.

    • include_original=True ensures that the original query is included along with the generated variations for more reliable retrieval.

    Navigate to the first terminal and execute the following command to test the above changes.
    python -m app.advanced_retriever_multiquery --question "Which hotel is best for a relaxing vacation?"
    

    You should see output in your terminal similar to the example below. Observe the MultiQuery logs.

    ...
    INFO:langchain_classic.retrievers.multi_query:Generated queries: ['Which hotels are best for a relaxing vacation, prioritizing spa services, quiet rooms, wellness programs, and tranquil surroundings?', 'Which hotels receive the highest guest ratings for relaxation, peacefulness, and comfort—ideal for a stress-free getaway?', 'Which boutique, adult-only, or all-inclusive hotels are recommended for a restful, low-activity vacation (beachfront or countryside)?']
    
    Retrieved 9 chunks (deduplicated)
    
    [1]
    source: horizon_hotel_profiles.pdf
    hotel: Horizon Bali
    content: Horizon Bali - profile  Horizon Bali Ubud, Bali,
    ...
    ``` <details>
    <summary>Explanation</summary>
    
    - `ContextualCompressionRetriever` improves retrieval quality by filtering and compressing retrieved documents before they are passed to the language model.
    
    - This can improve answer quality by reducing unnecessary context and keeping only the most relevant information for the query.
    
    - `LLMChainExtractor.from_llm(llm)` creates a compressor that uses the LLM to extract the most relevant parts of each retrieved document.
    
    - `base_retriever` first retrieves an initial set of candidate documents from the vector store.
    
    - `base_compressor` then processes those documents and removes irrelevant content, retaining only the sections relevant to the user’s query.
    
    - The resulting compressed context is passed to the RAG chain, which helps reduce noise and improve the quality of the generated answer.
    
    </details> Navigate to the first terminal and execute the following command to test the above changes.
    
    ```bash
    python -m app.advanced_retriever_compression --question "What is the check-in time at Horizon Paris?"
    

    You should see output in your terminal similar to the example below. Observe the compressed context.

    ...
    --- Compression Retriever (LLMChainExtractor) ---
    Retrieved 1 chunks (49 total chars)
    
    [1]
    source: horizon_hotel_profiles.pdf
    hotel: Horizon Paris
    size: 49 chars
    content: Check-in is at 3:00 PM and check-out at 11:00 AM....
    ...
    ``` <details>
    <summary>Explanation</summary>
    
    - `SelfQueryRetriever` allows the language model to translate a user’s natural language question into a structured query with optional metadata filters.
    
    - `SelfQueryRetriever.from_llm()` connects the LLM with the vector store so it can interpret the question and generate both a semantic query and metadata filters.
    
    - `document_contents` describes the type of information stored in the documents, helping the LLM understand the context of the data.
    
    - `metadata_field_info` defines the available metadata fields that can be used for filtering during retrieval.
    
    - `search_kwargs={"k": retriever_k}` controls how many documents are retrieved after applying the generated query and filters.
    
    - `enable_limit=True` allows the LLM to include result limits (such as the number of documents to return) when constructing the structured query.
    
    - `verbose=True` prints the generated structured query and filters, which is useful for debugging and understanding how the LLM interprets the question.
    
    - Using a SelfQuery retriever enables more advanced retrieval by combining semantic search with metadata-based filtering, improving retrieval accuracy in RAG systems.
    
    </details> Navigate to the first terminal and execute the following command to test the above changes.
    
    ```bash
    python -m app.advanced_retriever_selfquery --question "Which hotels in Asia have both pool and spa?"
    

    You should see output in your terminal similar to the example below. Observe the query and the filter.

    ...
    INFO:langchain_classic.retrievers.self_query.base:Generated Query: query=' ' filter=Operation(operator=<Operator.AND: 'and'>, arguments=[Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='continent', value='Asia'), Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='has_pool', value=True), Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='has_spa', value=True)]) limit=None
    Retrieved 3 chunks
    
    [1]
    hotel: Horizon Dubai | continent: Asia | price: $650 | pool: True | spa: True
    content: Horizon Dubai - dining  ## Nami — Horizon Dubai   **Cuisine:** Japanese Omakase **Chef:** Takeshi Ono **Dress Code:** Smart casual **Reservations:** R...
    ...
    ``` Congratulations! In this step, you implemented `MultiQueryRetriever`, `ContextualCompressionRetriever`, and `SelfQueryRetriever` to improve the retrieval of relevant information from the vector store.
  7. Challenge

    Evaluate RAG using Ragas

    In this step, you will evaluate the RAG system using RAGAS, a framework for assessing Retrieval-Augmented Generation pipelines. You will run a set of evaluation questions through the system, collect the retrieved contexts and generated answers, and use RAGAS metrics to measure retrieval quality and answer accuracy.

    RAGAS Overview

    RAGAS is an evaluation framework for measuring the performance of Retrieval-Augmented Generation (RAG) systems. It helps developers assess how effectively a RAG pipeline retrieves relevant information and generates accurate answers. RAGAS provides automated metrics to evaluate both the retrieval step and the generated response.

    In RAG evaluation, contexts refer to the document chunks retrieved from the vector store that are used to generate the answer. The evaluation process first runs the RAG pipeline to generate answers, then compares the generated answers and retrieved contexts against reference answers using RAGAS metrics.

    Key Metrics

    • Faithfulness (Generation) – Measures whether the generated answer is factually supported by the retrieved context, helping detect hallucinations.
    • Answer Relevancy (Generation) – Evaluates whether the generated answer directly addresses the user’s question.
    • Context Precision (Retrieval) – Measures how relevant the retrieved documents are to the query.
    • Context Recall (Retrieval) – Measures whether the retriever successfully retrieves the information needed to answer the question.

    A set of 2 questions has been created for this lab. You will use these questions to run the RAG system, retrieve relevant document contexts, generate answers, and evaluate the system’s performance. For this lab, you will focus only on evaluating faithfulness.

    https://{{hostname}}--3000.pluralsight.run/data/ragas_test_question.json

    Explanation
    • This function prepares evaluation data for the RAG system by retrieving relevant contexts and generating answers for each question in the dataset.

    • A retriever is created using get_advanced_retriever(...), which controls how documents are fetched based on retriever_mode and retriever_k.

    • The rag_chain is built using this retriever, allowing it to generate answers using retrieved context and the language model.

    • retriever.invoke(question) retrieves the most relevant document chunks from the vector store for the given question.

    • The retrieved documents are converted into plain text using doc.page_content and stored as retrieved_contexts.

    • rag_chain.invoke(question) generates the final answer using the RAG pipeline.

    • The output includes both retrieved_contexts and generated_with_rag, which are required for RAGAS evaluation.

    • Each result includes the question, the generated answer, the retrieved contexts, and the expected ground-truth answer.

    • This structured dataset enables evaluation using RAGAS metrics such as Faithfulness, Answer Relevancy, Context Precision, and Context Recall.

    Navigate to the first terminal and run the following command to prepare evaluation data for the RAGAS test.
    python -m app.prepare_ragas_test_data
    

    You should see output in your terminal similar to the example below.

    ....
    Generated 2 RAG answers with retrieved contexts
    Output saved to: data/ragas_test_data_minimal.json
    ...
    

    You can view the test data here: https://{{hostname}}--3000.pluralsight.run/data/ragas_test_data_minimal.json

    Explanation
    • The dataset is constructed using Dataset.from_dict to match the format required by RAGAS.

    • "question" represents the user query.

    • "answer" contains the response generated by the RAG system.

    • "contexts" includes the retrieved document chunks used to generate the answer.

    • "ground_truth" provides the correct reference answer for evaluation.

    • The metrics variable defines which RAGAS metric will be used to evaluate performance.

    • In this lab, the faithfulness metric measures whether the generated answer is supported by the retrieved context.

    • The evaluate() function runs the evaluation using the dataset, selected metric, and LLM to assess the quality of the RAG system’s response.

    In this lab, only the faithfulness metric is used to measure whether the generated answer is supported by the retrieved context. In real-world applications, multiple metrics are typically used, such as Answer Relevancy, Context Precision, and Context Recall, to evaluate both retrieval quality and answer accuracy. Navigate to the first terminal and execute the following command to test the above changes.

    python -m app.ragas_evaluation
    

    You should see output similar to the example below. Any warnings can be ignored for this lab.

    ....
    Results:
    {'faithfulness': 1.0000}
    ...
    

    In this step, you evaluated the RAG system using the RAGAS framework. You used the generated answers and retrieved contexts to measure the system’s performance using the Faithfulness metric. Congratulations! You have successfully completed this lab.

About the author

Asmin Bhandari is a full stack developer with years of experience in designing, developing and testing many applications and web based systems.

Real skill practice before real-world application

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Learn by doing

Engage hands-on with the tools and technologies you’re learning. You pick the skill, we provide the credentials and environment.

Follow your guide

All labs have detailed instructions and objectives, guiding you through the learning process and ensuring you understand every step.

Turn time into mastery

On average, you retain 75% more of your learning if you take time to practice. Hands-on labs set you up for success to make those skills stick.

Get started with Pluralsight