Featured resource
2026 Tech Forecast
2026 Tech Forecast

1,500+ tech insiders, business leaders, and Pluralsight Authors share their predictions on what’s shifting fastest and how to stay ahead.

Download the forecast
  • Lab
    • Libraries: If you want this lab, consider one of these libraries.
    • Data
Labs

Analyze Sales Data in DuckDB

A hands-on Code Lab demonstrating how quickly DuckDB can be installed and used to consolidate and analyze a realistic sales dataset (Orders, Order Line Items, Products, Customers). Learners install DuckDB in seconds, then query CSV, Excel, and Parquet files — and even remote HTTPS lookups — together in a single SQL workflow. The lab focuses on answering common business questions (revenue trends, top products, customer activity) while practicing lightweight validation, type-correcting, joins across heterogeneous sources, and materializing curated summaries.

Lab platform
Lab Info
Level
Intermediate
Last updated
May 05, 2026
Duration
30m

Contact sales

By clicking submit, you agree to our Privacy Policy and Terms of Use, and consent to receive marketing emails from Pluralsight.
Table of Contents
  1. Challenge

    Step 1: Introduction

    Overview

    This lab explores the variety of ways you can query data using DuckDB. Rather than focusing on a single format or source, you'll work across CSV, Excel, Parquet, local files, HTTP endpoints, and cloud object storage, using the same familiar SQL syntax.

    DuckDB meets your data where it lives, so you won't need to move or transform files before querying them.

    info> Note: This lab uses DuckDB v1.2.2.


    Working in This Lab

    For a smoother workflow, use scratch_pad.sql in the file tree to write and refine your SQL queries.


    Prerequisites

    Before starting this lab, you should have:

    • SQL fundamentals: Experience writing basic queries (SELECT, WHERE, JOIN, GROUP BY, ORDER BY)
    • Data concepts: Familiarity with tabular data (rows, columns, primary/foreign keys) and common data types (integers, strings, dates).
    • Command-line skills: Comfort navigating a terminal.
    • File formats: Prior exposure to CSV and Excel file formats.
    • DuckDB: No prior experience necessary; the lab teaches DuckDB from the ground up.
  2. Challenge

    Step 2: Start DuckDB

    Install DuckDB

    Goal: Start the DuckDB CLI and load the extensions required to query all data sources used in this lab.

    DuckDB extensions add support for data sources and formats that are not built into the core engine. In this lab, you’ll use extensions to read remote files, Excel workbooks, and cloud-style object storage.


    1. Activate the DuckDB CLI from the command line:

      duckdb
      
    2. Install and load the required extensions inside DuckDB:

      -- Remote file access over HTTPS
      INSTALL httpfs;  
      LOAD httpfs;
      
      -- Excel reading via st_read()
      INSTALL spatial; 
      LOAD spatial; 
      
      -- Alternative Excel reader (DuckDB 1.x+)
      INSTALL excel;   
      LOAD excel;    
      
    3. Confirm all three extensions are active before moving on:

      SELECT 
        extension_name, 
        loaded, 
        installed
      FROM duckdb_extensions()
      WHERE extension_name 
      IN ('httpfs', 'spatial', 'excel');
      
      ResultWhen loaded correctly the select query should return:

      | extension_name | loaded | installed | |---|---|---| | httpfs | true | true | | spatial | true | true | | excel | true | true |

  3. Challenge

    Step 3: Query Raw Files

    Inspect the Data Sources

    Goal: Understand what data files exist, where they live, and what each source contains.

    Note: If your terminal does not already have a DuckDB session, type duckdb to start one.


    1. Preview the accounts CSV file using read_csv_auto:

      SELECT * 
      FROM read_csv_auto
       ('data/salesforce/accounts.csv') 
      LIMIT 5;
      
      ResultYour results should look like this:

      | account_id | account_name | industry | created_date | |---|---|---|---| | A00001 | Account 1 | Retail | 2021-03-09 | | A00002 | Account 2 | Finance | 2022-03-13 | | A00003 | Account 3 | Retail | 2020-10-01 | | A00004 | Account 4 | Finance | 2022-08-01 | | A00005 | Account 5 | Finance | 2020-03-13 |

    2. Preview the opportunities CSV file using read_csv_auto:

      SELECT * 
      FROM read_csv_auto ('data/salesforce/opportunities.csv')   LIMIT 5;
      
      ResultYour results should look like this:

      | id | account_id | amount | close_date | stage | |---|---|---|---|---| | O0000001 | A00387 | 194560.19 | 2021-10-17 | Closed Lost | | O0000002 | A01794 | 17925.62 | 2021-01-09 | Closed Lost | | O0000003 | A03871 | 182264.19 | 2022-12-04 | Proposal | | O0000004 | A02510 | 197361.29 | 2020-05-17 | Negotiation | | O0000005 | A00385 | 151152.83 | 2023-10-02 | Closed Won |

    3. Preview orders Excel sources using st_read:

      SELECT * 
      FROM st_read
        ('data/oltp/orders.xlsx') 
      LIMIT 5;
      
      ResultYour results should look like this:

      | order_id | account_id | order_total | order_date | |---|---|---|---| | 1000000 | A02164 | 3293.55 | 2021-06-17 00:00:00 | | 1000001 | A02533 | 1738.11 | 2021-02-15 00:00:00 | | 1000002 | A01857 | 0.0 | 2024-02-18 00:00:00 | | 1000003 | A02719 | 21417.56 | 2022-08-12 00:00:00 | | 1000004 | A03232 | 4770.22 | 2023-12-05 00:00:00 |

    4. Preview inventory Excel sources using st_read:

      SELECT * 
      FROM st_read
        ('data/oltp/inventory.xlsx') 
      LIMIT 5;
      
      ResultYour results should look like this:

      | sku | description | quantity | last_stocked | |---|---|---|---| | P000001 | Product 1 | 913 | 2023-04-29 00:00:00 | | P000002 | Product 2 | 133 | 2023-09-25 00:00:00 | | P000003 | Product 3 | 582 | 2024-01-24 00:00:00 | | P000004 | Product 4 | 680 | 2023-07-10 00:00:00 | | P000005 | Product 5 | 451 | 2023-05-25 00:00:00 |

    Create and Query a DuckDB Table

    Goal: Verify DuckDB's inferred schema and confirm data types are correct—acting as a gatekeeping check to ensure inferred types won't cause silent data issues downstream.


    1. Materialize a CSV source to a temporary table:

      CREATE TEMP TABLE t_accounts AS
      SELECT * 
      FROM read_csv_auto
       ('data/salesforce/accounts.csv');
      
    2. Inspect the inferred schema with PRAGMA table_info:

      PRAGMA table_info('t_accounts');
      
    3. Use typeof() to inspect column types on sample rows:

      SELECT typeof(account_id), typeof(created_date), typeof(industry)
      FROM t_accounts
      LIMIT 5;
      
      ResultYou should see the following:

      | typeof(account_id) | typeof(created_date) | typeof(industry) | |---|---|---| | VARCHAR | DATE | VARCHAR | | VARCHAR | DATE | VARCHAR | | VARCHAR | DATE | VARCHAR | | VARCHAR | DATE | VARCHAR | | VARCHAR | DATE | VARCHAR |

  4. Challenge

    Step 4: Create and Query DuckDB Tables

    Create and Query DuckDB Tables

    Goal: Load each raw source into a table so the rest of the lab can reference a clean name instead of a file path.

    Direct file queries are useful for quick inspection. Creating tables gives you stable names to use in joins, reports, and repeated analysis.


    1. Create a table for each source. In a Terminal with an active DuckDB session, run the following commands:

      INSTALL spatial;
      LOAD spatial;
      
      CREATE TABLE accounts AS 
        SELECT * 
        FROM read_csv_auto
        ('data/salesforce/accounts.csv');
      
      CREATE TABLE opportunities AS 
        SELECT * 
        FROM read_csv_auto 
        ('data/salesforce/opportunities.csv');
      
      CREATE TABLE orders AS 
        SELECT * 
        FROM st_read
        ('data/oltp/orders.xlsx');
      
      CREATE TABLE inventory AS 
        SELECT * 
        FROM st_read
        ('data/oltp/inventory.xlsx');
      
      
    2. Ensure all tables are accessible in the same DuckDB session:

      SHOW TABLES;
      
    3. Confirm all tables are registered and return rows:

      SELECT 
        'orders' AS table_name,
        COUNT(*) AS row_count 
      FROM orders
      
      UNION ALL
      
      SELECT 
        'accounts' AS table_name,
        COUNT(*) AS row_count 
      FROM accounts
      
      UNION ALL
      
      SELECT 
        'inventory' AS table_name,
        COUNT(*) AS row_count 
      FROM inventory
      
      UNION ALL
      
      SELECT 
        'opportunities' AS table_name,
        COUNT(*) AS row_count 
      FROM opportunities;
      
      ResultsYou should see the following results:

      | table_name | row_count | |---|---| | orders | 50000 | | accounts | 5000 | | inventory | 500 | | opportunities | 15000 |

  5. Challenge

    Step 5: Write Your First Report

    Write Your First Report

    Goal: Create a CSV file from a DuckDB query result.

    You can use DuckDB to output query results directly to CSV, Parquet, or Excel files using the following syntax:

    COPY(
      QUERY
    ) 
    TO 'File.csv'
    

    The COPY ... TO command writes query results to a file, making it useful for reports, exports, and downstream analysis.

    Note: By default, DuckDB writes files to the current directory where the DuckDB session starts. In this lab, that directory is /home/ps-user/workspace/.

  6. Challenge

    Step 6: Persist Aggregations to File

    Persist Aggregations to File

    Goal: Save curated results so they can be reused or shared without rerunning raw source queries.

    Aggregating data into a summary table gives you a reusable result set for reporting and downstream analysis. Instead of rerunning joins against the raw source tables, you can query the curated table directly.

  7. Challenge

    Step 7: Access Remote Data with DuckDB

    Access Remote Data with DuckDB

    Goal: Use DuckDB's httpfs extension to read data from an HTTP endpoint hosted by a Flask sample application and from object storage.

    With httpfs, DuckDB can query remote files using the same SQL patterns you used for local files.

    ### Conclusion

    In this lab you learned:

    • Installing and activating DuckDB extensions - powerful abstractions that unlock extra functionality, such as querying HTTP endpoints and S3 storage.
    • Querying multiple file types and writing results to dedicated files - one of DuckDB's key advantages: you can start querying raw files with minimal setup or infrastructure.
    • Reading raw files into persisted DuckDB tables - stored inside a .duckdb database file.
    • Querying S3 objects and HTTPS-hosted files directly from DuckDB.

    If you enjoyed this content, consider exploring these DuckDB projects and features:

    • Community Extensions - A list of opensource community developed extensions for DuckDB.
    • DuckDB WASM - A live, in-browser example of DuckDB running via WebAssembly.
    • DuckLake - DuckDB's lakehouse extension.
About the author

Ian is a database enthusiast with particular expertise in the Microsoft data stack. Other technologies that Ian has expert level experience with include ETL, Python development, cloud architecture and systems automation. Ian holds the Microsoft Certified Solutions Expert (MCSE) for data management and analytics as well as the AWS Solution Architect associate certifications. Ian also writes for multiple technical blogs including SuperFunSQL.com which he founded in 2019.

Real skill practice before real-world application

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Learn by doing

Engage hands-on with the tools and technologies you’re learning. You pick the skill, we provide the credentials and environment.

Follow your guide

All labs have detailed instructions and objectives, guiding you through the learning process and ensuring you understand every step.

Turn time into mastery

On average, you retain 75% more of your learning if you take time to practice. Hands-on labs set you up for success to make those skills stick.

Get started with Pluralsight