Featured resource
Tech Upskilling Playbook 2025
Tech Upskilling Playbook

Build future-ready tech teams and hit key business milestones with seven proven plays from industry leaders.

Learn more
  • Labs icon Lab
  • Data
Labs

Build a Box Plot in Python to Visualize Clinical Drug Trial Effectiveness

In this Code Lab, you'll visualize clinical drug trial effectiveness using both Seaborn and Plotly. You'll begin by building classic box plots in Seaborn to explore treatment group outcomes and learn key statistical features like medians, outliers, and IQRs. Then, you'll transition to Plotly to build interactive versions that allow deeper exploratory analysis. By the end, you'll understand both static and interactive visualization approaches used widely across healthcare, data science, and analytics roles.

Labs

Path Info

Level
Clock icon Intermediate
Duration
Clock icon 53m
Last updated
Clock icon Jul 21, 2025

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.

Table of Contents

  1. Challenge

    Build a Basic Box Plot with Seaborn

    Step 1: Build a Basic Box Plot to Explore Data Distributions

    Box plots are a powerful tool for visualizing the distribution of continuous data. They help you quickly identify key statistical features such as medians, quartiles, variability, and potential outliers — all critical when comparing groups or evaluating trends.

    In this Code Lab, you'll learn how to create your first box plots using Seaborn’s sns.boxplot() function. While practicing box plot fundamentals, you’ll work with real-world data from a clinical drug trial to explore treatment effectiveness across patient groups.


    What You’ll Learn in This Step

    • Build a basic box plot using Seaborn.
    • Understand box plot components: median, quartiles, whiskers, and outliers.
    • Assign variables to compare categories visually.

    You'll complete all tasks in a Jupyter notebook rather than in standalone .py files.

    Open the Notebook

    • Navigate to the workspace in the right panel.
    • Open the file: 1-step-one.ipynb.

    info> Important: You must save your notebook (Ctrl/Cmd + S) before clicking Validate. Validation checks the most recent saved checkpoint.

    How to Complete Each Task > * Find the matching code cell labeled `Task 1.1`, `Task 1.2`, etc. > * Write your code directly in that cell. > * Run the cell using the **Run** button or by pressing `Shift+Enter`. > * Save your progress using the **Save icon** or **File > Save and Checkpoint**. > * You do not need to use the terminal, create additional files, or call `plt.savefig()`. All code and output will appear inline.
    ### Create a Basic Box Plot

    Box plots help visualize how numerical data is distributed and compared across categories. They display key summary statistics including:

    • Median: The center line inside the box
    • Interquartile Range (IQR): The top and bottom of the box
    • Whiskers: The range of most of the data
    • Outliers: The dots beyond the whiskers

    Seaborn makes it easy to build box plots using the sns.boxplot() function. This function requires you to specify:

    • x: The column containing your categorical groupings
    • y: The column containing your numeric values to summarize
    • data: The full dataframe containing your dataset

    In this task, you'll build your very first box plot using Seaborn. The dataset has already been loaded for you.


    You’ll plot:

    • treatment_group on the x-axis
    • effectiveness_score on the y-axis

    This will allow you to visualize how the different treatment groups performed during the clinical trial.

  2. Challenge

    Customize Box Plot Styling and Grouping with Seaborn

    Step 2: Customize Box Plot Styling and Grouping with Seaborn

    Now that you’ve built your first basic box plot, it’s time to explore how to customize its appearance. Being able to control the visual styling of box plots is critical for making them both more readable and more insightful.


    What You’ll Learn in This Step

    • Apply Seaborn themes with set_theme().
    • Apply color palettes with set_palette().
    • Customize box width, whiskers, and outlier markers using plot arguments.
    • Add subgroups using the hue parameter for grouped box plots.

    These are common tasks you’ll perform when building box plots for presentations, reports, and exploratory data analysis — especially in fields like healthcare and clinical trials where subgroup comparison is essential.

    You'll complete all tasks in a Jupyter notebook rather than in standalone .py files.

    Open the Notebook

    • Navigate to the workspace in the right panel.
    • Open the file: 2-step-two.ipynb.

    info> Important: You must save your notebook (Ctrl/Cmd + S) before clicking Validate. Validation checks the most recent saved checkpoint.

    How to Complete Each Task > * Find the matching code cell labeled `Task 2.1`, `Task 2.2`, etc. > * Write your code directly in that cell. > * Run the cell using the **Run** button or by pressing `Shift+Enter`. > * Save your progress using the **Save icon** or **File > Save and Checkpoint**. > * You do not need to use the terminal, create additional files, or call `plt.savefig()`. All code and output will appear inline.
    ### Controlling Seaborn Plot Styling with Themes and Palettes

    Seaborn’s set_theme() function controls many default settings that affect the appearance of all plots you create. This includes:

    • Background style
    • Gridlines
    • Axis tick styling
    • Context scaling (for presentations vs papers)

    You typically call set_theme() once at the beginning of your notebook to apply global styling.


    set_theme() Syntax

    sns.set_theme(
    	style=None, 
    	palette=None, 
    	context=None, 
    	font_scale=None, 
    	rc=None)
    
    • style: Controls background and gridlines (Common values: "whitegrid", "darkgrid", "white", "dark", "ticks")
    • palette: Sets default colors for all categorical data
    • context: Adjusts scaling (default: "notebook" and other options: "paper", "talk", "poster")
    • font_scale: Multiplier to control overall font size
    • rc: Dictionary for fine-grained Matplotlib overrides

    Example

    sns.set_theme(style="whitegrid", palette="pastel", context="notebook", font_scale=1.2)
    

    set_palette() Function

    While you can pass palette= directly to set_theme(), you can also call set_palette() separately:

    sns.set_palette("pastel")
    

    This changes the colors applied to categorical variables.


    Here is a quick rule of thumb for most data visualization tasks:

    • Use set_theme() for global appearance.
    • Use set_palette() if you want to adjust colors separately after setting the theme. ### Customizing Box Appearances and Outlier Markers

    Box plots in Seaborn aren’t just fixed visuals — you can control many aspects of their appearance directly through function parameters. This allows you to fine-tune the plot for clarity, aesthetics, or specific analytic goals.


    Box Width (width)

    The width parameter controls how wide each box appears. Narrower boxes can be useful when comparing many categories.

    sns.boxplot(..., width=0.5)
    
    • Default width is 0.8.
    • Range is typically between 0.2 and 1.0 depending on spacing.

    Whisker Length (whis)

    The whis parameter controls how far the whiskers extend beyond the box.

    sns.boxplot(..., whis=1.5)
    
    • The default whis=1.5 means whiskers extend to 1.5 times the interquartile range (IQR).
    • Lower values produce shorter whiskers, revealing more outliers.
    • Higher values include more data within the whiskers.

    Outlier Marker Styling (flierprops)

    Outliers (or “fliers”) are drawn using Matplotlib’s scatter plot style. You can fully customize how outlier markers look using the flierprops argument.

    flier_props = dict(marker='o',
                       markerfacecolor='red',
                       markersize=6,
                       linestyle='none')
    
    sns.boxplot(..., flierprops=flier_props)
    
    • marker: Shape of the outlier marker (such as 'o', 'x', '^', etc.)
    • markerfacecolor: Fill color
    • markersize: Size of outlier markers
    • linestyle='none': Disables connecting lines

    Note: If you omit flierprops, Seaborn uses default outlier styling. ### Grouping Categorical Data with the Hue Parameter

    Box plots often become even more powerful when you compare multiple subgroups within each category. In Seaborn, the hue parameter allows you to split each box into subgroups based on a second categorical variable.

    Instead of drawing a single box for each main category, Seaborn will draw multiple boxes side-by-side for each subgroup level.


    Hue Syntax

    sns.boxplot(x="category_col",
                y="numeric_col",
                hue="subgroup_col",
                data=df)
    
    • hue must be set to a column that contains a second categorical variable.
    • The unique values in the hue column define how many subgroup boxes get drawn within each main category.

    Example

    sns.boxplot(x="treatment_group",
                y="effectiveness_score",
                hue="gender",
                data=df)
    

    This would draw multiple boxes for each treatment_group split by gender.


    Behind the Scenes

    • Seaborn automatically assigns different colors (from the current palette) to each subgroup level.
    • Grouped box plots help you quickly evaluate whether different populations respond differently across your main categories.
  3. Challenge

    Build and Customize Interactive Box Plots with Plotly

    Step 3: Build and Customize Interactive Box Plots with Plotly

    So far, you’ve created static box plots using Seaborn. While static plots are useful for many tasks, sometimes you need more interactivity when exploring or presenting your data. That’s where Plotly comes in.

    In this step, you’ll use Plotly Express to build interactive box plots. With just a few lines of code, you can:

    • Hover over data points to see exact values.
    • Add subgroup comparisons automatically.
    • Dynamically adjust orientation, ordering, and more.

    Plotly integrates well with Pandas DataFrames and gives you immediate access to highly interactive visuals — perfect for clinical trial data exploration, presentations, or dashboards.



    What You’ll Learn in This Step

    • Create your first interactive box plot using px.box().
    • Control the orientation (horizontal or vertical) of your plot.
    • Customize category ordering using category_orders to control how groups appear.


    You'll complete all tasks in a Jupyter notebook rather than in standalone .py files.

    Open the Notebook

    • Navigate to the workspace in the right panel.
    • Open the file: 3-step-three.ipynb.

    info> Important: You must save your notebook (Ctrl/Cmd + S) before clicking Validate. Validation checks the most recent saved checkpoint.

    How to Complete Each Task > * Find the matching code cell labeled `Task 3.1`, `Task 3.2`, etc. > * Write your code directly in that cell. > * Run the cell using the **Run** button or by pressing `Shift+Enter`. > * Save your progress using the **Save icon** or **File > Save and Checkpoint**. > * You do not need to use the terminal, create additional files, or call `plt.savefig()`. All code and output will appear inline.
    ### Building Interactive Box Plots with Plotly Express

    plotly.express.box() creates interactive box plots directly from your DataFrame in a single function call. Compared to Seaborn, Plotly gives you full interactivity by default — no additional configuration required.

    You can immediately:

    • Hover to see exact values.
    • Zoom, pan, and export your plots.
    • Display multiple subgroup comparisons.

    px.box() Syntax

    px.box(
        data_frame,
        x=None,
        y=None,
        color=None,
        facet_col=None,
        orientation=None,
        points=None,
        category_orders=None,
        title=None,
        labels=None
    )
    

    Key Parameters You'll Use

    • data\_frame: The full Pandas DataFrame
    • x: Categorical column for group comparisons
    • y: Numeric column for measurement values
    • color: Optional second categorical column to split colors
    • orientation: 'v' or 'h' to control vertical/horizontal
    • points: Controls whether individual data points are shown ("all", "outliers", "suspectedoutliers", "false")

    Controlling Box Plot Orientation in Plotly

    By default, box plots in Plotly Express are drawn vertically — with categories on the x-axis and values on the y-axis. But sometimes, a horizontal orientation makes your chart easier to read, especially if:

    • Category labels are long
    • You have many groups to display
    • The numeric range fits better horizontally

    Plotly makes this change simple using the orientation parameter inside px.box().



    orientation Parameter

    orientation = "v"  # vertical (default)
    orientation = "h"  # horizontal
    
    • "v" (vertical): Categories on x-axis, values on y-axis
    • "h" (horizontal): Categories on y-axis, values on x-axis


    Key Rule

    When switching orientations, you also swap your x and y arguments.

    # Vertical (default)
    px.box(df, x="category_col", y="value_col")
    
    # Horizontal
    px.box(df, x="value_col", y="category_col", orientation="h")
    

    Plotly does not automatically infer which column is numeric — you must place your numeric column accordingly when changing orientation. ### Customizing Category Order with Plotly Express

    By default, Plotly determines category order based on how they appear in your dataset. This might work for simple plots, but often you’ll want full control over how groups are displayed:

    • Logical sorting (e.g. control group before treatment groups)
    • Clinical priority
    • Presentation-ready ordering

    Plotly Express allows you to fully control this order using the category_orders parameter.



    category_orders Parameter

    category_orders = {
        "column_name": ["first_value", "second_value", "third_value", ...]
    }
    
    • The key is the column you want to control.
    • The list defines the exact order of categories.
    • This ensures categories appear exactly in the order you specify, regardless of how they're ordered in the DataFrame.


    Important

    • category_orders accepts multiple columns if you need to control order for multiple dimensions (useful in faceting).
    • Category names inside the list must exactly match the text values in your dataset.
    • Category names are case-sensitive. "Control" and "control" are treated as different categories.

    Controlling Point Display in Plotly Box Plots

    Plotly box plots have the ability to show individual data points along with the boxes themselves. This can be helpful for:

    • Showing the actual distribution of your observations
    • Highlighting outliers
    • Revealing clusters or gaps in the data

    You control how points are displayed using the points argument inside px.box().

    points Options

    | Value | What It Does | | --------------------- | --------------------------------------------------------- | | "outliers" | Show only points beyond the whiskers (default). | | "all" | Show all individual data points. | | "suspectedoutliers" | Show only points beyond 1.5×IQR, but not extreme outliers. | | False | Do not show any individual points. |

    Example

    points="all"
    

    This shows every observation as a jittered dot.

    Tip: Showing all points can be helpful for presentations, but can look cluttered if you have very large datasets. ### Refining Box Width, Jitter, and Marker Opacity

    When you display all data points on a box plot, you can improve readability by customizing:

    • Box Width: Controls the width of each box relative to spacing between categories
    • Jitter: Adds horizontal spread to individual points so they don’t overlap
    • Opacity: Makes points partially transparent, reducing visual clutter when many observations overlap

    These parameters help create clearer, presentation-ready visuals — especially when you have lots of data.

    Parameters to Use:

    | Parameter | What It Does | | --------- | ----------------------------------------------------------------- | | boxmode | Controls whether boxes are grouped or stacked | | jitter | Controls how far apart individual points are spread horizontally | | opacity | Controls how transparent the points are |

    Tip: A jitter between 0.2 and 0.4 is usually enough to reduce overlap without creating noise. ### Applying Titles, Labels, and Styling for Presentation

    Once your plot shows all points and adjusted spacing, the last step is to prepare it for communication. This includes:

    • Titles: Provide clear context about what the chart shows
    • Axis Labels: Rename axes to be more descriptive and readable
    • Templates: Apply a cohesive style for fonts, gridlines, and colors

    Common Arguments for Presentation Styling

    | Parameter | Purpose | | ---------- | --------------------------------------- | | title | Sets the main chart title | | labels | Re-maps column names to friendly labels | | template | Applies a built-in visual style |

  4. Challenge

    Add Advanced Interactivity to Plotly Box Plots

    Step 4: Add Advanced Interactivity to Plotly Box Plots

    You've built interactive box plots using Plotly Express — now it's time to take advantage of some of Plotly's more advanced interactivity features.

    These options allow you to:

    • Control exactly what appears in tooltips when users hover over data points
    • Display additional fields directly in the hover text
    • Customize how quartiles are calculated
    • Apply built-in visual templates for consistent presentation styling

    These features are widely used in interactive dashboards, presentations, and exploratory data apps — helping users understand patterns more deeply by surfacing contextual details on demand.


    What You’ll Learn in This Step

    • Customize hover tooltips using hovertemplate for full text control.
    • Add multiple fields to the hover data using hover_data.
    • Control quartile calculation behavior with quartilemethod.
    • Apply pre-built Plotly templates to quickly style your charts.

    You'll complete all tasks in a Jupyter notebook rather than in standalone .py files.

    Open the Notebook

    • Navigate to the workspace in the right panel.
    • Open the file: 4-step-four.ipynb.

    info> Important: You must save your notebook (Ctrl/Cmd + S) before clicking Validate. Validation checks the most recent saved checkpoint.

    How to Complete Each Task > * Find the matching code cell labeled `Task 4.1`, `Task 4.2`, etc. > * Write your code directly in that cell. > * Run the cell using the **Run** button or by pressing `Shift+Enter`. > * Save your progress using the **Save icon** or **File > Save and Checkpoint**. > * You do not need to use the terminal, create additional files, or call `plt.savefig()`. All code and output will appear inline.
    ### Customize Box Plot Tooltips with `hovertemplate`

    When you hover over a box plot in Plotly, a tooltip appears with information about the data point. This tooltip is useful, but you may want more control over what appears and how it's displayed.

    The hovertemplate property lets you define the entire content of the tooltip using placeholder variables.


    hovertemplate Syntax:

    fig.update_traces(
        hovertemplate="Label: %{x}<br>Value: %{y}<extra></extra>"
    )
    
    • %{x} and %{y} insert the values from the x and y axes.
    • <br> adds a line break.
    • <extra></extra> removes the default trace label (which you often don’t need in box plots).

    Supported Placeholders
    • %{x}: Category label or axis value
    • %{y}: Numeric value or axis value
    • %{color}: Grouping label (if color is applied)
    • %{customdata[i]}: Value from a custom data array (advanced)
    • <extra></extra>: Hides trace label in the tooltip
    ### Add Extra Fields to Hover Tooltips with `hover_data`

    While hovertemplate gives you full control over tooltip formatting, sometimes you simply want to include extra data fields without customizing every line. That’s where the hover_data parameter comes in.

    This parameter lets you specify which columns from your dataset should be included in the tooltip — and whether each one is shown or hidden.


    hover_data Syntax

    px.box(
        data_frame=df,
        x="group_col",
        y="value_col",
        hover_data=["field1", "field2", "field3"]
    )
    
    • Provide a list of column names to include in the tooltip.
    • Columns must exist in your DataFrame.
    • Plotly will automatically format and display them as additional lines.

    Advanced Usage with Formatting

    You can pass a dictionary instead of a list to control formatting and visibility:

    hover_data={
        "field1": True,     # Show this field
        "field2": False,    # Hide but keep data accessible
        "field3": ":.2f"    # Format numbers to two decimals
    }
    
    ### Control Quartile Calculation and Apply Built-in Templates

    In a box plot, the box represents the middle 50% of values — between the first and third quartile. The whiskers extend outward to capture variability beyond that.

    Plotly lets you control how these boundaries are calculated with the quartilemethod argument.


    quartilemethod Options

    quartilemethod="linear"      # Default method (interpolated)
    quartilemethod="inclusive"   # Includes data endpoints
    quartilemethod="exclusive"   # Excludes endpoints for stricter bounds
    

    These options change how the box edges and whiskers are computed — which affects the shape and spread of the box plot. Most of the time, you’ll use "linear", but other methods are helpful in regulated or statistical settings.


    Plotly Templates

    Plotly also supports built-in themes to instantly change the look of your chart — background, fonts, gridlines, and color cycles.

    Use the template argument inside your px.box() call to apply one:

    template="plotly_dark"
    

    Popular Built-in Templates
    • "plotly" (default)
    • "plotly_white"
    • "plotly_dark"
    • "ggplot2"
    • "seaborn"
    • "simple_white"
    • "presentation"
  5. Challenge

    Plotly Advanced Layouts and Presentation

    Step 5: Customize Layout and Axis Behavior in Plotly

    Once your box plots are built and styled, you may need to refine their layout — especially for presentations, dashboards, or scientific reports. This step teaches you how to go beyond default styles and take full control over how your charts behave and appear.

    You’ll learn how to annotate, arrange, and fine-tune axis behavior in ways that increase clarity and impact.


    What You’ll Learn in This Step

    • Add annotations to highlight key insights.
    • Use facet panels to compare distributions side by side.
    • Adjust axis ranges for more readable scaling.
    • Apply multiple layout customizations in one plot.

    You'll complete all tasks in a Jupyter notebook rather than in standalone .py files.

    Open the Notebook

    • Navigate to the workspace in the right panel.
    • Open the file: 5-step-five.ipynb

    info> Important: You must save your notebook (Ctrl/Cmd + S) before clicking Validate. Validation checks the most recent saved checkpoint.

    How to Complete Each Task
    • Find the matching code cell labeled Task 5.1, Task 5.2, etc.
    • Write your code directly in that cell.
    • Run the cell using the Run button or by pressing Shift+Enter.
    • Save your progress using the Save icon or File > Save and Checkpoint.

    You do not need to use the terminal or create additional files. All code and output will appear inline.

    ### Add Visual Annotations to Emphasize Key Insights

    Annotations help you call out specific data points, groupings, or trends directly on a chart. This is especially useful in dashboards, presentations, or scientific communication where clarity is critical.

    Plotly supports multiple ways to add annotations, but the most flexible is add_annotation() — a method of fig.update_layout().


    Common Use Cases for Annotations:

    • Highlighting the median or outlier in a group
    • Calling attention to a specific treatment group
    • Labeling a notable range in your plot

    add_annotation() Syntax

    fig.update_layout(
      annotations=[
        dict(
          x=...,          # X coordinate in data or pixel
          y=...,          # Y coordinate in data or pixel
          text="Label",   # Text to display
          showarrow=True, # Whether to draw an arrow
          arrowhead=1     # Style of the arrowhead
        )
      ]
    )
    

    Annotation Position Tips
    • x and y should match values from your data (e.g. "Group A", 4.5).
    • Use xref="x" and yref="y" to pin to data coordinates.
    • Use xref="paper" and yref="paper" to anchor by plot size (from 0 to 1).
    • Arrows are optional — they help when pointing at small outliers.
    ### Split Your Plot into Panels by Category with Facets

    Sometimes it’s better to show multiple side-by-side plots rather than combine everything into one. Plotly lets you do this using facets — mini plots split by the values in a column.

    This is ideal for visually comparing subgroups using consistent axes and styling.


    How Faceting Works

    Facets are created using:

    • facet_col=: Splits into vertical panels
    • facet_row=: Splits into horizontal panels

    You provide the name of a categorical column, and Plotly makes a subplot for each unique value.


    Facet Plot Parameters
    facet_col="columnName"
    facet_row="columnName"
    facet_col_wrap=2
    
    • facet_col: Assigns one panel per category along columns
    • facet_row: Does the same, but along rows
    • facet_col_wrap: Controls how many go per row before wrapping
    ### Adjust Axis Ranges for Better Focus and Readability

    By default, Plotly automatically sets the axes to fit your data. But sometimes, auto-scaling can distract or distort how data is perceived — especially if you want to zoom in on key ranges or create consistent axes across multiple charts.

    This is where range_x and range_y come in handy.


    What You Can Control

    With range_x and range_y, you can:

    • Zoom into a specific value range (e.g., [2, 8]).
    • Create consistent scaling across multiple charts.
    • Limit visual clutter when outliers are irrelevant.

    Axis Range Parameters
    range_y=[lower_limit, upper_limit]
    range_x=[lower_limit, upper_limit]
    

    These parameters accept a list of two numeric values.

    Example context only:

    range_y=[3, 7]
    
    ### Stack Multiple Layout Options in One Plot

    As your visualizations evolve, you’ll often want to combine multiple layout-level controls into a single figure. This helps align your plot with its communication purpose — whether for dashboards, reports, or presentations.

    In this final task, you’ll apply everything you’ve learned to produce a polished, custom box plot.


    What Can Be Combined

    Plotly Express lets you pass layout arguments directly into px.box(), including:

    • Axis range: range_y=[...], range_x=[...]
    • Figure style: template="...", title="..."
    • Axis behavior: points=, notched=, etc.

    Example Parameters You Might Combine
    range_y=[2, 8]
    template="plotly_dark"
    title="Trial Response by Group"
    points="all"
    

    Note: This is illustrative, not prescriptive.

    ## Lab Summary: Review What You Built

    You’ve now completed a full data visualization pipeline using both Seaborn and Plotly Express — two very powerful libraries for visualizing statistical distributions in Python.

    Across this lab, you:

    • Learned how box plots reveal medians, quartiles, outliers, and overall spread
    • Customized chart styling and layout using Seaborn themes and Plotly templates
    • Used interactivity to explore subgroup effects and annotate key insights
    • Practiced visual storytelling techniques relevant to real-world healthcare and analytics workflows

    Explore the Final Chart

    If you'd like to see how all of these techniques can be combined in a professional-quality chart, open the file below:

    File: lab-recap.ipynb

    This recap notebook shows a fully customized interactive box plot that combines color, layout, interactivity, and styling — all using the same dataset you've worked with throughout the lab.

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.