Featured resource
2025 Tech Upskilling Playbook
Tech Upskilling Playbook

Build future-ready tech teams and hit key business milestones with seven proven plays from industry leaders.

Check it out
  • Lab
    • Libraries: If you want this lab, consider one of these libraries.
    • AI
Labs

Introduction to Decision Trees

In this lab, you’ll practice creating a decision tree. When you’re finished, you’ll have a basic decision tree and a fundamental understanding of use cases for them.

Lab platform
Lab Info
Level
Intermediate
Last updated
Mar 05, 2026
Duration
20m

Contact sales

By clicking submit, you agree to our Privacy Policy and Terms of Use, and consent to receive marketing emails from Pluralsight.
Table of Contents
  1. Challenge

    Introduction to Decision Trees

    Introduction to Decision Trees

    Welcome to the "Introduction to Decision Trees" lab! This lab is designed to give you a foundational understanding of Decision Trees, including use cases, interpretability, and the advantages and disadvantages of decision trees.

    A Decision Tree is a supervised learning algorithm used for both classification and regression,and for this lab you will focus on a classification decision tree, which models decisions and their possible consequences in a tree-like structure. Decision trees split data into smaller subsets based on feature conditions, making it easy to interpret while effectively capturing patterns in the data.

    Learning Objectives

    • Understand use cases of decision trees
    • Learn how to implement, train, and measure Decision tree metrics
    • Understand ethical considerations and interpretability of decision tree models
  2. Challenge

    Creation of Classification Data

    Creation of Classification Data

    For the creation of data in this lab you will use the code provided below, as the synthetic data creation is not the key focus of this lab. The data generated will be features related to classification problems and align well with decision trees and random forests, which excel at high interpretability classification problems.

    X, y = make_classification(
        n_samples=500,  # Number of samples
        n_features=10,  # Number of features
        n_informative=8,  # Number of informative features
        n_redundant=2,  # Number of redundant features
        n_classes=2,  # Binary classification
        random_state=42
    )
    

    Once the synthetic data is generated ensure the data is properly split the data into test and train with a random seed established.

  3. Challenge

    Setting Up Decision Tree Models

    For this step you will be determining the hyperparameters for the decision tree model.

    Setting Up Decision Tree Models

    For initializing your decision tree model you will be using the DecisionTreeClassifier from the scikit learn library. For this basic demonstration you will only be using 3 key parameters:

    criterion: consists of three algorithms for classification.

    1. gini impurity measures the probability of misclassifying a randomly chosen element from a dataset if it were randomly labeled according to the class distribution.
      • Faster to compute than entropy because it avoids logarithmic calculations.
      • Prefers larger class splits, leading to a more balanced tree.
    2. entropy derived from Shannon's Information Theory, quantifies the amount of disorder (uncertainty) in a dataset
      • Measures "purity" more rigorously than Gini.
      • More computationally expensive due to the logarithm.
      • Results in deeper trees since it may favor splits with multiple smaller classes.
    3. log_loss (or Binary Cross-Entropy) is primarily used in probabilistic classification, where predictions are given as probabilities rather than hard class labels.
      • Penalizes incorrect probabilistic predictions more heavily.
      • Unlike Gini and Entropy, Log Loss works with predicted probabilities rather than discrete class labels.

    max_depth: The maximum depth of the tree. There are several other hyperparameters to limit tree depth but for this example we will simply establish the max_depth. Ensuring a proper depth for your decision tree is important as a shallow tree will under fit data and a deep tree will over fit the data, causing failure for generalization. Within this lab a max_depth under 6 would be acceptable, feel free to adjust to change the visualization of the tree.

    Note: In practice the best way to determine optimal depth is by using Grid Search with Cross-Validation.

    random_state any integer value you desire to ensure repeatability within training.

  4. Challenge

    Training and Evaluating

    Training and Evaluating Decision Trees

    Now that your data is collected and models hyperparameters established we can train and fit the model with the two lines provided below.

    dt_model.fit(X_train, y_train)
    y_pred = dt_model.predict(X_test)
    

    Key Measurements:

    Decision trees can be measured with accuracy and the classification report functionality.

    • Precision – How many predicted positives are actually positive?
    • Recall – How many actual positives were correctly predicted?
    • F1-score – Balance between precision and recall.
    • Accuracy – Overall correctness (not always the best metric for imbalanced datasets).
    • Support – The number of instances per class.
    • Macro vs. Weighted Average – Choose based on whether you want equal importance or class-weighted evaluation.

    Below is a simple list explaining general scenarios on which metric should be valued the most and why.

    1. When False Positives are Costly (e.g., spam detection, fraud detection)
      • Key Metric: Precision
      • Reason: Avoids mislabeling negatives as positives.
    2. When False Negatives are Costly (e.g., medical diagnosis, security alerts)
      • Key Metric: Recall
      • Reason: Ensures most actual positive cases are correctly identified.
    3. When Both False Positives and False Negatives Matter Equally (e.g., general classification tasks)
      • Key Metric: F1-score
      • Reason: Balances precision and recall for a fair evaluation.
    4. When Dealing with an Imbalanced Dataset (e.g., fraud detection with few fraud cases)
      • Key Metric: Weighted Average
      • Reason: Accounts for class distribution to prevent bias toward majority classes.
    5. When All Classes Should Be Treated Equally (e.g., multi-class problems where each class is equally important)
      • Key Metric: Macro Average
      • Reason: Gives equal importance to each class, regardless of frequency.
  5. Challenge

    Interpreting and Visualizing Decision Trees

    Visualizing and Interpreting Decision Trees

    To visualize the tree you can implement the code below. Properly named features will increase readability and assist with understanding repeatability of decisions made by the AI model.

    plt.figure(figsize=(12, 6))
    plot_tree(dt_model, feature_names=[f'Feature {i}' for i in range(X.shape[1])], class_names=['Class 0', 'Class 1'], filled=True)
    plt.title("Decision Tree Visualization")
    plt.show()
    

    Feel free to alter max depth and adjust other hyperparameters to see how the visualization and weights of the tree change.

About the author

I am, Josh Meier, an avid explorer of ideas an a lifelong learner. I have a background in AI with a focus in generative AI. I am passionate about AI and the ethics surrounding its use and creation and have honed my skills in generative AI models, ethics and applications and thrive to improve in my understanding of these models.

Real skill practice before real-world application

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Learn by doing

Engage hands-on with the tools and technologies you’re learning. You pick the skill, we provide the credentials and environment.

Follow your guide

All labs have detailed instructions and objectives, guiding you through the learning process and ensuring you understand every step.

Turn time into mastery

On average, you retain 75% more of your learning if you take time to practice. Hands-on labs set you up for success to make those skills stick.

Get started with Pluralsight