Skip to content

Contact sales

By filling out this form and clicking submit, you acknowledge our privacy policy.
  • Labs icon Lab
  • A Cloud Guru
Google Cloud Platform icon
Labs

Generate a Complete Report

In this lab, graphs are created from data sliced from Titanic survivability CSV files. The PDF of the notebook for this lab is [here.](https://github.com/linuxacademy/content-python-for-database-and-reporting/blob/master/pdf/hol_5_1_l_solution.pdf)

Google Cloud Platform icon
Labs

Path Info

Level
Clock icon Intermediate
Duration
Clock icon 1h 30m
Published
Clock icon Mar 13, 2020

Contact sales

Table of Contents

  1. Challenge

    Start Jupyter Notebook Server and Access on Your Local Machine

    Connecting to the Jupyter Notebook Server

    Make sure that you have activated the virtual environment!

    1. To activate the virtual environment:
    conda activate base
    
    1. To start the server run the following:
    python get_notebook_token.py
    

    This is a simple script that starts the Jupyter notebook server and sets it to continue to run outside of the terminal.

    Note: On the terminal is a token, please copy this and save it to a text file on your local machine.

    On Your Local Machine

    1. In a terminal window, enter the following:
    ssh -N -L localhost:8087:localhost:8086 cloud_user@<the public IP address of the Playground server>
    

    It will ask you for your password; this is the password you used to log in to the Playground remote server. Leave this terminal open. It will appear nothing has happened, but it must remain open while you use the Jupyter Notebook server in this session.

    1. In the browser of your choice, enter the following address:

    http://localhost:8087

    This will open a Jupyter Notebook site that asks for the token you copied from the remote server.

  2. Challenge

    Import Required Packages and Create Dataframe From File

    Titanic Data: Factors Affecting Survivability

    This data was collected from a web search. It is available from many different organizations. The data provides specific data about passengers on the Titanic and whether they survived the disaster or not.

    The various data available is defined as:

    • PassengerId - Indexed starting at 1
    • Survived - Survival (0 = No; 1 = Yes)
    • Pclass - Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd)
    • Name - Name
    • Sex - Sex
    • Age - Age
    • SibSp - Number of Siblings/Spouses Aboard
    • Parch - Number of Parents/Children Aboard
    • Ticket - Ticket Number
    • Fare - Passenger Fare
    • Cabin - Cabin
    • Embarked - Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)

    The questions we are asking:

    1. What part did age play?
    2. What part did gender play?
    3. Did the passenger class make a difference?

    Load the CSV Data Into a Dataframe

    import matplotlib.pyplot as plt
    import pandas as pd
    
    %matplotlib inline
    
    titanic_df = pd.read_csv('titanic.csv')
                           
    titanic_df.head()
    
  3. Challenge

    Examine the Effect Age Had on Survivability

    Examine The Effect of Age on Survivability

    • Under 12
    • 13 - 24
    • 25 - 49
    • 50 - 74
    • 75 and Older
    #### Under 12
    passengers_under_12 = titanic_df[titanic_df.Age < 12]
    passengers_under_12_survived = passengers_under_12[passengers_under_12.Survived == 1]
    passengers_under_12_percent_survived = passengers_under_12_survived.Age.count() / passengers_under_12.Age.count()
    
    # Under 13 - 24
    passengers_13_to_24 = titanic_df[(titanic_df.Age >= 13) & (titanic_df.Age < 25)]
    passengers_13_to_24_survived = passengers_13_to_24[passengers_13_to_24.Survived == 1]
    passengers_13_to_24_percent_survived = passengers_13_to_24_survived.Age.count() / passengers_13_to_24.Age.count()
    
    # 25 to 49
    passengers_25_to_49 = titanic_df[(titanic_df.Age >= 25) & (titanic_df.Age < 50)]
    passengers_25_to_49_survived = passengers_25_to_49[passengers_25_to_49.Survived == 1]
    passengers_25_to_49_percent_survived = passengers_25_to_49_survived.Age.count() / passengers_25_to_49.Age.count()
    
    # 50 to 74
    passengers_50_to_74 = titanic_df[(titanic_df.Age >= 50) & (titanic_df.Age < 74)]
    passengers_50_to_74_survived = passengers_50_to_74[passengers_50_to_74.Survived == 1]
    passengers_50_to_74_percent_survived = passengers_50_to_74_survived.Age.count() / passengers_50_to_74.Age.count()
    
    # 75 and over
    passengers_75_over = titanic_df[titanic_df.Age > 74]
    passengers_75_over_survived = passengers_75_over[passengers_75_over.Survived == 1]
    passengers_75_over_percent_survived = passengers_75_over_survived.Age.count() / passengers_75_over.Age.count()
    
    print(f'Under 12:\t{passengers_under_12.Age.count()} - {passengers_under_12_percent_survived}')
    print(f'13 - 24:\t{passengers_13_to_24.Age.count()} - {passengers_13_to_24_percent_survived}')
    print(f'25 - 49:\t{passengers_25_to_49.Age.count()} - {passengers_25_to_49_percent_survived}')
    print(f'50 - 74:\t{passengers_50_to_74.Age.count()} - {passengers_50_to_74_percent_survived}')
    print(f'75 & Over:\t{passengers_75_over.Age.count()} - {passengers_75_over_percent_survived}')
    
    
    # Show data as a bar chart
    groups = ('Under 12', '13 - 24', '25 - 49', '50 - 74', '75 & Over')
    percentages = [0.57, 0.37, 0.41, 0.36, 1]
    plt.bar(groups, percentages, align='center', alpha=0.5)
    plt.ylabel("Percent Survived")
    plt.title("Titanic Survivablity by Age Group")
    

    This suggests that children under 13 may have been given some preferential treatment for lifeboats. However, it is not clear if survivability is only those that died in the event. It may be that some of the children may have been more susceptible to environmental factors, such as temperature, and died in the lifeboat.

    Since there was only one passenger in the 75 & Over group, the survivability of that group is not useful and should not be considered.

  4. Challenge

    Examine the Effect Gender Had on Survivability

    Examine the Effect of Gender on Survivability

    #### Male
    passengers_male = titanic_df[titanic_df.Sex == "male"]
    passengers_male_survived = passengers_male[passengers_male.Survived == 1]
    passengers_male_percent_survived = passengers_male_survived.Sex.count() / passengers_male.Sex.count()
    
    #### Female
    passengers_female = titanic_df[titanic_df.Sex == "female"]
    passengers_female_survived = passengers_female[passengers_female.Survived == 1]
    passengers_female_percent_survived = passengers_female_survived.Sex.count() / passengers_female.Sex.count()
    
    print(f'Male:\t{passengers_male.Sex.count()} - {passengers_male_percent_survived}')
    print(f'Female:\t{passengers_female.Sex.count()} - {passengers_female_percent_survived}')
    
    # Show data as a bar chart
    groups = ('Male', 'Female')
    percentages = [0.18, 0.74]
    plt.bar(groups, percentages, align='center', alpha=0.5)
    plt.ylabel("Percent Survived")
    plt.title("Titanic Survivablity by Gender")
    

    It is obvious female passengers were given preference over male passengers for lifeboats. It would be interesting to break down the male survivors by age group. Hypothesis: Younger males survived at a higher rate.

  5. Challenge

    Examine the Effect Passenger Class Had on Survivability

    Examine the Effect of Passenger Class on Survivability

    #### Passenger Class 1
    passengers_class_1 = titanic_df[titanic_df.Pclass == 1]
    passengers_class_1_survived = passengers_class_1[passengers_class_1.Survived == 1]
    passengers_class_1_percent_survived = passengers_class_1_survived.Pclass.count() / passengers_class_1.Pclass.count()
    
    #### Passenger Class 2
    passengers_class_2 = titanic_df[titanic_df.Pclass == 2]
    passengers_class_2_survived = passengers_class_2[passengers_class_2.Survived == 1]
    passengers_class_2_percent_survived = passengers_class_2_survived.Pclass.count() / passengers_class_2.Pclass.count()
    
    #### Passenger Class 3
    passengers_class_3 = titanic_df[titanic_df.Pclass == 3]
    passengers_class_3_survived = passengers_class_3[passengers_class_3.Survived == 1]
    passengers_class_3_percent_survived = passengers_class_3_survived.Pclass.count() / passengers_class_3.Pclass.count()
    
    
    print(f'Class 1:\t{passengers_class_1.Pclass.count()} - {passengers_class_1_percent_survived}')
    print(f'Class 2:\t{passengers_class_2.Pclass.count()} - {passengers_class_2_percent_survived}')
    print(f'Class 3:\t{passengers_class_3.Pclass.count()} - {passengers_class_3_percent_survived}')
    
    
    # Show data as a bar chart
    groups = ('Class 1', 'Class 2', 'Class 3')
    percentages = [0.63, 0.47, 0.24]
    plt.bar(groups, percentages, align='center', alpha=0.5)
    plt.ylabel("Percent Survived")
    plt.title("Titanic Survivablity by Passenger Class")
    

    It is clear that Class 1 passengers were more likely to be saved, whether they were closer to the lifeboats or a genuine preference cannot be determined. Once again, looking at this data by age and gender would be interesting for further study.

    This is not an exhaustive review of the data available, but a simple review based on three independent attributes. Much more data could be analyzed for deeper, more specific ideas of how the surviving passengers were selected.

The Cloud Content team comprises subject matter experts hyper focused on services offered by the leading cloud vendors (AWS, GCP, and Azure), as well as cloud-related technologies such as Linux and DevOps. The team is thrilled to share their knowledge to help you build modern tech solutions from the ground up, secure and optimize your environments, and so much more!

What's a lab?

Hands-on Labs are real environments created by industry experts to help you learn. These environments help you gain knowledge and experience, practice without compromising your system, test without risk, destroy without fear, and let you learn from your mistakes. Hands-on Labs: practice your skills before delivering in the real world.

Provided environment for hands-on practice

We will provide the credentials and environment necessary for you to practice right within your browser.

Guided walkthrough

Follow along with the author’s guided walkthrough and build something new in your provided environment!

Did you know?

On average, you retain 75% more of your learning if you get time for practice.

Start learning by doing today

View Plans