Data Science on the Web with Streamlit

The open source Python library Streamlit is regarded as the fastest way to build data apps for the web to communicate data insights to non-technical audiences.

By Kimaru Thagana

Sep 21, 2020 • 4 Minute Read

Subscribe to the newsletter

Introduction

Data science involves using scientific means in extracting and presenting information and insights to relevant stakeholders. Custom data science tools are often developed as programs or scripts in languages such as Python. To communicate their insights, data scientists may find it hard sharing scripts and programs, especially if the audience is non-technical. They may also require a web-based resource to reach a wider audience.

The need for performing data science on the web using a simple and minimal interface is what birthed the open source Python library Streamlit. Streamlit is highly regarded as the fastest way to build data apps for the web by data scientists, data analysts, machine learning engineers, and business intelligence developers.

This guide will explore the library via a sample app.

Consider the scenario where you are the data scientist for your startup. You have been selected to design and develop an interactive tool for your company's client, a winery that has collected data on the chemical components of their wines. The company wishes to have an interface where they can interact with the dataset, build simple visualizations on demand, and filter columns from the data.

The client is not available for a physical meeting but open to a remote one. They wish to have an interactive interface where they can make simple selections. For this task, you choose to use Streamlit to host the interface for sharing with the client.

This guide assumes you have at least intermediate knowledge in Python and have some experience in data science.

Setup

To get started with Streamlit, download it via the command below in your terminal.

          pip install streamlit
streamlit hello
    

If Streamlit opens and runs on the browser, the installation was successful. Create a new Python file and name it app.py.

Sample App

Import the required libraries and set up the page title and welcome text.

          import streamlit as st
import pandas as pd

st.title("Winery Inc. Welcome")
st.header("Data Visualization Board for Winery Inc Chemical Components")
wine = pd.read_csv("wine_data.csv")
    

Sample Data

The data below is in CSV format and is an extract from the wine dataset. Generate more values to build a sample dataset and save the file as wine_data.csv

          date,pH,sulphur dioxide,acidity,color
20-11-2019,0.025,2,4,white
21-11-2019,0.015,6,3,red
22-11-2019.0.147,2,7,red
    

Visualize the dataframe and give the option of selecting which columns the user would like to see.

          selected_columns = st.multiselect('Select desired Columns', wine.columns.to_list(), default=['acidity','pH'])
st.dataframe(wine[selected_columns])
    

Display a line graph with selected columns as input. Allowing selections gives the user the ability to compare more than one column within the visualization. The date column shows the day the readings were recorded. This column is made to be the index and the X axis of the chart

          st.line_chart(wine[selected_columns].rename(columns={'date':'index'}).set_index('index'))
    

With the above setup, the interface allows the user to select columns to view in the dataset and in the graph.

Running the Project

To run your project, execute the following command on your terminal

      streamlit run app.py

The project will be running on your default browser at the local address and port localhost:8501

Conclusion

Bringing data science and analytics to the web via Streamlit allows you to easily share your data science projects and research with colleagues and the general public. This skill is best applied in job positions such as data analyst, data scientist and business intelligence developer.

To further build on this guide, sign up for a free account on Heroku and follow this guide to upload your data science machine learning or data analytics project.

Kimaru T.

Kimaru is a firm believer of education as a tool of self sufficiency. As software development consultant, living in Kenya, he mainly works to bring small and medium sized business to the internet with custom solutions ranging from data processing to business digitization. Away from the field of coding and computer science, he participates as a mentor for young university students. In his free time, he prefers peace and quiet, away from screens but close to nature.

More about this author