Data science involves using scientific means in extracting and presenting information and insights to relevant stakeholders. Custom data science tools are often developed as programs or scripts in languages such as Python. To communicate their insights, data scientists may find it hard sharing scripts and programs, especially if the audience is non-technical. They may also require a web-based resource to reach a wider audience.
The need for performing data science on the web using a simple and minimal interface is what birthed the open source Python library Streamlit. Streamlit is highly regarded as the fastest way to build data apps for the web by data scientists, data analysts, machine learning engineers, and business intelligence developers.
This guide will explore the library via a sample app.
Consider the scenario where you are the data scientist for your startup. You have been selected to design and develop an interactive tool for your company's client, a winery that has collected data on the chemical components of their wines. The company wishes to have an interface where they can interact with the dataset, build simple visualizations on demand, and filter columns from the data.
The client is not available for a physical meeting but open to a remote one. They wish to have an interactive interface where they can make simple selections. For this task, you choose to use Streamlit to host the interface for sharing with the client.
This guide assumes you have at least intermediate knowledge in Python and have some experience in data science.
To get started with Streamlit, download it via the command below in your terminal.
1pip install streamlit 2streamlit hello
If Streamlit opens and runs on the browser, the installation was successful.
Create a new Python file and name it
Import the required libraries and set up the page title and welcome text.
1import streamlit as st 2import pandas as pd 3 4st.title("Winery Inc. Welcome") 5st.header("Data Visualization Board for Winery Inc Chemical Components") 6wine = pd.read_csv("wine_data.csv")
The data below is in CSV format and is an extract from the wine dataset.
Generate more values to build a sample dataset and save the file as
1date,pH,sulphur dioxide,acidity,color 220-11-2019,0.025,2,4,white 321-11-2019,0.015,6,3,red 422-11-2019.0.147,2,7,red
Visualize the dataframe and give the option of selecting which columns the user would like to see.
1selected_columns = st.multiselect('Select desired Columns', wine.columns.to_list(), default=['acidity','pH']) 2st.dataframe(wine[selected_columns])
Display a line graph with selected columns as input. Allowing selections gives the user the ability to compare more than one column within the visualization. The date column shows the day the readings were recorded. This column is made to be the index and the X axis of the chart
With the above setup, the interface allows the user to select columns to view in the dataset and in the graph.
To run your project, execute the following command on your terminal
1streamlit run app.py
The project will be running on your default browser at the local address and port localhost:8501
Bringing data science and analytics to the web via Streamlit allows you to easily share your data science projects and research with colleagues and the general public. This skill is best applied in job positions such as data analyst, data scientist and business intelligence developer.