Doing Data Science with Python 2
This course shows you how to work on an end-to-end data science project including processing data, building & evaluating machine learning model, and exposing the model as an API in a standardized approach using various Python libraries.
What you'll learn
Do you want to become a Data Scientist? If so, this course will equip you with concepts and tools that can bring you to speed and you can utilize the skills acquired in this course to work on any data science project in a standardized approach.
This course, Doing Data Science with Python, follows a pragmatic approach to tackle an end-to-end data science project cycle. You'll learn everything from extracting data from different types of sources, to exposing your machine learning model as API endpoints that can be consumed in a real-world data solution. This course will not only help you to understand various data science related concepts, but also help you to implement the concepts in an industry standard approach by utilizing Python and related libraries.
Table of contents
- Introduction 0m
- Overview 1m
- Python Distributions for Data Science 2m
- Python 3.x vs. Python 2.x 1m
- Demo: Installing Ananconda Distribution 4m
- Jupyter Notebook 2m
- Demo: Setting up Jupyter Notebook on Local Machine 2m
- Demo: Jupyter Notebook - Basics 7m
- Demo: Jupyter Notebook - Magic Functions 8m
- Data Science Project Template 3m
- Demo: Setting up Cookiecutter Data Science Project Template 5m
- Versioning for Data Science Projects 2m
- Demo: Add Project to Git 2m
- Summary 1m
- Introduction 1m
- Overview 1m
- Extracting Data from Databases 1m
- Demo: Extracting Data from Databases 8m
- Extracting Data Through APIs 2m
- Demo: Extracting Data Through APIs 4m
- Extracting Data Using Web Scraping 2m
- Demo: Web Scraping Using Requests and BeautifulSoup 7m
- Demo: Getting Titanic Dataset Using Requests : Part 1 - Initial Preparation 3m
- Demo: Getting Titanic Dataset Using Requests : Part 2 - Downloading Data 8m
- Demo: Creating Reproducible Script for Getting Titanic Data 5m
- Public Datasets 2m
- Committing Changes to Git 1m
- Summary 1m
- Introduction 5m
- Overview 1m
- Introduction to NumPy and Pandas 2m
- EDA: Basic Structure 1m
- Demo: Investigating Basic Structure 10m
- Demo: Selection, Indexing, and Filtering 6m
- EDA: Summary Statistics 1m
- Centrality Measure 0m
- Centrality Measure: Mean 2m
- Centrality Measure: Median 2m
- Spread Measure 1m
- Spread Measure: Range 2m
- Spread Measure: Percentiles and Boxplot 3m
- Spread Measure: Variance and Standard Deviation 2m
- Demo: Getting Summary Statistics for Numerical Features 5m
- Counts and Proportions 1m
- Demo: Summary Statistics for Categorical Feature 5m
- Summary 1m
- Introduction 1m
- Overview 1m
- EDA: Distributions 1m
- Univariate Distribution: Histogram and KDE Plot 5m
- Demo: Creating Univariate Distribution Plots 2m
- Bivariate Distribution: Scatter Plot 1m
- Demo: Creating Scatter Plots 4m
- EDA: Grouping 2m
- Demo: Grouping and Aggregation 5m
- Crosstab 1m
- Demo: Crosstab 2m
- Pivot Table 2m
- Demo: Pivot Table 3m
- Summary 1m
- Introduction 2m
- Overview 1m
- Data Munging 2m
- Missing Value: Issues and Solution 3m
- Missing Value Imputation Techniques 3m
- Demo: Treating Missing Values Using Pandas - Part 1 7m
- Demo: Treating Missing Values Using Pandas - Part 2 2m
- Demo: Treating Missing Values Using Pandas - Part 3 10m
- Outliers: Detection and Treatment 4m
- Demo: Detecting and Treating Outliers Using Pandas and NumPy 7m
- Feature Engineering 2m
- Demo: Feature Creation Using Pandas and NumPy – Part 1 2m
- Demo: Feature Creation Using Pandas and NumPy – Part 2 3m
- Demo: Feature Creation Using Pandas and NumPy – Part 3 1m
- Demo: Feature Creation Using Pandas and NumPy – Part 4 4m
- Categorical Feature Encoding 1m
- Categorical Feature Encoding: Binary Encoding 1m
- Categorical Feature Encoding: Label Encoding 2m
- Categorical Feature Encoding: One-hot Encoding 2m
- Demo: Categorical Feature Encoding Using Pandas 3m
- Demo: Drop and Reorder Columns Using Pandas 2m
- Demo: Save Dataframe to File Using Pandas 3m
- Demo: Reproducible Script for Data Processing Using Pandas and NumPy 7m
- Demo: Creating Visualization Using MatPlotlib 7m
- Demo: Committing Changes to Git 1m
- Summary 1m
- Introduction 3m
- Overview 2m
- Machine Learning Basics 1m
- Machine Learning Basics: Representation and Generalization 2m
- Machine Learning Basics: Spam Classification 3m
- Machine Learning Basics: Supervised Learning 3m
- Machine Learning Basics: Unsupervised Learning 2m
- Titanic Disaster Data Challenge 2m
- Classifier 4m
- Performance Metrics 1m
- Performance Metrics: Accuracy 1m
- Performance Metrics: Precision and Recall 3m
- Classifier Evaluation 3m
- Baseline Model 2m
- Demo: Preparing Data for Machine Learning Model 6m
- Demo: Building and Evaluating Baseline Model 4m
- Demo: Making the First Kaggle Submission 5m
- Linear Regression Model 3m
- Logistic Regression Model 5m
- Demo: Building Logistic Regression Using Scikit-Learn 3m
- Demo: Making Second Kaggle Submission 2m
- Summary 1m
- Introduction 2m
- Overview 2m
- Underfitting vs. Overfitting 3m
- Regularization 4m
- Hyperparameter Optimization: GridSearch 2m
- Crossvalidation 2m
- K-Fold Crossvalidation 1m
- Demo: Hyperparameter Optimization Using GridSearchCV 3m
- Demo: Making Third Kaggle Submission 1m
- Feature Normalization and Standardization 2m
- Demo: Feature Normalization and Standardization Using Scikit-Learn 4m
- Model Persistence 1m
- Demo: Model Persistence Using Pickle 4m
- Machine Learning API Development 2m
- Demo: Hello World API Using Flask 5m
- Demo: Machine Learning API Using Flask 7m
- Demo: Committing Changes to Git 1m
- Summary 2m
- Where to Go from Here? 2m
Course FAQ
Yes! Python's robust libraries are ideal for manipulating data and it is a relatively easy language to learn for data analyst beginners!
Python and R are both great programming languages geared towards data science. However, Python is often easier for beginners, and is a more general purpose language with easy to read syntax. Python is better for raw data scraping, while R is more useful in analyzing already scrubbed data.
Yes. We will go over various standard Python libraries such as NumPy, Scikit-Learn, Pandas, Pickle, Matplotlib, and Flask to help with extracting, cleaning, and processing data, and building machine learning models.
Simply put, it is a combination of statistical and machine learning techniques through the use of Python programming to help analyze and interpret data.
Some previous exposure to Python or its libraries may come in handy, but is not required. Just come with an interest in data science.
Data science is a super popular field these days. Through data science we can find meaningful and valuable insights, and provide data-driven evidence to help organizations be more efficient and successful.