Author avatar

Chhaya Wagmi

Getting Started with BigML

Chhaya Wagmi

  • Oct 16, 2020
  • 7 Min read
  • 47 Views
  • Oct 16, 2020
  • 7 Min read
  • 47 Views
Data
Data Analytics
Machine Learning

Introduction

BigML is a machine learning software hosted on BigML.com. You can use this platform to perform basic data preprocessing and visualization, and build machine learning models even without coding experience. Both supervised models (linear regression, logistic regression, deepnets, etc.) and unsupervised models (clusters, anomalies, associations, etc.) can be trained and tested on this platform.

For beginners, BigML provides flexibility to build models on a dataset with a maximum size of 10KB free of cost. This guide will demonstrate how to create an end-to-end project on time series data forecasting to give you a clear idea of how you can initiate your own BigML project. The major steps involved in this project are:

  • Importing the time series dataset
  • Data preprocessing
  • Building a machine learning model
  • Applying the model on the test dataset

Importing the Time Series Dataset

Export the AirPassengers dataset from R or download it from Kaggle as a CSV file. This dataset has two columns, Month and #Passengers . None of the features have any missing or invalid value. The data format of the Month column is YYYY-MM. Here's a sneak peek of the data:

Month#Passengers
1949-01112
1949-02118
1949-03132
1949-04129
1949-05121

To upload the dataset in your BigML dashboard, start by creating a new project and name it Time Series Forecast.

Creating a new project

As you can observe in the above image, there are currently no available sources in your project. To bring the dataset inside Sources, click on the extreme right icon and upload the Airpassengers.csv file. This will open a green bar pop-up.

Data uploaded

The dataset has a size of only 1.7KB and therefore you can proceed ahead for data preprocessing.

Data Preprocessing

Click on the dataset and you should see the following screen, which consists of four features:

First view of the dataset

You only had two features at the start but now BigML has broken the first feature, Month, into Month.year and Month.month. To keep only the first two features, click on the Configure source button and change the data type of Month from Datetime to Numeric. Once the data type is set, click Update.

Changing the data type of Month

Have you noticed that you are still working in Sources? To create a dataset of the given two features, click on 1-CLICK DATASET button.

Creating a BigML dataset

This will migrate you from Sources to Datasets tab. Next, you need to split the complete dataset into two separate datasets, one for training the model and another for testing the model. Since you are currently working on a time series dataset, you should only perform a linear split (strict no to random split). To do so, click on the LINEAR SPLIT button, which will initiate the splitting process. After the split, the training data receives 80% of the total observations (115 out of 144).

Splitting data linearly

Building a Machine Learning Model

If you have ever worked on forecasting a time series data in R then you must be familiar with the auto.arima() function, which presents the best fit model out of various models. BigML also creates multiple forecasting models on a dataset and initially chooses the one that has the best metrics. You can also compare the best fit model with the rest of the models based on metrics like AIC, AICc, BIC, and R squared. Apart from presenting the best fit model, you can also visualize the decomposed model with its level, trend, and seasonality.

To start building the forecasting models on your dataset, click the TIME SERIES button.

Building forecasting models

This will present you with a dashboard with all the information about the best model, other models, and the decomposed model.

Best Model Best model

Comparison of Models Comparison of Models

Decomposed Model Decomposed model

Applying the Model on the Test Dataset

So far you have built the model on your training dataset. Now, you need to apply this model on the remaining 20% of data that is inside the testing dataset. To do so, click on the EVALUATE button, select the testing dataset, and then click the green Evaluate button.

Evaluating the model on the testing dataset

After the evaluation, you will receive a dashboard with the original training, testing, and forecasting data. Given below is the forecast for the first three best models. The testing data is differentiated in the highlighted area.

Forecasting on the testing dataset

You can forecast the original data based on any of the available models and, if needed, export the chart in a PNG image with or without legends.

Conclusion

BigML has an efficient, state-of-the-art infrastructure to support data processing, visualizations, and building supervised/unsupervised machine learning models. Currently, it has nine built-in datasets and even gives you the flexibility to upload your own datasets (maximum size 10KB) for your practice, free of cost. This guide presented an end-to-end project (importing/configuring a dataset and building/evaluating models) to get you started with BigML. The Pluralsight course Leveraging Online Resources for Python Analytics also covers BigML for building and sharing analytics. For a more detailed study, you can always refer to BigML documentation.

0