Machine learning and data science are quickly becoming essential and vital practices in organizations that have raw data and would like to derive insights from it.
With large volumes of data, efficiency quickly becomes paramount in order to to test solutions and hypotheses as fast as possible, pushing the best viable solution to production. Turi was built with this in mind. The motivation behind Turi is to create powerful machine learning and data science tools that can allow quick progression from ideas to production.
This guide assumes that you possess at least intermediate level knowledge in Python and that you have some background in machine learning and data science.
Assume you are a machine learning developer at a young and budding startup that seeks to employ efficient data and ML models in the computer vision and retail space. The iOS ML team in the startup is currently considering a tool that can allow rapid prototyping and still deliver results efficiently. Your team decides to give Turi a try in both image classification and predictive recommendation.
To set up and use Turi, you need either Turi Create, open-source, or graphlab, academic. Both are Python libraries used to build high-performing and large-scale data and machine learning applications. To achieve this speed and efficiency, they are supported by C++. Both are supported on Windows, Mac OS, and Linux-based operating systems.
To install the open-source version, on your terminal, run the command
pip install -U turicreate.
In this example, you will use the popular Kaggle flowers dataset.
Download and copy the dataset to your working directory.
Import the turi library and give it an alias for easy reference:
import turicreate as tc.
Since the flowers dataset is in the same directory, load the images using the
data = tc.image_analysis.load_images('flowers', with_path=True).
Next, you will create the label column, which is the name of each subfolder.
1import os 2data['flower_name'] = data['path'].apply(lambda path: os.path.basename(os.path.dirname(path)))
The loaded data is then saved to an SFrame.
Consider this a version of a pandas dataframe that can hold so much more data since it utilizes disk space rather than memory.
To perform image classification, you will need to load the data, train a classification model, and finally, save and export the model.
The previously created
flowers.sframe will be loaded into an SFrame object to allow manipulation and classification.
Load the data:
data = tc.SFrame("flowers.sframe")
Once loaded, split the data into a training and testing set in whichever ratio you see fit. In this example, 0.75 is used.
train_data, test_data = data.random_split(0.8)
With Turi Create, it's easy to create a model with just one line of code. Pass the training set and the label you wish to predict. In this case, the label is
model = tc.image_classifier.create(train_data, target='flower_name')
To make predictions on the test set, the method
predict is called.
predictions = model.predict(test_data)
At this point, the model is complete. For quality purposes, it is a good idea to evaluate the model and examine the accuracy.
metrics = model.evaluate(test_data)
If you wish to view all the other evaluation metrics, just run the line
Save the model:
Export the model in coreML format:
.mlmodel file, you can now add image classification capability to your Apple app. This model can be used on an iPhone app that classifies flowers in real time or in pictures.
For this example, download the MovieLens dataset and copy it into your working directory.
Load the data into an SFrame object
actions = tc.SFrame.read_csv('./dataset/ml-20m/ratings.csv') and print the data
Split the train and test data and create a recommendation model.
1training_data, validation_data = tc.recommender.util.random_split_by_user(actions, 'userId', 'movieId') 2model = tc.recommender.create(training_data, 'userId', 'movieId')
At this point, all that is required is the
1results = model.recommend() 2 3print(results)
In the modern age of data, skills in machine learning and data science are not only vital but very marketable. These skills are much sought after for job roles such as machine learning engineers, data scientists, chief data/information officers, business intelligence developers, and data analysts, among others.
To build on the skills learned by developing machine learning tools with Turi, the next step is to learn how to deploy solutions in production, either on the cloud or on devices.