SageMaker Studio Lab: How to experiment with ML for free
Looking for an easy way to experiment with machine learning on AWS? Amazon SageMaker Studio Lab is an awesome (and free) way to get started with ML.
Jun 08, 2023 • 7 Minute Read
- AI & Machine Learning
In this post, we’ll talk about how you can get started with machine learning for free using Amazon SageMaker Studio Lab.
Coming out of AWS re:Invent 2021, we saw a push for AWS to make machine learning accessible and inclusive. An extension of Amazon SageMaker, Amazon SageMaker Studio Lab (currently in preview), was announced.
SageMaker Studio Lab is a free machine learning service that allows you to spin up Jupyter notebooks quickly and requires no complex configurations to get started. The target audience of this service is developers, students, and data scientists wanting to learn and experiment with machine learning. I like that SageMaker Studio Lab accounts are separate from AWS accounts and only require an email to create — no credit card needed!
If you’re new to machine learning, this free service is a fantastic way to get started.
Requesting a lab account
To request a free SageMaker Studio Lab account, you’ll need to enter a valid email address. There is a waiting list for users to experience the service, which is still in preview. Once you have access, you’re able to start the runtime for your project and open a JupyterLab-based notebook.
Components of a lab account
SageMaker Studio Lab gives you a single project with a minimum of 15 GB of persistent storage, CPU (T3.xlarge) and GPU (G4dn.xlarge) runtimes, enterprise security, and a JupyterLab-based user interface. My session was timeboxed to 12 hours when I selected the compute type and started my session.
CPU works best for general-purpose computing tasks, while GPU is best suited for machine learning algorithms specifically optimized to run on GPUs. Upon further research, I learned CPU runtimes are limited to 12 hours while GPU runtimes are limited to 4 hours. Once the time remaining reaches zero, you’ll have to restart the project runtime again, but all your files (including notebooks) are saved to persistent storage.
Training a machine learning model
The runtime only takes a few minutes to start, and upon opening your project, you’ll see a getting started screen with helpful information. If you’re new to machine learning, I recommend reading the getting started page in full.
As I explored the interface, I was happy to see an integration with Git that allowed me to access a local repository, initialize a repository, or clone an existing repository.
I created an empty notebook instead of cloning existing code for this example.
- Pandas is a data analysis and manipulation library for the Python programming language.
- Scikit-learn is a machine learning library for the Python programming language. The library is easy to use and robust as it features various classification, regression, and clustering learning algorithms.
- LogisticRegression is a class from Scikit-learn that allows you to apply logistic regression to a binary classification problem, resulting in a Yes (i.e., 1) or No (i.e., 0) prediction.
Now, I’m ready to start writing Python code to train a custom machine learning model. There are several options for loading the data you need to train your model. I uploaded preexisting crime data in CSV format from my local machine for this simple example.
The crime data (crime-data-cleaned2.csv) consists of cleaned and preprocessed stop and search crime data at the street level from the United Kingdom. Each row in the dataset represents a stop and search record, and within each record, a column (i.e., CommittedCrime) identifies whether or not that stop led to an arrest. This dataset was preprocessed and put in a format in which a machine can easily find trends and patterns.
Next, I define the column headers since they aren’t provided in the CSV file. Then, I read the CSV file into a Pandas DataFrame using read_csv().
Next, I check the class distributions on the CommittedCrime column, which indicates whether or not a stop led to an arrest, to ensure my dataset is balanced. At 49.79% and 50.21%, my dataset is well balanced.
Next, I split the data into 70% for training and 30% for processing using the train_test_split() function from Scikit-learn.
I verify the split is successful.
Then, I create the logistic regression object and train the model using fit().
Now that I have a model trained to predict crime, I can evaluate its performance and make it available to any application that needs to predict crime.
Overall, I’m very excited about the doors SageMaker Studio Lab opens. It’s entirely free with no need to provide billing and credit card information which is a huge win. Additionally, it’s a standalone service with no dependencies on having an AWS account. As someone who has worked with Amazon SageMaker before, I found it easy to train a model using SageMaker Studio Lab.
If you’ve been curious about machine learning, now is your chance to get started.
Kesha Williams is an award-winning technology leader. She’s also an AWS Machine Learning Hero, AWS Partner Ambassador, and Alexa Champion.