Important Update
The Guide Feature will be discontinued after December 15th, 2023. Until then, you can continue to access and refer to the existing guides.
Author avatar

Douglas Starnes

Computer Vision with Amazon Rekognition

Douglas Starnes

  • Jul 21, 2020
  • 7 Min read
  • 1,858 Views
  • Jul 21, 2020
  • 7 Min read
  • 1,858 Views
Data
Data Analytics
Machine Learning

Introduction

Is it possible for a computer to see? If a computer analyzes an image, will it be able to interpret it in the same way a human does? In computer vision, we attempt to accomplish exactly that. And it can be done, with remarkable accuracy. But it is still a difficult problem, and the knowledge and required resources are not within the reach of everyone. But as machine learning is more frequently used, how do those without the knowledge and resource needed for computer vision keep up?

AWS Rekognition

Amazon Web Services offers a product called Rekognition (pronounced like "recognition"). The purpose of Rekognition is to analyze images and predict what objects are in the image, if there are any faces, transcribe text, and other tasks. But to use Rekognition you need to know little, if anything, about computer vision or machine learning at all! You simply point Rekognition to a stored image file, tell it what computer vision task you want, pay Amazon a little bit of money (there is a free allowance for 12 months) and get the results back. Sound simple? It is!

The recommended way to access Rekognition and other AWS products is through a client library. Many different AWS client libraries support different languages. I'll be using Python and the boto3 package for this guide. Check the documentation for the other languages that are supported. I won't go into installing boto3 here, but it's not difficult.

Getting Started

Before using Rekognition, you need to have images stored in a location that is accessible by Rekognition, generally in the cloud. For this guide, I'll upload several images to S3.

1import boto3
2
3s3 = boto3.resource('s3')
4
5for image in ['faces.jpg', 'objects.jpg', 'text.jpg']:
6    s3.Bucket('ps-guide-rekognition').upload_file(image, image)
python

The images are from Unsplash and can be found respectively at

Detecting Objects

To use Rekoginition, first create a client.

1rek = boto3.client('rekognition', region_name='us-east-1')
python

Notice the use of client instead of resource. Create a dict representing the location of the image in S3.

1image_dict = {'S3Object':{'Bucket': 'ps-guide-rekognition', 'Name':'objects.jpg'}}
python

Pass the image to the detect_labels method in the Image keyword argument along with the number of labels/objects to detect.

1labels = rek.detect_labels(Image=image_dict, MaxLabels=10)
python

The resulting dict has a Labels key with the detected labels. Each label has a Name and a Confidence.

1for label in labels['Labels']:
2    print('{} - {}'.format(label['Name'], label['Confidence']))
python

And here are the results.

1Furniture - 99.87967681884766
2Table - 99.26332092285156
3Wood - 99.20465087890625
4Desk - 98.95134735107422
5Person - 98.12129974365234
6Flooring - 98.12129974365234
7Hardwood - 97.50928497314453
8Floor - 88.36457061767578
9Electronics - 85.74508666992188
10Interior Design - 83.3857192993164

And here is the image:

The bounding boxes of the objects detected are also returned in the dict. And Rekognition can also detect objects in video, not just images.

Detecting Faces

To detect a face, call the detect_faces method and pass it a dict to the Image keyword argument similar to detect_labels. The Attributes keyword argument is a list of different features to detect, such as age and gender. For this guide, I'll pass a single value, ALL, to get all of the attributes.

1image_dict = {'S3Object': {'Bucket': 'ps-guide-rekognition', 'Name': 'faces.jpg'}}
2
3faces = rek.detect_faces(Image=image_dict, Attributes=['ALL'])
python

This is the image:

The faces are stored in the FaceDetails key. There is only one face in this image. The keys of the face are the attributes.

1face = faces['FaceDetails'][0]
2for key in face.keys():
3    print(key)
python
1BoundingBox
2AgeRange
3Smile
4Eyeglasses
5Sunglasses
6Gender
7Beard
8Mustache
9EyesOpen
10MouthOpen
11Emotions
12Landmarks
13Pose
14Quality
15Confidence

In the attributes, we can see that Rekognition predicted the AgeRange as 16-28, the Gender as Female with an almost 100% confidence, and that the subject's eyes were open but her mouth was not. And it predicted the subject's most likely emotion as CALM.

1f['AgeRange']
python
1{'Low': 16, 'High': 28}
1f['Gender']
python
1{'Value': 'Female', 'Confidence': 99.82478332519531}
1f['EyesOpen']
python
1{'Value': True, 'Confidence': 98.96699523925781}
1f['MouthOpen']
python
1{'Value': False, 'Confidence': 94.9305191040039}
1f['Emotions']
python
1[{'Type': 'CALM', 'Confidence': 93.3696060180664},
2 {'Type': 'ANGRY', 'Confidence': 2.4889822006225586},
3 {'Type': 'HAPPY', 'Confidence': 1.7609130144119263},
4 {'Type': 'SAD', 'Confidence': 1.0307590961456299},
5 {'Type': 'CONFUSED', 'Confidence': 0.5804279446601868},
6 {'Type': 'SURPRISED', 'Confidence': 0.35635992884635925},
7 {'Type': 'DISGUSTED', 'Confidence': 0.24075132608413696},
8 {'Type': 'FEAR', 'Confidence': 0.17220167815685272}]

And like with object detection, video is also supported.

Text Detection

To detect text, call the detect_text method and just the Image keyword argument.

1text = rek.detect_text(Image={'S3Object':{'Bucket':'ps-guide-rekognition', 'Name':'text.jpg'}})
python

Here is an image with some text:

The detected text is in the TextDetections key and each detection has a DetectedText key.

1for detection in text['TextDetections']:
2    print(detection['DetectedText'])
python
1DANGER
2HARD HAT
3PROTECTION
4REQUIRED
5DANGER
6HARD
7HAT
8PROTECTION
9REQUIRED

It might look as if the text were detected twice. But notice that in the image the words "hard" and "hat" are on the same line. The first four detections are lines of text. The remaining detections are for words.

And as you might have guessed, video is also supported.

Conclusion

AWS Rekognition is a simple, easy, quick, and cost-effective way to detect objects, faces, text and more in both still images and videos. You don't need to know anything about computer or machine learning. All you need to know is how to use the API for the client libraries. This guide used Python. There are other client libraries for popular languages. This frees you from devoting resources to reinventing the wheel that Rekognition has built. Thanks for reading!