Is it possible for a computer to see? If a computer analyzes an image, will it be able to interpret it in the same way a human does? In computer vision, we attempt to accomplish exactly that. And it can be done, with remarkable accuracy. But it is still a difficult problem, and the knowledge and required resources are not within the reach of everyone. But as machine learning is more frequently used, how do those without the knowledge and resource needed for computer vision keep up?
Amazon Web Services offers a product called Rekognition (pronounced like "recognition"). The purpose of Rekognition is to analyze images and predict what objects are in the image, if there are any faces, transcribe text, and other tasks. But to use Rekognition you need to know little, if anything, about computer vision or machine learning at all! You simply point Rekognition to a stored image file, tell it what computer vision task you want, pay Amazon a little bit of money (there is a free allowance for 12 months) and get the results back. Sound simple? It is!
The recommended way to access Rekognition and other AWS products is through a client library. Many different AWS client libraries support different languages. I'll be using Python and the boto3
package for this guide. Check the documentation for the other languages that are supported. I won't go into installing boto3
here, but it's not difficult.
Before using Rekognition, you need to have images stored in a location that is accessible by Rekognition, generally in the cloud. For this guide, I'll upload several images to S3.
1import boto3
2
3s3 = boto3.resource('s3')
4
5for image in ['faces.jpg', 'objects.jpg', 'text.jpg']:
6 s3.Bucket('ps-guide-rekognition').upload_file(image, image)
The images are from Unsplash and can be found respectively at
To use Rekoginition, first create a client
.
1rek = boto3.client('rekognition', region_name='us-east-1')
Notice the use of client
instead of resource
. Create a dict
representing the location of the image in S3.
1image_dict = {'S3Object':{'Bucket': 'ps-guide-rekognition', 'Name':'objects.jpg'}}
Pass the image to the detect_labels
method in the Image
keyword argument along with the number of labels/objects to detect.
1labels = rek.detect_labels(Image=image_dict, MaxLabels=10)
The resulting dict
has a Labels
key with the detected labels. Each label has a Name
and a Confidence
.
1for label in labels['Labels']:
2 print('{} - {}'.format(label['Name'], label['Confidence']))
And here are the results.
1Furniture - 99.87967681884766
2Table - 99.26332092285156
3Wood - 99.20465087890625
4Desk - 98.95134735107422
5Person - 98.12129974365234
6Flooring - 98.12129974365234
7Hardwood - 97.50928497314453
8Floor - 88.36457061767578
9Electronics - 85.74508666992188
10Interior Design - 83.3857192993164
And here is the image:
The bounding boxes of the objects detected are also returned in the dict
. And Rekognition can also detect objects in video, not just images.
To detect a face, call the detect_faces
method and pass it a dict
to the Image
keyword argument similar to detect_labels
. The Attributes
keyword argument is a list of different features to detect, such as age and gender. For this guide, I'll pass a single value, ALL
, to get all of the attributes.
1image_dict = {'S3Object': {'Bucket': 'ps-guide-rekognition', 'Name': 'faces.jpg'}}
2
3faces = rek.detect_faces(Image=image_dict, Attributes=['ALL'])
This is the image:
The faces are stored in the FaceDetails
key. There is only one face in this image. The keys of the face are the attributes.
1face = faces['FaceDetails'][0]
2for key in face.keys():
3 print(key)
1BoundingBox
2AgeRange
3Smile
4Eyeglasses
5Sunglasses
6Gender
7Beard
8Mustache
9EyesOpen
10MouthOpen
11Emotions
12Landmarks
13Pose
14Quality
15Confidence
In the attributes, we can see that Rekognition predicted the AgeRange
as 16-28, the Gender
as Female with an almost 100% confidence, and that the subject's eyes were open but her mouth was not. And it predicted the subject's most likely emotion as CALM
.
1f['AgeRange']
1{'Low': 16, 'High': 28}
1f['Gender']
1{'Value': 'Female', 'Confidence': 99.82478332519531}
1f['EyesOpen']
1{'Value': True, 'Confidence': 98.96699523925781}
1f['MouthOpen']
1{'Value': False, 'Confidence': 94.9305191040039}
1f['Emotions']
1[{'Type': 'CALM', 'Confidence': 93.3696060180664},
2 {'Type': 'ANGRY', 'Confidence': 2.4889822006225586},
3 {'Type': 'HAPPY', 'Confidence': 1.7609130144119263},
4 {'Type': 'SAD', 'Confidence': 1.0307590961456299},
5 {'Type': 'CONFUSED', 'Confidence': 0.5804279446601868},
6 {'Type': 'SURPRISED', 'Confidence': 0.35635992884635925},
7 {'Type': 'DISGUSTED', 'Confidence': 0.24075132608413696},
8 {'Type': 'FEAR', 'Confidence': 0.17220167815685272}]
And like with object detection, video is also supported.
To detect text, call the detect_text
method and just the Image
keyword argument.
1text = rek.detect_text(Image={'S3Object':{'Bucket':'ps-guide-rekognition', 'Name':'text.jpg'}})
Here is an image with some text:
The detected text is in the TextDetections
key and each detection has a DetectedText
key.
1for detection in text['TextDetections']:
2 print(detection['DetectedText'])
1DANGER
2HARD HAT
3PROTECTION
4REQUIRED
5DANGER
6HARD
7HAT
8PROTECTION
9REQUIRED
It might look as if the text were detected twice. But notice that in the image the words "hard" and "hat" are on the same line. The first four detections are lines of text. The remaining detections are for words.
And as you might have guessed, video is also supported.
AWS Rekognition is a simple, easy, quick, and cost-effective way to detect objects, faces, text and more in both still images and videos. You don't need to know anything about computer or machine learning. All you need to know is how to use the API for the client libraries. This guide used Python. There are other client libraries for popular languages. This frees you from devoting resources to reinventing the wheel that Rekognition has built. Thanks for reading!