Object detection is a subfield of computer vision that deals with identifying instances of semantic objects from digital images and videos. Usually, the identified object is detected and identified by drawing a bounding box around it. In an image, this is a static box, but in a video, this box is in motion, following the live object.
Object detection technology has several applications, such as face detection, people counting, optical character recognition (OCR), and fault and defect detection, among others. This is an exciting field of research and application, and big tech companies are investing and building tools to perform object detection. Such companies include Google with TensorFlow Object Detection API , Facebook with Detectron, Amazon AWS with Sagemaker, and ImageAI their Object Detection.
In this guide, you will go through an object detection example using Detectron2 by Facebook.
Assume you are a software developer looking to develop a proof of concept (PoC) of a face/person detection application for a security company. The client requests a PoC consisting of a simple program that, given an image, can draw a bounding box around a face/person if present.
To develop this, you choose to use Detectron2.
A Pytorch based modular object detection software that is a successor of the previous library, Detectron2 was built on Caffe2. This is an improvement over its predecessor, especially in terms of training time, where Detectron2 is much faster. It also spots new features, such as cascaded R-CNN, panoptic segmentation, and DensePose, among others.
This guide assumes you have a fundamental understanding of computer vision, object detection, and at least intermediate knowledge in deep learning using Pytorch.
There are three main ways to set up.
1. Using Docker: Use these instructions to run Detectron2 in a Docker container. This method requires you to be knowledgable in Docker.
2. Building from source: To use this method, run the code block below. It should be noted that the version of
G++ required is >=5.
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git' python -m pip install -e detectron2
3. Installing Pre-Built Versions for Linux Only: Learn more about this method in the Detectron2 documentation.
Pre-trained models are good for quick demos and can be downloaded from online resources such as model zoo. To run your object detection, use any image of your choosing and read it using opencv. Refer to the code below.
1 2 3 4 5 6 7 8 9 10 11
import detectron2 import numpy as np import os, cv2 from detectron2 import model_zoo from detectron2.engine import DefaultPredictor from detectron2.config import get_cfg from detectron2.utils.visualizer import Visualizer im = cv2.imread("./input.jpg") cv2.imshow(im)
After this, you will then create a Detectron2 config (configuration variable) and a predictor to run inference (perform object detection) on the image you just loaded.
1 2 3 4 5
cfg = get_cfg() # load a weights file from online resources such as model zoo cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml") predictor = DefaultPredictor(cfg) outputs = predictor(im)
At this point, the inference has already happened in the
output variable. To check the number of idenyified objects and the classes of the predicted objects, run the code below.
To visualize your results, you will require a special utility from Detectron2 called Visualizer.
1 2 3 4 5
from detectron2.utils.visualizer import Visualizer from detectron2.data import MetadataCatalog, DatasetCatalog viualizer = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN), scale=1.2) img_output = visualizer.draw_instance_predictions(outputs["instances"].to("cpu")) cv2.imshow(img_output.get_image()[:, :, ::-1])
The result will be the previously loaded image, but with bounding boxes around identified objects and the predicted class names on the boxes. Here's an example output created by the Detectron2 team:
In this guide, you have learned the basic use of Detectron2 by Facebook. With this skill, there are several exciting job roles predominantly in the computer vision space. Positions that involve object detection include computer vision engineers, computer vision researchers, and Image processing engineers.
To further build on the skills learned in this guide, challenge yourself to develop a custom object detection model that can detect anything wish. To make it even more challenging, collect the dataset from scratch. For example, you might decide to collect image data on dogs and build an object detector that can identify dogs in images and draw bounding boxes around them. To help you get started, consider this google Colab tutorial.