Object detection is a critical task in the field of computer vision. It involves identifying and localizing objects within an image or video frame. One of the most efficient and widely used techniques for real-time object detection is YOLO (You Only Look Once). Combined with OpenCV, a powerful library for computer vision tasks, YOLO can be implemented efficiently to perform real-time object detection. This tutorial will delve into the details of setting up and using YOLO with OpenCV to achieve robust object detection in real-time applications.
Introduction to YOLO
YOLO is a state-of-the-art, real-time object detection system. Unlike previous detection systems that repurpose classifiers or localizers to perform detection, YOLO frames object detection as a single regression problem, straight from image pixels to bounding box coordinates and class probabilities. This makes it extremely fast and efficient.
Key Features of YOLO:
- Speed: YOLO is incredibly fast because it makes predictions with a single network evaluation, as opposed to sliding over regions or bounding boxes.
- End-to-End Training: YOLO can be trained end-to-end directly on detection performance.
- High Accuracy: YOLO achieves high accuracy with less background errors in object localization.
Setting Up Your Environment
Before we dive into the code, it’s crucial to set up our environment. We will need Python, OpenCV, and the YOLO weights and configuration files.
Prerequisites
- Python: Make sure you have Python installed. You can download it from python.org.
- OpenCV: Install OpenCV using pip:
pip install opencv-python
Code language: Bash (bash)
- YOLO Files: Download the YOLO configuration and weights from the official YOLO website or the Darknet GitHub repository. You will need:
yolov3.cfg
: The configuration file.yolov3.weights
: The pre-trained weights file.coco.names
: The file containing the class names.
Directory Structure
Create a project directory and organize your files as follows:
project/
├── yolov3.cfg
├── yolov3.weights
├── coco.names
└── detect.py
Code language: plaintext (plaintext)
Loading YOLO with OpenCV
OpenCV provides an interface to load YOLO directly. We will start by loading the model and the class names.
Step 1: Load the YOLO Network
OpenCV’s dnn
module makes it straightforward to load the YOLO network. Here’s how to do it:
import cv2
import numpy as np
# Load YOLO
net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")
# Load class names
with open("coco.names", "r") as f:
classes = [line.strip() for line in f.readlines()]
layer_names = net.getLayerNames()
output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]
Code language: Python (python)
Step 2: Prepare the Input Image
YOLO expects the input image to be preprocessed into a blob. This involves resizing the image, normalizing it, and rearranging its dimensions.
def preprocess_image(image):
blob = cv2.dnn.blobFromImage(image, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
return blob
Code language: Python (python)
Step 3: Perform Forward Pass
Perform a forward pass through the network to get the detection results.
def get_detections(net, blob):
net.setInput(blob)
outputs = net.forward(output_layers)
return outputs
Code language: Python (python)
Processing YOLO Outputs
The output from the YOLO network consists of bounding boxes, class IDs, and confidence scores. We need to process these outputs to extract useful information.
Step 4: Extract Bounding Boxes
We will iterate over the outputs and extract bounding boxes, confidences, and class IDs.
def extract_boxes_confidences_classids(outputs, confidence_threshold):
boxes = []
confidences = []
class_ids = []
for output in outputs:
for detection in output:
scores = detection[5:]
class_id = np.argmax(scores)
confidence = scores[class_id]
if confidence > confidence_threshold:
center_x = int(detection[0] * width)
center_y = int(detection[1] * height)
w = int(detection[2] * width)
h = int(detection[3] * height)
x = int(center_x - w / 2)
y = int(center_y - h / 2)
boxes.append([x, y, w, h])
confidences.append(float(confidence))
class_ids.append(class_id)
return boxes, confidences, class_ids
Code language: Python (python)
Step 5: Non-Maximum Suppression
Non-Maximum Suppression (NMS) is used to reduce the number of overlapping bounding boxes and keep the best ones.
def apply_nms(boxes, confidences, score_threshold, nms_threshold):
indices = cv2.dnn.NMSBoxes(boxes, confidences, score_threshold, nms_threshold)
return indices
Code language: Python (python)
Step 6: Draw Bounding Boxes
We will draw the final bounding boxes on the image along with class names and confidence scores.
def draw_bounding_boxes(image, boxes, confidences, class_ids, indices, classes):
for i in indices:
i = i[0]
box = boxes[i]
x, y, w, h = box[0], box[1], box[2], box[3]
label = str(classes[class_ids[i]])
confidence = confidences[i]
color = np.random.uniform(0, 255, size=(3,))
cv2.rectangle(image, (x, y), (x + w, y + h), color, 2)
cv2.putText(image, f"{label} {confidence:.2f}", (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
Code language: Python (python)
Real-Time Object Detection
Now that we have all the pieces in place, we can perform real-time object detection using a video stream from a webcam or a video file.
Step 7: Capture Video Stream
Capture video stream using OpenCV’s VideoCapture
.
cap = cv2.VideoCapture(0) # Use 0 for webcam. For a video file, provide the path to the file.
while True:
ret, frame = cap.read()
if not ret:
break
height, width, channels = frame.shape
blob = preprocess_image(frame)
outputs = get_detections(net, blob)
boxes, confidences, class_ids = extract_boxes_confidences_classids(outputs, 0.5)
indices = apply_nms(boxes, confidences, 0.5, 0.4)
draw_bounding_boxes(frame, boxes, confidences, class_ids, indices, classes)
cv2.imshow("Real-Time Object Detection", frame)
key = cv2.waitKey(1)
if key == 27: # Press 'Esc' to exit
break
cap.release()
cv2.destroyAllWindows()
Code language: Python (python)
Step 8: Putting It All Together
Combine all the steps into a single script for real-time object detection.
import cv2
import numpy as np
# Load YOLO
net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")
# Load class names
with open("coco.names", "r") as f:
classes = [line.strip() for line in f.readlines()]
layer_names = net.getLayerNames()
output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]
def preprocess_image(image):
blob = cv2.dnn.blobFromImage(image, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
return blob
def get_detections(net, blob):
net.setInput(blob)
outputs = net.forward(output_layers)
return outputs
def extract_boxes_confidences_classids(outputs, confidence_threshold):
boxes = []
confidences = []
class_ids = []
for output in outputs:
for detection in output:
scores = detection[5:]
class_id = np.argmax(scores)
confidence = scores[class_id]
if confidence > confidence_threshold:
center_x = int(detection[0] * width)
center_y = int(detection[1] * height)
w = int(detection[2] * width)
h = int(detection[3] * height)
x = int(center_x - w / 2)
y = int(center_y - h / 2)
boxes.append([x, y, w, h])
confidences.append(float(confidence))
class_ids.append(class_id)
return boxes, confidences, class_ids
def apply_nms(boxes, confidences, score_threshold, nms_threshold):
indices = cv2.dnn.NMSBoxes(boxes, confidences, score_threshold, nms_threshold)
return indices
def draw_bounding_boxes(image, boxes, confidences, class_ids, indices, classes):
for i in indices:
i = i
[0]
box = boxes[i]
x, y, w, h = box[0], box[1], box[2], box[3]
label = str(classes[class_ids[i]])
confidence = confidences[i]
color = np.random.uniform(0, 255, size=(3,))
cv2.rectangle(image, (x, y), (x + w, y + h), color, 2)
cv2.putText(image, f"{label} {confidence:.2f}", (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if not ret:
break
height, width, channels = frame.shape
blob = preprocess_image(frame)
outputs = get_detections(net, blob)
boxes, confidences, class_ids = extract_boxes_confidences_classids(outputs, 0.5)
indices = apply_nms(boxes, confidences, 0.5, 0.4)
draw_bounding_boxes(frame, boxes, confidences, class_ids, indices, classes)
cv2.imshow("Real-Time Object Detection", frame)
key = cv2.waitKey(1)
if key == 27:
break
cap.release()
cv2.destroyAllWindows()
Code language: Python (python)
Optimizing Performance
Real-time object detection can be computationally intensive. Here are some tips to optimize performance:
- Use a GPU: Offload computations to a GPU. OpenCV’s
dnn
module can utilize CUDA if compiled with GPU support.
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
Code language: CSS (css)
- Reduce Input Size: Smaller input sizes reduce computational load. However, this may impact accuracy.
blob = cv2.dnn.blobFromImage(image, 0.00392, (320, 320), (0, 0, 0), True, crop=False)
Code language: Python (python)
- Batch Processing: Process multiple frames in a batch if latency is not critical.
- Model Optimization: Use a lighter YOLO model like YOLOv3-tiny for faster inference.
Practice Exercise: Real-Time Object Detection with Custom Objects and Tracking
Objective:
The goal of this exercise is to implement a real-time object detection system using YOLO and OpenCV that can detect custom objects and track their movements across frames. The system will also log the coordinates of the detected objects in a file for further analysis.
Requirements:
- Detect and track multiple custom objects (e.g., cars and pedestrians).
- Use YOLO for object detection.
- Use OpenCV for video capture, preprocessing, and displaying results.
- Log the coordinates of the detected objects in each frame to a CSV file.
- Implement a basic tracking mechanism to assign unique IDs to detected objects.
Steps:
1. Prepare the Dataset:
- Collect or download a dataset of the custom objects you want to detect.
- Use a tool like LabelImg to annotate the images and create training data.
2. Train YOLO Model:
- Train a YOLO model on your custom dataset using Darknet.
- Save the trained weights, configuration file, and class names.
3. Implement Detection and Tracking:
Solution:
Step 1: Prepare the Dataset and Train YOLO
(Assuming you have already trained the YOLO model and have the following files: yolov3_custom.cfg
, yolov3_custom.weights
, and custom.names
)
Step 2: Implement Detection and Tracking
import cv2
import numpy as np
import csv
import time
from collections import deque
# Load YOLO
net = cv2.dnn.readNet("yolov3_custom.weights", "yolov3_custom.cfg")
# Load class names
with open("custom.names", "r") as f:
classes = [line.strip() for line in f.readlines()]
layer_names = net.getLayerNames()
output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]
# Initialize video capture
cap = cv2.VideoCapture(0)
# Initialize a list to hold the tracking data
object_trackers = []
# Create a deque to store previous positions of objects
object_paths = {}
max_path_length = 30
# Create a CSV file to log coordinates
csv_file = open("object_tracking_log.csv", mode='w', newline='')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(["Frame", "ObjectID", "Class", "X", "Y", "Width", "Height"])
def preprocess_image(image):
blob = cv2.dnn.blobFromImage(image, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
return blob
def get_detections(net, blob):
net.setInput(blob)
outputs = net.forward(output_layers)
return outputs
def extract_boxes_confidences_classids(outputs, confidence_threshold):
boxes = []
confidences = []
class_ids = []
for output in outputs:
for detection in output:
scores = detection[5:]
class_id = np.argmax(scores)
confidence = scores[class_id]
if confidence > confidence_threshold:
center_x = int(detection[0] * width)
center_y = int(detection[1] * height)
w = int(detection[2] * width)
h = int(detection[3] * height)
x = int(center_x - w / 2)
y = int(center_y - h / 2)
boxes.append([x, y, w, h])
confidences.append(float(confidence))
class_ids.append(class_id)
return boxes, confidences, class_ids
def apply_nms(boxes, confidences, score_threshold, nms_threshold):
indices = cv2.dnn.NMSBoxes(boxes, confidences, score_threshold, nms_threshold)
return indices
def draw_bounding_boxes(image, boxes, confidences, class_ids, indices, classes):
global object_trackers, object_paths
new_object_trackers = []
new_object_paths = {}
for i in indices:
i = i[0]
box = boxes[i]
x, y, w, h = box[0], box[1], box[2], box[3]
label = str(classes[class_ids[i]])
confidence = confidences[i]
color = np.random.uniform(0, 255, size=(3,))
cv2.rectangle(image, (x, y), (x + w, y + h), color, 2)
cv2.putText(image, f"{label} {confidence:.2f}", (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
# Log the coordinates
csv_writer.writerow([frame_count, i, label, x, y, w, h])
# Initialize tracker
tracker = cv2.TrackerKCF_create()
tracker.init(image, tuple(box))
new_object_trackers.append((tracker, class_ids[i]))
# Update paths
if i in object_paths:
object_paths[i].append((x + w // 2, y + h // 2))
if len(object_paths[i]) > max_path_length:
object_paths[i].popleft()
else:
object_paths[i] = deque([(x + w // 2, y + h // 2)], maxlen=max_path_length)
new_object_paths[i] = object_paths[i]
object_trackers = new_object_trackers
object_paths = new_object_paths
# Draw paths
for object_id, path in object_paths.items():
for j in range(1, len(path)):
if path[j - 1] is None or path[j] is None:
continue
thickness = int(np.sqrt(max_path_length / float(j + 1)) * 2.5)
cv2.line(image, path[j - 1], path[j], color, thickness)
frame_count = 0
while True:
ret, frame = cap.read()
if not ret:
break
height, width, channels = frame.shape
blob = preprocess_image(frame)
outputs = get_detections(net, blob)
boxes, confidences, class_ids = extract_boxes_confidences_classids(outputs, 0.5)
indices = apply_nms(boxes, confidences, 0.5, 0.4)
draw_bounding_boxes(frame, boxes, confidences, class_ids, indices, classes)
# Update trackers
new_object_trackers = []
for tracker, class_id in object_trackers:
success, box = tracker.update(frame)
if success:
x, y, w, h = [int(v) for v in box]
color = np.random.uniform(0, 255, size=(3,))
cv2.rectangle(frame, (x, y), (x + w, y + h), color, 2)
cv2.putText(frame, classes[class_id], (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
new_object_trackers.append((tracker, class_id))
# Log the coordinates
csv_writer.writerow([frame_count, class_id, classes[class_id], x, y, w, h])
# Update paths
if class_id in object_paths:
object_paths[class_id].append((x + w // 2, y + h // 2))
if len(object_paths[class_id]) > max_path_length:
object_paths[class_id].popleft()
else:
object_paths[class_id] = deque([(x + w // 2, y + h // 2)], maxlen=max_path_length)
object_trackers = new_object_trackers
# Draw paths
for object_id, path in object_paths.items():
for j in range(1, len(path)):
if path[j - 1] is None or path[j] is None:
continue
thickness = int(np.sqrt(max_path_length / float(j + 1)) * 2.5)
cv2.line(frame, path[j - 1], path[j], color, thickness)
cv2.imshow("Real-Time Object Detection and Tracking", frame)
frame_count += 1
key = cv2.waitKey(1)
if key == 27: # Press 'Esc' to exit
break
cap.release()
csv_file.close()
cv2.destroyAllWindows()
Code language: Python (python)
Explanation:
- Preprocess Image: Convert the image to a blob and pass it through the YOLO network to get detections.
- Extract Boxes, Confidences, and Class IDs: Parse the outputs of the network to get the bounding boxes, confidence scores, and class IDs.
- Apply Non-Maximum Suppression: Filter out overlapping bounding boxes to keep only the most confident ones.
- Draw Bounding Boxes: Draw the bounding boxes and log the coordinates.
- Initialize and Update Trackers: Use OpenCV’s
TrackerKCF_create()
to track the objects across frames. - Draw Paths: Keep track of object paths and draw them on the frame.
- Log Data: Log the coordinates of the detected objects to a CSV file for further analysis.
Conclusion:
This exercise provides a comprehensive practice for implementing a real-time object detection and tracking system using YOLO and OpenCV. It covers key aspects such as detection, tracking, logging, and visualization, making it a robust solution for various real-time applications.