Real-Time Object Detection with YOLO and OpenCV

Object detection is a critical task in the field of computer vision. It involves identifying and localizing objects within an image or video frame. One of the most efficient and widely used techniques for real-time object detection is YOLO (You Only Look Once). Combined with OpenCV, a powerful library for computer vision tasks, YOLO can be implemented efficiently to perform real-time object detection. This tutorial will delve into the details of setting up and using YOLO with OpenCV to achieve robust object detection in real-time applications.

Introduction to YOLO

YOLO is a state-of-the-art, real-time object detection system. Unlike previous detection systems that repurpose classifiers or localizers to perform detection, YOLO frames object detection as a single regression problem, straight from image pixels to bounding box coordinates and class probabilities. This makes it extremely fast and efficient.

Key Features of YOLO:

Speed: YOLO is incredibly fast because it makes predictions with a single network evaluation, as opposed to sliding over regions or bounding boxes.
End-to-End Training: YOLO can be trained end-to-end directly on detection performance.
High Accuracy: YOLO achieves high accuracy with less background errors in object localization.

Setting Up Your Environment

Before we dive into the code, it’s crucial to set up our environment. We will need Python, OpenCV, and the YOLO weights and configuration files.

Prerequisites

Python: Make sure you have Python installed. You can download it from python.org.
OpenCV: Install OpenCV using pip:

   pip install opencv-pythonCode language: Bash (bash)

YOLO Files: Download the YOLO configuration and weights from the official YOLO website or the Darknet GitHub repository. You will need:

yolov3.cfg: The configuration file.
yolov3.weights: The pre-trained weights file.
coco.names: The file containing the class names.

Directory Structure

Create a project directory and organize your files as follows:

project/
├── yolov3.cfg
├── yolov3.weights
├── coco.names
└── detect.pyCode language: plaintext (plaintext)

Loading YOLO with OpenCV

OpenCV provides an interface to load YOLO directly. We will start by loading the model and the class names.

Step 1: Load the YOLO Network

OpenCV’s dnn module makes it straightforward to load the YOLO network. Here’s how to do it:

import cv2
import numpy as np

# Load YOLO
net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")

# Load class names
with open("coco.names", "r") as f:
    classes = [line.strip() for line in f.readlines()]

layer_names = net.getLayerNames()
output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]Code language: Python (python)

Step 2: Prepare the Input Image

YOLO expects the input image to be preprocessed into a blob. This involves resizing the image, normalizing it, and rearranging its dimensions.

def preprocess_image(image):
    blob = cv2.dnn.blobFromImage(image, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
    return blobCode language: Python (python)

Step 3: Perform Forward Pass

Perform a forward pass through the network to get the detection results.

def get_detections(net, blob):
    net.setInput(blob)
    outputs = net.forward(output_layers)
    return outputsCode language: Python (python)

Processing YOLO Outputs

The output from the YOLO network consists of bounding boxes, class IDs, and confidence scores. We need to process these outputs to extract useful information.

Step 4: Extract Bounding Boxes

We will iterate over the outputs and extract bounding boxes, confidences, and class IDs.

def extract_boxes_confidences_classids(outputs, confidence_threshold):
    boxes = []
    confidences = []
    class_ids = []

    for output in outputs:
        for detection in output:
            scores = detection[5:]
            class_id = np.argmax(scores)
            confidence = scores[class_id]
            if confidence > confidence_threshold:
                center_x = int(detection[0] * width)
                center_y = int(detection[1] * height)
                w = int(detection[2] * width)
                h = int(detection[3] * height)
                x = int(center_x - w / 2)
                y = int(center_y - h / 2)
                boxes.append([x, y, w, h])
                confidences.append(float(confidence))
                class_ids.append(class_id)

    return boxes, confidences, class_idsCode language: Python (python)

Step 5: Non-Maximum Suppression

Non-Maximum Suppression (NMS) is used to reduce the number of overlapping bounding boxes and keep the best ones.

def apply_nms(boxes, confidences, score_threshold, nms_threshold):
    indices = cv2.dnn.NMSBoxes(boxes, confidences, score_threshold, nms_threshold)
    return indicesCode language: Python (python)

Step 6: Draw Bounding Boxes

We will draw the final bounding boxes on the image along with class names and confidence scores.

def draw_bounding_boxes(image, boxes, confidences, class_ids, indices, classes):
    for i in indices:
        i = i[0]
        box = boxes[i]
        x, y, w, h = box[0], box[1], box[2], box[3]
        label = str(classes[class_ids[i]])
        confidence = confidences[i]
        color = np.random.uniform(0, 255, size=(3,))
        cv2.rectangle(image, (x, y), (x + w, y + h), color, 2)
        cv2.putText(image, f"{label} {confidence:.2f}", (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)Code language: Python (python)

Real-Time Object Detection

Now that we have all the pieces in place, we can perform real-time object detection using a video stream from a webcam or a video file.

Step 7: Capture Video Stream

Capture video stream using OpenCV’s VideoCapture.

cap = cv2.VideoCapture(0)  # Use 0 for webcam. For a video file, provide the path to the file.

while True:
    ret, frame = cap.read()
    if not ret:
        break

    height, width, channels = frame.shape

    blob = preprocess_image(frame)
    outputs = get_detections(net, blob)

    boxes, confidences, class_ids = extract_boxes_confidences_classids(outputs, 0.5)
    indices = apply_nms(boxes, confidences, 0.5, 0.4)
    draw_bounding_boxes(frame, boxes, confidences, class_ids, indices, classes)

    cv2.imshow("Real-Time Object Detection", frame)

    key = cv2.waitKey(1)
    if key == 27:  # Press 'Esc' to exit
        break

cap.release()
cv2.destroyAllWindows()Code language: Python (python)

Step 8: Putting It All Together

Combine all the steps into a single script for real-time object detection.

import cv2
import numpy as np

# Load YOLO
net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")

# Load class names
with open("coco.names", "r") as f:
    classes = [line.strip() for line in f.readlines()]

layer_names = net.getLayerNames()
output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]

def preprocess_image(image):
    blob = cv2.dnn.blobFromImage(image, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
    return blob

def get_detections(net, blob):
    net.setInput(blob)
    outputs = net.forward(output_layers)
    return outputs

def extract_boxes_confidences_classids(outputs, confidence_threshold):
    boxes = []
    confidences = []
    class_ids = []

    for output in outputs:
        for detection in output:
            scores = detection[5:]
            class_id = np.argmax(scores)
            confidence = scores[class_id]
            if confidence > confidence_threshold:
                center_x = int(detection[0] * width)
                center_y = int(detection[1] * height)
                w = int(detection[2] * width)
                h = int(detection[3] * height)
                x = int(center_x - w / 2)
                y = int(center_y - h / 2)
                boxes.append([x, y, w, h])
                confidences.append(float(confidence))
                class_ids.append(class_id)

    return boxes, confidences, class_ids

def apply_nms(boxes, confidences, score_threshold, nms_threshold):
    indices = cv2.dnn.NMSBoxes(boxes, confidences, score_threshold, nms_threshold)
    return indices

def draw_bounding_boxes(image, boxes, confidences, class_ids, indices, classes):
    for i in indices:
        i = i

[0]
        box = boxes[i]
        x, y, w, h = box[0], box[1], box[2], box[3]
        label = str(classes[class_ids[i]])
        confidence = confidences[i]
        color = np.random.uniform(0, 255, size=(3,))
        cv2.rectangle(image, (x, y), (x + w, y + h), color, 2)
        cv2.putText(image, f"{label} {confidence:.2f}", (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    height, width, channels = frame.shape

    blob = preprocess_image(frame)
    outputs = get_detections(net, blob)

    boxes, confidences, class_ids = extract_boxes_confidences_classids(outputs, 0.5)
    indices = apply_nms(boxes, confidences, 0.5, 0.4)
    draw_bounding_boxes(frame, boxes, confidences, class_ids, indices, classes)

    cv2.imshow("Real-Time Object Detection", frame)

    key = cv2.waitKey(1)
    if key == 27:
        break

cap.release()
cv2.destroyAllWindows()Code language: Python (python)

Optimizing Performance

Real-time object detection can be computationally intensive. Here are some tips to optimize performance:

Use a GPU: Offload computations to a GPU. OpenCV’s dnn module can utilize CUDA if compiled with GPU support.

   net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
   net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)Code language: CSS (css)

Reduce Input Size: Smaller input sizes reduce computational load. However, this may impact accuracy.

   blob = cv2.dnn.blobFromImage(image, 0.00392, (320, 320), (0, 0, 0), True, crop=False)Code language: Python (python)

Batch Processing: Process multiple frames in a batch if latency is not critical.
Model Optimization: Use a lighter YOLO model like YOLOv3-tiny for faster inference.

Practice Exercise: Real-Time Object Detection with Custom Objects and Tracking

Objective:

The goal of this exercise is to implement a real-time object detection system using YOLO and OpenCV that can detect custom objects and track their movements across frames. The system will also log the coordinates of the detected objects in a file for further analysis.

Requirements:

Detect and track multiple custom objects (e.g., cars and pedestrians).
Use YOLO for object detection.
Use OpenCV for video capture, preprocessing, and displaying results.
Log the coordinates of the detected objects in each frame to a CSV file.
Implement a basic tracking mechanism to assign unique IDs to detected objects.

Steps:

1. Prepare the Dataset:

Collect or download a dataset of the custom objects you want to detect.
Use a tool like LabelImg to annotate the images and create training data.

2. Train YOLO Model:

Train a YOLO model on your custom dataset using Darknet.
Save the trained weights, configuration file, and class names.

3. Implement Detection and Tracking:

Solution:

Step 1: Prepare the Dataset and Train YOLO

(Assuming you have already trained the YOLO model and have the following files: yolov3_custom.cfg, yolov3_custom.weights, and custom.names)

Step 2: Implement Detection and Tracking

import cv2
import numpy as np
import csv
import time
from collections import deque

# Load YOLO
net = cv2.dnn.readNet("yolov3_custom.weights", "yolov3_custom.cfg")

# Load class names
with open("custom.names", "r") as f:
    classes = [line.strip() for line in f.readlines()]

layer_names = net.getLayerNames()
output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]

# Initialize video capture
cap = cv2.VideoCapture(0)

# Initialize a list to hold the tracking data
object_trackers = []

# Create a deque to store previous positions of objects
object_paths = {}
max_path_length = 30

# Create a CSV file to log coordinates
csv_file = open("object_tracking_log.csv", mode='w', newline='')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(["Frame", "ObjectID", "Class", "X", "Y", "Width", "Height"])

def preprocess_image(image):
    blob = cv2.dnn.blobFromImage(image, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
    return blob

def get_detections(net, blob):
    net.setInput(blob)
    outputs = net.forward(output_layers)
    return outputs

def extract_boxes_confidences_classids(outputs, confidence_threshold):
    boxes = []
    confidences = []
    class_ids = []

    for output in outputs:
        for detection in output:
            scores = detection[5:]
            class_id = np.argmax(scores)
            confidence = scores[class_id]
            if confidence > confidence_threshold:
                center_x = int(detection[0] * width)
                center_y = int(detection[1] * height)
                w = int(detection[2] * width)
                h = int(detection[3] * height)
                x = int(center_x - w / 2)
                y = int(center_y - h / 2)
                boxes.append([x, y, w, h])
                confidences.append(float(confidence))
                class_ids.append(class_id)

    return boxes, confidences, class_ids

def apply_nms(boxes, confidences, score_threshold, nms_threshold):
    indices = cv2.dnn.NMSBoxes(boxes, confidences, score_threshold, nms_threshold)
    return indices

def draw_bounding_boxes(image, boxes, confidences, class_ids, indices, classes):
    global object_trackers, object_paths

    new_object_trackers = []
    new_object_paths = {}

    for i in indices:
        i = i[0]
        box = boxes[i]
        x, y, w, h = box[0], box[1], box[2], box[3]
        label = str(classes[class_ids[i]])
        confidence = confidences[i]
        color = np.random.uniform(0, 255, size=(3,))
        cv2.rectangle(image, (x, y), (x + w, y + h), color, 2)
        cv2.putText(image, f"{label} {confidence:.2f}", (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

        # Log the coordinates
        csv_writer.writerow([frame_count, i, label, x, y, w, h])

        # Initialize tracker
        tracker = cv2.TrackerKCF_create()
        tracker.init(image, tuple(box))
        new_object_trackers.append((tracker, class_ids[i]))

        # Update paths
        if i in object_paths:
            object_paths[i].append((x + w // 2, y + h // 2))
            if len(object_paths[i]) > max_path_length:
                object_paths[i].popleft()
        else:
            object_paths[i] = deque([(x + w // 2, y + h // 2)], maxlen=max_path_length)

        new_object_paths[i] = object_paths[i]

    object_trackers = new_object_trackers
    object_paths = new_object_paths

    # Draw paths
    for object_id, path in object_paths.items():
        for j in range(1, len(path)):
            if path[j - 1] is None or path[j] is None:
                continue
            thickness = int(np.sqrt(max_path_length / float(j + 1)) * 2.5)
            cv2.line(image, path[j - 1], path[j], color, thickness)

frame_count = 0

while True:
    ret, frame = cap.read()
    if not ret:
        break

    height, width, channels = frame.shape

    blob = preprocess_image(frame)
    outputs = get_detections(net, blob)

    boxes, confidences, class_ids = extract_boxes_confidences_classids(outputs, 0.5)
    indices = apply_nms(boxes, confidences, 0.5, 0.4)
    draw_bounding_boxes(frame, boxes, confidences, class_ids, indices, classes)

    # Update trackers
    new_object_trackers = []
    for tracker, class_id in object_trackers:
        success, box = tracker.update(frame)
        if success:
            x, y, w, h = [int(v) for v in box]
            color = np.random.uniform(0, 255, size=(3,))
            cv2.rectangle(frame, (x, y), (x + w, y + h), color, 2)
            cv2.putText(frame, classes[class_id], (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
            new_object_trackers.append((tracker, class_id))

            # Log the coordinates
            csv_writer.writerow([frame_count, class_id, classes[class_id], x, y, w, h])

            # Update paths
            if class_id in object_paths:
                object_paths[class_id].append((x + w // 2, y + h // 2))
                if len(object_paths[class_id]) > max_path_length:
                    object_paths[class_id].popleft()
            else:
                object_paths[class_id] = deque([(x + w // 2, y + h // 2)], maxlen=max_path_length)

    object_trackers = new_object_trackers

    # Draw paths
    for object_id, path in object_paths.items():
        for j in range(1, len(path)):
            if path[j - 1] is None or path[j] is None:
                continue
            thickness = int(np.sqrt(max_path_length / float(j + 1)) * 2.5)
            cv2.line(frame, path[j - 1], path[j], color, thickness)

    cv2.imshow("Real-Time Object Detection and Tracking", frame)

    frame_count += 1

    key = cv2.waitKey(1)
    if key == 27:  # Press 'Esc' to exit
        break

cap.release()
csv_file.close()
cv2.destroyAllWindows()Code language: Python (python)

Explanation:

Preprocess Image: Convert the image to a blob and pass it through the YOLO network to get detections.
Extract Boxes, Confidences, and Class IDs: Parse the outputs of the network to get the bounding boxes, confidence scores, and class IDs.
Apply Non-Maximum Suppression: Filter out overlapping bounding boxes to keep only the most confident ones.
Draw Bounding Boxes: Draw the bounding boxes and log the coordinates.
Initialize and Update Trackers: Use OpenCV’s TrackerKCF_create() to track the objects across frames.
Draw Paths: Keep track of object paths and draw them on the frame.
Log Data: Log the coordinates of the detected objects to a CSV file for further analysis.

Conclusion:

This exercise provides a comprehensive practice for implementing a real-time object detection and tracking system using YOLO and OpenCV. It covers key aspects such as detection, tracking, logging, and visualization, making it a robust solution for various real-time applications.