深入探索Supervision库：Python中的AI视觉助手

在这里插入图片描述

深入探索Supervision库：Python中的AI视觉助手

在计算机视觉和机器学习领域，数据处理和结果可视化是项目成功的关键环节。今天我们将深入探讨一个强大的Python库——Supervision，它专为简化AI视觉项目的工作流程而设计。

什么是Supervision？

Supervision是一个开源的Python库，旨在为计算机视觉项目提供一系列实用工具，特别是在对象检测、分割和跟踪任务中。它提供了直观的API，可以与流行的机器学习框架（如YOLO、Detectron2等）无缝集成，大大简化了从模型推理到结果可视化的整个流程。

核心功能概述

Supervision的主要功能包括但不限于：

标注可视化（边界框、掩码、标签等）
数据集处理与转换
检测过滤与后处理
视频流处理
性能分析工具
与多种计算机视觉框架的集成

安装Supervision

安装Supervision非常简单，可以通过pip完成：

pip install supervision

如果你需要完整的功能（包括视频处理支持）：

pip install supervision[full]

基础使用示例

让我们从一个简单的例子开始，展示如何使用Supervision可视化检测结果。

import cv2
import supervision as sv
from ultralytics import YOLO# 加载YOLOv8模型
model = YOLO('yolov8n.pt')# 读取图像
image = cv2.imread('image.jpg')# 运行推理
results = model(image)[0]
detections = sv.Detections.from_yolov8(results)# 创建标注工具
box_annotator = sv.BoxAnnotator()# 标注图像
labels = [f"{model.model.names[class_id]} {confidence:0.2f}"for _, _, confidence, class_id, _in detections
]
annotated_image = box_annotator.annotate(scene=image.copy(),detections=detections,labels=labels
)# 显示结果
sv.plot_image(annotated_image)

检测结果处理

Supervision的Detections类是处理检测结果的核心。让我们看看如何操作这些检测结果。

# 过滤低置信度的检测
high_confidence_detections = detections[detections.confidence > 0.7]# 只保留特定类别的检测
person_detections = detections[detections.class_id == 0]  # 假设0是人# 获取检测的边界框坐标
for bbox in person_detections.xyxy:print(f"边界框坐标: {bbox}")# 计算检测区域中心点
centers = person_detections.get_anchors_coordinates(sv.Position.CENTER)
print(f"中心点坐标: {centers}")

高级标注功能

Supervision提供了多种标注样式，可以满足不同的可视化需求。

# 创建不同类型的标注器
box_annotator = sv.BoxAnnotator(thickness=2,text_thickness=1,text_scale=0.5
)mask_annotator = sv.MaskAnnotator()
label_annotator = sv.LabelAnnotator()
circle_annotator = sv.CircleAnnotator()# 组合使用多种标注
annotated_image = box_annotator.annotate(image.copy(), detections)
annotated_image = mask_annotator.annotate(annotated_image, detections)
annotated_image = label_annotator.annotate(annotated_image, detections)
annotated_image = circle_annotator.annotate(annotated_image, detections,anchor=sv.Position.CENTER
)sv.plot_image(annotated_image)

视频处理能力

Supervision简化了视频处理流程，使得处理视频流就像处理单帧图像一样简单。

# 创建视频处理器
video_info = sv.VideoInfo.from_video_path("video.mp4")
frame_generator = sv.get_video_frames_generator("video.mp4")# 初始化跟踪器
byte_tracker = sv.ByteTrack()# 处理每一帧
with sv.VideoSink("output.mp4", video_info) as sink:for frame in frame_generator:results = model(frame)[0]detections = sv.Detections.from_yolov8(results)detections = byte_tracker.update_with_detections(detections)annotated_frame = box_annotator.annotate(scene=frame.copy(),detections=detections,labels=labels)sink.write_frame(annotated_frame)

数据集工具

Supervision提供了一些便捷的数据集处理工具。

# 加载COCO数据集
dataset = sv.DetectionDataset.from_coco(images_directory_path="train/images",annotations_path="train/annotations.json"
)# 随机采样并可视化
samples = dataset.sample(4)
sv.plot_images_grid(images=[sample.image for sample in samples],annotations=[sample.annotations for sample in samples],grid_size=(2, 2),size=(16, 16)
)# 转换为其他格式
dataset.as_yolo(images_directory_path="yolo/images",annotations_directory_path="yolo/labels",data_yaml_path="yolo/data.yaml"
)

高级分析功能

Supervision还包含一些高级分析工具，如区域计数和热图生成。

# 定义感兴趣区域
polygon = np.array([[100, 100],[300, 100],[300, 300],[100, 300]
])
zone = sv.PolygonZone(polygon, frame_resolution_wh=(640, 480))# 创建分析工具
zone_annotator = sv.PolygonZoneAnnotator(zone=zone, color=sv.Color.red()
)
heat_map_annotator = sv.HeatMapAnnotator()# 处理视频并分析
heat_map = np.zeros((480, 640), dtype=np.float32)
with sv.VideoSink("analysis_output.mp4", video_info) as sink:for frame in frame_generator:results = model(frame)[0]detections = sv.Detections.from_yolov8(results)# 更新区域计数zone.trigger(detections)# 更新热图heat_map = heat_map_annotator.update(heat_map, detections)# 标注annotated_frame = box_annotator.annotate(frame.copy(), detections)annotated_frame = zone_annotator.annotate(annotated_frame)annotated_frame = heat_map_annotator.annotate(annotated_frame,heat_map=heat_map)sink.write_frame(annotated_frame)

自定义标注样式

Supervision允许完全自定义标注的外观。

# 自定义颜色和样式
class CustomColor:BOX = sv.Color(r=255, g=0, b=0)  # 红色边框TEXT = sv.Color(r=255, g=255, b=255)  # 白色文本BACKGROUND = sv.Color(r=0, g=0, b=0, a=128)  # 半透明黑色背景custom_annotator = sv.BoxAnnotator(color=CustomColor.BOX,text_color=CustomColor.TEXT,text_background_color=CustomColor.BACKGROUND,text_padding=2,thickness=3,corner_radius=10
)annotated_image = custom_annotator.annotate(scene=image.copy(),detections=detections,labels=labels
)
sv.plot_image(annotated_image)

与不同框架集成

Supervision支持与多种流行框架的集成。

# 从不同框架创建Detections对象# 从YOLOv8
detections = sv.Detections.from_yolov8(results)# 从Detectron2
# outputs = predictor(image)
# detections = sv.Detections.from_detectron2(outputs)# 从MMDetection
# result = inference_detector(model, image)
# detections = sv.Detections.from_mmdetection(result)# 从TorchVision
# outputs = model(image)
# detections = sv.Detections.from_torchvision(outputs)

实用工具函数

Supervision还包含许多有用的实用函数。

# 图像处理
resized_image = sv.resize_image(image, scale_factor=0.5)
gray_image = sv.cvt_color(image, sv.ColorConversion.BGR2GRAY)# 视频工具
sv.get_video_frames_count("video.mp4")
sv.get_video_fps("video.mp4")# 文件系统
sv.list_files_with_extensions(directory="dataset/images",extensions=["jpg", "png"]
)# 绘图工具
sv.draw_text(scene=image.copy(),text="Sample Text",text_anchor=sv.Point(100, 100),text_color=sv.Color.red(),text_scale=1.0,text_thickness=2,background_color=sv.Color.white()
)

性能优化技巧

当处理大规模数据时，性能变得尤为重要。

# 使用多线程处理视频
with sv.VideoSink("output.mp4", video_info) as sink:with sv.FramesThreadBatchProcessor(source_path="video.mp4",batch_size=4,max_workers=4) as batch_generator:for batch in batch_generator:batch_results = model(batch.frames)batch_detections = [sv.Detections.from_yolov8(results)for results in batch_results]for frame, detections in zip(batch.frames, batch_detections):annotated_frame = box_annotator.annotate(scene=frame.copy(),detections=detections)sink.write_frame(annotated_frame)

实际应用案例

让我们看一个完整的行人计数应用示例。

import numpy as np
import supervision as sv
from ultralytics import YOLO# 初始化模型和工具
model = YOLO('yolov8n.pt')
byte_tracker = sv.ByteTrack()
box_annotator = sv.BoxAnnotator()# 定义计数区域
counting_zone = np.array([[200, 150],[800, 150],[800, 600],[200, 600]
])
zone = sv.PolygonZone(polygon=counting_zone, frame_resolution_wh=(1280, 720))
zone_annotator = sv.PolygonZoneAnnotator(zone=zone,color=sv.Color.green(),text_color=sv.Color.black(),text_scale=2,text_thickness=4,text_padding=8
)# 处理视频
with sv.VideoSink("people_counting.mp4", sv.VideoInfo.from_video_path("input.mp4")) as sink:for frame in sv.get_video_frames_generator("input.mp4"):# 推理results = model(frame)[0]detections = sv.Detections.from_yolov8(results)# 只保留人（class_id=0）detections = detections[detections.class_id == 0]# 更新跟踪器detections = byte_tracker.update_with_detections(detections)# 更新计数区域zone.trigger(detections)# 标注labels = [f"#{tracker_id} {model.model.names[class_id]} {confidence:0.2f}"for _, _, confidence, class_id, tracker_idin detections]annotated_frame = box_annotator.annotate(scene=frame.copy(),detections=detections,labels=labels)annotated_frame = zone_annotator.annotate(annotated_frame)sink.write_frame(annotated_frame)print(f"总人数统计: {zone.current_count}")