mediapipe标注视频姿态关键点(基础版加进阶版)

前言

手语视频流的识别有两种大的分类,一种是直接将视频输入进网络,一种是识别了关键点之后再进入网络。所以这篇文章我就要来讲讲如何用mediapipe对手语视频进行关键点标注。

代码

需要直接使用代码的,我就放这里了。环境自己配置一下吧,不太记得了。

基础代码

这部分实现了主要功能,后续在此基础上进行修改

import os
import cv2
import numpy as np
import mediapipe as mp
from concurrent.futures import ThreadPoolExecutor# 关键点过滤设置
filtered_hand = list(range(21))
filtered_pose = [11, 12, 13, 14, 15, 16]  # 只保留躯干和手臂关键点
HAND_NUM = len(filtered_hand)
POSE_NUM = len(filtered_pose)# 初始化MediaPipe模型(增加检测参数)
mp_hands = mp.solutions.hands
mp_pose = mp.solutions.posehands = mp_hands.Hands(static_image_mode=False,max_num_hands=2,min_detection_confidence=0.1,#太高的话,没识别到就不识别,比较低能识别的比较全(没有干扰的情况下低比较好)min_tracking_confidence=0.1#太高,没追踪到也会放弃,比较低的连续性会比较好
)pose = mp_pose.Pose(static_image_mode=False,model_complexity=1,min_detection_confidence=0.7,min_tracking_confidence=0.5
)def get_frame_landmarks(frame):"""获取单帧关键点(修复线程安全问题)"""all_landmarks = np.full((HAND_NUM * 2 + POSE_NUM, 3), np.nan)  # 初始化为NaN# 改为顺序执行确保数据可靠性# 手部关键点results_hands = hands.process(frame)if results_hands.multi_hand_landmarks:for i, hand_landmarks in enumerate(results_hands.multi_hand_landmarks[:2]):  # 最多两只手hand_type = results_hands.multi_handedness[i].classification[0].indexpoints = np.array([(lm.x, lm.y, lm.z) for lm in hand_landmarks.landmark])if hand_type == 0:  # 右手all_landmarks[:HAND_NUM] = pointselse:  # 左手all_landmarks[HAND_NUM:HAND_NUM * 2] = points# 身体关键点results_pose = pose.process(frame)if results_pose.pose_landmarks:pose_points = np.array([(lm.x, lm.y, lm.z) for lm in results_pose.pose_landmarks.landmark])all_landmarks[HAND_NUM * 2:HAND_NUM * 2 + POSE_NUM] = pose_points[filtered_pose]return all_landmarksdef get_video_landmarks(video_path, start_frame=1, end_frame=-1):"""获取视频关键点(添加调试信息)"""cap = cv2.VideoCapture(video_path)if not cap.isOpened():print(f"无法打开视频文件: {video_path}")return np.empty((0, HAND_NUM * 2 + POSE_NUM, 3))total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))if end_frame < 0 or end_frame > total_frames:end_frame = total_framesvalid_frames = []frame_index = 0while cap.isOpened():ret, frame = cap.read()if not ret or frame_index > end_frame:breakif frame_index >= start_frame:frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)landmarks = get_frame_landmarks(frame_rgb)# 检查是否检测到有效关键点if not np.all(np.isnan(landmarks)):valid_frames.append(landmarks)else:print(f"第 {frame_index} 帧未检测到关键点")frame_index += 1cap.release()if not valid_frames:print("警告:未检测到任何关键点")return np.empty((0, HAND_NUM * 2 + POSE_NUM, 3))return np.stack(valid_frames)def draw_landmarks(video_path, output_path, landmarks):"""绘制关键点到视频"""cap = cv2.VideoCapture(video_path)if not cap.isOpened():print(f"无法打开视频文件: {video_path}")returnfps = int(cap.get(cv2.CAP_PROP_FPS))width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))fourcc = cv2.VideoWriter_fourcc(*'mp4v')out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))landmark_index = 0while cap.isOpened():ret, frame = cap.read()if not ret:breakif landmark_index < len(landmarks):# 绘制关键点for i, (x, y, _) in enumerate(landmarks[landmark_index]):if not np.isnan(x) and not np.isnan(y):px, py = int(x * width), int(y * height)# 右手绿色,左手红色,身体蓝色color = (0, 255, 0) if i < HAND_NUM else \(0, 0, 255) if i < HAND_NUM * 2 else \(255, 0, 0)cv2.circle(frame, (px, py), 4, color, -1)landmark_index += 1out.write(frame)cap.release()out.release()# 处理所有视频
video_root = "./doc/补充版/正式数据集/"
output_root = "./doc/save/"if not os.path.exists(output_root):os.makedirs(output_root)for video_name in os.listdir(video_root):if not video_name.endswith(('.mp4', '.avi', '.mov')):continuevideo_path = os.path.join(video_root, video_name)print(f"\n处理视频: {video_name}")# 获取关键点landmarks = get_video_landmarks(video_path)print(f"获取到 {len(landmarks)} 帧关键点")# 保存npy文件base_name = os.path.splitext(video_name)[0]np.save(os.path.join(output_root,"npy", f"{base_name}.npy"), landmarks)# 生成带关键点的视频output_video = os.path.join(output_root, "MP4",f"{base_name}_landmarks.mp4")draw_landmarks(video_path, output_video, landmarks)
print("全部处理完成!")

使用比较简单,修改video_root为视频目录路径,output_root为结果输出目录路径就可以正常使用了!

前置处理

# 关键点过滤设置
filtered_hand = list(range(21))
filtered_pose = [11, 12, 13, 14, 15, 16]  # 只保留躯干和手臂关键点
HAND_NUM = len(filtered_hand)
POSE_NUM = len(filtered_pose)
)

这里需要选取你需要的关键点,手部正常来说每个手21个,姿态和脸部的关键点也可以自己选择保留什么,网上可以查到每个点对应数字。

# 初始化MediaPipe模型(增加检测参数)
mp_hands = mp.solutions.hands
mp_pose = mp.solutions.posehands = mp_hands.Hands(static_image_mode=False,max_num_hands=2,min_detection_confidence=0.1,#太高的话,没识别到就不识别,比较低能识别的比较全(没有干扰的情况下低比较好)min_tracking_confidence=0.1#太高,没追踪到也会放弃,比较低的连续性会比较好
)pose = mp_pose.Pose(static_image_mode=False,model_complexity=1,min_detection_confidence=0.7,min_tracking_confidence=0.5

参数调整,对于手部和姿态都可以进行单独的参数调整,static_image_mode是是否是图片,False代表不是,我这里是视频,如果是视频的话,后面就还有一个min_tracking_confidence追踪阈值,而图片不具有时间连续性,所以用不到这个参数。max_num_hands是最大会识别到有几个手,后面两个参数我也写了怎么调。姿态参数基本同理,有一些区别可以自己查一下。

函数讲解

def get_frame_landmarks(frame):"""获取单帧关键点(修复线程安全问题)"""all_landmarks = np.full((HAND_NUM * 2 + POSE_NUM, 3), np.nan)  # 初始化为NaN# 改为顺序执行确保数据可靠性# 手部关键点results_hands = hands.process(frame)if results_hands.multi_hand_landmarks:for i, hand_landmarks in enumerate(results_hands.multi_hand_landmarks[:2]):  # 最多两只手hand_type = results_hands.multi_handedness[i].classification[0].indexpoints = np.array([(lm.x, lm.y, lm.z) for lm in hand_landmarks.landmark])if hand_type == 0:  # 右手all_landmarks[:HAND_NUM] = pointselse:  # 左手all_landmarks[HAND_NUM:HAND_NUM * 2] = points# 身体关键点results_pose = pose.process(frame)if results_pose.pose_landmarks:pose_points = np.array([(lm.x, lm.y, lm.z) for lm in results_pose.pose_landmarks.landmark])all_landmarks[HAND_NUM * 2:HAND_NUM * 2 + POSE_NUM] = pose_points[filtered_pose]return all_landmarks

对于单帧进行处理,先对所有关键点留np的位置,全部填充NaN,再分别对手部关键点和肢体关键点进行识别,将识别的点填入原先的数组里面,得到最后要返回的关键点数组。

def get_video_landmarks(video_path, start_frame=1, end_frame=-1):"""获取视频关键点(添加调试信息)"""cap = cv2.VideoCapture(video_path)if not cap.isOpened():print(f"无法打开视频文件: {video_path}")return np.empty((0, HAND_NUM * 2 + POSE_NUM, 3))total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))if end_frame < 0 or end_frame > total_frames:end_frame = total_framesvalid_frames = []frame_index = 0while cap.isOpened():ret, frame = cap.read()if not ret or frame_index > end_frame:breakif frame_index >= start_frame:frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)landmarks = get_frame_landmarks(frame_rgb)# 检查是否检测到有效关键点if not np.all(np.isnan(landmarks)):valid_frames.append(landmarks)else:print(f"第 {frame_index} 帧未检测到关键点")frame_index += 1cap.release()if not valid_frames:print("警告:未检测到任何关键点")return np.empty((0, HAND_NUM * 2 + POSE_NUM, 3))return np.stack(valid_frames)

处理视频帧的关键点识别,读取视频的每一帧,分别做通道BGR转RGB和调用单帧处理函数对其进行处理,将每一帧的结果堆叠起来返回。

def draw_landmarks(video_path, output_path, landmarks):"""绘制关键点到视频"""cap = cv2.VideoCapture(video_path)if not cap.isOpened():print(f"无法打开视频文件: {video_path}")returnfps = int(cap.get(cv2.CAP_PROP_FPS))width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))fourcc = cv2.VideoWriter_fourcc(*'mp4v')out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))landmark_index = 0while cap.isOpened():ret, frame = cap.read()if not ret:breakif landmark_index < len(landmarks):# 绘制关键点for i, (x, y, _) in enumerate(landmarks[landmark_index]):if not np.isnan(x) and not np.isnan(y):px, py = int(x * width), int(y * height)# 右手绿色,左手红色,身体蓝色color = (0, 255, 0) if i < HAND_NUM else \(0, 0, 255) if i < HAND_NUM * 2 else \(255, 0, 0)cv2.circle(frame, (px, py), 4, color, -1)landmark_index += 1out.write(frame)cap.release()out.release()

绘制结果关键点函数,将视频路径和输出路径以及识别的关键点数组传入,读取视频,对每一帧的图片每一个关键点进行绘制,画圈圈,然后将帧写入保存。

进阶版log代码

该版本在原有基础上将简单点连接,新加上了线连接,效果如下:
在这里插入图片描述
同时添加了log,对于结果的视频流进行分析处理,当当前帧缺失了一只手的点,那么就认为该帧出现掉帧,统计掉帧的帧数和将掉帧的前2帧外加后3帧保存为图片记录下来。

import os
import cv2
import numpy as np
import mediapipe as mp
from concurrent.futures import ThreadPoolExecutor# 关键点过滤设置
filtered_hand = list(range(21))
filtered_pose = [11, 12, 13, 14, 15, 16]  # 只保留躯干和手臂关键点
HAND_NUM = len(filtered_hand)
POSE_NUM = len(filtered_pose)# 初始化MediaPipe模型(增加检测参数)
mp_hands = mp.solutions.hands
mp_pose = mp.solutions.posehands = mp_hands.Hands(static_image_mode=False,max_num_hands=2,min_detection_confidence=0.1,#太高的话,没识别到就不识别,比较低能识别的比较全(没有干扰的情况下低比较好)min_tracking_confidence=0.1#太高,没追踪到也会放弃,比较低的连续性会比较好
)pose = mp_pose.Pose(static_image_mode=False,model_complexity=1,min_detection_confidence=0.7,min_tracking_confidence=0.5
)def get_frame_landmarks(frame):"""获取单帧关键点(修复线程安全问题)"""all_landmarks = np.full((HAND_NUM * 2 + POSE_NUM, 3), np.nan)  # 初始化为NaN# 改为顺序执行确保数据可靠性# 手部关键点results_hands = hands.process(frame)if results_hands.multi_hand_landmarks:for i, hand_landmarks in enumerate(results_hands.multi_hand_landmarks[:2]):  # 最多两只手hand_type = results_hands.multi_handedness[i].classification[0].indexpoints = np.array([(lm.x, lm.y, lm.z) for lm in hand_landmarks.landmark])if hand_type == 0:  # 右手all_landmarks[:HAND_NUM] = pointselse:  # 左手all_landmarks[HAND_NUM:HAND_NUM * 2] = points# 身体关键点results_pose = pose.process(frame)if results_pose.pose_landmarks:pose_points = np.array([(lm.x, lm.y, lm.z) for lm in results_pose.pose_landmarks.landmark])all_landmarks[HAND_NUM * 2:HAND_NUM * 2 + POSE_NUM] = pose_points[filtered_pose]return all_landmarksdef get_video_landmarks(video_path, start_frame=1, end_frame=-1):"""获取视频关键点(严格版帧对齐+掉帧统计)"""output_dir = "./doc/save_log/log"os.makedirs(output_dir, exist_ok=True)  # 确保输出目录存在video_name=video_path.split("/")[4].split(".")[0]output_root=os.path.join(output_dir,video_name)os.makedirs(output_root, exist_ok=True)log_file_path = os.path.join(output_root, f"{video_name}.txt")with open(log_file_path, 'w') as log_file:cap = cv2.VideoCapture(video_path)if not cap.isOpened():print(f"无法打开视频文件: {video_path}", file=log_file)return np.empty((0, HAND_NUM * 2 + POSE_NUM, 3))total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))if end_frame < 0 or end_frame > total_frames:end_frame = total_frames# 预分配全NaN数组确保严格帧对齐results = np.full((end_frame - start_frame + 1, HAND_NUM * 2 + POSE_NUM, 3), np.nan)missing_frames = []frame_index = 0results_index = 0  # 结果数组的索引frame_buffer = []  # 用于保存帧图像width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))while cap.isOpened():ret, frame = cap.read()if not ret or frame_index > end_frame:breakif start_frame <= frame_index <= end_frame:frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)landmarks = get_frame_landmarks(frame_rgb)# 保存帧图像到缓冲区only_draw_landmarks(frame, landmarks, width, height)frame_buffer.append((frame_index, frame.copy()))# 检查关键点数量是否正确if landmarks.shape[0] == HAND_NUM * 2 + POSE_NUM:valid_points = np.sum(~np.isnan(landmarks[:, :2]))results[results_index] = landmarksif valid_points != 2 * (HAND_NUM * 2 + POSE_NUM):# 保存前后5帧save_range = range(max(frame_index - 2, start_frame), min(frame_index + 3, end_frame) + 1)for save_idx in save_range:save_path = os.path.join(output_root, f"frame_{save_idx:04d}_near_nan.png")# 从缓冲区查找帧for buf_idx, buf_frame in frame_buffer:if buf_idx == save_idx:cv2.imwrite(save_path, buf_frame)missing_frames.append(frame_index)print(f"掉帧警告 - 第 {frame_index} 帧: 有效点不足 ({valid_points}/{2 * landmarks.shape[0]})",file=log_file)else:missing_frames.append(frame_index)print(f"掉帧警告 - 第 {frame_index} 帧: 关键点数量异常 ({landmarks.shape[0]} != {HAND_NUM * 2 + POSE_NUM})",file=log_file)results_index += 1frame_index += 1cap.release()# 统计报告total_processed = end_frame - start_frame + 1print("\n关键点检测统计报告:", file=log_file)print(f"处理帧范围: {start_frame}-{end_frame} (共 {total_processed} 帧)", file=log_file)print(f"成功帧数: {total_processed - len(missing_frames)}", file=log_file)print(f"掉帧数: {len(missing_frames)}", file=log_file)if missing_frames:print("掉帧位置: " + ", ".join(map(str, missing_frames)), file=log_file)print(f"掉帧率: {len(missing_frames) / total_processed:.1%}", file=log_file)return resultsdef only_draw_landmarks(frame, landmarks, width, height):"""绘制关键点和连线到帧"""# 定义连接线HAND_CONNECTIONS = [  # 21个手部关键点连线 (MediaPipe手部模型)(0, 1), (1, 2), (2, 3), (3, 4),  # 拇指(0, 5), (5, 6), (6, 7), (7, 8),  # 食指(0, 9), (9, 10), (10, 11), (11, 12),  # 中指(0, 13), (13, 14), (14, 15), (15, 16),  # 无名指(0, 17), (17, 18), (18, 19), (19, 20)  # 小指]# 躯干和手臂连线 (11-16对应: 肩膀、手肘、手腕)POSE_CONNECTIONS = [(11, 12),  # 左右肩连线(11, 13), (13, 15),  # 左臂(12, 14), (14, 16)  # 右臂]# 绘制关键点for i, (x, y, _) in enumerate(landmarks):if not np.isnan(x) and not np.isnan(y):px, py = int(x * width), int(y * height)# 右手绿色(0-20),左手红色(21-41),身体蓝色(42+)color = (0, 255, 0) if i < HAND_NUM else \(0, 0, 255) if i < HAND_NUM * 2 else \(255, 0, 0)cv2.circle(frame, (px, py), 4, color, -1)# 绘制连线 - 右手 (前21个点)for connection in HAND_CONNECTIONS:start_idx, end_idx = connectionif start_idx < len(landmarks) and end_idx < len(landmarks):x1, y1, _ = landmarks[start_idx]x2, y2, _ = landmarks[end_idx]if not np.isnan(x1) and not np.isnan(y1) and not np.isnan(x2) and not np.isnan(y2):pt1 = (int(x1 * width), int(y1 * height))pt2 = (int(x2 * width), int(y2 * height))cv2.line(frame, pt1, pt2, (0, 255, 0), 2)# 绘制连线 - 左手 (21-41)for connection in HAND_CONNECTIONS:start_idx, end_idx = connectionstart_idx += HAND_NUMend_idx += HAND_NUMif start_idx < len(landmarks) and end_idx < len(landmarks):x1, y1, _ = landmarks[start_idx]x2, y2, _ = landmarks[end_idx]if not np.isnan(x1) and not np.isnan(y1) and not np.isnan(x2) and not np.isnan(y2):pt1 = (int(x1 * width), int(y1 * height))pt2 = (int(x2 * width), int(y2 * height))cv2.line(frame, pt1, pt2, (0, 0, 255), 2)# 绘制连线 - 身体 (只绘制filtered_pose中的点)for connection in POSE_CONNECTIONS:start_idx, end_idx = connection# 转换为实际索引 (假设身体关键点从2*HAND_NUM开始)start_idx = 2 * HAND_NUM + filtered_pose.index(start_idx) if start_idx in filtered_pose else -1end_idx = 2 * HAND_NUM + filtered_pose.index(end_idx) if end_idx in filtered_pose else -1if start_idx != -1 and end_idx != -1 and start_idx < len(landmarks) and end_idx < len(landmarks):x1, y1, _ = landmarks[start_idx]x2, y2, _ = landmarks[end_idx]if not np.isnan(x1) and not np.isnan(y1) and not np.isnan(x2) and not np.isnan(y2):pt1 = (int(x1 * width), int(y1 * height))pt2 = (int(x2 * width), int(y2 * height))cv2.line(frame, pt1, pt2, (255, 0, 0), 2)
def draw_landmarks(video_path, output_path, landmarks):"""绘制关键点和连线到视频"""# 定义连接线HAND_CONNECTIONS = [  # 21个手部关键点连线 (MediaPipe手部模型)(0, 1), (1, 2), (2, 3), (3, 4),  # 拇指(0, 5), (5, 6), (6, 7), (7, 8),  # 食指(0, 9), (9, 10), (10, 11), (11, 12),  # 中指(0, 13), (13, 14), (14, 15), (15, 16),  # 无名指(0, 17), (17, 18), (18, 19), (19, 20)  # 小指]# 躯干和手臂连线 (11-16对应: 肩膀、手肘、手腕)POSE_CONNECTIONS = [(11, 12),  # 左右肩连线(11, 13), (13, 15),  # 左臂(12, 14), (14, 16)  # 右臂]cap = cv2.VideoCapture(video_path)if not cap.isOpened():print(f"无法打开视频文件: {video_path}")returnfps = int(cap.get(cv2.CAP_PROP_FPS))width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))fourcc = cv2.VideoWriter_fourcc(*'mp4v')out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))landmark_index = 0while cap.isOpened():ret, frame = cap.read()if not ret:breakif landmark_index < len(landmarks):current_landmarks = landmarks[landmark_index]# 绘制关键点for i, (x, y, _) in enumerate(current_landmarks):if not np.isnan(x) and not np.isnan(y):px, py = int(x * width), int(y * height)# 右手绿色(0-20),左手红色(21-41),身体蓝色(42+)color = (0, 255, 0) if i < HAND_NUM else \(0, 0, 255) if i < HAND_NUM * 2 else \(255, 0, 0)cv2.circle(frame, (px, py), 4, color, -1)# 绘制连线 - 右手 (前21个点)for connection in HAND_CONNECTIONS:start_idx, end_idx = connectionif start_idx < len(current_landmarks) and end_idx < len(current_landmarks):x1, y1, _ = current_landmarks[start_idx]x2, y2, _ = current_landmarks[end_idx]if not np.isnan(x1) and not np.isnan(y1) and not np.isnan(x2) and not np.isnan(y2):pt1 = (int(x1 * width), int(y1 * height))pt2 = (int(x2 * width), int(y2 * height))cv2.line(frame, pt1, pt2, (0, 255, 0), 2)# 绘制连线 - 左手 (21-41)for connection in HAND_CONNECTIONS:start_idx, end_idx = connectionstart_idx += HAND_NUMend_idx += HAND_NUMif start_idx < len(current_landmarks) and end_idx < len(current_landmarks):x1, y1, _ = current_landmarks[start_idx]x2, y2, _ = current_landmarks[end_idx]if not np.isnan(x1) and not np.isnan(y1) and not np.isnan(x2) and not np.isnan(y2):pt1 = (int(x1 * width), int(y1 * height))pt2 = (int(x2 * width), int(y2 * height))cv2.line(frame, pt1, pt2, (0, 0, 255), 2)# 绘制连线 - 身体 (只绘制filtered_pose中的点)for connection in POSE_CONNECTIONS:start_idx, end_idx = connection# 转换为实际索引 (假设身体关键点从2*HAND_NUM开始)start_idx = 2 * HAND_NUM + filtered_pose.index(start_idx) if start_idx in filtered_pose else -1end_idx = 2 * HAND_NUM + filtered_pose.index(end_idx) if end_idx in filtered_pose else -1if start_idx != -1 and end_idx != -1 and start_idx < len(current_landmarks) and end_idx < len(current_landmarks):x1, y1, _ = current_landmarks[start_idx]x2, y2, _ = current_landmarks[end_idx]if not np.isnan(x1) and not np.isnan(y1) and not np.isnan(x2) and not np.isnan(y2):pt1 = (int(x1 * width), int(y1 * height))pt2 = (int(x2 * width), int(y2 * height))cv2.line(frame, pt1, pt2, (255, 0, 0), 2)landmark_index += 1out.write(frame)cap.release()out.release()# 处理所有视频
video_root = "./doc/补充版/正式数据集/"
output_root = "./doc/try_log/"if not os.path.exists(output_root):os.makedirs(output_root)for video_name in os.listdir(video_root):if not video_name.endswith(('.mp4', '.avi', '.mov')):continuevideo_path = os.path.join(video_root, video_name)print(f"\n处理视频: {video_name}")# 获取关键点landmarks = get_video_landmarks(video_path)print(f"获取到 {len(landmarks)} 帧关键点")if not os.path.exists(os.path.join(output_root,"npy")):os.makedirs(os.path.join(output_root,"npy"))# 保存npy文件base_name = os.path.splitext(video_name)[0]np.save(os.path.join(output_root,"npy", f"{base_name}.npy"), landmarks)if not os.path.exists(os.path.join(output_root,"MP4")):os.makedirs(os.path.join(output_root,"MP4"))# 生成带关键点的视频output_video = os.path.join(output_root, "MP4",f"{base_name}_landmarks.mp4")draw_landmarks(video_path, output_video, landmarks)
print("全部处理完成!")

函数讲解

def get_video_landmarks(video_path, start_frame=1, end_frame=-1):"""获取视频关键点(严格版帧对齐+掉帧统计)"""output_dir = "./doc/save_log/log"os.makedirs(output_dir, exist_ok=True)  # 确保输出目录存在video_name=video_path.split("/")[4].split(".")[0]output_root=os.path.join(output_dir,video_name)os.makedirs(output_root, exist_ok=True)log_file_path = os.path.join(output_root, f"{video_name}.txt")with open(log_file_path, 'w') as log_file:cap = cv2.VideoCapture(video_path)if not cap.isOpened():print(f"无法打开视频文件: {video_path}", file=log_file)return np.empty((0, HAND_NUM * 2 + POSE_NUM, 3))total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))if end_frame < 0 or end_frame > total_frames:end_frame = total_frames# 预分配全NaN数组确保严格帧对齐results = np.full((end_frame - start_frame + 1, HAND_NUM * 2 + POSE_NUM, 3), np.nan)missing_frames = []frame_index = 0results_index = 0  # 结果数组的索引frame_buffer = []  # 用于保存帧图像width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))while cap.isOpened():ret, frame = cap.read()if not ret or frame_index > end_frame:breakif start_frame <= frame_index <= end_frame:frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)landmarks = get_frame_landmarks(frame_rgb)# 保存帧图像到缓冲区only_draw_landmarks(frame, landmarks, width, height)frame_buffer.append((frame_index, frame.copy()))# 检查关键点数量是否正确if landmarks.shape[0] == HAND_NUM * 2 + POSE_NUM:valid_points = np.sum(~np.isnan(landmarks[:, :2]))results[results_index] = landmarksif valid_points != 2 * (HAND_NUM * 2 + POSE_NUM):# 保存前后5帧save_range = range(max(frame_index - 2, start_frame), min(frame_index + 3, end_frame) + 1)for save_idx in save_range:save_path = os.path.join(output_root, f"frame_{save_idx:04d}_near_nan.png")# 从缓冲区查找帧for buf_idx, buf_frame in frame_buffer:if buf_idx == save_idx:cv2.imwrite(save_path, buf_frame)missing_frames.append(frame_index)print(f"掉帧警告 - 第 {frame_index} 帧: 有效点不足 ({valid_points}/{2 * landmarks.shape[0]})",file=log_file)else:missing_frames.append(frame_index)print(f"掉帧警告 - 第 {frame_index} 帧: 关键点数量异常 ({landmarks.shape[0]} != {HAND_NUM * 2 + POSE_NUM})",file=log_file)results_index += 1frame_index += 1cap.release()# 统计报告total_processed = end_frame - start_frame + 1print("\n关键点检测统计报告:", file=log_file)print(f"处理帧范围: {start_frame}-{end_frame} (共 {total_processed} 帧)", file=log_file)print(f"成功帧数: {total_processed - len(missing_frames)}", file=log_file)print(f"掉帧数: {len(missing_frames)}", file=log_file)if missing_frames:print("掉帧位置: " + ", ".join(map(str, missing_frames)), file=log_file)print(f"掉帧率: {len(missing_frames) / total_processed:.1%}", file=log_file)return results

稍稍讲一下这个修改比较大的部分吧,这部分添加了frame_buffer保存缓存帧,用于后续我提取我需要的记录帧,在保存之前添加了only_draw_landmarks函数,对于图片只进行关键点标注而不保存的功能,使得保存的图片能清楚看到问题出现在哪里。
if valid_points != 2 * (HAND_NUM * 2 + POSE_NUM):最关键的判断,有校点的判断,如果有nan的关键点就不是有校点,乘2是因为一个点要保留xy两个数值。当有效点不足时,进行log记录并且保存图片,最后还需要统计报告。

代码进阶版(卡尔曼滤波版)

class Kalman1D:def __init__(self):self.x = 0self.P = 1self.F = 1self.H = 1self.R = 0.01self.Q = 0.001self.initiated = Falsedef update(self, measurement):if not self.initiated:self.x = measurementself.initiated = True# Predictself.x = self.F * self.xself.P = self.F * self.P * self.F + self.Q# UpdateK = self.P * self.H / (self.H * self.P * self.H + self.R)self.x += K * (measurement - self.H * self.x)self.P = (1 - K * self.H) * self.Preturn self.xdef init_kalman_filters(num_points):return [[Kalman1D() for _ in range(3)] for _ in range(num_points)]def get_video_landmarks(video_path, start_frame=1, end_frame=-1):output_dir = "./doc/save_log/log"os.makedirs(output_dir, exist_ok=True)video_name = video_path.split("/")[-1].split(".")[0]output_root = os.path.join(output_dir, video_name)os.makedirs(output_root, exist_ok=True)log_file_path = os.path.join(output_root, f"{video_name}.txt")filters = init_kalman_filters(HAND_NUM * 2 + POSE_NUM)with open(log_file_path, 'w') as log_file:cap = cv2.VideoCapture(video_path)if not cap.isOpened():print(f"无法打开视频文件: {video_path}", file=log_file)return np.empty((0, HAND_NUM * 2 + POSE_NUM, 3))total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))if end_frame < 0 or end_frame > total_frames:end_frame = total_framesresults = np.full((end_frame - start_frame + 1, HAND_NUM * 2 + POSE_NUM, 3), np.nan)missing_frames = []frame_index = 0results_index = 0frame_buffer = []width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))while cap.isOpened():ret, frame = cap.read()if not ret or frame_index > end_frame:breakif start_frame <= frame_index <= end_frame:frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)landmarks = get_frame_landmarks(frame_rgb)# 应用卡尔曼滤波for i, (x, y, z) in enumerate(landmarks):for j, val in enumerate([x, y, z]):if not np.isnan(val):landmarks[i][j] = filters[i][j].update(val)only_draw_landmarks(frame, landmarks, width, height)frame_buffer.append((frame_index, frame.copy()))if landmarks.shape[0] == HAND_NUM * 2 + POSE_NUM:valid_points = np.sum(~np.isnan(landmarks[:, :2]))results[results_index] = landmarksif valid_points != 2 * (HAND_NUM * 2 + POSE_NUM):save_range = range(max(frame_index - 2, start_frame), min(frame_index + 3, end_frame) + 1)for save_idx in save_range:save_path = os.path.join(output_root, f"frame_{save_idx:04d}_near_nan.png")for buf_idx, buf_frame in frame_buffer:if buf_idx == save_idx:cv2.imwrite(save_path, buf_frame)missing_frames.append(frame_index)print(f"掉帧警告 - 第 {frame_index} 帧: 有效点不足 ({valid_points}/{2 * landmarks.shape[0]})", file=log_file)else:missing_frames.append(frame_index)print(f"掉帧警告 - 第 {frame_index} 帧: 关键点数量异常 ({landmarks.shape[0]} != {HAND_NUM * 2 + POSE_NUM})", file=log_file)results_index += 1frame_index += 1cap.release()total_processed = end_frame - start_frame + 1print("\n关键点检测统计报告:", file=log_file)print(f"处理帧范围: {start_frame}-{end_frame} (共 {total_processed} 帧)", file=log_file)print(f"成功帧数: {total_processed - len(missing_frames)}", file=log_file)print(f"掉帧数: {len(missing_frames)}", file=log_file)if missing_frames:print("掉帧位置: " + ", ".join(map(str, missing_frames)), file=log_file)print(f"掉帧率: {len(missing_frames) / total_processed:.1%}", file=log_file)return results

其他部分同上就不赘诉和再次写了,当R=0.01时,会发现整体识别会跟不上视频,而R=0.00001时,又会发现几乎同没有卡尔曼差不多,在我的数据集上是这样的,其他数据集说不定有效果。

代码进阶版(速度卡尔曼滤波)

class Kalman1D_Velocity:def __init__(self):self.x = np.array([[0.], [0.]])  # 初始状态:[位置, 速度]self.P = np.eye(2)               # 状态协方差self.F = np.array([[1., 1.],[0., 1.]])    # 状态转移self.H = np.array([[1., 0.]])    # 观测矩阵self.R = np.array([[0.01]])      # 观测噪声self.Q = np.array([[0.001, 0.],[0., 0.001]])  # 过程噪声self.initiated = Falsedef predict(self):self.x = np.dot(self.F, self.x)self.P = np.dot(self.F, np.dot(self.P, self.F.T)) + self.Qreturn self.x[0, 0]def update(self, measurement):if not self.initiated:self.x[0, 0] = measurementself.x[1, 0] = 0.0self.initiated = Truereturn measurement# Predictself.predict()# UpdateS = np.dot(self.H, np.dot(self.P, self.H.T)) + self.RK = np.dot(np.dot(self.P, self.H.T), np.linalg.inv(S))z = np.array([[measurement]])y = z - np.dot(self.H, self.x)self.x = self.x + np.dot(K, y)self.P = self.P - np.dot(K, np.dot(self.H, self.P))return self.x[0, 0]def update_or_predict(self, measurement):if np.isnan(measurement):return self.predict()else:return self.update(measurement)
def init_kalman_filters(num_points):return [[Kalman1D_Velocity() for _ in range(3)] for _ in range(num_points)]def get_video_landmarks(video_path, start_frame=1, end_frame=-1):output_dir = "./doc/save_log/log"os.makedirs(output_dir, exist_ok=True)video_name = video_path.split("/")[-1].split(".")[0]output_root = os.path.join(output_dir, video_name)os.makedirs(output_root, exist_ok=True)log_file_path = os.path.join(output_root, f"{video_name}.txt")filters = init_kalman_filters(HAND_NUM * 2 + POSE_NUM)with open(log_file_path, 'w') as log_file:cap = cv2.VideoCapture(video_path)if not cap.isOpened():print(f"无法打开视频文件: {video_path}", file=log_file)return np.empty((0, HAND_NUM * 2 + POSE_NUM, 3))total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))if end_frame < 0 or end_frame > total_frames:end_frame = total_framesresults = np.full((end_frame - start_frame + 1, HAND_NUM * 2 + POSE_NUM, 3), np.nan)missing_frames = []frame_index = 0results_index = 0frame_buffer = []width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))while cap.isOpened():ret, frame = cap.read()if not ret or frame_index > end_frame:breakif start_frame <= frame_index <= end_frame:frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)landmarks = get_frame_landmarks(frame_rgb)# 应用卡尔曼滤波for i, (x, y, z) in enumerate(landmarks):for j, val in enumerate([x, y, z]):landmarks[i][j] = filters[i][j].update_or_predict(val)only_draw_landmarks(frame, landmarks, width, height)frame_buffer.append((frame_index, frame.copy()))if landmarks.shape[0] == HAND_NUM * 2 + POSE_NUM:valid_points = np.sum(~np.isnan(landmarks[:, :2]))results[results_index] = landmarksif valid_points != 2 * (HAND_NUM * 2 + POSE_NUM):save_range = range(max(frame_index - 2, start_frame), min(frame_index + 3, end_frame) + 1)for save_idx in save_range:save_path = os.path.join(output_root, f"frame_{save_idx:04d}_near_nan.png")for buf_idx, buf_frame in frame_buffer:if buf_idx == save_idx:cv2.imwrite(save_path, buf_frame)missing_frames.append(frame_index)print(f"掉帧警告 - 第 {frame_index} 帧: 有效点不足 ({valid_points}/{2 * landmarks.shape[0]})", file=log_file)else:missing_frames.append(frame_index)print(f"掉帧警告 - 第 {frame_index} 帧: 关键点数量异常 ({landmarks.shape[0]} != {HAND_NUM * 2 + POSE_NUM})", file=log_file)results_index += 1frame_index += 1cap.release()total_processed = end_frame - start_frame + 1print("\n关键点检测统计报告:", file=log_file)print(f"处理帧范围: {start_frame}-{end_frame} (共 {total_processed} 帧)", file=log_file)print(f"成功帧数: {total_processed - len(missing_frames)}", file=log_file)print(f"掉帧数: {len(missing_frames)}", file=log_file)if missing_frames:print("掉帧位置: " + ", ".join(map(str, missing_frames)), file=log_file)print(f"掉帧率: {len(missing_frames) / total_processed:.1%}", file=log_file)return results

这个版本不会出现慢太多的情况,但是会出现有点飘的感觉,有时候会比没有的效果好一点。

总结

基础版:整个路线还是比较清晰的,由于我使用的数据视频背景比较简单,不太会出现误识别,所以我的参数调的很低,但是不知道为什么还是会出现掉帧的情况,需要后续研究一下。
进阶版:用了log记录之后才发现,掉帧和误识别还是有点严重的,一帧一帧会发现很多,采用了卡尔曼和速度卡尔曼似乎都不能很好的处理。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.pswp.cn/web/81575.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Redis数据迁移方案及持久化机制详解

#作者&#xff1a;任少近 文章目录 前言Redis的持久化机制RDBAOF Redis save和bgsave的区别redis数据迁移redis单机-单机数据迁移redis 主从-主从数据迁移redis 单机-cluster数据迁移redis cluster –redis cluster数据迁移 前言 Redis数据迁移是常见需求&#xff0c;主要包括…

图论回溯

图论 200.岛屿数量DFS 给你一个由 ‘1’&#xff08;陆地&#xff09;和 ‘0’&#xff08;水&#xff09;组成的的二维网格&#xff0c;请你计算网格中岛屿的数量。岛屿总是被水包围&#xff0c;并且每座岛屿只能由水平方向和/或竖直方向上相邻的陆地连接形成。此外&#xff…

真实网络项目中交换机常用的配置与解析

一、配置三层链路聚合增加链路带宽 1.组网需求 某企业有多个部门分布在不同的地区&#xff0c;由于业务发展的需要&#xff0c;不同区域的部门与部门之间有进行带有VLAN Tag的报文的传输需求。采用透明网桥的远程桥接和QinQ功能&#xff0c;可以实现企业在不同区域部门之间进…

【Redis】过期键删除策略,LRU和LFU在redis中的实现,缓存与数据库双写一致性问题,go案例

一、Redis 中的过期键删除策略有哪些&#xff1f; 采用了 惰性删除 和 定期删除 两种策略处理过期键&#xff1a; 1. 惰性删除&#xff08;Lazy Deletion&#xff09; 机制&#xff1a;只有在访问 key 时才检查是否过期&#xff0c;如果已过期则立刻删除。优点&#xff1a;对…

为什么单张表索引数量建议控制在 6 个以内

单张表索引数量建议控制在6个以内的主要原因包括以下几点‌&#xff1a; ‌性能影响‌&#xff1a;索引会占用额外的磁盘空间。如果索引数量过多&#xff0c;会占用大量的磁盘空间&#xff0c;尤其是在数据量较大的情况下&#xff0c;索引占用的空间可能会超过数据本身。此外&…

深度学习实战109-智能医疗随访与健康管理系统:基于Qwen3(32B)、LangChain框架、MCP协议和RAG技术研发

大家好,我是微学AI,今天给大家介绍一下深度学习实战109-智能医疗随访与健康管理系统:基于Qwen3(32B)、LangChain框架、MCP协议和RAG技术研发。在当今医疗信息化快速发展的背景下,医疗随访与健康管理面临着数据分散、信息整合困难、个性化方案生成效率低等挑战。传统的医疗随…

聊一聊 .NET Dump 中的 Linux信号机制

一&#xff1a;背景 1. 讲故事 当 .NET程序 在Linux上崩溃时&#xff0c;我们可以配置一些参考拿到对应程序的core文件&#xff0c;拿到core文件后用windbg打开&#xff0c;往往会看到这样的一句信息 Signal SIGABRT code SI_USER (Sent by kill, sigsend, raise)&#xff0c…

如何在uniapp H5中实现路由守卫

目录 Vue3 app.config.globalProperties 1. 创建 Vue 应用实例 2. 添加全局属性或方法 3. 在组件中使用全局属性或方法 beforeEach在uniapp的注册 1、在H5中这两个对象是都存在的。「router:route」但是功能并不全面,具体可参考下图。 2、刚刚测试了一下,在微信小程序…

无人机降落伞设计要点难点及原理!

一、设计要点 1. 伞体结构与折叠方式 伞体需采用轻量化且高强度的材料&#xff08;如抗撕裂尼龙或芳纶纤维&#xff09;&#xff0c;并通过多重折叠设计&#xff08;如三重折叠缝合&#xff09;减少展开时的阻力&#xff0c;同时增强局部承力区域的强度。 伞衣的几何参数&am…

AI时代新词-AI增强现实(AI - Enhanced Reality)

一、什么是AI增强现实&#xff08;AI - Enhanced Reality&#xff09;&#xff1f; AI增强现实&#xff08;AI - Enhanced Reality&#xff09;是指将人工智能&#xff08;AI&#xff09;技术与增强现实&#xff08;Augmented Reality&#xff0c;简称AR&#xff09;技术相结合…

基于Matlab实现各种光谱数据预处理

在IT领域&#xff0c;尤其是在数据分析和科学研究中&#xff0c;光谱数据的预处理是至关重要的步骤。光谱数据通常包含了丰富的信息&#xff0c;但往往受到噪声、杂散光、背景信号等因素的影响&#xff0c;需要通过预处理来提取有效信号&#xff0c;提高分析的准确性和可靠性。…

用 commitizen-go 来实现标准化你的Git提交信息 【windows 版】

前言 团队中有部分人的 commit 信息比较随意&#xff0c;因此想用工具来进行约束&#xff0c; web 项目可以使用 commitizen 来实现&#xff0c; 但是 golang 又该用什么来约束呢&#xff0c; 在 Github 上找到 commitizen-go 可以做为 commitizen 平替&#xff0c;但该说明文…

为什么共现矩阵是高维稀疏的

为什么共现矩阵是高维稀疏的&#xff1f; 共现矩阵&#xff08;Co-occurrence Matrix&#xff09;的高维稀疏性是其固有特性&#xff0c;主要由以下原因导致&#xff1a; 1. 高维性的根本原因 词汇表大小决定维度&#xff1a; 共现矩阵的维度为 ( V \times V )&#xff0c;其…

OpenLayers 加载鼠标位置控件

注&#xff1a;当前使用的是 ol 5.3.0 版本&#xff0c;天地图使用的key请到天地图官网申请&#xff0c;并替换为自己的key 地图控件是一些用来与地图进行简单交互的工具&#xff0c;地图库预先封装好&#xff0c;可以供开发者直接使用。OpenLayers具有大部分常用的控件&#x…

知识宇宙-学习篇:学编程为什么从C语言开始学起?

名人说&#xff1a;博观而约取&#xff0c;厚积而薄发。——苏轼《稼说送张琥》 创作者&#xff1a;Code_流苏(CSDN)&#xff08;一个喜欢古诗词和编程的Coder&#x1f60a;&#xff09; 目录 一、C语言的历史地位与影响力1. 编程语言的"鼻祖"2. 现代技术的基础 二、…

手机IP地址更换的影响与操作指南

在移动互联网时代&#xff0c;IP地址如同手机的“网络身份证”&#xff0c;其变更可能对上网体验、隐私安全及服务访问产生连锁反应。无论是为了绕过地域限制、保护隐私&#xff0c;还是解决网络冲突&#xff0c;了解IP更换的影响与正确操作方法都至关重要。本文将系统分析影响…

基于Alibaba Cloud Linux + 宝塔面板安装 LibreOffice 全攻略流程

LibreOffice 是一款功能强大的办公软件,默认使用开放文档格式 (OpenDocument Format , ODF), 并支持 *.docx, *.xlsx, *.pptx 等其他格式。 官网:https://www.libreoffice.org/ 或 https://zh-cn.libreoffice.org/ Alibaba Cloud Linux 3(Soaring Falcon) 是阿里云自主研发…

UniApp 微信小程序绑定动态样式 :style 避坑指南

在使用 UniApp 开发跨端应用时&#xff0c;绑定动态样式 :style 是非常常见的操作。然而&#xff0c;很多开发者在编译为 微信小程序 时会遇到一个奇怪的问题&#xff1a; 原本在 H5 中可以正常渲染的样式&#xff0c;在微信小程序中却不生效&#xff01; 让我们通过一个示例来…

WebSocket学习总结

WebSocket 是一种基于TCP的网络通信协议&#xff0c;允许浏览器和服务器之间进行全双工、实时、低延迟的双向数据传输。它突破了传统HTTP协议的限制&#xff08;请求-响应模式&#xff09;&#xff0c;特别适合需要实时通信的场景&#xff08;如聊天、实时数据推送、游戏等&…

【screen-recorder-tts】RPG 游戏字幕语音实时合成,让无声文字游戏变有声

screen-recorder-tts RPG 游戏字幕语音实时合成&#xff0c;让无声文字游戏变有声&#xff01; 欢迎大佬们提 PR&#xff0c;一起完善这个项目&#xff01;&#xff01;&#xff01; Real-time TTS for RPG game subtitles, turning silent text games into audio experienc…