3D Gaussian splatting 04: 代码阅读-提取相机位姿和稀疏点云

3D Gaussian splatting 01: 环境搭建
3D Gaussian splatting 02: 快速评估
3D Gaussian splatting 03: 用户数据训练和结果查看
3D Gaussian splatting 04: 代码阅读-提取相机位姿和稀疏点云
3D Gaussian splatting 05: 代码阅读-训练整体流程
3D Gaussian splatting 06: 代码阅读-训练参数
3D Gaussian splatting 07: 代码阅读-训练载入数据和保存结果
3D Gaussian splatting 08: 代码阅读-渲染

代码阅读: 预处理阶段

convert.py 用于从帧系列中提取相机参数, 相机位姿和对象特征点的稀疏点云, 从 convert.py 的代码可以看到转换阶段的处理流程:

1. 3D稀疏点云重建

## Feature extraction
feat_extracton_cmd = colmap_command + " feature_extractor "\"--database_path " + args.source_path + "/distorted/database.db \--image_path " + args.source_path + "/input \--ImageReader.single_camera 1 \--ImageReader.camera_model " + args.camera + " \--SiftExtraction.use_gpu " + str(use_gpu)
exit_code = os.system(feat_extracton_cmd)
if exit_code != 0:logging.error(f"Feature extraction failed with code {exit_code}. Exiting.")exit(exit_code)## Feature matching
feat_matching_cmd = colmap_command + " exhaustive_matcher \--database_path " + args.source_path + "/distorted/database.db \--SiftMatching.use_gpu " + str(use_gpu)
exit_code = os.system(feat_matching_cmd)
if exit_code != 0:logging.error(f"Feature matching failed with code {exit_code}. Exiting.")exit(exit_code)### Bundle adjustment
# The default Mapper tolerance is unnecessarily large,
# decreasing it speeds up bundle adjustment steps.
mapper_cmd = (colmap_command + " mapper \--database_path " + args.source_path + "/distorted/database.db \--image_path "  + args.source_path + "/input \--output_path "  + args.source_path + "/distorted/sparse \--Mapper.ba_global_function_tolerance=0.000001")
exit_code = os.system(mapper_cmd)
if exit_code != 0:logging.error(f"Mapper failed with code {exit_code}. Exiting.")exit(exit_code)

提取流程

特征点提取 feature_extractor
生成每一帧的特征点
特征匹配 exhaustive_matcher
图像间匹配的特征点对, 建立2D-2D几何约束
稀疏三维重建 mapper
生成相机位姿 + 稀疏3D点云, 通过SfM恢复3D结构和相机运动

说明

相机模型参数camera_model, 默认使用的是OPENCV
用skip_matching参数可以跳过这个步骤, 这个阶段生成的数据会保存在 distorted 目录下, 这是用 colmap 直接处理产生的结果

2. 图像去畸变

### Image undistortion
## We need to undistort our images into ideal pinhole intrinsics.
img_undist_cmd = (colmap_command + " image_undistorter \--image_path " + args.source_path + "/input \--input_path " + args.source_path + "/distorted/sparse/0 \--output_path " + args.source_path + "\--output_type COLMAP")
exit_code = os.system(img_undist_cmd)
if exit_code != 0:logging.error(f"Mapper failed with code {exit_code}. Exiting.")exit(exit_code)

将上一步生成的文件, 使用colmap image_undistorter处理后输出到 sparse 目录, 最后移动到 sparse/0 目录下.

image_undistorter 涉及的处理包含

去畸变 Undistortion: 根据相机的内参和畸变系数(通常来自 database.db 或 cameras.txt), 对原始图像进行去畸变处理, 在images目录下生成无畸变的图像
图像重投影: 将原始图像重新投影到新的虚拟相机坐标系下, 确保去畸变后的图像尽可能保持直线性和几何一致性, 新产生的稀疏重建数据输出到 sparse 目录下.

output/
├── images/               # 去畸变后的图像, 文件名与原始图像一致, 但内容已去除畸变
├── sparse/               # 稀疏重建数据(适配去畸变后的图像)
│   ├── cameras.bin       # 更新后的相机内参(去除了畸变参数, 通常为 `SIMPLE_PINHOLE` 或 `PINHOLE` 模型)
│   ├── images.bin        # 更新后的图像位姿
│   └── points3D.bin      # 3D点云数据(与原始稀疏模型一致)

3. 生成小尺寸图片

source_file = os.path.join(args.source_path, "images", file)destination_file = os.path.join(args.source_path, "images_2", file)
shutil.copy2(source_file, destination_file)
exit_code = os.system(magick_command + " mogrify -resize 50% " + destination_file)

如果指定了resize参数, 则将images中的图片序列按 1/2, 1/4, 1/8 大小压缩后放到对应的 images_2, images_4, images_8 目录下.

提取结果数据结构

在Convert阶段, 使用Colmap处理输入帧序列, 在3D场景的稀疏重建完成后, model 默认会被导出到 bin 文件中, 因为这样比较紧凑, 节省空间, 在结果目录中生成以下文件

cameras.bin: 相机内参(焦距、畸变系数等)
images.bin: 每张图像的外参(旋转矩阵、平移向量)
points3D.bin: 稀疏三维点云(坐标、颜色、关联的图像特征)

二进制文件不能直接查看, 如果要查看需要先导出为文本形式

模型导出为文本

界面方式

File -> Open Project, 选择 distorted/sparse/0/project.ini
project.ini 里面有结果数据集 database_path, image_path 对应的路径
Import Model
弹出窗口会显示 cameras.bin, frames.bin, images.bin, points3D.bin 这些文件所在的目录, 直接点Open
Export Model as Text
选择目录导出

界面方式需要先有个 project.ini 才能打开然后才能导出, 对于 image_undistorter 之后的数据, 可以直接用命令行

colmap model_converter --input_path ./sparse/0  --output_path ./export  --output_type TXT

相机参数 cameras.txt

分别将 distorted/sparse/0 和 sparse/0 下面的文件导出对比一下,

这个是 distorted 的

# Camera list with one line of data per camera:
#   CAMERA_ID, MODEL, WIDTH, HEIGHT, PARAMS[]
# Number of cameras: 1
1 OPENCV 1366 768 1293.8711353466817 1286.2774515116257 683 384 0.061212269737348043 -0.0680218 70521804764 0.0027510647480376441 -0.00096792591781855999

从参数可以看到, 只有1个相机, 类型为 OPENCV, 宽1366, 高768, 后面的对应OPENCV的参数为fx, fy, cx, cy, k1, k2, p1, p2, 参数说明在这里 sensor models opencv camera_calibration_and_3d_reconstruction

这个是 sparse 的,

# Camera list with one line of data per camera:
#   CAMERA_ID, MODEL, WIDTH, HEIGHT, PARAMS[]
# Number of cameras: 1
1 PINHOLE 1342 753 1293.8711353466817 1286.2774515116257 671 376.5

可以看到 image_undistorter 处理后, 将相机类型简化为 PINHOLE 了. 宽1342, 高753, 后面对应PINHOLE参数fx, fy, cx, cy.

稀疏三维点云 points3D.txt

每一行代表3D世界坐标系中的一个点, 每一行的数据记录的是每个特征点的编号, 在世界坐标系中的三轴坐标, 颜色, 误差, 对应的帧ID(有多个)等

# 3D point list with one line of data per point:
#   POINT3D_ID, X, Y, Z, R, G, B, ERROR, TRACK[] as (IMAGE_ID, POINT2D_IDX)
# Number of points: 95681, mean track length: 6.0265674480826918
1 -4.6429054040200315 -0.12682166809568018 5.4170983660932128 163 177 139 0.36651172613355359 91 228 97 1097  1 1084  2 1041  3 1098
2 -2.8858762120805075 -0.14464560656096839 9.1629341058577598 151 128 96  0.39002500983496419 91 963 97 1371 94 6903 93 6795 92 1260 1 1467 98 7488 90 847 89 938 86 1686
3 -2.0084821657841556 -0.081920789643777775 8.8585159154051105 138 105 91 0.41792357452137563 91 1080 97 1389 95 1376 94 1289 93 1292 92 1327 99 1553 100 1585 103 935 104 972 106 1279105 1269

帧位姿信息信息 frames.txt

每行记录对应一帧, 用于描述每一帧图像的相机姿态, 即相机在世界坐标系下的位置和旋转.

# Frame list with one line of data per frame:
#   FRAME_ID, RIG_ID, RIG_FROM_WORLD[QW, QX, QY, QZ, TX, TY, TZ], NUM_DATA_IDS, DATA_IDS[] as (SENSOR_TYPE, SENSOR_ID, DATA_ID)
# Number of frames: 106
1 1 0.99836533904705349 0.04193352296417753 0.038821862334836317 0.0010452014780679608 -0.25128396315431845 -1.2152713376026862 3.0253864464155615 1 CAMERA 1 1
2 1 0.99835087534385147 0.042428195611485663 0.038620024616921315 -0.00196764222042955 -0.13775248425073805 -1.2180580685699627 3.0103416379580277 1 CAMERA 1 2
3 1 0.99855111805603081 0.043597372641527229 0.024544817561746674 -0.019811250810262224 -0.040750492657603658 -1.2271622331366634 3.0046993738476142 1 CAMERA 1 3

其中

FRAME_ID: 帧ID,
RIG_ID: 相机系统ID, 单相机时为 1
RIG_FROM_WORLD: 此相机相对于世界坐标系的位姿(旋转 + 平移)
- [QW, QX, QY, QZ]: 四元数(Quaternion)表示从世界坐标系到相机坐标系的旋转
- [TX, TY, TZ]: 平移向量, 表示相机在世界坐标系中的位置
NUM_DATA_IDS: 关联的数据条目数量(1 表示单相机)
DATA_IDS[]
- SENSOR_TYPE: 传感器类型
- SENSOR_ID: 传感器ID
- DATA_ID: 数据ID, 与 FRAME_ID 一致

如果要将世界坐标系中的点转换到相机坐标系, 公式为如下, 其中 $R$ 是四元数对应的旋转矩阵, $T$ 是平移向量
$P_camera = R * P_world + T$

帧位姿和 2D 到 3D 的点对应关系 images.txt

从一开始的注释可以看到, 总共有106帧, 平均每帧有 5439.9 个特征点. 每一帧对应两行数据, 因为第二行数据特别长, 这里只选了第一帧, 且第二行只复制了一部分

# Image list with two lines of data per image:
#   IMAGE_ID, QW, QX, QY, QZ, TX, TY, TZ, CAMERA_ID, NAME
#   POINTS2D[] as (X, Y, POINT3D_ID)
# Number of images: 106, mean observations per image: 5439.8867924528304
1 0.99836533904705349 0.04193352296417753 0.038821862334836317 0.0010452014780679608 -0.25128396315431845 -1.2152713376026862 3.0253864464155615 1 00001.png
825.15373304728928 -5.6413415027986957 -1 164.01311518616183 -3.143878378690431 -1 234.84114206197438 -3.2774780931266605 -1 410.44616170102927 -4.4760000788289176 -1 410.44616170102927 -4.4760000788289176 -1 498.88078573599353 -4.4877569648373878 -1 569.35841783248441 -5.1971020964857075 -1 100.7993779156044 -2.2199628505093756 -1 356.56707555817059 -3.3454093456278997 -1 448.63533653913873 -3.8738229730629996 -1 448.63533653913873 -3.8738229730629996 -1 3.8371802816758418 -1.3898702174245159 -1 14.528644520511875 -1.3897672966739947 -1 14.528644520511875 -1.3897672966739947 -1 58.303840482829514 -1.6884535247444319 -1 183.3108855651688 -2.0632513592141777 -1 225.6034102754262 -2.3950756662687809 -1 225.6034102754262 -2.3950756662687809 -1 379.73286273610427 -3.4805414725628907 -1 379.73286273610427 -3.4805414725628907 -1 19.916327860588694 -1.2292769819643468 -1 19.916327860588694 -1.2292769819643468 10907 79.77255155967282 -1.4162716901826116 -1 238.963680618659 -2.0952003476307937 -1 247.31417122505815 -1.9649340021943544 -1 284.2464937295465 -2.3623328654380202 -1 306.65943761573396 -2.2745898954538575 -1 542.61914847275943 -3.5028586005029183 -1 581.95785661473622 -3.3856713839217605 -1 581.95785661473622 -3.3856713839217605 -1 815.63607208237715 -3.4239040200370709 -1 288.53829875350897 -1.9195248965336305 -1 1166.9736271841496 -1.7695890660907594 ...

第一行是图像位姿和元信息, 和 frames.txt 中的数据是一致的