The Bonn RGB-D Dynamic Dataset
Abstract: This is a dataset for RGB-D SLAM, containing highly dynamic sequences. We provide 24 dynamic sequences, where people perform different tasks, such as manipulating boxes or playing with balloons, plus 2 static sequences. For each scene we provide the ground truth pose of the sensor, recorded with an Optitrack Prime 13 motion capture system. The sequences are in the same format as the TUM RGB-D Dataset, so that the same evaluation tools can be used. Furthermore, we provide a ground truth 3D point cloud of the static environment recorded using a Leica BLK360 terrestrial laser scanner.
Related publication
Emanuele Palazzolo, Jens Behley, Philipp Lottes, Philippe Giguère, Cyrill Stachniss, “ReFusion: 3D Reconstruction in Dynamic Environments for RGB-D Cameras Exploiting Residuals”, arXiv, 2019. PDF
BibTeX:
@InProceedings{palazzolo2019iros,
author = {E. Palazzolo and J. Behley and P. Lottes and P. Gigu\`ere and C. Stachniss},
title = {{ReFusion: 3D Reconstruction in Dynamic Environments for RGB-D Cameras Exploiting Residuals}},
booktitle = iros,
year = {2019},
url = {https://www.ipb.uni-bonn.de/pdfs/palazzolo2019iros.pdf},
codeurl = {https://github.com/PRBonn/refusion},
videourl = {https://youtu.be/1P9ZfIS5-p4},
}
We provide the full ground truth point cloud of 394109339 points, as well as a subsampled section of 54676774 points, more convenient for evaluation. The point clouds are in PLY ASCII format. To convert a model from the reference frame of the RGB-D sensor to the one of the ground truth model, refer to the Evaluation section below.Download Full Point Cloud (4.0 GB).
Download Subsampled Section (676.0 MB).
Evaluation
For evaluating the reconstructed model w.r.t. the ground truth, it is first necessary to transform them to the same coordinate frame. To convert a model from the reference frame of the sensor to the one of the ground truth, one can use the following transformation:
$\mathbf{T}_\mathrm{g}=\mathbf{T}_\mathrm{ROS}^{-1}\mathbf{T}_0\mathbf{T}_\mathrm{ROS}\mathbf{T}_\mathrm{m}$,
where $\mathbf{T}_\mathrm{m}$ is the transformation between the reference frame of the RGB-D sensor and the one of the markers used by the motion capture system, $\mathbf{T}_\mathrm{ROS}$ transforms the coordinate frame of the motion capture system to the one used to write the file groundtruth.txt in the sequences, $\mathbf{T}_0$ is the first pose read from the file groundtruth.txt.
The value of $\mathbf{T}_\mathrm{m}$ obtained from our calibration is the following:
$ \mathbf{T}_\mathrm{m} =
\begin{pmatrix}
1.0157 & 0.1828 & -0.2389 & 0.0113 \\
0.0009 & -0.8431 & -0.6413 & -0.0098 \\
-0.3009 & 0.6147 & -0.8085 & 0.0111 \\
0 & 0 & 0 & 1.0000
\end{pmatrix}
$.
$\mathbf{T}_\mathrm{ROS}$ is needed due to a bug in the ROS node that interfaces the framework to the motion capture system, and its value is:
$\mathbf{T}_\mathrm{ROS}=
\begin{pmatrix}
-1 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 0 & 1
\end{pmatrix}
$.
Finally, $\mathbf{T}_0$ has to be read from the file groundtruth.txt included in the sequence that has to be evaluated.
To simplify this process, we provide a Python script that will compute $\mathbf{T}_\mathrm{g}$ given $\mathbf{T}_0$. The script requires numpy, numpy-quaternion and numba. The three packages can be easily installed with pip.
RGB-D Sequences
fx = 542.822841
fy = 542.576870
cx = 315.593520
cy = 237.756098
d0 = 0.039903
d1 = -0.099343
d2 = -0.000730
d3 = -0.000144
d4 = 0.000000