First an entire building floor of the Stata Center was captured using a Kinect. The data was then used to build a simple plane-based 3D model using PCL (and sensor poses from Lidar-based SLAM) of about 1MB in size.
They then developed a robust algorithm for localization within the model in real-time using a particle filter.
The approach works by generating simulated range images using the 3D model for a virtual camera located the different particle poses. This approach correctly simulates the image formation model and allows for a disparity parameterized likelihood function. This allows the filter to correctly utilize the very noisy RGB-D points up to 20 meters away - data that has usually been discarded up to this.
Particles are propagated using the FOVIS Visual Odometry library - so no wheel odometry or IMU was required.
Implicitly the approach is robust to dynamic objects and people in the environment, but given the use of the RGB-D sensors such challenges could be explicitly supported by alternative detectors.
As the approach utilizes the GPU for scene rendering and depth buffering (via OpenGL), the approach is efficient: allowing for real-time operation with 100s of particles. This allows for broad exploration of the particle cloud around the source location: resulting in robust localization 6-DOF.