email: | potthast<at>usc.edu |
---|---|

project: | Point Cloud Segmentation using Graphical Models |

mentor: | Alex Trevor and Koen Buys |

I am a third year PhD student in the Robotics Embedded Systems Lab at the University of Southern California. In my research I am focusing on perception, N-dimensional data processing and Next-Best-View estimation for robotic applications.

In this project I propose to implement a state of the art segmentation approach using graphical models and a highly efficient approximate inference algorithm, which results in faster and better segmentations. Furthermore I propose to create a new pcl package ’pcl_ml’ containing the necessary machine learning techniques such as Markov network structures, inference al- gorithms and low level functions in a general way such they can be reused for future applications.

The supervised segmentation is a two step process. Consisting of training phase and segmentation. In the training phase we extract the objects from the scene. We use the FPFH features as classifiers and as a prior assignment of the unary potentials of the CRF. We compute the FPFH histogram features for all points in one object. To reduce computation and feature comparisons in the recognition step we use a k-means cluster algorithm and cluster the feature into 10 classes. The training objects can be seen in the following image.

In the segmentation and recognition step we use the learned features to assign prior probabilities to a new scene. The prior assignment of the most likely label can be seen in the following image. As you can see in the image, many of the points of the objects we want to segment and recognize are not labeled correctly. This is because the distance of two FPFH features are two far apart. However, as a first initial estimate, FPFH features are well suited. The advantage of using these features is the fact that it only captures the geometry of the features and not color information. Whit this the training data set can be much smaller.

As a second and to refine the assignment we use the fully connected CRF. The following image shows the segmentation and labeling after 10 iterations.

The following picture two pictures show the input data set. On the left side you see the captured point cloud captured using an Asus camera. On the right side you see the han labeled data set. I will show different segmentation results using different features and different levels of noise when setting the labels as unary potentials. One challenging part of the image (red circle) is the boundary between the box and the table. The box in the lower right corner has (almost) the same color as the table.

In the following image sequence you’ll thesegmentation using onlycolor information. The input labels arewith 50% noise assigned, meaning each unary potential is with 50% probability a random label assigned. From left to right the different results after x number of iterations can be seen. Whereas X is [0, 1, 3, 5, 10, 15]. Notice that when using only color information the table label grows into the box (red circle).

In the next image sequence we use only the normals as features. One can see that normals by itself are very powerful. However, we will also see that using only normal information has it’s limitations as well. The number of iterations per image is kept the same as well as the noise level.

Lastly Color + Normal features are used for segmentation. Notice that using color and normal features has extremely fast convergence. After only 5 iterations we have a very acceptable result.

In the second segmentation experiment I wanted to push the algorithm to the limit. For this I made the unary potentials extremely noisy. The potentials get with 80% a random label assigned. The first image sequence shows the segmentation result from left to right with different number of iterations [0, 1, 3, 5, 10, 15]. For the first test we use again only color features. We can see that by using only color features the algorithm performs poorly, which is not surprising. Changing the weights might help a little bit, however to make it a fair comparision I kept the weights and the standard deviations for the Gaussian kernels constant.

Next we use only the normals as features. Using the normals results in surprisingly good results. The background is labeled almost perfectly as well as the objects on the table. The table itselfhowever, remains unlabeled. To this point I have no good explanations why this is the case. Further investigation might be interesting.

Lastly we use Color + Normals features. To my surprise, I actually did not expect such a good result. The only part that seems to be mislabeled are table legs.

I implemented and modified the following two paper to handle n-dimensional data inputs given as a point cloud.

- Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, Philipp Krähenbühl, Vladlen Koltun
- Fast high-dimensional filtering using the permutohedral lattice, A. Adams, J. Baek, and M. A. Davis.

The input for the segmentation algorithm is a point cloud with XYZRGB (will be extended further) ordered and unordered. To convert the input point cloud into a Conditional Random Field I am using a modified voxel grid for scene discretization. Each cell with a measurement becomes a node in the graphical model. For now the edge potential features incorporate position as well as color information.

As a first step I am using the algorithm for supervised segmentation, hand label an input scene. In the following picture the input cloud (left) and the labeled cloud (right) can be seen.

With the labels are used to initialize the unary potentials of the CRF. The potentials are initialized as follows. A point with associated label gets a 0.3 probability that the label is correct. Further more I am assigning 10% of the points a randomly chosen wrong label.

In the next image you can see on the left the the noisy point cloud initialized with the unary energies. On the right you can see the result after segmentation.

To be able to construct a Conditional Random Field from a point cloud I am using a voxel grid for sensor input discretization. This allows to work in grid cells rather than euclidean distances. The problem what I was facing was the following, which made it necessary to extend the current voxel grid implementation with a new subclass inherited from pcl::VoxelGrid<PoinT>. The current implementation of the voxel grid filter does a linear interpolation over the position and PointCloud field lying in the same grid cell. This is problematic for the PointCloud type PointXYZRGBL, which assigns a label to any point in the cloud. By interpolating these labels can become wrong since they are just simple unsigned integer values. In my implementation I modified the voting scheme for the field ‘label’. If many point lie in the same grid cell, the label number with the highest occurrence wins.

On the left side you can see the behaviour of the modified voxel grid filter. You can see on the right side, which is the standart voxel grid filter, that the labels are wrong due to the interpolation.