email: | desinghkar@gmail.com |
---|---|

project: | RGBD Segmentation in Cluttered Indoor Environments |

mentor: | Zoltan-Csaba Marton |

Hi! I am a PhD student in Computer Science department at Brown University.

Object segmentation is a prominent computer vision problem, and with the advent of RGBD sensors like Kinect this problem is solved much better than just using the color information. The objective is to develop framework, by integrating existing methods to do effective RGB-D object segmentation in clutter indoor environments. Overall approach is to divide the scene into segments and then group them to form an object using machine learning techniques.

This project aims at getting features for each surface patch and group them to gether based on a machine learning technique. I have been experimenting on the features that could represent the surface patches belonging to an object and how they relate with the inter and intra object surface patches. Below are some of them, with their description and plots.

For all the below analysis please look at the image below and the segments derived out of basic region growing which doesn’t make use of RGB color and purely based on surface curvature.

**ColorHistogram:**
Binning colors is pretty normal thing to get the appearance cues. But they can be done in two ways. One that captures the color channels independently
and the other that captures a dependent binning. In this section I am showing the independent binning of values on three different channels. Independent means at a pixel/point in scene we check RGB values different and increment the bin where these values belong to. Below is the plot that shows how the histograms look like for RGB and HSV color space.

**ColorHistogram 3D:**
Binning colors dependently means having a matrix with RGB in 3 dimensions and increment the value of the (r, g, b) bin only if that combination is
satified. This gives a 3D histogram. Below images show the 3D histogram concatenated to a 1D histogram for RGB, HSV and YUV color space. This binning style is
referred from [1].

**Verticality:**
Verticality actually represents how the surface patch is orientated with respect to the camera viewpoint. Histogram is developed by binning the
difference of angles between the normals and the direction in which the camera is pointing to (i.e the positive z axis for any point cloud from
Kinect). Below is the histogram plot for this feature. However this is not so useful to this segmentation project but more aimed towards the object
discovery and saliency related problems to distinguish the a surface patch from its peers. This is implemented based on [2] which was adopted from [3].

**Dimentionality Compactness:**
This actually shows how compact a surface patch is. One could do a PCA of a surface patch and derive a local frame of reference. This when followed by
creating a bounding box gives the 3 dimensions in which the patch ranges the max. This bounding box will have 3 dimensions and their ranges are computed. Once this ranges (xrange, yrange, zrange) are sorted into min_range, mid_range and max_range, two ratios are computed. 1) min_range / max_range and 2) mid_range / max_range. Below is plot of this histogram for the above mentioned segments. This is implemented based on [2] which was adopted from [3].

**Perspective scores:**
This is the ratio of the area projected in the image to the maximum area spread by the region in 3D. The pixel_range below means the bounding box range in the
particular direction of the image pixels. xrange, yrange and zrange are the dimentions of the 3D bounding box of the surface patch. Note that PCA shouldn’t done here as we are comparing the 3D surface patch with its appearance on the image. This is implemented based on [2] which was adopted from [3]. Below are the elements of the
histogram.

pixel_x_range (in meters) / xrange

pixel_x_range (in meters) / yrange

pixel_x_range (in meters) / zrange

pixel_y_range (in meters) / xrange

pixel_y_range (in meters) / yrange

pixel_y_range (in meters) / zrange

diameter_of_bounding_box_pixels (in meters)/ diameter_of_cuboid_bounding_box_in_3d

area_of_bounding_box (in metersquare) / area_of_the_largest_two_dimentions_in_3d

**Contour Compactness:**

This is the ratio of the perimeter to the area of the region. This is the ratio of number of boundary points computed for a segment to the total number of points in the region. This is implemented based on [2] which was adopted from [3].

**Referrences:**

[1] A.Richtsfeld, T. Mörwald, J. Prankl, M. Zillich and M. Vincze: Learning of Perceptual Grouping for Object Segmentation on RGB-D Data; Journal of Visual Communication and Image Representation (JVCI), Special Issue on Visual Understanding and Applications with RGB-D Cameras, July 2013.

[2] Karthik Desingh, K Madhava Krishna, Deepu Rajan, C V Jawahar, “Depth really Matters: Improving Visual Salient Region Detection with Depth”, BMVC 2013.

[3] Alvaro Collet Romea, Siddhartha Srinivasa, and Martial Hebert, “Structure Discovery in Multi-modal Data : a Region-based Approach”, ICRA 2011.

These features will go into the features module of the pcl_trunk pretty soon. Currently working on the relational features which tells how two surfaces are related to each other. Next blog post should be on that.

Almost close to the end stage of GSoC, but this project has long way to go! Hope to keep pushing stuff till the entire pipeline is up on PCL.

Global Radius-based surface descriptor concatenates the RSD descriptor as discussed in the previous post to represent the complete object. GRSD gives a good description of the 3D shape of the object. Below are the set of objects and its GRSD descriptors i.e. histograms. I have used University of Washington’s “Large scale RGBD dataset” for the experiments.

- For an object whose surface is planar but has 2 different planes in the view
- For an object whose surface is planar but has 1 planes in the view
- For an object whose surface is spherical
- For an object whose surface is cylinderical but doesn’t have any planar surface in view

It can be seen that all the descriptors are different from eachother. Planes and box surfaces are similar as the surface characteristics are similar in this case. Both GRSD and RSD are pushed into the pcl-trunk for people to use. The test files for these two features are also included in the trunk for the basic usage of the same.

Currently working on the NURBS for small surface patches. Since NURBS are already available in PCL we will be looking at how to tailor the same for our needs. After this we plan to work on the features that compute the relationship between the surface patches.

RSD Feature is a local feature histogram that describes the surface local to a query point. There is pcl implementation for this that is available in the features folder. With the help of my mentor I understood the algorithm by which this feature is obtained. To very if this is working perfectly we took a real object whose radius is known and generated the RSD computation on the entire point cloud of the object. This gives RSD Feature Histogram for all the points in the pointcloud. We can also get the min and max radius of the local surface patch around each point in the pointcloud. I generated various combination of parameters to know how the radius computed varies. Below is the object used which has a radius of 3.5cm which is 0.035m

Below are some of the params chosen and their corrresponding effect on the min and max radius in the local surface patch of each point. For Normal Radius search = 0.03 Max_radius = 0.7 (maximum radius after which everything is plane) RSD_radius search = 0.03

For Normal Radius search = 0.03

Max_radius = 0.1 (maximum radius after which everything is plane)

RSD_radius search = 0.03

For Normal Radius search = 0.02

Max_radius = 0.1 (maximum radius after which everything is plane)

RSD_radius search = 0.03 - This is found to be good way for generating histograms

I tried to do MLS smoothing on the point cloud data and then compute the RSD feature which makes the normal computation better and resulting in consistency over all the points on the object surface.

For Normal Radius search = 0.03

Max_radius = 0.7 (maximum radius after which everything is plane)

RSD_radius search = 0.03

For Normal Radius search = 0.03

Max_radius = 0.1 (maximum radius after which everything is plane)

RSD_radius search = 0.03

For Normal Radius search = 0.02

Max_radius = 0.1 (maximum radius after which everything is plane)

RSD_radius search = 0.03 - This is found to be good way for generating histograms

Now I tested out how the actual feature looks like at a point on the sphere to check if it matches with the histogram in the paper. The same is compared between raw point cloud from the kinect and MLS smoothened point cloud. Below is the result of the same.

It was really hard to fix the previous image that it can show the histograms with values and good resolution. So below is the snapshot of the spherical and cylinderical surfaces.

Cylinderical Surface:

Spherical Surface:

Next post will have the details of how GRSD results are and how they differentiate the characteristics of two surfaces. GRSD code from the author will be integrated into the PCL code base. We also plan to categorize the pipeline into modules that fit into the PCL code base as features, surface and segmentation sections. These information will be posted in the next post.