Google sponsored development for 11 projects in the summer of 2011.

For a complete list of all the present and past PCL code sprints please visit http://www.pointclouds.org/blog.

Today Marius has applied the patch for multithreaded search to the git head of FLANN. So, multithreaded search is available now! Give it a try! If you encounter any problems, contact the mailing list and we will try to resolve them as quickly as possible :)

This will also be my last blogpost as this is the end of GSoC. I would like to thank all PCL developers (especially Marius, my mentor) for their help during the summer and for the opportunity they gave me by accepting me for GSoC 2011! It was a blast and you’ll probably see me around!

Cool results using the PFHRGB feature. It is based on the PFH feature which can be found in the following publication:

- R.B. Rusu, N. Blodow, Z.C. Marton, M. Beetz. - “Aligning Point Cloud Views using Persistent Feature Histograms”, In Proceedings of the 21st IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Nice, France, September 22-26 2008.

The implementation of this feature in PCL currently uses a histogram of 125 bins. The PFHRGB feature we introduced uses an additional 125 bins for storing color information around the point of interest. By using Pararth’s feature evaluation framework, we concluded from a simple experiment that PFHRGB are more distinguishable than simple PFH features.

The experiment was carried out at 3 different feature search radii. A single cloud from the Kinect was used - the original input as source and a transformed version as target. After subsampling, feature signatures were calculated for each point. Correspondences are made by using a simple nearest neighbor heuristic in feature space. The results are the following:

- PFH vs PFHRGB at search_radius 0.02: 4% vs 24% repeatability
- PFH vs PFHRGB at search_radius 0.05: 37% vs 85% repeatability
- PFH vs PFHRGB at search_radius 0.06: 46% vs 92% repeatability

To apply the newly implemented feature to a real world situation, I recorded 6 frames of my desk and incrementally aligned them by using PFHRGB features and RANSAC. The whole procedure took about 5 minutes for registering 5 pairs of clouds.

Today is the official “pencils down” date for GSoC 2011. My project is now officially “over,” and I’m writing this blog post as a final summary of my work this summer, and as a look into the future of surface reconstruction for PCL.

This summer, I’ve completed a large majority of what I’ve set out to do. I’ve reviewed the surface reconstruction API, implemented Marching Cubes, ported over Poisson Reconstruction, connected our surface reconstruction algorithms to VTK’s surface smoothing techniques, and did a lot of further research into the world of surface reconstruction. All in all, I’m pretty satisfied with the amount of progress that has been made over these past couple months.

That said, I feel there’s still a lot more work to be done. Some tutorials should be written for these surface reconstructions, as well as unit tests and more detailed documentation. The Poisson reconstruction code is basically a copy of Misha’s, so this should be ported further, to utilize our own in house Octree and Marching Cubes code. The Marching Cubes code should also be further extended to handle octree based voxel grids. Finally, there are many other surface reconstruction algorithms out there that should be added to the library.

I’ve had a great time this summer working on PCL. Having come in with very little experience in professional or open source software development, it’s been a great learning experience for me, teaching me a lot about efficient C++ coding as well as code management and design. I’ve also found the topic of surface reconstruction very interesting, and will be continuing to work on improving PCL’s surface reconstruction library in the future. Thanks to all the PCL developers, and see you on the mailing list!

Ah, long time no seen! I’ve been busy with the redo of my exam (business skills, a broad view on economics. Wasn’t really my cup of tea but I think I passed it this time *thumbs up*). Back to GSoC then! Currently I’m wrapping up for the final submission to the Google repos and for the merge with the git head of FLANN: doxygen comments, unit tests using GTest and writing a paragraph in the manual of FLANN on how to use the multithreading search.

I’ll send out another blog post when this is finished and the multithreaded search is is merged succesfully with the git head of FLANN!

I’m supposed to add an example to the Statistical removal tutorial that use pcl::PointXYZRGB instead of pcl::PointXYZ, but the input file provided contains no RGB data, so I need a different file to work with for the example. I’ve also been requested to provide an example of finding the volume of irregularly-shaped objects from PointClouds (such as leaves on a bush), and am now just waiting on the files to be provided so that I can write the example.

I started working on a lot of different things lately. A short description of every little branch will be given here, more details when I reach cool results.

In a previous post, I mentioned the Surfel smoothing algorithm I implemented. A lot of modifications have been made to it, to make it closer to the original paper by Amenta et al. After this, benchmarking it against the Moving Least Squares reconstruction algorithm was necessary. The immediate conclusions are that MLS performs faster and with better reconstruction results and I am still working on understanding why Amenta ‘s smoothing method would be theoretically much more precise. Also, in this section, we are looking into adapting a smoothing algorithm with real-time performance for the Kinect.

Next, I looked into trying to combine a few of the things I wrote this summer for a novel application for PCL. So, I decided to try hand pose/gesture recognition. I have successfully used the pyramid feature matching I talked about previously to get some rough initial classification results. I am now trying to use Support Vector Machines in combination with pyramid matching and different kinds of features in order to train classifiers to recognize hand poses recorded with the kinect. For those interested, I have recorded a rather extensive dataset with 8 different hand postures with 10 different images per class. They are uploaded in the dataset repository of PCL.

SmoothedSurfacesKeypoint is up and running. Just need to add unit tests. What this algorithm does is extract features from a set of smoothed versions of the input cloud by looking at the “height” differences between the points smoothed with different scales.

As for bulky programming tasks, I proposed the pcl_tools idea on the mailing list and contributed with a few tools such as: add_gaussian_noise, compute_cloud_error, mls_smoothing, passthrough_filter. Also, two new apps for the kinect have come to life: openni_mls_smoothing and openni_feature_persistence - which can be found in the apps/ folder.

Not all of them are currently commited to trunk, but I have been working on creating new features by integrating color information into 3D feature signatures. So, today I started using Pararth’s feature benchmarking framework to compare the performance of each feature signature.

Last, but not least, I contacted the Willow Garage people working with Android and trying to help in that direction too.

Over and out!

I’ve successfully ported over Misha’s Poisson reconstruciton code, and here are the results:

This model was generated with the sample data on Misha’s site, which provides both points and normals. In an ideal case, the results look extremely good. When the points alone are ported in without any normal information, the image quality is a bit worse:

These norms were generated in PCL, with a K-NN value of 10. This is a great illustration of how important a good set of surface normals are for accurate reconstruction using implicit surface algorithms like Poisson reconstruction.

A few days ago, I sent a patch containing my GSoC work to Marius, and it can be found in FLANN (git head) now. If you want to try it, you simply have to install CUDA and then build FLANN from git. Instructions on how to use the new index type can be found in the manual. In short, I finished all three mandatory points of my roadmap, but sadly there wasn’t enough time for the implementation of high-dimensional index structures in FLANN.

This was probably my final GSoC update, since I’m leaving on a trip tomorrow and returning on sunday. I really enjoyed working with this project, so I guess you can expect some more improvements on the CUDA FLANN kd tree after GSoC :-)

I have added a tutorial which explains the functionality of the FeatureEvaluationFramework class, for benchmarking feature descriptor algorithms. Also, the tutorial explains a sample use case, which is “determining effect of search radius on FPFHEstimation computations”.

I have mentioned how the Framework can be extended to include testing of many (all?) feature algorithms, quite easily.

I run many times already existing GICP implementation, but unfortunately I couldn’t find out why the algorithm doesn’t converge. This made me think that this happens because of some numerical issues infiltrated into the new code. Needing a ground truth for numerical comparisons we decided to start from original implementation based on ANN and GSL and compare the results of each intermediate algorithm’s step from original implementation and PCL ported version. In what fallows I will try to summarize what I did and my conclusions:

**1.** remodeled the original implementation in order to get it closer to PCL “registration” for an easier future integration

- Break some data structures which made the data access harder and change the computation flow.

**2.** replaced ANN searching with FLANN searching

- I compared ANN vs FLANN results required by the computation of covariance matrix for each point and by the correspondence point searching. They are almost identical. For covariance matrices computation are used 20 neighbors. Both ANN and FLANN returned same neighbors. This allows me to go deeply with future comparisons because I was sure that I am comparing same points but retained in different containers (original and PCL). Notice here that this is not happening all the times, but in not happening case I tried to filter both point clouds and then it worked.

**3.** reimplemented matrix and vector operations based on GSL with their Eigen equivalent. The GSL operations are using original point cloud containers while Eigen operations are using PCL point cloud containers.

- I compared each intermediate result and all the differences between them is less than 1e-10, which I think is acceptable for beginning.

**4.** reimplemented functions necessary for the evaluation of the function to be optimized and function’s Jacobian for the pairs of corresponding points. Also here the GSL operations are using original point cloud containers while Eigen operations are using PCL point cloud containers.

- Additionally from original implementation I called both new functions (evaluation function and it’s Jacobian). After each call, I computed the differences between the values returned by the new implemented functions and their original versions. These differences are less than 1e-10 in both cases: function evaluation and jacobian evaluation.

**5.** replaced the original optimization based on GSL with the new one based on LM. CMINPACK offers multiple variants for minimize the sum of the squares of M nonlinear functions in N variables.

Using LMDER1

- Requires to be hand-computed both the function values and the function Jacobian.
- In the first test scenario, the GICP optimization loop, additionally to original optimization based on GSL, optimizes the same transformation matrix using LMDER1 and the same set of correspondences. The numerical results are quite different. The same transformation optimized by GSL converges to different values if it is optimized by LMDER1.
- In the second test scenario, the GICP optimization loop separately computes the correspondences for GSL optimization and LMDER1 optimization. As in previous scenario the numerical results are quite different. Additionally it seems that LMDER1 get blocked into a local minim.
- In conclusion I think that LMDER1 convergence parameters represent a critical point. I am intending to test in parallel the GSL optimization vs. LMDER1 optimization in order to verify the robustness of their convergences.
- After listening Nizar suggestions I think that I tried to use LMDER1 somehow in and improper way because of reasons related the GICP mathematic model.

Using LMDIF1

- Requires to be hand-computed only the function values. The function Jacobian is computed by a forward-difference approximation.
- As in the second scenario of LMDER1, the GICP optimization loop separately computes the correspondences for GSL optimization and LMDIF1 optimization. The results are almost the same but still the translation components significantly differ.

Using LMSTR1

- Requires to be hand-computed the function values and the Jacobian’s rows.
- As in the case of LMDIF1, the optimization converges to different values than original implementation.

As Zoltan suggested, I have implemented extra step to remove the triangles in the overlapping region given by 2 meshes. The first version proposed by Zoltan is that we will go over the triangles in the first mesh and check if the its center has a nearest neighbor in the other cloud that is closer than the maximum distance from the center to one of the vertices and this nearest neighbor has to be in at least one triangle of other mesh, we will delete this triangle of the first mesh. To search for nearest neighbor, I have used k-nearest neighbor search with k =1.

To delete triangle from a mesh, we set a vertex to FREE if this triangle was only a triangle which the vertex was in. After that update SFN and FFN of other two vertices. If this was not the only triangle the vertex was in, it has to be set to FRINGE. Updating the sfn and ffn here is a bit more tricky. Check if on of the other triangles this vertex is in also has as another vertex one of the other points. Following Zoltan suggested I have implemented three more functions to complete this task.

And here are results:

The 1st mesh:

The 2st mesh which overlapps 1st mesh:

The result after removing overlapped triangles and merging 2 meshes:

It’s been a while since I’ve posted a blog update, but I haven’t been sitting around lately. Since my last blogpost I completed the multithreading of FLANN by making use of the Intel Threading Building Blocks library. This embodies that now all knn- and radiusSearches make use of multiple cores when specified by the user. The parallelism can be exploited when multiple query points are passed trough the knn- or radiusSearch interface and the amount of cores is specified in the SearchParam struct.

Because using Intel TBB provides some additional overhead (using function objects, spawning threads(!), ...) over the normal singlecore FLANN implementation, it is expected that it will only pay off when enough work is being done by the spawned worker threads. How is work increased? Ofcourse as the k and radius parameter increases and/or the more query points are passed through the interface, the more work is introduced and we start benefiting from the multithreaded version at a particular point. Cfr. the GPU implementation of FLANN (Andreas).

**Multithreading results**

Opposed to the GPU implementation, multithreading starts to pay off at much smaller workloads. Depending on the k/radius parameter, multithreaded search can already start paying of at as few as 100 query points. You can have a look at the benchmarkings I did on my i5 450M processor (quadcore; 2 pyshical cores, each being a logical 2 core processor): FLANN TBB payoff comparison. This comparison is actually for knnSearch, but I also did the benchmarkings for radiusSearch and they are exactly the same. This ofcourse makes sense as the radius also just acts as a “work increasing” parameter.

We can see that a fairly consistent speedup of 2x can be achieved when we transcend the payoff barrier. I’ve also made a benchmark for comparing traditional FLANN with FLANN using 2 cores and FLANN using 4 cores: FLANN numof cores comparison. Interesting to see is that introducing twice as many cores doesn’t always double the search time performance as we all intuitively might think. You can see that going from 2 to 4 cores introduces an extra gain in searchtime speedup, but not as significant a speedup as going from 1 to 2 cores.

**Future steps**

Currently I’m wrapping up for the GSoC period, meaning that I’m cleaning code and integrating it nicely into FLANN, writing documentation and unit tests for it and so on. Once this is done the deadline of GSoC will probably be reached.

Things that I want to perfect / examine later on is having a look at how the caching mechanisms in Intel TBB could potentially increase the performance of multithreaded FLANN even further. For example a first thing that I can have a look at is, when building the KDtree, try and see what the effect is of allocating each node or points in that node on single cache lines instead of using the standard malloc procedure, which doesn’t offer any guarantee on this. This way we might already have an increase in searchtime because of the better fitting into and the more effective use of, the cache.

Another more challenging but logical extention of this first optimization is to take into account which query points were assigned to which CPU, such that the warmed up on-processor cache can be used for doing the search through the tree. Potential major speedup can be obtained if large portions of the kdtree can be fit into the on-processor cache. This is detailed optimization work, but truly interesting stuff to look at and speed up FLANN even further.

I’ve successfully ported over Misha’s Poisson reconstruction code into PCL. As of now, the interface is still rough, and needs refinement, however the reconstruction algorithm does run. Unfortunately, I was unable to code my own version of Poisson reconstruction, due to summer of code ending soon, but I think that an in-house implementation would be the best in the long term, since marching cubes is already implemented, and a more complex, octree based marching cubes algorithm could be then used in tandem with any manner of implicit surface algorithm. This would allow future implementations of implicit surface reconstructions to reuse this code and make life that much easier for developers. The way that the Poisson code is structured now, it is a standalone algorithm with its own implementation of marching cubes, matrices, etc.

More details, along with pictures, coming soon!

Sorry for the late post, but the progress since my last post includes a commandline tool for benchmarking feature descriptor algorithms, which I have added under /trunk/test/

On a related note, I have written a simple tool for extracting features from given pointcloud, which is under /tools/extract_feature.cpp. Currently it supports the PFH, FPFH and VFH algorithms, and I will be adding more. An issue I am facing is of how to dynamically select the point_type of the input cloud, depending on the input PCD file, and currently it is converting all input clouds into PointXYZ types.

Sorry for the long time without updates, I was basically working on the GPU tree build and didn’t have any interesting results until now. Basically the result is: Building the kd tree on the GPU can be about 2-3 times faster than the FLANN CPU build method, though I didn’t do any complete benchmarks until now.

A nice side effect is that the GPU tree build leads to a slightly different memory layout that makes tree traversal a bit (~5%) faster than before.

I’m currently starting to polish the code, document it and write tests to verify that it works correctly. If everything goes the way I planned, it should be ready for a merge with FLANN HEAD by the end of the week.