PCL Developers blog

All blog posts

Multithreaded search available in FLANN
Friday, August 26, 2011

Today Marius has applied the patch for multithreaded search to the git head of FLANN. So, multithreaded search is available now! Give it a try! If you encounter any problems, contact the mailing list and we will try to resolve them as quickly as possible :)

This will also be my last blogpost as this is the end of GSoC. I would like to thank all PCL developers (especially Marius, my mentor) for their help during the summer and for the opportunity they gave me by accepting me for GSoC 2011! It was a blast and you’ll probably see me around!

PFHRGB Feature results!
Wednesday, August 24, 2011

Cool results using the PFHRGB feature. It is based on the PFH feature which can be found in the following publication:

  • R.B. Rusu, N. Blodow, Z.C. Marton, M. Beetz. - “Aligning Point Cloud Views using Persistent Feature Histograms”, In Proceedings of the 21st IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Nice, France, September 22-26 2008.

The implementation of this feature in PCL currently uses a histogram of 125 bins. The PFHRGB feature we introduced uses an additional 125 bins for storing color information around the point of interest. By using Pararth’s feature evaluation framework, we concluded from a simple experiment that PFHRGB are more distinguishable than simple PFH features.

The experiment was carried out at 3 different feature search radii. A single cloud from the Kinect was used - the original input as source and a transformed version as target. After subsampling, feature signatures were calculated for each point. Correspondences are made by using a simple nearest neighbor heuristic in feature space. The results are the following:

  • PFH vs PFHRGB at search_radius 0.02: 4% vs 24% repeatability
  • PFH vs PFHRGB at search_radius 0.05: 37% vs 85% repeatability
  • PFH vs PFHRGB at search_radius 0.06: 46% vs 92% repeatability

To apply the newly implemented feature to a real world situation, I recorded 6 frames of my desk and incrementally aligned them by using PFHRGB features and RANSAC. The whole procedure took about 5 minutes for registering 5 pairs of clouds.

../_images/pfhrgb_registration_clouds.png ../_images/pfhrgb_registration_color.png
Pencils Down
Monday, August 22, 2011

Today is the official “pencils down” date for GSoC 2011. My project is now officially “over,” and I’m writing this blog post as a final summary of my work this summer, and as a look into the future of surface reconstruction for PCL.

This summer, I’ve completed a large majority of what I’ve set out to do. I’ve reviewed the surface reconstruction API, implemented Marching Cubes, ported over Poisson Reconstruction, connected our surface reconstruction algorithms to VTK’s surface smoothing techniques, and did a lot of further research into the world of surface reconstruction. All in all, I’m pretty satisfied with the amount of progress that has been made over these past couple months.

That said, I feel there’s still a lot more work to be done. Some tutorials should be written for these surface reconstructions, as well as unit tests and more detailed documentation. The Poisson reconstruction code is basically a copy of Misha’s, so this should be ported further, to utilize our own in house Octree and Marching Cubes code. The Marching Cubes code should also be further extended to handle octree based voxel grids. Finally, there are many other surface reconstruction algorithms out there that should be added to the library.

I’ve had a great time this summer working on PCL. Having come in with very little experience in professional or open source software development, it’s been a great learning experience for me, teaching me a lot about efficient C++ coding as well as code management and design. I’ve also found the topic of surface reconstruction very interesting, and will be continuing to work on improving PCL’s surface reconstruction library in the future. Thanks to all the PCL developers, and see you on the mailing list!

Wrapping up
Thursday, August 18, 2011

Ah, long time no seen! I’ve been busy with the redo of my exam (business skills, a broad view on economics. Wasn’t really my cup of tea but I think I passed it this time thumbs up). Back to GSoC then! Currently I’m wrapping up for the final submission to the Google repos and for the merge with the git head of FLANN: doxygen comments, unit tests using GTest and writing a paragraph in the manual of FLANN on how to use the multithreading search.

I’ll send out another blog post when this is finished and the multithreaded search is is merged succesfully with the git head of FLANN!

Thursday, August 18, 2011

I’m supposed to add an example to the Statistical removal tutorial that use pcl::PointXYZRGB instead of pcl::PointXYZ, but the input file provided contains no RGB data, so I need a different file to work with for the example. I’ve also been requested to provide an example of finding the volume of irregularly-shaped objects from PointClouds (such as leaves on a bush), and am now just waiting on the files to be provided so that I can write the example.

Work updates - a lot of different topics
Thursday, August 18, 2011

I started working on a lot of different things lately. A short description of every little branch will be given here, more details when I reach cool results.

  1. In a previous post, I mentioned the Surfel smoothing algorithm I implemented. A lot of modifications have been made to it, to make it closer to the original paper by Amenta et al. After this, benchmarking it against the Moving Least Squares reconstruction algorithm was necessary. The immediate conclusions are that MLS performs faster and with better reconstruction results and I am still working on understanding why Amenta ‘s smoothing method would be theoretically much more precise. Also, in this section, we are looking into adapting a smoothing algorithm with real-time performance for the Kinect.

  2. Next, I looked into trying to combine a few of the things I wrote this summer for a novel application for PCL. So, I decided to try hand pose/gesture recognition. I have successfully used the pyramid feature matching I talked about previously to get some rough initial classification results. I am now trying to use Support Vector Machines in combination with pyramid matching and different kinds of features in order to train classifiers to recognize hand poses recorded with the kinect. For those interested, I have recorded a rather extensive dataset with 8 different hand postures with 10 different images per class. They are uploaded in the dataset repository of PCL.

  3. SmoothedSurfacesKeypoint is up and running. Just need to add unit tests. What this algorithm does is extract features from a set of smoothed versions of the input cloud by looking at the “height” differences between the points smoothed with different scales.

  4. As for bulky programming tasks, I proposed the pcl_tools idea on the mailing list and contributed with a few tools such as: add_gaussian_noise, compute_cloud_error, mls_smoothing, passthrough_filter. Also, two new apps for the kinect have come to life: openni_mls_smoothing and openni_feature_persistence - which can be found in the apps/ folder.

  5. Not all of them are currently commited to trunk, but I have been working on creating new features by integrating color information into 3D feature signatures. So, today I started using Pararth’s feature benchmarking framework to compare the performance of each feature signature.

  6. Last, but not least, I contacted the Willow Garage people working with Android and trying to help in that direction too.

Over and out!

Poisson Reconstruciton Results
Wednesday, August 17, 2011

I’ve successfully ported over Misha’s Poisson reconstruciton code, and here are the results:

../_images/poisson_horse1.png ../_images/poisson_horse2.png

This model was generated with the sample data on Misha’s site, which provides both points and normals. In an ideal case, the results look extremely good. When the points alone are ported in without any normal information, the image quality is a bit worse:


These norms were generated in PCL, with a K-NN value of 10. This is a great illustration of how important a good set of surface normals are for accurate reconstruction using implicit surface algorithms like Poisson reconstruction.

GPU KD Tree Integrated Into FLANN
Tuesday, August 16, 2011

A few days ago, I sent a patch containing my GSoC work to Marius, and it can be found in FLANN (git head) now. If you want to try it, you simply have to install CUDA and then build FLANN from git. Instructions on how to use the new index type can be found in the manual. In short, I finished all three mandatory points of my roadmap, but sadly there wasn’t enough time for the implementation of high-dimensional index structures in FLANN.

This was probably my final GSoC update, since I’m leaving on a trip tomorrow and returning on sunday. I really enjoyed working with this project, so I guess you can expect some more improvements on the CUDA FLANN kd tree after GSoC :-)

Tutorial For Using The Benchmarking Class
Monday, August 15, 2011

I have added a tutorial which explains the functionality of the FeatureEvaluationFramework class, for benchmarking feature descriptor algorithms. Also, the tutorial explains a sample use case, which is “determining effect of search radius on FPFHEstimation computations”.

I have mentioned how the Framework can be extended to include testing of many (all?) feature algorithms, quite easily.

The convergence of GICP based on Levenberg–Marquardt optimizer
Sunday, August 14, 2011

I run many times already existing GICP implementation, but unfortunately I couldn’t find out why the algorithm doesn’t converge. This made me think that this happens because of some numerical issues infiltrated into the new code. Needing a ground truth for numerical comparisons we decided to start from original implementation based on ANN and GSL and compare the results of each intermediate algorithm’s step from original implementation and PCL ported version. In what fallows I will try to summarize what I did and my conclusions:

1. remodeled the original implementation in order to get it closer to PCL “registration” for an easier future integration

  • Break some data structures which made the data access harder and change the computation flow.

2. replaced ANN searching with FLANN searching

  • I compared ANN vs FLANN results required by the computation of covariance matrix for each point and by the correspondence point searching. They are almost identical. For covariance matrices computation are used 20 neighbors. Both ANN and FLANN returned same neighbors. This allows me to go deeply with future comparisons because I was sure that I am comparing same points but retained in different containers (original and PCL). Notice here that this is not happening all the times, but in not happening case I tried to filter both point clouds and then it worked.

3. reimplemented matrix and vector operations based on GSL with their Eigen equivalent. The GSL operations are using original point cloud containers while Eigen operations are using PCL point cloud containers.

  • I compared each intermediate result and all the differences between them is less than 1e-10, which I think is acceptable for beginning.

4. reimplemented functions necessary for the evaluation of the function to be optimized and function’s Jacobian for the pairs of corresponding points. Also here the GSL operations are using original point cloud containers while Eigen operations are using PCL point cloud containers.

  • Additionally from original implementation I called both new functions (evaluation function and it’s Jacobian). After each call, I computed the differences between the values returned by the new implemented functions and their original versions. These differences are less than 1e-10 in both cases: function evaluation and jacobian evaluation.

5. replaced the original optimization based on GSL with the new one based on LM. CMINPACK offers multiple variants for minimize the sum of the squares of M nonlinear functions in N variables.

  • Using LMDER1
  • Requires to be hand-computed both the function values and the function Jacobian.
  • In the first test scenario, the GICP optimization loop, additionally to original optimization based on GSL, optimizes the same transformation matrix using LMDER1 and the same set of correspondences. The numerical results are quite different. The same transformation optimized by GSL converges to different values if it is optimized by LMDER1.
  • In the second test scenario, the GICP optimization loop separately computes the correspondences for GSL optimization and LMDER1 optimization. As in previous scenario the numerical results are quite different. Additionally it seems that LMDER1 get blocked into a local minim.
  • In conclusion I think that LMDER1 convergence parameters represent a critical point. I am intending to test in parallel the GSL optimization vs. LMDER1 optimization in order to verify the robustness of their convergences.
  • After listening Nizar suggestions I think that I tried to use LMDER1 somehow in and improper way because of reasons related the GICP mathematic model.
  • Using LMDIF1
  • Requires to be hand-computed only the function values. The function Jacobian is computed by a forward-difference approximation.
  • As in the second scenario of LMDER1, the GICP optimization loop separately computes the correspondences for GSL optimization and LMDIF1 optimization. The results are almost the same but still the translation components significantly differ.
  • Using LMSTR1
  • Requires to be hand-computed the function values and the Jacobian’s rows.
  • As in the case of LMDIF1, the optimization converges to different values than original implementation.
Finished and back to school
Saturday, August 13, 2011

As Zoltan suggested, I have implemented extra step to remove the triangles in the overlapping region given by 2 meshes. The first version proposed by Zoltan is that we will go over the triangles in the first mesh and check if the its center has a nearest neighbor in the other cloud that is closer than the maximum distance from the center to one of the vertices and this nearest neighbor has to be in at least one triangle of other mesh, we will delete this triangle of the first mesh. To search for nearest neighbor, I have used k-nearest neighbor search with k =1.

To delete triangle from a mesh, we set a vertex to FREE if this triangle was only a triangle which the vertex was in. After that update SFN and FFN of other two vertices. If this was not the only triangle the vertex was in, it has to be set to FRINGE. Updating the sfn and ffn here is a bit more tricky. Check if on of the other triangles this vertex is in also has as another vertex one of the other points. Following Zoltan suggested I have implemented three more functions to complete this task.

And here are results:

The 1st mesh:


The 2st mesh which overlapps 1st mesh:


The result after removing overlapped triangles and merging 2 meshes:

Multithreaded FLANN
Friday, August 12, 2011

It’s been a while since I’ve posted a blog update, but I haven’t been sitting around lately. Since my last blogpost I completed the multithreading of FLANN by making use of the Intel Threading Building Blocks library. This embodies that now all knn- and radiusSearches make use of multiple cores when specified by the user. The parallelism can be exploited when multiple query points are passed trough the knn- or radiusSearch interface and the amount of cores is specified in the SearchParam struct.

Because using Intel TBB provides some additional overhead (using function objects, spawning threads(!), ...) over the normal singlecore FLANN implementation, it is expected that it will only pay off when enough work is being done by the spawned worker threads. How is work increased? Ofcourse as the k and radius parameter increases and/or the more query points are passed through the interface, the more work is introduced and we start benefiting from the multithreaded version at a particular point. Cfr. the GPU implementation of FLANN (Andreas).

Multithreading results

Opposed to the GPU implementation, multithreading starts to pay off at much smaller workloads. Depending on the k/radius parameter, multithreaded search can already start paying of at as few as 100 query points. You can have a look at the benchmarkings I did on my i5 450M processor (quadcore; 2 pyshical cores, each being a logical 2 core processor): FLANN TBB payoff comparison. This comparison is actually for knnSearch, but I also did the benchmarkings for radiusSearch and they are exactly the same. This ofcourse makes sense as the radius also just acts as a “work increasing” parameter.

We can see that a fairly consistent speedup of 2x can be achieved when we transcend the payoff barrier. I’ve also made a benchmark for comparing traditional FLANN with FLANN using 2 cores and FLANN using 4 cores: FLANN numof cores comparison. Interesting to see is that introducing twice as many cores doesn’t always double the search time performance as we all intuitively might think. You can see that going from 2 to 4 cores introduces an extra gain in searchtime speedup, but not as significant a speedup as going from 1 to 2 cores.

Future steps

Currently I’m wrapping up for the GSoC period, meaning that I’m cleaning code and integrating it nicely into FLANN, writing documentation and unit tests for it and so on. Once this is done the deadline of GSoC will probably be reached.

Things that I want to perfect / examine later on is having a look at how the caching mechanisms in Intel TBB could potentially increase the performance of multithreaded FLANN even further. For example a first thing that I can have a look at is, when building the KDtree, try and see what the effect is of allocating each node or points in that node on single cache lines instead of using the standard malloc procedure, which doesn’t offer any guarantee on this. This way we might already have an increase in searchtime because of the better fitting into and the more effective use of, the cache.

Another more challenging but logical extention of this first optimization is to take into account which query points were assigned to which CPU, such that the warmed up on-processor cache can be used for doing the search through the tree. Potential major speedup can be obtained if large portions of the kdtree can be fit into the on-processor cache. This is detailed optimization work, but truly interesting stuff to look at and speed up FLANN even further.

Misha’s Poisson Reconstruction
Thursday, August 11, 2011

I’ve successfully ported over Misha’s Poisson reconstruction code into PCL. As of now, the interface is still rough, and needs refinement, however the reconstruction algorithm does run. Unfortunately, I was unable to code my own version of Poisson reconstruction, due to summer of code ending soon, but I think that an in-house implementation would be the best in the long term, since marching cubes is already implemented, and a more complex, octree based marching cubes algorithm could be then used in tandem with any manner of implicit surface algorithm. This would allow future implementations of implicit surface reconstructions to reuse this code and make life that much easier for developers. The way that the Poisson code is structured now, it is a standalone algorithm with its own implementation of marching cubes, matrices, etc.

More details, along with pictures, coming soon!

Commandline Tool For Benchmarking Features
Sunday, August 07, 2011

Sorry for the late post, but the progress since my last post includes a commandline tool for benchmarking feature descriptor algorithms, which I have added under /trunk/test/

On a related note, I have written a simple tool for extracting features from given pointcloud, which is under /tools/extract_feature.cpp. Currently it supports the PFH, FPFH and VFH algorithms, and I will be adding more. An issue I am facing is of how to dynamically select the point_type of the input cloud, depending on the input PCD file, and currently it is converting all input clouds into PointXYZ types.

Nearly Done
Sunday, August 07, 2011

Sorry for the long time without updates, I was basically working on the GPU tree build and didn’t have any interesting results until now. Basically the result is: Building the kd tree on the GPU can be about 2-3 times faster than the FLANN CPU build method, though I didn’t do any complete benchmarks until now.

A nice side effect is that the GPU tree build leads to a slightly different memory layout that makes tree traversal a bit (~5%) faster than before.

I’m currently starting to polish the code, document it and write tests to verify that it works correctly. If everything goes the way I planned, it should be ready for a merge with FLANN HEAD by the end of the week.

Work status
Saturday, August 06, 2011

Lately I have been working on integrating octree gpu functions with the search class. Since GPU functions can only have an advantage when we have queries in batch mode. For that a module as mentioned in my previous post is written which taken a vector of queries as an input and calls the appropriate gpu function. Right now, the calling gpu function is not working correctly which I am still working on.

KdTree Search tutorial complete
Thursday, August 04, 2011

So I finished up the KdTree tutorial, and added a theoretical primer to it to explain a little bit about whats going on. This addition should be live on the tutorial page as soon as the website updates.

Combine mesh update and texture mapping
Thursday, August 04, 2011

A small update on combining mesh update and texture mapping. I just created the function that will update the texture mesh when a new point cloud is added to the mesh. Now I will move to the final part of the project.

Update the mesh without recreating the entire mesh
Wednesday, August 03, 2011

Such a long time I haven’t update my progress since finishing my 1st part. I am working on part II of my roadmap. I just finished implementing the surface mesh update when a new point cloud is added to an existing surface mesh, without recreating the entire mesh. Basically, the implementation of greedy triangular algorithms is based on the algorithm in paper “A fast and efficient projection-based approach for surface reconstruction”. The terminology is that we will assign the data point at any given state of the algorithm as FREE, FRINGE, BOUNDARY and COMPLETED points. To update the mesh without recreating the entire mesh for a updated cloud, as Zoltan suggested, we assign the new points with FREE state, change the states of BOUNDARY points to FRINGE points and enter the triangulating loop which only keep the current mesh and updates new trianges.

To test this function, I have splited bunny point cloud dataset to 2 point clouds.

This figure show 2 meshes after using greedy triangular algorithm on 2 separated datasets.


And here is result of updated mesh when one point clouds is added to existing mesh.


And here is result of the original mesh before splited.

KdTree search tutorial.
Tuesday, August 02, 2011

At the moment I’m working on a tutorial for the KdTree module. It is nearly done, I just need to finish up the theoretical primer and then it will be completly good to go. I’ve also gotten several emails regarding the tutorials that I have already written which I need to sit down and respond to. I believe one is about the RANSAC tutorial and the other has to do with my concave/convex hull tutorial.

VTK Smoothing Integration
Monday, August 01, 2011

I’ve finished integrating VTK smoothing algorithms into PCL. There is now an object VtkSmoother, that takes in a PolygonMesh, and outputs a new PolygonMesh that is smoothed. The actual algorithm first optionally subdivides the polygons into smaller triangles, using either a linear, loop, or butterfly subdivision filter. It then runs either VTK’s LaPlacian or WindowedSinc smoothing algorithm, and outputs it into another PolygonMesh. To read more on these algorithms, the VTK documentation gives good information on them: http://www.vtk.org/doc/release/5.6/html/.

I’ve spent most of my time analyzing the WindowedSinc smoothing algorithm, and the different parameters such as the number of iterations, and the size of the window. Also, I’ve taken a look at the different subdivision methods.

On the Bunny dataset, there aren’t many differences between the Linear, Loop, and Butterfly subdivision methods:

../_images/bunny_smooth_linear.png ../_images/bunny_smooth_loop.png ../_images/bunny_smooth_butterfly.png

The difference is very slight between them all, in both results and in computational time.

The number of iterations affects how much smoothing is performed. In general, the Windowed Sinc algorithm converges quickly, so the number of iterations doesn’t affect the results much. In general, 20 iterations is sufficient. Here are the results after 5, 10, and 15 iterations:

../_images/bunny_smooth_5.png ../_images/bunny_smooth_10.png ../_images/bunny_smooth_15.png

The window size also affects the amount of smoothing, and is the most effective way of changing the results. In general, values between 0.01 and 0.001 work well. Here are results with 0.1, 0.01, and 0.001 window size:

../_images/bunny_smooth_0p1.png ../_images/bunny_smooth_0p01.png ../_images/bunny_smooth_0p001.png

One improvement is to include hole filling for the mesh. Other work integrating more polygon mesh algorithms from VTK is also possible, and should be straightforward in the future.

Conclusions for integration after some more detailed benchmarking
Wednesday, July 27, 2011

Since I mentioned in my earlier blogpost that I would investigate the smaller build times of SC2 and Libnabo than those of FLANN, here is the conclusion on integration (and some other interesting stuff I picked up along the way):

The larger build times of FLANN

SplitCloud2 and Libnabo seemed the have smaller build times than FLANN, but this was due to the use of the PCL wrapped FLANN version instead of using FLANN directly. The larger build time of the PCL wrapped FLANN is due to a copy of the pointcloud that is being made in the PCL kdtree wrapper that really should be avoided. You can see the difference in build-time between wrapped and direct FLANN in these benchmarking figures (SplitCloud2 is also considered):


You can see now that it is in no way beneficial (in realistic scenario’s) to choose SplitCloud2 over FLANN.

Equivalent kdtrees in libnabo and FLANN

What I would like to mention is that I did some refactoring on the benchmark framework in my trunk. For libnabo and FLANN benchmarking I added a way to easily build kdtrees with the same leafsize (number of points in a leaf node) such that benchmarkings can be created where the two libraries always fight with equal weapons. For the default leafsize that is currently used in PCL (=15), we can see that FLANN is performing better in both measurements (build and search time):


For different meaningful leafsizes, FLANN keeps outperforming Libnabo, meaning that integrating Libnabo (or certain aspects of it) would not be beneficial in any way.

Optimal leafsize for kdtree

It also turns out that when changing the leafsize that is used for building the FLANN kdtree, some more performance can be gained. Changing the leafsize from 15 to 50 [1] improves the kdtree build time with approx 20% while radius search times only incur an increase of approx 5%, and knn search times incur an increase of approx 5-10%. This is a valuable insight that can be used in choosing a leafsize that balances the build and search times for it’s particular use in PCL. The results of these benchmarkings are linked below for the ones that are interested:




Multithreading of FlANN

Sorry for the late update on the final conclusion of integration for libnabo and SC2, this is because I have ben context switching between multithreading development and interpreting benchmarking results. Currently I’ve set up a test project that locates Intel TBB using pkg-config and a FindTBB.cmake file I wrote. This is a clean and basic project that I can easily expand while developing a multithreaded variant of the SingleKDTreeIndex in FLANN. I’m currently working on this as we speak. The goal will be to highly optimise this index and when CMake detects that Intel TBB is installed on the system, it will maximally exploit the capabilities of the hardware platform.

[1] I noted from the benchmarkings that a leafsize of 50 provides a good balance between kdtree build time improvement and search time degrading, if the current leafsize value should be changed is currently being discussed on the developer mailing list. In effect some more fine grained benchmarkings could be generated to reach a well founded consensus.

First version of registration visualizer
Tuesday, July 26, 2011

The first version of Registration Visualizer is ready and can be found in the visualization module. In order to be used, it requires to be set:

  • the registration method whose intermediate steps will be rendered
  • the number of correspondence pairs between intermediate point cloud and target point cloud. These pairs will be displayed as lines. Note that if the maximum correspondences is not set then the entire set of correspondences will be displayed and the most probably the visualization will freeze while the registration algorithm will progress. For the future I am intending to speed up the correspondence lines remove / update sequence.

During the registration progress is opened a visualization window which has two ports. In the left port are displayed:

  • the initial positions of source (red) and target (blue) point clouds .

In the right port are displayed:

  • the target point cloud (blue)
  • intermediate positions of source point cloud (yellow) obtained after applying the estimated rigid transformation.
  • the correspondences between the points of intermediate point cloud and the points of the target point cloud used for the estimation of the rigid transformation. These correspondences are represented with random colored lines.
../_images/RegistrationVisualizer.png ../_images/RegistrationVisualizerBun04.png

Currently the RegistrationVisualizer class uses PCLVisualizer. Once I will succeed to run PCLVisualizer display in a separate thread, RegistrationVisualizer will be able to inherit PCLVisualizer.

Here is some snipped code showing how to use this class:

// Create the registration visualizer which will display points of type pcl::PointXYZ
pcl::RegistrationVisualizer<pcl::PointXYZ, pcl::PointXYZ> registrationVisualizer;

// Set the maximum number of correspondences which will be displayed. Note that if
// this number is greater than 30 the visualization thread is slowed down and not all
// intermediate results will be displayed
registrationVisualizer.setMaximumDisplayedCorrespondences (30);

// Start the display window

// Prepare the registration method to be visualized

// ...

pcl::IterativeClosestPoint<pcl::PointXYZ, pcl::PointXYZ> icp;

// ...

// Register the registration method
registrationVisualizer.setRegistration (icp);

// Start running the registration method
icp.align (source_aligned);

// Print the registration results

// ...

// Use a loop to enable examination of final results
while (true)
  boost::this_thread::sleep (boost::posix_time::seconds (1));

Also I added a test unit to visualization tools from visualization module. I run it for trunk/test/bun0.pcd and trunk/test/bun4.pcd and obtained the second print screen.

Radius Search and GPU Tree Build
Tuesday, July 26, 2011

My current status: Radius search: works, but could need some improvement concerning the speed... GPU tree build: Naive implementation basically works, but would only be about 25-50% faster than the CPU version. Zhou et al achieved a speedup of about 8x, so I’m starting over with an approach that is more similar to theirs. (I’m not trying to implement their approach exactly as there are some implementation details missing, so I don’t know for sure how they implemented it...)

RANSAC tutorial done moving on
Tuesday, July 26, 2011

At last the Random Sample Consensus tutorial is done. It has 2 seperate examples in it, one using the plane sample consensus model, and the other using the spherical sample consensus model. I was trying to use a cylindrical model, but as I have not yet learned how to generate normals to surfaces I was unable to continue, so I just went with the spherical model instead. Now I am moving on, and my mentor, Vincent, suggested that work on a tutorial for either the Range Images module or the KDTree module, so that is what I am going to begin working on.

Octree GPU functions
Sunday, July 24, 2011

I started with reading octree search functions for probable porting of them onto GPU. For this I have been going through the GPU functions written by Anatoly for the last few days. Other than this I have added two functions which would act as an interface to CUDA search functions.

virtual int
radiusSearchGPU (std::vector<const PointT>& point, const double radius, std::vector<std::vector<int> >& k_indices,    std::vector<std::vector<float> >& k_distances, int max_nn = -1) const;

virtual int
nearestKSearchGPU (std::vector<const PointT>& point, int k, std::vector<std::vector<int> >& k_indices,    std::vector<std::vector<float> >& k_sqr_distances);

Keeping in mind that GPU power can only be leveraged when multiple query search is done, I have used a vector of point as an input to the function. Next I’ll try and implement some octree search functions onto GPU.

Interest Region Extraction and other things
Saturday, July 23, 2011

I have not updated my blog for a substantial period because I have not done any cool things until the last few days. I have spent most of my time enhancing the code I have written so far during this summer. All the ‘boring’ bits: documentation, unit testing, small bug fixes and optimizations here and there have been the things eating up my time.

As for the more interesting subjects: I have created a new architecture, as agreed upon with other PCL developers, for the MultiscaleFeaturePersistence. It can easily be used with any type of feature extractor/descriptor and I am attaching an example of the results using FPFH, as it was done in the paper describing the method:

  • Radu Bogdan Rusu, Zoltan Csaba Marton, Nico Blodow, and Michael Beetz - “Persistent Point Feature Histograms for 3D Point Clouds”, Proceedings of the 10th International Conference on Intelligent Autonomous Systems (IAS-10), 2008, Baden-Baden, Germany.

In a few words, this algorithm should extract only the features that have signatures distinguishable from the mass of features found in the respective scene, such as corners in an office scene as the one in the image. (Please note that the color image is not calibrated with the depth image with my Kinect under OS X [did not find a way to calibrate it properly])

../_images/office_multiscale_feature_persistence_fpfh.png ../_images/table_scene_multiscale_feature_persistence.png

Another thing worth mentioning is the StatisticalMultiscaleInterestRegionExtraction, inspired by the work Ranjith Unnikrishnan has done for his PhD thesis. I have polished and tested the algorithm and it is now in working state. In a previous blog post I said some things about the fact that I need geodesic distances between all point pairs inside the input point cloud and also geodesic radius searches in order to do the statistical computations. It seems that the method I went for initially is considered one of the efficient ones, which is building a graph where edges are created between each point and its K nearest neighbors (I chose 16 as a decent number - the results made sense). Then, I create an NxN distance matrix, as the output of the Johnson’s search algorithm for sparse graphs. For radius searches, I simply go through a whole row of distances in the matrix and extract only the ones below my threshold (An O(N) that can be further improved to O(logN) by using binary trees at the expense of some additional loops in the initialization). The result of this, as applied on the Stanford dragon:

All benchmarking results in one post!
Thursday, July 21, 2011

UPDATE a debug build sneaked up on me and poluted the results... Hence this final benchmarking update. Now it’s a fair battle between the libraries and FLANN :)

The automated benchmarking framework is finished and you can all find it here: http://svn.pointclouds.org/flann/nickon/trunk/automated%20benchmark/. Please read the README file that is included and everything should be clear after that :) The included python scripts are able to draw graphs of the benchmarking results and bundle all of them into a single html page. Have a look at the scripts I used for benchmarking libnabo and nnn and you should be ready to easily create your own benchmark automation.


The results for libnabo can be found by clicking on this link: Libnabo vs. FLANN. Brute force search times for the libnabo library are intentionally not plotted in the graphs. This would cause a different y axis scale, which would disable us to decently compare the kdtree implementations of libnabo to the one in FLANN. The actual search times of the brute force search can still be found in the results table above the respective graph.

The results indicate that FLANN still is the faster library regarding knn search, with respect to libnabo. For almost every pointcloud FLANN ‘outperforms’ the libnabo library in terms of search times. Outperforming might be an overstatement here, because the search times are really close, but FLANN still performs a tad bit better then libnabo. When we look at the build times libnabo seems to gain some ground. Ofcourse the build time of the bruteforce search is just a fraction of a second (the NearestNeighbourSearch object just being initialized), but the kdtree methods of libnabo are consistently built faster then the kdtree index built by FLANN.

The better search time performance of FLANN could be due to a different and hence better constructed kdtree. Otherwise the faster search times can be attributed to other optimizations in FLANN and we could benefit from building the tree like it is done in the libnabo library.


The results for NNN can be found by clicking on this link: NNN vs. FLANN (realistic range). The same comment applies here. To not overload the graph, only the search times of the optimized versions of the NNN, SplitCloud (SC) and SplitCloud2 (SC2) are plotted. The radiuses ranges considered in this ‘realistic’ scenario contain up to 15% of the entire point cloud.

We can see that for all datasets, FLANN performs better on smaller radiuses. This is because the kdtree exploits the sparsity of the point cloud. Noticable is that the SplitCloud2 method comes pretty near the search times of FLANN! We can also see that the build time of the SplitCloud2 is always smaller than the build time of the FLANN kdtree. Depending on the point cloud distribution it can even be half of FLANN’s build time.

To get a view on the behaviour of the naive methods on larger radiuses I’ve also included a ‘zoomed out’ view of the graphs. You can find them over here: NNN vs. FLANN (full range). We notice that when we move to larger radiuses, the benefit of the more naive methods (especially NNN, which doesn’t use any datastructure at all), shows off [1]. However these radiuses are not very realistic scenarios so actually this is just to show the trend of these methods.

So to conclude the benchmarking of NNN, for the realistic radius ranges we have the kdtree approach of FLANN that performs better than any naive method. We must however note that the SplitCloud2 method is a good tradeoff since it’s build time is noticeably smaller than that of FLANN. For few searches this might prove valuable.


Now, since phase 2 is all about integration of the previously mentioned libraries but they don’t introduce any real performance gains w.r.t. FLANN, (except for the faster build times of libnabo, which I will investigate) I can immediatly move forward to phase 3 which is the multithreading of FLANN. I already initiated this phase (sorry for the late blogpost on the libnabo results, my bad) and am currently looking into and evaluating Intel TBB for managing optimum thread spawning, cache coherency,... for the FLANN library.

[1] I already mentioned that the larger the radiuses become the less benefit the SplitCloud methods will have, because they will construct allmost entirely overlapping point clouds.

VTK Smoothing Results
Wednesday, July 20, 2011

This week I’ve been working on integrating the VTK surface smoothing algorithms in to the surface reconstruction code. I’ve successfully converted a pcl PolygonMesh into VTK format, and run the subdivision and surface smoothing algorithms on it. I’ve been using vtkLinearSubdivisionFilter and vtkWindowedSincPolyDataFilter for these tasks. The results for the gp3 bunny are very nice:

../_images/bunny_gp3_nosmooth.png ../_images/bunny_gp3_smooth_0p001.png

There are also parameters that you can tweak for the surface smoothing filter. Here I am changing the pass band from 0.001 to 0.1:


More updates to come as I implement more smoothing features.

Debugging GICP and visualization of registration process
Wednesday, July 20, 2011

It passed a lot of time from my last post. In the passed days I tested GICP implementation committed by Nizar and I tried to display the registration process of two point clouds.

In order to visualize the registration process I implemented a callback function. What remained is the synchronization of registration thread and visualization thread which is in progress now.

For GICP I still have to test it more deeply. Now I am digging into code and I hope to get GICP running as soon as possible.

Status Update
Wednesday, July 20, 2011

Last week I was waiting for a FLANN API change that would be necessary for radius search on the GPU. As the next steps (implementation of radius search, integration into FLANN and documentation) depend on this, I started with the next independent tasK: KD tree construction on the GPU. This isn’t as easy as the GPU search was, but I think it’s about half done. Most of the single functions should work, but not all of them are done yet (and of course, the pieces will still need to be put together).

I just noticed that the API change is done in GIT, so I’ll put the GPU build on hold and finish radius search first.

Random Sample Consensus tutorial
Tuesday, July 19, 2011

So I finally got the PCLVisualizer demo to work. I just ended up completely removing all of the Point Cloud Library components from my computer and re-compiling and installing the trunk. I believe that the problem I had been encountering was because I had installed some pcl-developer tools by apt-get. Anyway I’m now able to run the tutorial without any issues. This is going to be helpful for making my tutorials becuase I’ll be able to make pictures to include in my documents. I have a lot of the code for my Random Sample Consensus tutorial written, but I’m running into a few issues with displaying it so that everything makes sense. I think what I have written works and demonstrates the algorithm, I’m just working on some pictures that show what’s going on.

Work status
Saturday, July 16, 2011

Last week I refined various interface functions to different search methods. deleteTree() and addPointsfromInputCloud() functions in Octree class were merged into setInputCloud(). The functions inside OrganizedDataIndex class were merged into OrganizedNeighborSearch with the function names asapproxNearestKSearch() and approxRadiusSearch(). Other than these as suggested to include a function to select the best search method on the basis of input cloud, I included a new method as,

Search<PointXYZ>* search = new AutotunedSearch<PointXYZ>(AUTO_TUNED);
search->evaluateSearchMethods(cloudIn, NEAREST_K_SEARCH|NEAREST_RADIUS_SEARCH);

It can compare different datastructure on the bais of run time for the given search method in the second argument. Next I’ll start with interfacing GPU functions in search class and put some CPU-GPU checks in auto-tuned search class.

Improved Dot Product Based Voxelization
Friday, July 15, 2011

I developed a new method of voxelizing point clouds, utilizing a dot product to eliminate the double surface problem.

In the current naive voxelization method, if a point is found within a voxel, that voxel is filled, or set to 1, and neighboring voxels’ vertices are filled accordingly. Then, when the isosurface is extracted, the marching cubes algorithm marches with a value between 0 and 1. This causes a double surface effect:


This is undesirable as it’s not an accurate representation of the isosurface, and it uses twice as many polygons to draw. A simple way of solving this is to somehow fill the inside of the isosurface, however for objects that aren’t enclosed, this won’t quite work, and it also requires all the voxels to be enumerated, reducing performance. A quick fix is to do a dot product between the point’s normal vector and the vector between each neighboring voxel’s center and the point itself:


If the dot product is less than 0, then that voxel’s vertices are not filled.

This results in a single surface, but it also introduces holes into the surface:

../_images/bunny_mcdot1.png ../_images/bunny_mcdot2.png

Not great, but a modest improvement.

More Research on Surface Reconstruction
Friday, July 15, 2011

Doing some more digging on surface reconstruction, I found this nice summary of current surface reconstruction algorithms: http://stackoverflow.com/questions/838761/robust-algorithm-for-surface-reconstruction-from-3d-point-cloud.

In general, there are 2 categories of algorithms: Computational Geometry and Implicit Functions. Computational Geometry includes algorithms that attempt to build the surface from the ground up, for example the ball pivot algorithm or our own greedy triangulation algorithm. Implicit function algorithms create implicit functions for a point cloud, and uses something like marching cubes to extract a surface. Examples of these are Poisson reconstruction, and my own naive method of voxelizing data. These algorithms are typically more robust with respect to noisy data and varying density point clouds, and can also be implemented to process large inputs efficiently. A drawback is that they require normals for all points - however, PCL’s surface normal generation methods should take care of this problem :).

Since implicit functions seem so widespread, I believe a more powerful version of marching cubes is needed. Therefore, after integrating the VTK surface smoothing tools, and possibly Poisson as well, I will begin revamping my marching cubes code to make it as general and efficient as possible, so that any number of these implicit functions can be plugged in easily to the marching cubes base class.

A Comparison of Current Surface Reconstruction Algorithms Part 2
Thursday, July 14, 2011

In this post, I’ll go over Organized Fast Mesh, Concave/Convex Hull, and Poisson reconstruction algorithms, as well as Moving Least Squares and SurfelSmoothing.

Poisson Reconstruction is a method of voxelizing data, combined with an octree-based implementation of Marching Cubes. Code is available at http://www.cs.jhu.edu/~misha/Code/PoissonRecon/, and is being ported into PCL currently, however a more tightly integrated version could probably also be made as a child class of MarchingCubes, provided that the Marching Cubes class is extended to handle octree based volumes.

Organized Fast Mesh is a simple reconstruction algorithm for organized point clouds. Neighboring points are connected to construct a triangular mesh. This algorithm is fast, and very simple, but requires points to be ordered.

Concave/Convex Hull are algorithms that construct a concave/convex hull defined by the input point cloud. These algorithms are not based on the base reconstruction algorithm, but instead a separate implementation. For more information on convex hulls, see http://en.wikipedia.org/wiki/Convex_hull. For more information on the algorithm, see http://www.pointclouds.org/documentation/tutorials/hull_2d.php#hull-2d.

Moving Least Squares is not a reconstruction algorithm, but instead a method of smoothing and resampling noisy data. More information can be found at http://www.pointclouds.org/documentation/tutorials/resampling.php#moving-least-squares. Finally, SurfelSmoothing is a tool for smoothing a cloud with normal vectors.

A Comparison of Current Surface Reconstruction Algorithms Part 1
Wednesday, July 13, 2011

As of now, PCL implements the following surface reconstruction algorithms:

  • Greedy Projection Triangulation
  • Grid Projection
  • Marching Cubes
  • Organized Fast Mesh
  • Concave/Convex Hull

Additionally, Poisson reconstruction is being added as well. There are other classes found in the library, Moving Least Squares and SurfelSmoothing, that do not actually perform reconstruction, but instead are tools to help improve surface reconstruction in some way.

In this post, I’ll do a comparison of these reconstruction methods, Greedy Projection Triangulation, Grid Projection, and Marching Cubes.

Greedy Projection Triangulation The Greedy Projection Triangulation is a greedy algorithm that begins with a seed triangle, and connects subsequent triangles to it, repeating this until all points are connected. The algorithm can be found in the paper “On Fast Surface Reconstruction Methods for Large and Noisy Datasets” by Marton, Rusu, and Beetz. The paper can be found at http://files.rbrusu.com/publications/Marton09ICRA.pdf. This algorithm runs quickly, and produces good quality images with the test data set. One potential downside to the algorithm is that it requires several parameters, such as distance between connected points, and minimum and maximum angles, which may need to be tuned for any given dataset.


The Grid Projection algorithm is a grid based surface reconstruction algorithm. Points are first partitioned into voxels, and a vector field is constructed, where the vectors at any given point are directed at the nearest point. A surface is then determined by examining where vectors with opposite directions point towards. Edges in the voxels that this surface are reconstructed from are determined, and padding cells (cells neighboring the voxels containing the points) are also added. The center points of each voxel are then projected based on the edge intersections, and the surface is reconstructed by connecting these center points. A more detailed explanation can be found at http://www.pointclouds.org/news/surface-reconstruction-from-point-clouds.html. The output is very high quality, and requires only 2 parameters, padding and resolution, however the algorithm takes significantly longer than Greedy Projection or Marching Cubes - 10s of seconds, as opposed to <1 second for the sample data set.


The Marching Cubes algorithm is a well known method of extracting an iso-surface from a set of voxels. An explanation of the algorithm can be found at http://paulbourke.net/geometry/polygonise/. This algorithm takes in a set of voxels, and produces a list of polygons, therefore a method of voxelizing the data must be performed first. The current implementation of marching cubes contains a simple naive method of filling in the voxel in which a point lies. This method is fast, however the image quality is significantly lower than other methods. For more details on Marching Cubes image quality, see my previous blog post. The only input required for this naive method is the resolution. Image quality can also be improved with more clever means of voxelizing the data, as well as implementing dynamically scaled voxels with an octree data structure.


In my next post, I will cover the remaining reconstruction algorithms, and the other toolkit classes found in the surface reconstruction package.

Forward progress and set-backs
Wednesday, July 13, 2011

I’ve been dedicating a lot more of my time to this organization lately, and one thing I really need to do is learn how to use the visual tools of the pcl so that I can create pretty pictures for use in my tutorials. I’ve been attempting to run the PCLVisualizer demo, but I keep running into issues with the pcl-trunk Library installed. Initially I was unable to even compile the demo source file provided with the tutorial, but Radu had me make a small change to the code that fixed that problem. After that though, I ran into an even bigger problem with the compiled program seg faulting everytime a try running it no matter what command line argument was specified (except -h). Radu and Vincent both told me it ran fine for them and neither of them were working with the trunk, so I’m currently compiling the pcl from branch 1.0.x. Hopefully after I get that installed I will be able to complete the tutorial without issue. On to other progress, I updated the layout of our tutorial page this past week. I added a table of contents to the beginning of the page, re-ordered the sections and made everything look a little bit cleaner. I completed another merge, this time of the concatenate fields and concatenate points because they had so much (code and explanations) in common. I am now working on a tutorial for the sample_consensus module, specifically for the RANSAC.

Debugged Harris vertical detector
Tuesday, July 12, 2011

Fig. 1. Psychodelic cat.

It seems I finally debugged the detector and dominant orientation estimation. The color shows detector response at some scale (red is low, blue is high), spheres are keypoints at local maxima.

Now I want to check its robustness to transformations such as rotation in horisontal axes, holes, subsampling etc. Unfortunately, SHREC page does not contain ground truth correspondences, so I emailed the organisers about that.

Benchmarking results for NNN
Tuesday, July 12, 2011

As promised, here are the results of the benchmarking tests for NNN. You can download them @ http://dl.dropbox.com/u/32660516/NNN_benchmark.zip.

The benchmarking results that you can download are run on real world pointclouds grabbed from a kinect. You can find them @ http://svn.pointclouds.org/data/. In the benchmarking results I only added radiuses that cover up to approx 15% of the volume of the tightest bounding box around the whole point cloud, since larger radiuses are rarely used in practice.

We can see that by far FLANN always outperforms NNN for these situations. The build time is also lower than the build time of the SplitCloud techniques used in NNN, so no advantage can be gained there either.

I have currently partially added support for libnabo into the automated benchmarking program, but there are still some errors in there I need to fix before I can properly generate benchmarking figures for libnabo.

Stay tuned!

Finish part I of my roadmap
Tuesday, July 12, 2011

Finally, I have finished part I in my roadmap:

  • Starting from the existing methods (greedy surface triangulation algorithm) to obtain triangle mesh, I have designed and implemented the API for texture mapping onto a mesh.
  • I have finished implementing paper on texture synthesis from the paper “Texture Synthesis on Subdivision Surfaces”.
  • I have finished designing and implementing texture mapping and blending for two overlapping textures onto a surface mesh.

Here are some snapshots of mapping 2 overlap textures to a mesh:

../_images/july1201.png ../_images/july1200.png

And here are examples of texture atlas for further processing.

../_images/tex_00.jpg ../_images/tex_10.jpg
Automated benchmarking for NNN and FLANN
Monday, July 11, 2011

I just finished the automated benchmarking for NNN and FLANN. Tomorrow I’ll post the figures of some more realistic benchmarks, meaning that I will only take into account point clouds that are captured by a kinect device and try to provide a brief overview of my findings. Also, since large radiuses that cover almost the entire point aren’t relevant in real world situations, these scale of radiuses will not be covered in the bencmarking tests. For this, I’ll have to take into account the scale of every individual pointcloud that is being benchmarked, but normally this won’t take too much time.

So tomorrow you can expect a zip file with some folders containing the benchmarking figures for a radius search with one random query point in all of the ‘real world point clouds’ you can find @ http://svn.pointclouds.org/data/. All of these results are automatically generated by the benchmarking program, written in Python and C++. I’ll upload this program to the svn repo and post a link, so you can try it out yourself with different settings if you like.

Tomorrow I’ll also start inspecting some more of libnabo. For this library I’ve already added some support for initializing the appropriate data structues that are necessary for calling the appropriate procedures, but benchmarking for this library isn’t yet supported. I’ll try and get this finished tomorrow, together with some benchmarking figures of this library versus FLANN.

Performance In Comparison To Other Implementations
Monday, July 11, 2011

Basically my kd tree is up and running now. The problem is that the performance is always better than FLANN, but the speed advantage in some tests is sometimes as low as as 8x, while it is more around 10x-15x usually, and in some rare peak cases almost 25x. I tried finding some other GPU NN search implementations to compare my results to, but those are quite rare. One thing I found was http://nghiaho.com/?p=437, but the search was slightly slower than my implementation on synthetic data sets and much slower on real-world data.

In “GPU-accelerated Nearest Neighbor Search for 3D Registration” by Deyuan Qiu, Stefan May, and Andreas Nüchter, a gpu kd tree is used for ICP and, via an approximate search, achieves a speedup of up to 88x, though this is for the complete ICP procedure with a slower CPU and slightly faster GPU than my system. Also, no information about the NN search time is given, so to compare the speeds, I would have to test the complete ICP setup. If I interpret the results and fig. 4 correctly, their search algorithm is faster than my implementation when only a coarse approximation of the nearest neighbor is needed, but when an exact nearest neighbor is needed, I suspect that my code should be a lot faster.

Sunday, July 10, 2011

Since the speedup in comparison to the KDTree CPU implementation didn’t seem that high to me, I tried several things to optimize the search algorithm. These are:

  • Structure of Arrays (SoA): split the node structure into several (32 bit and 128 bit) structs and store each part in a single array: normally, this is supposed to result in performance gains on GPUs, but here, it always decreased performance.
  • Recursive Search: In the beginning, I re-wrote the tree traversal to work without a stack. But for this, I needed to store the parent index of each node, which increased the struct size by 128 bit due to alignment issues. I tried to instead keep a stack in shared memory and do the search in a recursive way. But this was also about 10% slower in the benchmarks presented in the last post.
  • Rotating split axes: Another way to save 128 bit on the node struct size was to rotate the split axis and remove the information about it from the struct. Result: again, no speed improvement.

So it seems like I came up with the best solution on the first try. I still have one or two ideas that might increase speed, but my hopes aren’t that high.

I think that the kD tree traversal just isn’t that well suited for GPU computation because of the large number of random memory accesses and generally the high number of memory accesses and low number of computations.

Integration of Kdtree, Octree and OrganizedNeighborClass into search base class
Saturday, July 09, 2011

I have done the initial integration of all search classes into one search base class. The class structure followed till now is:

class Search
class OctreePointCloud : public Search
class Kdtree : public Search
class OrganizedNeigbhorSearch: public Search
class AutotunedSearch : public Search

So, the appropriate search functions can be called as:


Search<PointXYZ>* kdtree = new KdTree<PointXYZ>();
kdtree->nearestKSearch (test_point, no_of_neighbors, k_indices, k_distances);
kdtree->radiusSearch (test_point, radius, k_indices, k_distances);


Search<PointXYZ>* octree = new OctreePointCloud<PointXYZ>(0.1f);
octree->addPointsFromInputCloud ();
octree->nearestKSearch (test_point, no_of_neighbors, k_indices, k_distances);
octree->radiusSearch (test_point, radius, k_indices, k_distances);


Search<PointXYZ>* organized_index = new OrganizedNeighborSearch<PointXYZ>();
organized_index->setPrecision(1);  // 1 for using OrganizedNeighborSearch functions and 0 for using OrganizedDataIndex functions
organized_index->nearestKSearch (test_point, no_of_neighbors, k_indices, k_distances);
organized_index->radiusSearch (test_point, radius, k_indices, k_distances);

Auto-Tuned Search: For now AutotunedSearch takes the appropriate datastructure as a parameter to initiate the search function.

Search<PointXYZ>* auto_search = new AutotunedSearch<PointXYZ>(KDTREE_FLANN); // or OCTREE, ORGANIZED_INDEX
auto_search->nearestKSearch (test_point, no_of_neighbors, k_indices, k_distances);
auto_search->radiusSearch (test_point, radius, k_indices, k_distances);

The above set of functions are implemented and pushed into trunk. Right now, I am working on AutotunedSearch class to make it more user friendly.

Filter & surface tutorials and merge
Saturday, July 09, 2011

So I made some progress with my project this week, all of the tutorials that I have done so far are now live on the tutorial section of the website. I ended up merging both the ConvexHull and the ConcaveHull tutorials into one tutorial because of how much code, and how many explanations were similar or the same between the two of them. I also wrote 2 more tutorials for the only two classes in the filter module that did not already have tutorials, specifically for the ConditionalRemoval and RadiusOutlierRemoval classes. I realized how similar the code for both of these was to the StatisticalOutlierRemoval tutorial written by Radu, so I decided to combine all of them into one all encompassing tutorial on removing outliers using different methods.

Real Data Benchmark
Saturday, July 09, 2011

So far, I only benchmarked the algorithms on synthetic data, where the search and query points were randomly scattered inside a 3d unit cube. There, the minimal kd tree was faster that FLANN by a factor of about 20 in the 1NN case, as shown in my last benchmark post. In some (unpublished) benchmarks, the FLANN kd tree ported to the GPU was always about 8 times faster than FLANN with the same random data set.

In this post, I’m going to show some results with more realistic data sets: I modified Nick’s benchmarking code to test the kNN search of FLANN, my FLANN GPU port and my minimal kd tree on the GPU. Here, two pcd files are loaded; the first one is used to build the NN index while the points from the second one are used as query points. All the tests used the data from the office dataset available in the PCL data repository and were performed on a Phenom II 955 CPU and a GTX260 gpu.

For the first test, ‘office2.pcd’ was used both as search and query points. The result is this:


As you can see, with real-world data, the FLANN GPU kd tree is always faster than both FLANN on the CPU and the minimal tree. Build time is about 3% slower for the GPU FLANN algorithm than for the normal FLANN implementation, and both are about twice as fast as the minimal tree. (The GPU build times includes the upload to the GPU, and the GPU searches include the up- and download to and from the gpu.) The pattern of the FLANN options’ build time being about twice as fast as the other build time repeated in all the tests, so it is not shown in the other benchmarks.

The search time of the FLANN GPU algorithm is shortest; in then case of k=1, it is about 6x faster than FLANN, and 30% faster than the minimal kd tree. With k=64, the speed of both is already equal. This is because the GPU is bad at handling the complexity of maintaining the result heap used in the kNN search, which takes more and more time with increasing k.

In the second test, ‘office1.pcd’ was used as the search points, while the query points came from ‘office2.pcd’:


Here, the minimal kd tree was not tested as it timed out already on k=32. With k=1, the FLANN GPU search takes only 0.025s, while the FLANN seach takes 0.86s, which is 34x the GPU search time! In this test, the influence of a decreasing speed advantage with increasing k can be seen again. For example, with k=128, FLANN takes 19s, and the CUDA FLANN 8s. To show the differences with low values of k, the following image shows only the tests until k=16:


The last test used ‘office3.pcd’ to build the tree and searched for the points in ‘office4.pcd’. Here, only the FLANN and GPU FLANN results are shown, as the minimal tree started to time out on the GPU very soon.


Here, the GPU search time was about 10x faster for the case of k=1, but the speed advantage decreased to a factor of 7 at k=8 and 2 at k=128.

To repeat the benchmark with your own pcd files, please download, build and install my patched FLANN from svn+ssh://svn@svn.pointclouds.org/flann/amuetzel/trunk/flann. Then, build the benchmarking code from svn+ssh://svn@svn.pointclouds.org/flann/amuetzel/trunk/benchmark. Finally, you can run the benchmark as “nnlib_benchmark <searchdata.pcd> <querydata.pcd>” and you’ll get a nice output of the search times, plus an output text file that can be plotted using gnuplot. If you want to use the code in your own application,

The result did not really surprise me. I guess the main reason for the fast results is the spatial organization of the search and query points; as both are (organized) point clouds from a kinect, query points that are next to each other in the cloud array are likely also spatially close. This leads to more coherent memory accesses, which is a good thing since the speed of the search is computationally cheap, but severely limited by memory access performance. The slow memory accesses also lead to the huge slowdowns with large values of k. There is not enough (fast) on-chip memory to store the neighbor heaps, so they have to be stored in (slow) GPU RAM. Thus, inserting new neighbors becomes expensive, as the data often has to be moved around in the heap structure. So as a conclusion, I think I can say that if you need few nearest neighbors, use your GPU, but if you need many nearest neighbors or have some spare CPU cores to parallelize the search, use the CPU.

If you test the code, I would really appreciate it if you tell me about the results, along with your system specs (GPU+CPU)! Of course, if you have any trouble building it, you can also contact me ;-) I would be especially interested in any benchmarks on Fermi-series cards, to see if the cache found on these cards improves performance.

Some more benchmarks will follow, for example about the approximate NN search and with some more other data sets.

Sixth week
Thursday, July 07, 2011

This week I returned to work on the project. I wanted to make sure the implementation of Harris vertical detector works correctly and tested it on artificial data. The result did not look okay. So I debugged the code and found one source of errors.

It seems that in the new version of FLANN the interface changed and the indices returned by radius search were not sorted. So a lot of points were ignored during neighborhood analysis.

I fixed that and the keypoints look great on the sphere:


Here the colour codes the detector response. The dominant orientations are estimated incorrectly. I am going to fix it next weak and then compare the detector with some other ones.

Thursday, July 07, 2011

For those of you looking at my previous benchmarking results, there are some erroneous numbers in there. You should ignore all NNN, SC, SC2 “without returning distances” timings from the fully detailed benchmarking results. A bug introduced overhead for this method and hence polluted the numbers for the NNN, SC and SC2 (without returning distances) methods.

Normally this doesn’t affect the evaluation I made in the previous blogpost. What we need to straighten out is the following:

  • standard NNN, SplitCloud and SplitCloud2 (without returning distances) will actually be faster than respectively standard NNN, SplitCloud and SplitCloud2 (with returning distances)

So the results of the optimized version, which doesn’t return distances, will run faster then the non-optimized one, which ofcourse sounds intuitive. How much faster this is, I will straighten out in a following blogpost as I am currently working on an automated benchmarking program. This program will enable me to keep the level of detail in the benchmarks, but greatly reduce the time needed to provide the results, as everything (including graphs) will be automatically generated.

Once this program is finished I will provide some new benchmarks that are evaluated on “real world” pointclouds, so this way we will have a fresh and entirely correct view of how NNN compares to FLANN. By then you will also see some benchmarking figures pop up on Andreas’ blog, so be sure to check those out as well. They are all related to optimizing the current nearest neighbor searching library used in PCL (FLANN).

Extracting Keypoints And Performing Registration
Wednesday, July 06, 2011

I have completed the Feature Evaluation Framework class, except for (i) integrating keypoint extraction, and (ii) registration of source and target clouds. Adding these functionalities will require some reading by me, which I plan to do currently.

Also, I am playing with various methods of visualizing the output of the FeatureCorrespondenceTests, and I intend to first run a series of tests on Feature algoithms to gather a set of benchmarking results and then devise a way to visualize it.

I am stuck up with a minor problem involving taking the ground truths as input, specifically converting an Eigen::Vector3f and Quaternion to a Eigen::Matrix4f representation. Hopefully I’ll find something useful in the Eigen documentation.

Generating texture atlas for 2 overlapping textures onto a surface mesh
Wednesday, July 06, 2011

I am working on generating texture atlas for 2 overlapping textures onto a surface mesh. This part basically pack all individual textures in the atlases. I am using OpenCV library to line all textures up horizontally. As I know, as common rending function requires that the width and the height of atlas should be less than 2048. So we may need multiple atlases in order to store all the textures. My implementation is working fine except few unexpected texture mappings still need to be correct when I map the mesh with generated atlases. I will plot something after I fix the errors.

Status Update
Wednesday, July 06, 2011

It’s been a while since I’ve updated! I was out of town for the 4th of July holiday, and therefore didn’t accomplish a lot last week. This week, I’ve been porting over the Poisson reconstruction algorithm, and also looking into the VTK smoothing algorithms, as well as playing with MeshLab to see what they have to offer.

MeshLab is a program that manipulates polygonal meshes, built on the VCG Library (http://vcg.sourceforge.net/index.php/Main_Page). One of its features is surface reconstruction of point clouds, so their implementations are good to look at to see what we may need to implement. From what I can tell, the two main reconstruction algorithms are Marching Cubes and Poisson. Since Poisson shouldn’t really be a separate reconstruction aglorithm, I will look into the code and see exactly how marching cubes and Poisson differ in this case.

Benchmarking results of NNN vs FLANN
Wednesday, July 06, 2011

I’ve done some extensive benchmarking of the NNN (Naive Nearest Neighbor) library and the library for nearest neighbor searching currently used in PCL, namely FLANN (Fast Library for Approximate Nearest Neighbors).

NNN provides 3 ways of doing a radius search for the nearest neighbours, each more naive then the other. Naive isn’t always a bad thing here, the more naive the method, the less overhead it incurs. For example the most naive method of NNN doesn’t need to build a data structure for the radius search, but just calculates the results directly. The two other naive methods are the SplitCloud and SplitCloud2 methods, which respectively subdivide the provided cloud into 8/64 smaller (overlapping!) subclouds. Radius search can then be performed on each of these (smaller! [1]) subclouds, such that the radius search is contained within that cloud (because of the overlapping construction of the subclouds). Meaning that during the search in one particular subcloud, no other subclouds need to be visited.

As I’ve said, the benchmarking is quite extensive, and it has cost me quite some time to collect these results. Although such results provide a good foundation for drawing conclusions, I’ll keep the time spent in mind for the benchmarking of the libnabo library and probably will try to keep the benchmarking more brief than this one.

I’ll present here some graphs I created from the benchmarking numbers accompanied with my conclusions. Feel free to look at the fully detailed results, which can be found by clicking here.

../_images/nnnvsflann0.5.png ../_images/nnnvsflann1.png ../_images/nnnvsflann2.png

The first three graphs show the search time of both the standard NNN (without splitting the cloud, so no build time) and FLANN for radiuses = 0.5, 1 and 2 on a cloud of points randomly generated in the interval [0,1]. So the radius is respectively underfitting, nearly fitting and overfitting all points in the point cloud. We can see that we start benefiting from the standard NNN approach as soon as the radius becomes pretty large in comparison with the point cloud, a point where FLANN starts performing worse.

The case where the radius = 2 and thus is overfitting all the points, is just to show that there is a standard NNN method that has an optimization which tries to avoid calculating the euclidean distance, hence not being able to return the distance of each point to the query point. In some cases this optimization can have an impact on the time spent on the radius search. In this particular case when the radius is overfitting the points in the point cloud, the optimization will be exploited maximally, resulting in a significant better time for the radius search.

../_images/splitcloud0.10.png ../_images/splitcloud0.25.png ../_images/splitcloud0.50.png ../_images/splitcloud1.png

Next I compared the SplitCloud and SplitCloud2 methods with the standard NNN search method. A similar optimization as in the standard NNN can be applied, but it is not represented in the graphs, as the SplitCloud and SplitCloud2 method only incur overhead from this approach, resulting in times that are far worse than without “optimization”.

As said before, the SplitCloud and SplitCloud2 method rely on splitting the cloud in respectively 8/64 subclouds, taking into account the radius. We can see that for smaller radiuses, thus radiuses not containing a large quantity of the points in the cloud, splitting makes sense. This is intuitive, as for larger radiuses we don’t benefit from splitting the cloud in almost entirely overlapping subclouds. However in the cases where SplitCloud and SplitCloud2 can make a difference to the standard NNN (radius < 1), FLANN is still the faster option regarding radius search times.

For the case where the radius = 1, SplitCloud and SplitCloud2 perform approximately equally well as the standard NNN, of which we know from the preceding figures that for a radius = 1 it outperforms FLANN.

../_images/buildtimesc.png ../_images/buildtimesc2.png

However FLANN “dominates” [2] the SplitCloud methods in terms of search time in the lower radius cases, another thing to consider is the build time of the datastructure needed to gain this performance. We can see that the build times for the SplitCloud2 and especially the SplitCloud method, outperform the build time of the FLANN kd-tree greatly [3].

Conclusion Overall we can conclude that in the cases where the radius is large with respect to the point cloud, the direct method (standard NNN, with returning distances) of performing a radius search is more appropriate. The SplitCloud and SplitCloud2 cases are more delicate.. I can carefully say that using the SplitCloud or SplitCloud2 method is more suitable when performing radius searches for only a small number of query points. This way we can maintain the advantage gained in the building of the datastructure.

Future work As stated in this rather lenghty but complete blogpost, I will proceed with benchmarking libnabo more briefly. After the conclusions drawn from the benchmarkings and with the blessing of Marius, I will start integrating the two libraries into FLANN.

What I would also like to mention is that I’m curious on what the performance of the SplitCloud and SplitCloud2 methods would do when they would benefit from multithreading. As this approach is clearly suitable for multithreading (when multiple query points are provided) the NNN library doesn’t introduce this kind of concurrency. This might be an interesting thing to consider when starting the second phase of my roadmap (which consists of multithreading FLANN after I’m done with the integrations).

[1] in case of a meaningful subdivision

[2] mind the scale on the graphs, on the relative scale FLANN seems to dominate the SplitCloud methods, but on the absolute scale (in the case of a low number of query points) there isn’t much difference because the radius search times are really small.

[3] again, mind the scale

GICP integration and a break for diploma presentation
Tuesday, July 05, 2011

Yesterday it was my diploma presentation. Last week was a crazy week. I had to finish and bind the written book and also refine the final results. I tried to do something for GSoC too.

GICP current implementation depends on ANN and GSL libraries. PCL 0.7 became independent of ANN and now it relies on FLANN. Therefore this must happens in GICP case too. Also GSL and PCL have incompatible licenses and in order to integrate GICP into PCL it is necessarily to replace GSL functions with Eigen and CMINPACK equivalent functions.

During last week I studied ANN, FLANN, GSL, Eigen and CMINPACK documentations. Regarding to ANN and GSL functions used by GICP I proposed their equivalences from FLANN, GSL and CMINPACK. You can find a draft which is open to new suggestions here and if you would like to add something please post on PCL blog.

My goals for next days are to apply the listed equivalences and add GICP to PCL without ANN and GSL.

Thank you for understanding the delay introduced by my graduation!

First tutorial lives
Tuesday, July 05, 2011

First off I’d like to apologize for the lack of posts as of late. I’ve been dealing with several unexpected family issues that have required a significant amount of traveling, and my time and attention. Regardless, my focus will be dedicated to this program soley for the rest of the summer. I’ve more or less finished a turorial for the IterativeClosestPoint class in the Registration module. I’m still becoming comfortable with the PCL code, so if anyone has some suggestions or anything please shoot me an email. I’ve also realized that there is an unconsistency between the tutorials about certain things like ‘using namespace...’. I think it could be beneficial to decided on a standard style that can be applied across all of the tutorials.

Edit: I ended up also going ahead and working on a simple tutorial for the surface module. It’s for the ConcaveHull class. There is something wrong with the pcl code at the moment, because the final concave hull that gets returned is composed of 0 points. This is also true for ConvexHull right now. Other than that the tutorial is correct.

Surfel Surface Smoothing and Pyramid Feature Matching
Tuesday, July 05, 2011

A few days have passed and it’s time for a new blog post :-).

To begin with, I added a useful new algorithm to the PCL library: Surfel smoothing based on the work of:

  • Xinju Li and Igor Guskov - “Multiscale features for approximate alignment of point-based surfaces”

It is an interesting iterative modification of the Gaussian mesh smoothing. Right now only the smoothing part works, the salient point extraction seems to need more testing, as I am not totally convinced about the results. Nevertheless, you can see a picture below of the effect:


Another interesting result I have is the use of:

  • Kristen Grauman and Trevor Darrell - “The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features”

I have implemented a generic class for comparing surfaces represented by unordered sets of features. The class then returns a value in [0, 1] representing the similarity between the two surfaces. To test it, I tried the very naive, raw representation of PPF features (i.e., pair features between all the pairs in the cloud) and the results are rather interesting:

  • my face with two different expressions: 0.867585
  • my face compared to a watering can: 0.586149
  • watering can compared to a chair: 0.399959
  • obviously, any object compared to itself: 1.0
Monday, July 04, 2011

Last week after some discusions with Nico, I managed to correctly complie and install OpenCV with CUDA support. This part was messy for me. After that I recompiled the source code of PCL, WITH_CUDA, and compiled the kinect_ransac.cpp in order to get more familiar with the framework. During the weekend I started to implement the MSAC algorithm. I can say that here I have progressed a little. The code is not done yet, but is due tomorrow night. Today I am planing to implement and test the sac model functions needed to be able to compute the MSAC. - this is inspired from pcl/sample_consensous/. The next post, probably tomorrow night will contain some results from the MSAC implementation with CUDA.

Benchmarking OrganizedNeighborClass and OrganizedDataIndex using office dataset
Sunday, July 03, 2011

I tried benchmarking the search functions in OrganizedNeighborClass and OrganizedDataIndex classes using the office dataset which Julius gave. The dataset is available at: http://svn.pointclouds.org/data/office/ .

Following are the time benchmark tests done on radiusSearch functions for OrganizedNeighborSearch and OrganizedDataIndex. The number of nearest neighbors for both classes is the same for most of the experiments. However the indices of nearest neighbors vary for both the classes. In some cases the number of nearest neigbhors is also different for both the classes but they are nearby which suggests that OrganizedDataIndex might be an approximate version of OrganizedNeighborSearch. The search point is decided at random ensuring that the search point is not NaN. The time calculated for each search radius is average of 100 iterations.

Data: office1.pcd

Search Point: -1.16809, 0.0467238, 4.906

Search Radius Organized Neighbor Search Organized Data Index Number of Nearest Neighbors
0 0.0511418 1.3416e-06 1
0.1 0.00958581 3.3005e-05 1172
0.2 0.0102894 0.000175295 3917
0.3 0.0118625 0.000422468 7432
0.4 0.0148702 0.000436519 9336
0.5 0.0157803 0.000821651 11850
0.6 0.0166538 0.00117252 14238
0.7 0.0177392 0.00146132 16663
0.8 0.0187914 0.0017615 21237
0.9 0.0198163 0.0020671 25461

Data: office2.pcd

Search Point: -0.723531, 0.218086, 1.347

Search Radius Organized Neighbor Search Organized Data Index Number of Nearest Neighbors
0.000 0.09484 0.00000 1
0.100 0.01409 0.00022 1743
0.200 0.01513 0.00045 3739
0.300 0.01823 0.00091 14067
0.400 0.02444 0.00195 25933
0.500 0.02855 0.00289 44137
0.600 0.03438 0.00409 63127
0.700 0.04080 0.00531 74889
0.800 0.05044 0.00649 92927
0.900 0.05616 0.00811 111587

Data: office3.pcd

Search Point: 0.0144343, -0.43784, 1.263

Search Radius Organized Neighbor Search Organized Data Index Number of Nearest Neighbors
0.000 0.07388 0.00000 1
0.100 0.01170 0.00012 4538
0.200 0.01426 0.00061 13565
0.300 0.01612 0.00110 24162
0.400 0.01890 0.00196 36527
0.500 0.02515 0.00354 53609
0.600 0.03459 0.00555 73626
0.700 0.05051 0.00613 92836
0.800 0.06044 0.00957 152082
0.900 0.06897 0.01178 199561

Data: office4.pcd

Search Point: -0.447771, 0.0920419, 1.306

Search Radius Organized Neighbor Search Organized Data Index Number of Nearest Neighbors
0.000 0.04692 0.00000 1
0.100 0.00948 0.00009 3093
0.200 0.00970 0.00038 12636
0.300 0.01246 0.00083 28916
0.400 0.01408 0.00162 48606
0.500 0.01939 0.00282 75014
0.600 0.02332 0.00395 102830
0.700 0.03006 0.00550 131138
0.800 0.03190 0.00629 154212
0.900 0.03357 0.00718 168709

For now, I have integrated OrganizedDataIndex into OrganizedNeighborSearch with a setMethod() function and pushed the same into trunk. I have started with the integration of Octree function into search base class.

More FLANN progress
Sunday, July 03, 2011

Last week, I didn’t have that much time for the GSoC work. But at least I finished the 1NN and kNN searches, integrated them into FLANN and uploaded the code to svn.pointclouds.org/flann/amuetzel/trunk/flann. To use it, first build the modified flann library you can find there.

The only difference to using the normal FLANN KdTreeSingleIndex is to create the index like this:

flann::Index<flann::L2<float> > flannindex( dataset, flann::KDTreeCudaLowdimIndexParams() );

If you do this, you will likely get an exception about an unknown index type as soon as you run the code. To solve this, #define FLANN_USE_CUDA before including flann.hpp and link libflann_cuda to the executable. The CUDA index types are enabled this way to avoid having to link the lib when no CUDA index is going to be used.

If you want to try the code, there is one other thing to keep in mind: Right now, it only works with the flann::L2<float> distance on 3D vectors.

In the next days, I will start working on four things, likely in this order: Restructuring the GPU kernel code to make it easier to adapt it to other distance types, porting the minimal kD-Tree to the FLANN API, making the interface work with 2D and 4D vectors and finally creating a benchmark with real-world data sets. I hope to be able to reuse some parts of Nick’s benchmark code for that, as he is doing the comparison of of the NN search libraries at the moment.

Finalising The Feature Test Class
Friday, July 01, 2011

This week I have been focusing on finalising the Feature Test class, to be used for benchmarking of Feature Descriptor algorithms. The class supports the following pipeline:

  • loadData (source and target clouds, ground truth)
  • setThreshold (either a single threshold value, or a threshold range, specified by lower bound, upper bound, and delta)
  • setParameters (specific to the Feature Descriptor algorithm, given as a “key1=value1, key2=value2, ...” string)
  • performDownsampling (filter input clouds through VoxelGrid filter, with specified leaf size)
  • extractKeypoints (extract keypoints from the downsampled clouds)
  • computeFeatures (compute features of the keypoints if extracted, or else on the preprocessed clouds)
  • computeCorrespondences (match the source and target features) (or) computeTransformation (register the source and target clouds)
  • computeResults (evaluate total successes and/or runtime statistics)

Important tasks include:

  • The FeatureEvaluationFramework class should support running multiple tests over a single independent variable, eg input clouds/ threshold/ parameters/ leaf size, etc
  • The output of each set of runs should be published in CSV format, which can be plotted in a graph/table
  • An executable “evaluate_feature” located in “trunk/test/” should allow running of multiple tests by choosing an independent variable and other input values, through the command line
  • It should also support quickly running some predefined standard tests on Feature algorithms, to compare the results of newly implemented algorithms with previous ones
Bye bye exams! Welcome GSoC!
Thursday, June 30, 2011

Finally my exams are done! Eagerly waiting for my result won’t be an option this year, as GSoC will keep my mind on some more important stuff!

I got all libraries (libnabo, nnn, FLANN) to build on Ubuntu 10.04LTS and am ready to benchmark their performance. So, my primary focus now will be on the benchmarking of the libraries and find the possible sweet spots of each. You’ll find out more in my next blog post :)

Updates about my progress
Wednesday, June 29, 2011

A lot has happened since my last update. I have read a lot about applying the multiscale concept in 3D feature representation, as I found this field was not exploited enough - and it is very successful when applied to 2D images.

As such, I implemented the multiscale feature persistence algorithm proposed by Radu et al. in “Persistent Point Feature Histograms for 3D Point Clouds”. Also, I tried the rather slow method introduced by Ranjith Unnikrishnan in his PhD thesis. I am still at the stage of tuning the algorithms, but they seem to work as expected so far. In this process, I ran into software engineering problems again - having to decide on the architecture of new abstract classes in the PCL framework - for this I asked the help of the other developers and will decide on the best solution after some thorough discussions.

Another interesting problem I faced was that of geodesic distances inside the point clouds. There are quite a few methods to solve this problem - most of them refer to creating a graph representation of the point cloud using different heuristics. I am not yet convinced about the ideal version for this, so I will be looking into it in the next days.

Furthermore, I contacted the author of an algorithm I implemented a few weeks ago - Slobodan Ilic to ask for some hints about his own experiments using noisy data. After following some of his advice, I managed to tune the parameters so that it works satisfactorily on Kinect datasets such as my dad’s garage :-) - where it detects a chair and a watering can:


I forgot to mention that I received my very own Kinect and spent some time playing with it (under Mac OS X - and it works after a looot of tweaks and hacks!!!). I will be collecting all the datasets I am using for my own experiments around the house and will upload them in a structured way to the PCL dataset repository: svn+ssh://svn@svn.pointclouds.org/data

VTK Surface Smoothing First Look
Monday, June 27, 2011

VTK has a couple different surface smoothing algorithms at its disposal: vtkSmoothPolydataFilter and vtkWindowedSincPolyDataFilter. These methods take in a set of polygons in VTK format, and output a smoothed version of the same. There are also algorithms that will subdivide existing surfaces, vtkLoopSubdivisionFilter and vtkButterflySubdivisionFilter. These take in a set of triangles, and then split each triangle into a set of four triangles. I can imagine situations when one would wish to first split their mesh into a finer set of triangles, and then apply smoothing. I believe integrating these smoothing filters would be simple, simply a matter of converting the data to VTK format, running the filters, and converting back.

Poisson Reconstruction Summary
Monday, June 27, 2011

After a more detailed examination the Poisson reconstruction algorithm, I’ve discovered that it is actually a method for constructing a voxel grid, on which marching cubes is then run. Specifically, the paper (http://research.microsoft.com/en-us/um/people/hoppe/poissonrecon.pdf) outlines a method using an octree. Since Misha (http://www.cs.jhu.edu/~misha/Code/PoissonRecon/) has allowed us to integrate his code into PCL, I will most likely do a separate port of this code into PCL as its own class. This is non-ideal, however. I believe the best solution would be to use the existing marching cubes implementation to reconstruct the octree that is output by the code. However, as I wish to add as many methods as possible, I will first implement the Poisson method in its entirety, and only later on update the marching cubes code to take octree inputs as well. It could be that adding octree implementation to the marching cubes code would be simple, and I will keep this in mind and change plans accordingly depending on what I find.

2 overlapping textures onto a surface mesh
Monday, June 27, 2011

Recently, I have generated an example of 2 overlapping textures onto a surface mesh. The texture mapping algorithm base on algorithm in “Texture Synthesis on Subdivision Surfaces” but I haven’t implement the blending part yet.

Here are some snapshots:

../_images/june_2700.png ../_images/june_27_201.png
Fifth week
Saturday, June 25, 2011

As I described earlier, I had faced the problem of normal orientation. When the orientations are random, descriptor and detector could be estimated incorrectly. So I tried to make them invariant to individual normal orientations.

Since my implementation of Harris detector uses only squares of projections, it does not require to have consistent orientations. So I needed to change implementation only for the dominant direction estimation. It attempts to estimate the direction where the most normals are directed (weighted window is used). So I decided to vote for the directions in pi instead of 2pi, which makes the method insensitive to orientations.

I am leaving to the UK for a summer school in a few hours, so I won’t be able to work during the next week. On returning I plan to test the repeatability of this detetor on SHREC dataset and look if the dominant direction is estimated robustly. Then I plan to use spin images to estimate the quality of the overall voting framework.

Next Experiments
Saturday, June 25, 2011

The last days I have started working on translating the FLANN KDTreeSingleIndex to a GPU version. FLANN’s implementation of the kD-tree has two main advantages in comparison to my implementation: First, the worst-case performance when the query points lie far away from the closest search points is way better. Second, it is possible to store multiple points in a single leaf node, which is not possible with the minimal tree representation.

I hope that both points will be beneficial in the GPU implementation. In the last days, I worked on the tricky problem of translating the search algorithm to a non-recursive version, along with the necessary changes in the data structure. Right now, I am porting this to CUDA and will report on the performance as soon as it is done.

Effect of Downsampling on Feature Computations
Friday, June 24, 2011

This week I have been running tests on the benchmark dataset using the FeatureEvaluationFramework class.

As suggested by Michael, I have added functionality to the Framework class to preprocess the input clouds, i.e. downsample them using VoxelGrid filter, to reduce the running times of feature computations.

For this, I have modified the Framework class, as well as the FeatureCorrespondenceTest class. The major changes to code are: (reflected here)

  • Added functions to control preprocessing of input clouds, using VoxelGrid filter to downsample the clouds.
  • Restructured the functions for running the tests, added a function to run tests over a range of leaf sizes (of VoxelGrid filter).
  • Added a minimal TestResult class to store the results of each test case as a separate object. Functions can be (will be) added to this class for stuff like printing to CSV, publishing to ReST, plotting graphs, etc.

I used this class to run FPFHEstimation algorithm on a sample dataset (cloud_000.pcd) for various values of the leaf sizes. Here are the results:

Feature Name: FPFHEstimation Parameters: threshold=0.01, searchradius=0.003 Dataset: cloud_000.pcd Input size: 307200

Machine Config Intel Core 2 Duo P8700 @ 2.53 GHz, 4GB RAM, on Ubuntu 10.10


Leaf size Preprocessed Input Size No. of Successful Correspondences Time Taken For Source Features Time Taken For Target Features Total Time For Feature Computation
0.5 28 1 0 0 0
0.1 369 56 0 0 0.01
0.05 1232 1219 0.01 0.02 0.04
0.01 22467 22465 3.18 3.21 6.78
0.007 40669 40667 10.69 10.74 22.47
0.005 69912 69910 31.86 31.82 66.42
0.001 234228 234226 671.67 674.85 1390.69
0.0007 235729 235727 729.75 722.02 1497.79

Note: In case of FPFHEstimation, the total time taken for Feature computation includes the time for calculating normals of the input points.

Anyone volunteering to provide benchmark results using this class (alongwith you machine config) is highly appreciated.

2st Texture mapping - Texture synthesis.
Friday, June 24, 2011

For a long week, I haven’t updated my progress. Here is my update for this week:

  • Implemented texture synthesis on the paper “Texture Synthesis on Subdivision Surfaces”.
  • I have completed the texture parameterization part and the next step will be blending texture and generating texture atlas.

Here is my first result after texture parameterization step:

../_images/snapshot01.png ../_images/snapshot02.png

The result still have few unexpected mapping and it is not smooth because:

  • I haven’t found the optimal solution for selecting the edge which determines the angle (alpha) with vector field (face orientation) calculated on each face.
  • Texture is not blended yet. I have started working on it.
Poisson Reconstruction & Marching Cubes Code Structure
Friday, June 24, 2011

After doing some preliminary research into Poisson Reconstruction, it turns out that the algorithm is actually just a novel method of voxelizing data. In their paper, Kazhdan et al state that they use a modified version of Marching Cubes, utilizing an octree. Given this new finding, I will restructure my Marching Cubes code to be more generic in its voxelization methods. I believe the best method would be to create a virtual voxelization function in the MarchingCubes class, and have child classes specify the voxelization technique. This way, any type of voxelization can be combined with the Marching Cubes isosurface extraction.

An even more general method of decoupling the voxelization from Marching Cubes is to create two separate classes, Voxelizer and MarchingCubes, and have the voxelizer create a data structure of voxels, that is passed into the MarchingCubes surface reconstructer. This would require a standard Voxel data structure as well.

Weekly Status Report
Friday, June 24, 2011

This week, I accomplished the following:

  • finished Marching Cubes implementation
  • began research into better voxelization methods
  • started research on Poisson Reconstruction
  • started to decouple voxelization from Marching Cubes

Next tasks include:

  • decide on a method to decouple voxelization from Marching Cubes
  • implement Poisson reconstruction
  • develop good unit tests for the new classes
  • look into other novel surface reconstruction algorithms
Comparing OrganizedNeighborClass with OrganizedDataIndex
Thursday, June 23, 2011

During the last few days I integrated OrganizedNeighborClass into the base search class and pushed the same into trunk. The functions inside OrganizedDataIndex is merged into OrganizedNeighborClass. OrganizedNeighborClass performs the search operation by projecting the search sphere onto the disparity image and then do a 2d search in projected search space. OrganizedDataIndex performs the same search operation by projecting the search point onto 2d and then defining the search space. So, in the first case we have a reduced projected search space with a overhead of projection computation whereas in the second case the 2d search space is bigger but with no projection computation overhead. To decide which method would work best on organized data, we do some benchmark tests by comparing the time taken by radiusSearch for both methods.

Here are some of the results:

Point Cloud Resolution Search Radius OrganizedDataIndex OrganizedNeighborSearch
640 x 480 0.0249623 0.00160503 0.00819993
640 x 480 0.73077 0.000736952 0.00690317
640 x 480 0.861978 0.000688791 0.00860214
1024 x 768 0.0519982 0.00165892 0.0255649
1024 x 768 0.367919 0.00194287 0.019875
1024 x 768 0.809822 0.00166297 0.0175359

K Nearest Neighbor module is not working fine, which I hope to debug soon.

Plans for next few days include,

  • Committing the integrated code into trunk
  • Benchmarking organized search functions with kinect data
  • Start with the integration of Octree functions
Marching Cubes Results
Wednesday, June 22, 2011

I recently completed work on the marching cubes algorithm. The marching cubes algorithm is an algorithm for constructing a polygonal isosurface from a set of volumetric data. For a more in-depth overview of the algorithm, see Paul Bourke’s excellent overview (and code) at http://paulbourke.net/geometry/polygonise/. Working off of his generously supplied code, the actual algorithm was easy to implement. The difficult part was actually creating the volumetric data to generate the surface from.

In other applications, a volumetric data set, such as a CT scan, is provided, and creating the volume is trivial: each vertex in the voxel is simply a point from this data set. In the case of point clouds, however, this set of voxels must first be created. As a first cut, I implemented a simple method for creating the voxels. For each point in the cloud, I found the voxel that contains it, and set every vertex in that voxel to be 1, and also updated all of the neighboring voxels so that the correct vertices are set to 1 as well. All other vertices are set by default to 0. Marching cubes is then run on the resulting voxel set, using an isovalue of 0.5. One great side effect of this method is that many empty voxels can be filtered out, and not traversed over, saving both computation time and memory. The results are as follows:

../_images/bunny_mc1.png ../_images/bunny_mc2.png

These results can be contrasted with the greedy surface triangulation method:

../_images/bunny_gp31.png ../_images/bunny_gp32.png

The algorithm does produce a smooth, continuous isosurface, with some caveats:

  • The isosurface looks “blocky,” in that there are only sharp 45 degree angles to outline curves. This can be contrasted with the greedy surface triangulation method which can produce more smooth looking surfaces.
  • The isosurface has holes in some areas, some of which are highlighted in the second picture.
  • Even though the model used should be a planar segment, the algorithm produces a shell-like triangle enclosure that surrounds the actual point cloud. This can be seen more easily in the second image.
  • The size of the voxel must be chosen carefully. If the size is too small, you will get another point cloud, except with polygonal spheres encapsulating your points. If too large, you’ll get degenerate shapes that don’t convey any structure. This is actually a difficult problem to solve automatically, and so far I’ve only been doing it heuristically. The following images show some examples of poorly chosen leaf sizes.
../_images/bunny_mc3.png ../_images/bunny_mc4.png

From these results, I think that more work should be done in the actual creation of the voxel grid, to make marching cubes behave better. The marching cubes algorithm is straightforward; voxelization of data is the interesting part. I’ll continue to think about this and post any new discoveries. From an implementation standpoint, I think that we should also split the code into 2 parts: voxelization and marching cubes. This will enable the user to pick different methods of constructing the voxel grid, and use the one that works best for them. For now, I think the easiest thing to do will be to add it as a flag when running the algorithm, however in the future we might consider creating a separate voxel class, and allowing the user to manually create the voxels themselves.

Next tasks include:

  • Complete unit testing and push the new code to the trunk
  • Begin Poisson reconstruction
  • Find a more robust voxelization method
GICP Registration
Wednesday, June 22, 2011

Short update regarding the progress of GICP integration :

  • Below are displayed two points clouds registered by GICP
    • As inputs, I used two point clouds taken from Kinect sensor.
      • green -> target
      • red -> source
      • blue -> source aligned to target
      • first -> front view
      • second -> top view
    • No initial guess for transformation -> identity matrix
../_images/GICPRegistrationFrontView.png ../_images/GICPRegistrationTopView.png
  • Even the two point clouds look little different (target contains much more wall points) GICP succeed to align them
  • The computational time must be improved in order to try to use it for real time registration
Virtual rendering of CAD models as pointcloud...
Wednesday, June 22, 2011

A couple of new functions into pcl::visualization to obtain pointclouds from CAD models. The first one is renderView() and the second one is renderViewTesselatedSphere(). We can add different geometries to the visualizer and once the virtual scene is created, we can just call the renderView() method to obtain a pointcloud of it seen from the virtual camera. The second one is mainly for object recognition methods that use synthetic models as training data and need to simulate views from the models seen from different viewpoints. Using a more rudimentary version of renderViewTesselatedSphere() is how the pointcloud version from the background robot at pointclouds.org was created :) after merging eighty views of it and applying some filters...

int main(int argc, char ** argv) {
  vtkSmartPointer<vtkPLYReader> readerQuery = vtkSmartPointer<vtkPLYReader>::New();
  readerQuery->SetFileName (argv[1]);
  vtkSmartPointer<vtkPolyData> polydata = readerQuery->GetOutput();

  pcl::visualization::PCLVisualizer vis("Visualizer");

  vis.addModelFromPolyData (polydata, "mesh1", 0);

  vis.camera_.window_size ={480,480};
  vis.camera_.pos = {0,5,5};
  vis.camera_.focal = {0,0,0};
  vis.camera_.view = {0,1,0};



  //call render in the visualizer to obtain a point cloud of the scene
  pcl::PointCloud<pcl::PointXYZ>::Ptr cloud_out (new pcl::PointCloud<pcl::PointXYZ> ());
  vis.renderView(256,256, cloud_out);


  return 0;

And the results: you can see the CAD model and a two partial views generated from the same viewpoint with different resolution (the top one at 64x64 and the bottom one at 256x256).

More exams...
Tuesday, June 21, 2011

They don’t seem to end do they... Finished my exam for Digital Electronics today, 5 hours of intensive digital circuit drawing and design, my brain is fried!

But now for GSoC achievements:

The past evenings, I found some time to spend on GSoC, so I’ve typed my internship plan and Radu signed it. Thanks Radu :)

I’ve also been trying to get the current CUDA implementation that Andreas developed up till now to compile on Windows. As I have a CUDA optimus enabled device, drivers for linux aren’t supported. So all CUDA development work will be on Windows for me. It’s not that bad actually, because the CUDA implementation should run on Windows to, so it’s a good thing to get all possible cross-platform errors fixed as early in the development stage as we can.

After fixing a decent amount of build errors, there is still one build issue that keeps me from compiling the implementation, and we don’t seem to get rid off. Its the CUDA compiler that won’t compile source files where the flann header gets included. Strangely Andreas has the same version of the compiler and on linux it doesn’t complaint. Together we are trying to figure out how to solve this tedious issue.

Progress report
Monday, June 20, 2011

Due to some minor problems I was not able to work on gsoc last week. Today I downloaded (knowing that Nico had updated the Cuda code last week) and compiled the updates concerning the CUDA code. I started to get a general ideea on what is going on. The plan is that this night I will try to better understand what was done. And tomorrow to test some of the code on my machine. After this I have to get some guidance from Nico concerning what is to be done next.

ICP Registration
Sunday, June 19, 2011

My activity from last week includes:

  • Installing and taking first datasets from the new Kinect sensor
  • Testing ICP registration on some datasets
    • As input I used two point clouds which differ just a little, but even that unfortunately ICP gets blocked in local minim
    • The computed translation (top view -> right) is good, but the orientation must be adjusted (front view -> left)
      • left - front view
      • right - top view
../_images/ICPRegistrationFrontView.png ../_images/ICPRegistrationTopView.png
  • The computational time of ICP keeps him away of real time registration

In progress working:

  • In the next days is GICP turn - it looks promising
    • I have to watch closely already existing implementation
  • Integrating GICP into PCL
  • The registration topic turns to be sensitive

My goals for next weeks are:

  • Think about the detection and matching of 3D global features
    • I think that 3D global features are the key solution of real time registration
  • Continue integrating GICP into PCL
Sunday, June 19, 2011

Object recognition and pose estimation for Kinect data trained on synthetic CAD models.


Image shows the recognized CAD models overlayed on the point cloud.

An example blog post
Saturday, June 18, 2011

This is a very simple example blog post.

Lolcatz, or Keypoint detection
Friday, June 17, 2011

Everybody is posting fancy pictures, so as me. I implemented Harris height map keypoints, here is visualization.


Note that I intensionally set high non-maxima suppression interval to make the picture visually plausible.

The directions for voting are defined as described in my previous post. The problem is the normal map is discontinuous. For example, if all normals are directed towards the model centroid, for the points whose tangent goes through the centroid there are discontinuties. In such cases the dominant orientation cannot be estimated properly.


I tried to use MLS to get a consistent map of normals, but fixed-radius MLS failed on that model. Another solution is to use greedy triangulation, but it is unlikely to work with real-world scans. So I am going to ignore the orientation of normals and vote for two locations.

Work Status
Friday, June 17, 2011

Committed base search class into trunk inheriting KdTree class with all the kdtree unit tests working fine. Next I am working on integrating OrganizedNeighborSearch class into base class. For now, it is giving some errors in certain unti tests, which hopefully would be debugged soon.

So, during next week, my plans are to complete the integrating of OrganizedNeighborSearch class into the base search class, commit it and start looking at Octree implementation.

Coding inertia
Friday, June 17, 2011

As a newcomer to large scale coding projects, one interesting problem I’ve had is the amount of time it took me to begin writing actual code. In the beginning, I spent a long time simply examining the code, trying to understand all the different functions and existing tools (e.g. boost shared pointers, eigen vectors, search trees, etc). It was a bit frustrating in the beginning as I felt I was not productive, but in retrospect, all the time I spent reading the code, playing with simple examples, and just thinking really pay off now that I’m writing code. I feel like after a slow start, I can finally begin building up some momentum and get some tangible results.

Weekly status update
Friday, June 17, 2011

This week, I completed the following tasks:

  • I completed a review of the surface reconstruction API. Ultimately, I decided to leave it be, and closely monitor it in relation to the development of new surface reconstruction algorithms.
  • I successfully added and compiled a skeleton MarchingCubes class. I’ve completed the code to voxelize the input point cloud and am beginning to code the actual surface reconstruction algorithm.

Next tasks include:

  • complete coding marching cubes
  • debug and design unit tests for the marching cubes algorithm
  • begin Poisson reconstruction and research new, novel reconstruction algorithms
First Code Uploaded
Thursday, June 16, 2011

I finally cleaned up the code a lot and restructured it so that it can be used in regular cpp files that are not processed by nvcc. (It’s not integrated into PCL, but will go straight into FLANN.)

If you want to try it, even though I wouldn’t consider it production-ready: the code is located at http://dl.dropbox.com/u/32615544/kdtree.zip until it is integrated into FLANN. The interface is slightly similar to FLANN, but it can’t be used as a drop-in replacement in the current form. You will likely need to change one include path in the CMakeLists.txt, as it points to a directory in my own home directory, but apart from that, it should work out of the box.

Any benchmark results would be welcome, together with information about your GPU and CPU! (I usually execute the tests with 1M points.)

Evaluating Feature Descriptor Algorithms
Wednesday, June 15, 2011

Finally, I have finished integrating a Feature Evaluation Framework class for testing multiple feature desciptor algorithms, on multiple datasets, with multiple sets of parameters. The code for it is here.

To test any algorithm, first a Test class for that algorithm must be derived from the FeatureCorrespondenceTest class. Then link the algorithm with the FeatureEvaluationFramework class, so that it can be tested on user provided datasets and parameters.

I have written a toy program to test FeatureEvaluationFramework class. It uses FPFHTest to run the FPFHEstimation algorithm on two sample datasets, using three sets of parameters.

The output of the test is currently printed to std::out in the following manner:

$ ./test_feature
----------Test Details:----------
Feature Name:  FPFHEstimation
Input Dataset: bun0.pcd
Parameters:    threshold=0.01, searchradius=0.03
----------Test Results:----------
Input Size:    397
Successes:     397
Failures:      0

Click here to see the complete output.

Next steps would be:

  • Deciding a good format for storing and visualizing the test results.
  • Using the class to run tests on the conference room benchmark dataset provided by Michael.
  • Uploading the tested code to the repository, and the test results to PCL website and/or repository.
  • Going ahead with the Registration Framework, in a similar manner.
GSoC Updates
Wednesday, June 15, 2011

Some more updates about my latest work. I fully implemented the ‘Model Globally, Match Locally: Efficient and Robust 3D Object Recognition’ paper. The results are as presented by the authors (Please see the figure for a dataset used by the authors in the paper). I also added a small optimization that will theoretically speed up the object recognition in larger scenes (will do some proper benchmarks next).


An interesting problem I faced during this time was to find a proper solution to average rotations, as my algorithm clusters the many results it gets from an internal Hough Transform and outputs the average poses on each such cluster. It seems that the naive solution of directly averaging quaternions in vector space and then normalizing them is accepted (you can find the proof in the publication: “On Averaging Rotations” by Claus Gramkow).

As the next steps, I am intending to test the algorithm on noisier data such as the one from the Kinect. In this process, a repository of pcd files for such tasks will be created and will be made available to the Pointclouds.org community.

Voxelization thoughts
Tuesday, June 14, 2011

The marching cubes algorithm is an algorithm used on volumetric data to construct isosurfaces. For example, CT scan data may have a spectrum of values from 0 to 255, and the user would pick a value that corresponds to something interesting (for example, 192 may correspond to bone), and then an isosurface would be extracted that traces that value.

The implementation for PCL is a bit different because a point cloud is a discrete set of points, without a spectrum of values. In fact, the data isn’t even in voxel format, so the cloud must first be converted. There are some parameters that must be tweaked first, though, such as the voxel size, and how to determine the value at each voxel. After some thought, I’ve decided that the best way to do this would be to create a high resolution voxel grid, no larger than the smallest distance between points. As a first cut, we’ll iterate through the point cloud, and assign each point’s nearest vertex a alue of 1. There are other ways of computing the value of each vertex, such as a weighted average of each nearest point to a vertex based on distance. Once the code is up and running, I’ll have to run tests to see what the best method would be.

Third week
Monday, June 13, 2011

As I noted earlier, both keypoint detectors implemented in PCL are not suitable for monochrome unorganized clouds. So I needed to come up with a new one.

Vertical world prior (though not always holds) might help in recognition. In many applications we know that the vertical axis of an object remains constant. For example, a car is usually oriented horizontally, while a pole is usually strictly vertical. On the opposite, car toy models might have any orientation. So, if the application allows it, the keypoint detector should be invariant only to horizontal rotations. Scale invariance (across octaves) often does not matter too, since e.g. car parts are usually of similar size.

Therefore, it is natural to consider the cloud as a height map. Derivatives of this function along the horizontal axes are essentially the projections of the point normals to those axes. So, the Harris operator could be computed across the scale-space. The maximums of the operator gives (hopefully) reliable keypoints. Intuitively, keypoints are the points where the normal projection to the horizontal plane changes significantly.

It is still unclear how to estimate the dominant orientation in that setting (the horizontal keypoint orientation may be required by the voting framework). Like in images, we can take the direction of maximum gradient, which means here the most horiontal normal (summed in gaussian weighted window). For horizontal planes there is no dominant orientation, but the Harris operator value is low there, so it fits to the defined keypoint detector. I implemented these ideas during the previous week and now ready to verify them experimentally.

Not connected to that, we also encountered a problem with FLANN radius search. We need that to estimate peaks in the ISM voting space. So we declare a point structure like this:

struct VoteISM
      float votePower;
    float data_c[4];


POINT_CLOUD_REGISTER_POINT_STRUCT (VoteISM,           // here we assume a XYZ + "test" (as fields)
  (float, x, x)
  (float, y, y)
  (float, z, z)
  (float, votePower, votePower)

pcl::KdTreeFLANN<VoteISM>::radiusSearch() works correctly only if the sorted_ flag if false. If the result should be sorted, there appear repetitions in indices. Obviously, sorting does not work right:

int n = 0;
int* indices_ptr = NULL;
DistanceType* dists_ptr = NULL;
if (indices.cols > 0) {
    n = indices.cols;
    indices_ptr = indices[0];
    dists_ptr = dists[0];
RadiusResultSet<DistanceType> result_set(radius, indices_ptr, dists_ptr, n);
nnIndex->findNeighbors(result_set, query[0], searchParams);
size_t cnt = result_set.size();
if (n > 0 && searchParams.sorted) {
    std::sort(make_pair_iterator(dists_ptr, indices_ptr),
              make_pair_iterator(dists_ptr + cnt, indices_ptr + cnt),
              pair_iterator_compare<DistanceType*, int*>());

I am not sure I defined the point type correctly, so may be those errors are due to alignment violations or something. I am going to try to analyze the problem and then submit a bug if needed.

Base Class to KdTree inherited class
Monday, June 13, 2011

Last week I started coding the base search class, with the initial focus on KdTree functions to be inherited by the base class. The basis construction of the generic class is almost done and I hope to commit it after some final tweakings.

After this I would be looking at the functions in OrganizedDataIndex and OrganizedNeighborSearch classes which will be interfaced by base search class.

final API notes
Monday, June 13, 2011

After a discussion on the pcl-developers mailing list, I realized that my API design was a bit overcomplicated. Now that I think about it, I guess that I should have realized that this is an API meant for more seasoned programmers that will be using PCL, as opposed to some new user that may break things easily. Also in the interest of backwards compatibility, we decided to ultimately leave the API as is, and modify it later if needed for other surface reconstruction algorithms.

Next up on my to do list:

  • Implement the Marching Cubes algorithm
  • Implement Poisson reconstruction
  • Do more research on surface reconstruction algorithms and implement some of them
Including citations in a blog post
Monday, June 13, 2011

We recently encountered a problem with the way our blogpost extension was handling citations, but I tracked down the problem, and I believe I’ve got it fixed now. It should now be possible to cite an article [DixonCVPR2011] from within a blogpost, as long as the citation target is also defined in your post. If you want to see an example, take a look at the source for this post here.

[DixonCVPR2011]On analyzing video with very small motions, Michael Dixon et al.
1st Texture mapping.
Monday, June 13, 2011

This week I have done:

  • Created OBJ_IO class to save a 3D model with the texture as Wavefront (.OBJ) format.
  • Designed texture data structure for mapping texture onto the mesh. (Just 1 texture material for a mesh right now)
  • Implement a basic texture mapping which is similar to OpenGL placing texture maps onto polygons. The texture coordinates are stored per-vertex. And texture is mapped in the Y plane.

Here is my result after mapping texture onto a bunny:

../_images/bunny06.png ../_images/bunny02.png

The result is not perfect because some reasons:

  • The mesh is not smoothed.
  • The texture is simply mapped in the Y plane. Texture coordinates are computed based on X and Z.

And here is my update code

Next steps:

  • Implement another texture mapping to improve the result.
  • Work on overlapping textures onto a mesh.
Progress report
Monday, June 13, 2011

This week I started by reading the document provided along with M-estimator SAmple and Consensus. (MLESAC: A new robust estimator with application to estimating image geometry, by P. H. S. Torr and A. Zisserman.) I also started reading and testing the existing RandomSampleConsensus class. At the moment I am trying to figure out how to implement this MSAC using the existing CUDA code. As I progress I will post more details concerning the implementation.

GPU vs CPU Benchmarking Results
Monday, June 13, 2011

In the last days, I finally figured out why Shawn Brown’s CUDA code (http://www.cs.unc.edu/~shawndb/) crashed on my PC (actually my fault), so here are some benchmarking results! 4 implementations in total were tested: FLANN (KDTREE_SINGLE_INDEX), the existing CUDA code, my CUDA code and my CPU fallback. The tests were done on a Phenom II 955 CPU with 12GB of RAM and a GTX260 GPU.

The main difference between the original and my CUDA code is that I don’t use a stack for tree traversal; instead, I evaluate the decision about which branch to follow again when moving up the tree. So in the table, CUDA means Shawn Brown’s code, and Stackless CUDA is my code. Both use minimal trees with rotating split axes, while FLANN tries to select the optimal split axes and has to store extra information about the nodes.

I used two artificial test cases. All of them consisted of randomly distributed 3D points, but their distribution is different for each test.

Test 1: In the first test, all the points have x,y,z coordinates in the range of [0,1[. In all tests, the number of query and search points is the same, but the points are different. (All times are in seconds.)

Algorithm Build Time, 10k points 1-NN 16-NN Build Time, 100k points 1-NN 16-NN
FLANN 0.003662 0.008056 0.045166 0.048583 0.141357 0.667724
CUDA 0.011676 0.001118 0.014650 0.143819 0.014667 0.115384
Stackless CUDA 0.057128 0.002929 0.013916 0.102539 0.010742 0.152343
CPU Fallback 0.057128 0.011230 n/a 0.102539 0.138671 n/a
Algorithm Build Time, 1M points 1-NN 16-NN Build Time, 4M points 1-NN 16-NN
FLANN 0.681396 2.213623 8.890380 3.179687 10.11523 37.945800
CUDA 1.651866 0.133243 1.100975 7.380264 0.547088 OOM
Stackless CUDA 0.654541 0.107910 1.694580 3.307128 0.468261 OOM
CPU Fallback 0.654541 1.908203 n/a 3.307128 9.09375 n/a

All GPU search times include transfer times, and the build time of the last two algorithms is always the same as they work on the same tree. (Also, the transfer time to the GPU is included here.)

The 16-NN test could not be performed with my CPU fallback because it is not implemented yet. The CUDA implementations failed at the largest data set because of a simple reason: My GTX260 has 768MB of RAM, but the buffers for storing the tree and returning the results would use more than 500 MB. The result: std::bad_alloc on allocation of the thrust::device_vector.

Test 2: Here, the x coordinate of the search points was limited to [0,.1[, everything else remained the same. About 90% of the query points were outside the range of the search points now.

Algorithm Build Time, 10k points 1-NN 16-NN Build Time, 100k points 1-NN 16-NN
FLANN 0.003906 0.019042 0.077636 0.050292 0.348632 1.320312
CUDA 0.011676 0.013418 0.053049 0.148135 0.365875 1.450399
Stackless CUDA 0.068359 0.008300 0.024902 0.093261 0.213378 0.470214
CPU Fallback 0.068359 0.081054 n/a 0.093261 2.822753 n/a
Algorithm Build Time, 1M points 1-NN 16-NN
FLANN 0.705566 7.083007 25.00878
CUDA 1676.332 timeout OOM
Stackless CUDA 0.655761 11.65576 OOM
CPU Fallback 0.655761 147.4165 n/a

In this test, FLANN was noticeably slower than the in the first test, but the CUDA results of the 1M point test were even slower than FLANN and always timed out on the 4M point test. Since CUDA usually kills GPU kernels that take too much time after about 4 seconds, some of the tests could not be completed.

In conclusion, it is possible to say that for “uniform” data sets, the GPU implementation is faster than FLANN by a factor of about 20. But for skewed data sets like in the second test, the performance drops significantly. The overhead of storing explicit information about the splits might result in a speed gain on the GPU as well, even though this might also result in a worse performance due to more memory accesses. So I think it might be interesting to implement and benchmark that as well. But first, my next step would be to test a more “real-world”-like case. This could be done by taking two kinect frames and searching for the neighbors of frame 1 in frame 2, which would be an approximation of the searches performed in the ICP algorithm.

For this, I think I’ll have to do some restructuring of my code first. At the moment, it requires that the code that calls it is also compiled via nvcc, which doesn’t work for some PCL code, especially those files that include Eigen headers. Somehow nvcc doesn’t like template-heavy code... As soon as this is done, I’ll publish the code somewhere. This should be done the end of the week. I hope I can start integrating it into FLANN by then, too.

ICP Registration
Sunday, June 12, 2011

My achievements for last week include:

  • Reading more references about GICP
  • Successfully compiling and running already existing GICP implementation
  • Taking some PCDs from Tyzx stereo camera, but I have weak chances to use them because they contain too much noise
    • We should be able to distinguish some furniture objects from my room:
      • Left - front view
      • Right - top view
../_images/tyzxOfficePCDFrontView.png ../_images/tyzxOfficePCDTopView.png

In progress working:

  • Testing PCL filters in order to reduce PCD noise
  • Implementing the ICP based registration prototype
  • Waiting for arriving new Kinect sensor, which should happen no latest Wednesday

My goals for next week are:

  • Read some references about detection / extraction of 3D global features
  • Install the new Kinect sensor
  • Start integrating of already existing GICP implementation into PCL
My Second Week @GSOC
Sunday, June 12, 2011

A lot of things have happened in the last week. I officially finished my undergraduate studies and I now am a Bachelor of Science in Electrical Engineering and Computer Science :-). It was a really crazy week, but I managed to fit in some hours of coding and reading. Yesterday was my first full day of GSoC.

As I said in my previous blog post, I started implementing the ‘Model Globally, Match Locally: Efficient and Robust 3D Object Recognition’ and a couple of hours ago I had the first results on some datasets. There is quite a lot of work to be done from the structural point of view, as everything has to be smoothly blended in the PCL framework. The picture below shows my result compared to the FPFH features (using the application from Aligning object templates to a point cloud tutorial <http://pointclouds.org/documentation/tutorials/template_alignment.php#template-alignment>):


I am now reading Radu’s PhD thesis - an exceptional document I truly recommend for any object recognition enthusiast or any computer vision person in general. You can find it here: <http://files.rbrusu.com/publications/RusuPhDThesis.pdf>

More GPU Programming
Friday, June 10, 2011

The last days, I continued working on the kD tree. Spent quite a few hours on a problem where the 1NN GPU code would find a slightly different point from the CPU implementation and finally traced it down to floating point imprecesions...

Additionally, the kNN search works now. All of this is still a naive implementation, but can be faster than FLANN in some cases. Benchmarks will hopefully follow this week, next week otherwise.

Ruminations on API design
Thursday, June 09, 2011

My first task for my GSoC project is to design an API for all surface reconstruction methods in PCL. As I’m new to large scale software development and have very little background in software engineering practices in general, I thought it a good idea to first do a bit of research into API design. Some googling turned up these two links:

In short, I’ve gotten this general impression of API design:

  • The API is just as simple as it sounds - the interface that users see to access the methods. In this case, users will probably create some sort of object that will then have methods to construct a surface given certain parameters.
  • That being said, API design is not necessarily simple. In general, however, the cardinal rule seems to be this: make it easy for users to use the API correctly, and hard for them to use incorrectly.

My proposed API would be a class, SurfaceReconstruction, that should be as black box as possible - that is, parameters are set, and then reconstruct() is called, and you’re done. The input is a valid point cloud, and optional tuning parameters, and the output is a surface mesh. I believe planning is very important, especially in the early stages, but now of course come the hard (and fun) part: actually coding it up.

(Exam?) progress report
Thursday, June 09, 2011

Not much has happened since last week, if you do not count the gain in bussines skills I have acquired during the last couple of days ;) Unfortunately, but as expected, I’ve been really busy with studying for my exams so far.. T minus 1 day before I have to go and take my first exam!

After that, it’s another burdon that will drop off my shoulders, and I might have some mental power left to fiddle a bit with compilation of all the different libraries on Linux and Windows. I’ll post another status update when we come to that point :)

Second week
Wednesday, June 08, 2011

During the latest week keypoint detection was in the focus of my attention. It is an important step in the object detection pipeline.

pcl::Keypoint is implemented by two classes: SIFTKeypoint and NarfKeypoint. The former needs a textured point cloud (and in fact ignores geometry), while the later could be applied only to range images. So none of them are suitable in general case.

There were some surveys of geometric keypoint detection in 3DIMPVT 2011. I probably need to implement some of them: [Salti2011], [Yu2011].

In the meantime, a keypoint detector can be mocked by a simple subsampling. It can be used for prototyping as well as for the baseline for recognition method evaluation. The problem is there’s no efficient subsampling method. For example, VoxelGrid does not preserve the cloud structure, and copy the cloud. It is quite inconvenient, since some descriptor implementations assume the points should belong to the original cloud. So I decided to implement a cloud-preserving variant of VoxelGrid.

The concept of a cloud-preserving filter means it retains only the points that already belong to the cloud. There’s no need to return the copy of those points, they also could be handled by their indices. So I prototyped the interface of a cloud-preserving filter and wait for the comments from the mentors. Such filters as PassThrough and xRemoval are already cloud-preserving, so it is easy to make them implement the ned interface.

Also, PCL keypoints and filters both transform the cloud to some smaller cloud, sometimes a subcloud. As noted above, they could be used in similar cases. So probably we should unify those interfaces.

In parallel, I work on the implementation of ISM-based recognition method, which will be hopefully added to PCL when/if it proves to work decently.


[Salti2011]A Performance Evaluation of 3D Keypoint Detectors. Samuele Salti, Federico Tombari, and Luigi Di Stefano.
[Yu2011]An Evaluation of Volumetric Interest Points. Tsz-Ho Yu, Oliver J. Woodford, and Roberto Cipolla.
Structure of the generic search interface class
Wednesday, June 08, 2011

After some initial problems in designing the basic interface and properly using the boost pointers along with some discussions on the developers list, the basic interface for the search functions is working good.

So, there would be one Search base class and three inherited classes for kdtree, octree and organized search. The base class would have a class pointer, which would in accordance point to the appropriate child class during initialization. The functions available to the user in the search interface would be defined as virtual functions which would be overriden by similar named functions in inherited classes.

I hope to complete the common interface for kdtree and organized search by the next week, after which I can move on to octree.

Wednesday, June 08, 2011

Last week I have been very busy- reason for no posts. I started getting familiar with the THRUST library and with the project. As a quick todo, this days I have to get in contact with Nico and discuss more details about the roadmap. As a plus side of the last week+now: I am participating at a HPC computing ( Toward petaflop numerical simulation on parallel hybrid architectures - INRIA sophia antipolis) summer school where I have the great opportunity of listening to Jack Dongara and other interesting persons. As far as gsoc concerns me, starting from this week I will have enough time to perform all the necessary tasks, as I finished with the work needed for my master thesis and for the internship.

Gabe O’Leary’s status update for Week 2
Wednesday, June 08, 2011

It has been a rather hectic past week for me as my house has flooded for the second time in 1 month and I had to meet with a lawyer because my landlord is refusing to pay for damages. Anyway other than that I’ve been compiling a list of all “interesting” classes which can be found here. I will begin by making tutorials for modules that currently have none. Most of the modules have atleast one tutorial made for them but several do not. Keypoints, Octree and Registration are the only 3 modules which do not currently have any tutorials. Vincent, my mentor, suggested that I go ahead and make a tutorial for the IterativeClosestPoint class in the Registration module, so that is going to be the next thing that I do.

Feature Evaluation Framework
Tuesday, June 07, 2011

After discussing the basic structure of the Feature Correspondence Test and Feature Evaluation Framework classes with Michael, I have been working on the class implementations, and getting them to run bug-free on a toy program. I hope to get a basic version of these classes finished in the next two days.

The important points I would like to mention are:

  • Modified the format of ground_truths of the feature test, to accept Eigen::Matrix4f as a transformation matrix of the source input points into the corresponding points.
  • Read up about homogenous transformations from point of view of 4x4 matrices, to get a better understanding of how it should work out in my code.
  • Worked on integrating a Framework class with the FeatureCorrespondenceTest class, to automate the testing of multiple algorithms with different parameters.

Once I get hold of the benchmark datasets, I will run some actual tests on the code to eliminate runtime bugs, and upload it here subsequently.

Next task would be to write up the Registration Test Framework and RegistrationTest base class, which would be very similar to the present task, at least at an abstract level.

Working on 3D OBJ file format
Tuesday, June 07, 2011

As I know, VTK file format is not much useful to represent 3D model with texture. So I decided to write the code to save a polygon mesh (further will be mesh with texture) as OBJ file which can better represent 3D model with texture compared with VTK format. Here is the link for OBJ file format: http://en.wikipedia.org/wiki/Wavefront_.obj_file.

Here is my code

Some minor setbacks
Tuesday, June 07, 2011

As I’m on the quarter system, there’s been quite a bit of delay on my part getting started on my project. Thankfully, I’ve finished classes, and now I’m ready to get going!

This week, I’ve successfully installed PCL on my Mac. This was actually a non-trivial task, with a lot of building and tweaking in order for everything to go. I’ve noted the process, and am going to try to post another blog entry on how to install on OS X 10.6, for those other unfortunate Mac users that don’t want to resort to using a VM to develop.

I’ve also set up an Eclipse environment to develop in, another non-trivial task for people like me who are used to coding in a simple editor like Emacs. I’m also considering writing up a short post on how to set up Eclipse.

Finally, I’ve worked through some of the tutorials. While coding I encountered an interesting duality of sorts: the ease of following the tutorial code and coding up new examples of my own, and the difficulty in understanding exactly how to set up the tutorials in the first place. Another final post might be my thoughts on the sort of mindset one needs to work on, and use PCL. Using PCL is obviously different than coding for PCL, and I think it’s important to remember the distinction of using it as a tool, vs actually improving that tool. That being said, it’s also important to understand both clearly, as the tool has to be useful for the user, or else it’s a useless tool.

Next steps include: coming up with a unified surface reconstruction API, and implementation of some totally sweet surface reconstruction algorithms.

Discussing Things and Learning CUDA
Tuesday, June 07, 2011

It’s been a busy last week. So far, I’ve been discussing PCL and FLANN API changes for the GPU NN search on the mailing list and with Marius and I started getting familiar with CUDA and Thrust. Additionally, I got the existing CUDA NN search to compile after getting help from Shawn, but it results in runtime errors on my PC...

Additionally, I just ported my toy 3D OpenCL kD-Tree to CUDA. Nice to see that the “porting” was basically a copy&paste job. And it works about 30-50% faster than before on the same graphics card. Good to see that it’s faster than the OpenCL version, but also kind of sad since it won’t work on AMD graphics cards.

Starting ICP Registration
Saturday, June 04, 2011

My achievements for last week include:

  • Successfully compiling of PCL and it’s dependencies under Ubuntu
  • Running some examples
  • Reading references about ICP algorithm
  • Start coding the first point cloud registration prototype which use ICP algorithm

My goals for next week are:

  • Read references about GICP
  • Get working the implementation of ICP based registration prototype
  • Get running already existing GICP implementation
Multi tasking
Thursday, June 02, 2011

As a student from Belgium, at the moment I’m in a difficult period as it is the start of GSoC and the start of my exam period... To not let the university snoop any more time of my GSoC coding period, I will be doing my best to pass my exams in the 1st session so I won’t have to do re-examinations. I hope Murphy is on my side on this :)

I will be trying to be as active as possible during the exam period, so I can really get it going when my exams are finished. At the moment I installed Ubuntu 10.04 LTS and as you can see I’ve posted my first blog update. I have been discussing some design considerations with Marius and Andreas by email regarding the interface adaptations FLANN will need to undergo to allow the maximum degree of parallelization.

What is on my planning for the next days is study for my exam (Bussines Skills, wish me luck :D) and to try and get PCL, nnn, libnabo and FLANN to compile succesfully on Linux.

Bug fixes
Tuesday, May 31, 2011

Today I fixed a couple of bugs in our blogpost extension, and I’m writing this post as a test.

First, Alex reported that he was unable to include images inside of a .. blogpost::. The problem was solved by adding in a call to process_images when creating the blogbody content.


Second, Gabe found that the :doc: role failed to work when included in a post, and this turned out to be a similar problem. The solution was to add a call to resolve_references.

Finally, Pararth was unable to link to downloadable files, but that appears to be working now, too.

I suspect I’ll need to make a few more fixes along these lines, but I’m afraid that will have to wait until another day. If any of you run into any more problems, please let me know.


Apparently, there’s one more problem I’ll need to solve to get :doc: and :download: working. It looks like those commands are assuming our site’s root is “pointclouds.org/” instead of “pointclouds.org/gsoc/”, so all of the automatically generated links are missing the “gsoc/”. I’ll have to look into that when I have a chance...

Update 2

Fixed. The problem was that a path defined relative to, say, pointclouds.org/gsoc/mdixon/ wouldn’t be valid in a blogpost that was copied over to poinclouds.org/gsoc/index.php. The resolve_references function is designed to handle that, but it didn’t follow the same conventions as the process_images or process_downloads, so I needed to pass in a different “docname” in order to get the right behavior.

Start of the Second Week
Tuesday, May 31, 2011

The good news: CUDA and PCL are working now on my ArchLinux PC. So no Ubuntu for me :-) The way it works is kind of hackish now. I compile the CPU code with GCC 4.6, but pass the option ‘-DCUDA_NVCC_FLAGS=”–compiler-bindir=/opt/gcc-4.4”’ to cmake, so that nvcc uses GCC 4.4 for the CPU stuff in the .cu files. At least it works...

The bad news: The GPU implementation on http://www.cs.unc.edu/~shawndb/ doesn’t build on my PC. Missing header files in the project, as well as nvcc errors such as “error: attribute “global” does not apply here”. Seems like I have to leave out the benchmarking and start directly with the implementation without learning anything from that implementation...

In the meantime, I’m figuring out the optimal NN search API to allow for GPU and CPU searches. See the mailing lists ;-)

First week
Monday, May 30, 2011

I tried to build PCL before, but last week I decided to build the latest trunk version with all dependencies, including the optional ones, under Windows XP and MSVS 2010. I left only the install system, OpenNI and CUDA (since I don’t have hardware for the two latest). The tutorial was useful, though there were few things where I got stuck.

VTK. First I downloaded Windows Installer, but it turned out I need to download sources to build the required libraries. VTK was being built for few hours (and few more for the debug build). Still I did not understand how to set the build variables TCL_LIBRARY and TK_LIBRARY.

wxWidgets were compiled successfully. Some libs were still missing, e.g. WX_mono. Also, I did not understand how to set the wxWidgets_wxrc_EXECUTABLE variable, since both CMake and me did not find wxrc executable. Overall, I am not sure if visualization works. Are there any variant of test_visualization?

For QHull, in opposit to the tutorial, i used qhullstatic.lib, since there were no qhull.lib. It was not really intuitive, and first I tried qhull6.lib.

There were few errors on the build, since I did not link OpenNI. It seems there is no way to turn it off in CMake configuration. There was only one failed test, i.e. pcl_keypoints, and, again, I am not sure about visualization.

First Week In
Monday, May 30, 2011

One week into the coding period, the work I have completed includes:

  • Downloaded, compiled and installed PCL library and dependencies.
  • Gone through majority of tutorials, and successfully executed them on my machine.
  • Started prelimnary work on my first task, developing a benchmarking utility for Feature Correspondence and Registration Tests. (See project roadmap for details)

For Feature Correspondence Tests, I have gone ahead with the following design:

FeatureCorrespondenceTest is a base class that implements high level tasks for benchmarking. To test any specific algorithm, derive a new class from this base and implement functions for:

  1. Taking algorithm parameters as input.
  2. Computing feature descriptors using that algorithm.

Use an object of this derived class in the main function to run the tests.

The reason I went ahead with this is that each feature descriptor algorithm can have a unique number and types of parameters, and the exact procedure of computing the feature descriptors may differ to an extent that cannot be captured solely by use of templates. So each algorithm should have a separate class for testing it, while the common functionality is implemented in the base class.

The source code is over here.

This is still a prelimnary version of the utility, to test the general design. I will be adding more functionality and addressing robustness issues in this week. Any ideas/comments/suggestions for this are highly welcome.

Status update week 1
Monday, May 30, 2011

I finally got PCL compile on my machine. The progress I’ve done up to now:

  • Installed all the dependecies (except openni - still working on that)
  • Changed several times the Ubuntu distribution - so that I have no nvcc problems
  • Installed the cudaSDK, devDrivers and so on
  • At the moment I started checking the existing CUDA code

This week will be a full week for me due to ending of the INRIA stage.

Gabe O’Leary’s status update for Week 1
Sunday, May 29, 2011

So I finally got the pcl to compile this week, but that was using:

svn co http://svn.pointclouds.org/pcl/trunk pcl-trunk

I had been unable to properly use ssh and svn to download the code. This is because I had been trying to put the repository in my root folder and therefore had to use sudo which bypassed my regular public ssh key and used a different one. Anyway all of that now works fine. Once I downloaded the repository I attempted to compile and it only got to 5 percent before giving me a build error. After discussion on the IRC channel we realized this was an error that had actually been committed already, but it was resolved. My first milestone that I am now working on is going over the list of tutorials and finding out what is missing. In order to do this I have created a file, located at gsocweb/source/goleary/tutorials.rst that will keep track of all components that exist in the PCL, and whether or not they currently have a tutorial. You can view this list here. I am going to be populating this list over the next few days, but I would appreciate any assistance as I am not hugely familiar with the Point Cloud Library and all of it’s componenets as of yet.

My very first status update
Sunday, May 29, 2011

I finally convinced myself to write something on my GSoC page. I actually started playing with PCL before the 1.0 release and before the GSoC coding period started (because I will be offline during the next week as I am graduating => all the fun stuff of moving out, tons of documents (love Germans for that) etc. - will keep me overly busy).

Now for the techy bits. I am using a Mac - with an Ubuntu VM just in case. After some very helpful discussions on the IRC channel and the bug tracker, I realized that the compiler bundled with Snow Leopard is outdated so I installed the MacPorts gcc4.5. After hitting another ton of problems (mostly related to Apple anti-Unix stupidity) I managed to get all the 3rd party libraries up and running and PCL now happily compiles and all of the examples are running. I saw that there are other GSoCers using Macs and switched to Ubuntu. If there are requests I could write a more comprehensive tutorial on how to install PCL on OS X. Just for fun, you can find a picture of pcd_viewer running on OS X on my main page. For this, I made a small application that takes an image and creates a beveled point cloud out of it using OpenCV and PCL (I have to add some code too :-) ):

#include <cv.h>
#include <highgui.h>
#include <iostream>

#include "pcl/io/pcd_io.h"
#include "pcl/point_types.h"

using namespace cv;
using namespace std;

int main(int argc, char **argv)
  if (argc != 3) {
    cerr << "Syntax: ./sampleImage2PCL image_file_name output_pcd_file" << endl;

  Mat image = imread(argv[1]);
  pcl::PointCloud<pcl::PointXYZ> cloud;

  for(size_t i = 0; i < image.rows; ++i) {
    for(size_t j = 0; j < image.cols; ++j) {
      if (image.at <char> (i, j, 0) > 10) {
          for(float z = 0; z < 0.04; z += 0.001)
            cloud.points.push_back(pcl::PointXYZ( ((float)j) / image.rows, ((float)i) /image.cols, z));

  pcl::io::savePCDFileASCII (argv[2], cloud);

  return 0;

I also started looking into the more advanced tutorials and had fun playing with parameters and experimenting a bit. Furthermore, I already have a small collection of publications about 3D features I will read and analyze. I will start implementing the ‘Model Globally, Match Locally: Efficient and Robust 3D Object Recognition’ within the next days.

I mentioned earlier about the fact that I added a picture on my main page. There seems to be a bug with the blogpost and blogbody macros - cannot add images/figures inside a blogpost.

#### Update: now images work

Progress status on 28th May
Saturday, May 28, 2011

The work done till now includes,

  • Downloaded and compiled the complete code successfully and run a few tests and it all went smoothly
  • Understood the basis tutorials of reading/writing of PCD files and some advanced tutorials pertaining to the octree data structure. Even tested a few codes regarding the nearest neighbour search on Octrees.

Currently I am reading the implementation of nearest neighbour search operations in octree and kdtree and trying to identify the relevant parameters in reference to the nearest neighbour operation.

Khai Tran’s status update on Week 1
Friday, May 27, 2011

This week I have done:

  • Installed and built PCL library and its dependencies
  • Completed build and test some tutorials using PCL

In progress:

  • Reading references on texture blending
  • Working on the surface reconstruction API for texture mapping onto a mesh
CUDA Madness
Friday, May 27, 2011

I seriously started wondering why Linux people even look at CUDA... Trying to install it right now, bu it seems like only horribly outdated distros with GCC < 4.5 are supported. (And in turn, most of those aren’t really supported any more.) I hope I’ll get gcc 4.4 running on Archlinux, otherwise I’ll have to switch to an unsupported Fedora. Or is OpenSUSE 11.2 still supported?

Update: GCC 4.4 is running now, but I still get tons of errors in the PCL CUDA code. Never thought that this would be such a dependency nightmare :-( Seems like I have to try Ubuntu 10.04 then...

Late Start
Thursday, May 26, 2011

Today I finally returned to Germany and started with my GSoC work. I’m one of the GPU NN guys and, of course, developing on Linux. I already compiled PCL on my laptop, but currently I’m working on getting all the stuff (incl. CUDA) to run on my desktop PC, since the laptop doesn’t have a Nvidia graphics card.

In the meantime, I’m having a look at the FLANN KD-Tree implementation and the CUDA one mentioned in my roadmap and already encountered the first problem: the CUDA implementation was developed with Visual Studio, which means: no Linux Makefile, meh...

A first observation I made when having a first look at comparing my own KD-Tree implementation with FLANN: I use a left-balanced median split scheme which stores the points of the tree without any overhead and switches cyclically between the split axes. When using randomly generated tree and query points in the 3D unit cube, my implementation is slightly faster than FLANN and the OpenCL version that I did a few months ago is about 8 times faster than FLANN on a GTX 260 vs. a Phenom II 955. But when using configurations such as ones where the x axis of all query points is restricted to [0,0.5] while the y axis of all tree points can only take values in [0,0.25], the performance degrades seriously. The CPU implementation takes about 30 times longer, as well as the OpenCL one, but FLANNs search time does not change. So I guess the flexibility of choosing optimal split planes is worth the overhead of storing additional info, which is what FLANN does if I saw it correctly.

In the next days, I will focus on the first milestone, the definition of a suitable interface for GPU KD-Tree search, should the current FLANN interface not prove sufficient.

Hello, world!
Wednesday, May 25, 2011
Test blogpost
Wednesday, May 25, 2011

This days I learned how to add content to the developer blogs and today I also added the content.

Here’s a code snippet:

// This is a really fun block of code...
int n = 10;
for (int i = 0; i < n; ++i)
  printf ("%d\n", i);

And here’s an equation:

ax^2 + bx + c = 0.

Setting up Linux tools
Wednesday, May 25, 2011

Some days ago I decided to move from Windows to Linux. I installed Linux tools and got in touch with them. Now I’m compiling PCL. Today I learned how to add content to the developer blogs.

Here’s a code snippet:

// This is a really boring block of code...
int n = 10;
for (int i = 0; i < n; ++i)
  printf ("%d\n", i);

And here’s an equation:

ax^2 + bx + c = 0

My first status update
Tuesday, May 24, 2011

Today I learned how to add content to the developer blogs.

Here’s a code snippet:

// This is a really boring block of code...
int n = 10;
for (int i = 0; i < n; ++i)
  printf ("%d\n", i);

And here’s an equation:

ax^2 + bx + c = 0

My first status update
Tuesday, May 24, 2011

Today I learned how to add content to the developer blogs.

Here’s a code snippet:

// This is a really boring block of code...
int n = 10;
for (int i = 0; i < n; ++i)
  printf ("%d\n", i);

And here’s an equation:

ax^2 + bx + c = 0

And now for something completely different
Tuesday, May 24, 2011

I’ve spent a large amount of time installing tons of libraries and software on my Apple laptop, and after many many problems and headaches I decided to just install Ubuntu on my computer to make everything easier. After doing a bunch of research on compatibility and some other issues having to do with the EFI firmware on my computer, I finally start to install it and I attempt to partition my hard drive. For some god forsaken reason my computer is apparently unable of moving some of the files that are on my hard drive so it can’t partition the hard drive. After going to all this trouble Vincent (my mentor) just suggested that I use a virtual machine. Why didn’t I think of that in the first place. Well anyway so now I’m installing Ubuntu on my virtual machine and then I’m going to have to reinstall all of the libraries and everything necessary for PCL as well as the software that Michael told everyone to install.

My first status update
Monday, May 23, 2011

Today I learned how to add content to the developer blogs.

Here’s a code snippet:

  // This is a really boring block of code...
  int n = 10;
  for (int i = 0; i < n; ++i)
  printf ("%d\n", i);

And here’s an equation:

ax^2 + bx + c = 0

Khai Tran’s first status update
Monday, May 23, 2011

This is the first update to the developer blogs.

Here’s a code snippet sample:

// Hello world
for (int i = 0; i < 10; ++i)
  cout << "Hello World" << endl;

And here’s an equation:


Gabe O’Leary’s first status update
Monday, May 23, 2011

Today I learned how to add content to the developer blogs.

Here’s a code snippet:

// This is a really boring block of code...
int n = 10;
for (int i = 0; i < n; ++i)
  printf ("%d\n", i);

And here’s an equation:

ax^2 + bx + c = 0

How to install Sphinx (and some other stuff)
Friday, May 20, 2011

In my earlier post, I discussed how to write and publish status updates using Sphinx and svn. However, as you might imagine, before you can follow those instructions, you’ll need to install Sphinx (and a couple of additional dependencies for the extensions we use). Assuming you don’t already have them installed, here’s what you need to do:

First, install Sphinx 1.0.7. I have Python 2.6, so this is the one I installed:

~$ easy_install http://pypi.python.org/packages/2.6/S/Sphinx/Sphinx-1.0.7-py2.6.egg

Next, install python-dateutil. (Used by our blog extension to parse the date strings)

~$ easy_install http://labix.org/download/python-dateutil/python-dateutil-1.5.tar.gz

Finally, install dvipng. (Used by the pngmath extension to generate LaTeX formulas)

~$ sudo apt-get install dvipng

After you have these installed, you should be able to “make html” (see earlier instructions) without problems.

P.S. As you can tell, these instructions assume you’re working in Linux. If you’re planning to develop in Mac OS X or Windows, then installing dvipng may be a little more complicated for you. (I haven’t tried to install it on Windows, but this page suggests that “The quickest way is to download and install MikTeX - http://miktex.org/2.9/setup ”, so you might try that.)

How to write a blog post
Thursday, May 19, 2011

In order to make sure the mentors and other interested students can keep up-to-date about the progress of each project, every student will have their own “developer blog” that they will post to every couple of days. In this blog post, I’ll describe the infrastructure we’ll be using for this and give some instructions for how to write your status updates.

Because we already use ReStructuredText (ReST) as the document format for so many other aspects of our site, we’ve decided to use the same approach for writing and publishing regular status updates. ReST is a markup language that is designed to be relatively clean and readable. There are also extensions that make it easy to include code snippets and equations in your document, which comes in very handy when writing technical content. If you’re unfamiliar with the format, take a look at some of the the source files for our tutorials in pcl/trunk/doc/tutorials/content/; that should give you a few examples of how it looks and what you can do with it.

Much like with our documentation and tutorials, the content for the developer blogs will be stored in an svn repository. We’re keeping this content separate from our source tree, so to get started, you’ll need to check out the gsocweb repository:

~$ svn co svn+ssh://svn@svn.pointclouds.org/gsocweb gsocweb

If you examine the directory structure, you’ll see that the root directory contains a source directory, which will contain all of the source .rst files for the site, and an exts directory that contains some useful extensions. (A build directory will be created the first time you “make,” but these files don’t belong in the repository.)

Under the source directory, you’ll see that every student/developer has their own subdirectory. Each student directory should have the following files:

  • index.rst: A homepage that tells everyone a little bit about you (this page should link to the other pages)
  • roadmap.rst: A page that describes the plans for the project you’re working on
  • status.rst: A page that aggregates all of your status updates in one place

Please take a minute to update the information on your index page. Basic contact information is all that’s really necessary, but feel free to add whatever additional content you’d like.

The next step is to write your first status update. To write a new status update, create a new .rst in your directory (you can name it whatever you’d like, but for the sake of discussion, let’s call it “blog.rst”) and add the following to it:

.. blogpost::
   :title: My first status update
   :author: your_username
   :date: 5-19-2011

   Today I learned how to add content to the developer blogs.

   Here's a code snippet:

   .. code-block:: c

      // This is a really boring block of code...
      int n = 10;
      for (int i = 0; i < n; ++i)
        printf ("%d\n", i);

   And here's an equation:

   .. math::

      ax^2 + bx + c = 0

You can write your posts in any file or subdirectory, and the Sphinx parser will automatically find them and make sure they get added to the right blog pages. You aren’t required to keep them all in one file, nor do you need to keep each post in a file of its own. Feel free to organize the files in whatever way makes the most sense to you.

Just make sure every post is contained within its own .. blogpost:: directive, and don’t forget to fill in the correct values for the author and date. The title argument is optional—a default title will be generated if you don’t feel like making one up—but try to use descriptive titles when you can, since that will help your readers to find the posts that are most relevant to them.

Now that you’ve created a new file and written a .. blogpost entry in it, generate the html files like so:

~/gsocweb$ make html

Preview the new post at ./build/html/yourusername/blog.html directory, and if it looks good, commit your changes. (Remember: since we created a new file for this post, we need to svn add it before we commit).

~/gsocweb$ svn add source/your_username/blog.rst
~/gsocweb$ svn commit -m "first post"

The commit will trigger our webserver to rebuild the content, so the update will show up on the website soon after. And that’s it! The post you just wrote will appear in the status.html file under your directory, and the ten most recent posts (by any user) will show up on the main index page at http://www.pointclouds.org/gsoc/.

Thanks for taking the time to update your page. We can’t wait to start reading about all the exciting things you’ll be developing with us this summer!


This is still a new set-up, and we might not have all the bugs worked out yet. If you run into errors or think something is amiss, please email the PCL GSoC mailing list (gsoc2011@pointclouds.org).

My first status update
Wednesday, May 18, 2011

Today I learned how to add content to the developer blogs.

Here’s a code snippet:

// This is a really boring block of code...
int n = 10;
for (int i = 0; i < n; ++i)
  printf ("%d\n", i);

And here’s an equation:

ax^2 + bx + c = 0

An example blog post
Tuesday, May 17, 2011

This is a very simple example blog post. Check out Michael’s blog for instructions on writing your own posts.