Q: What is a code sprint?
A code sprint is an accelerated research and development program, where we pair talented developers with senior researchers and engineers for 3-6 months of extended programming work. The model is inspired by the Google Summer Of Code initiative, and is meant to create open source solutions for interesting 2D/3D computer perception problems. Each code sprint is financially supported by a different institution (see below).
Q: What does this blog represent?
The Point Cloud Library (PCL) developer blog represents a great way to keep track of daily updates in PCL sponsored code sprints. Our developers are writing about their experiences and progress updates while working on exciting PCL projects.
The following shows the current list of ongoing code sprints. Click on any of the logos below to go to the respective sprint blog page.
The list of code sprints which have been completed is:
We would like to thank our financial sponsors for their generous support. For details on how to start a new code sprint or contribute to PCL, please visit http://www.pointclouds.org/about and our organization’s web site at http://www.openperception.org/get-involved/.
In the previous blog post I described my attempts to find a good balance between the contributions of the three terms (distance, normal, and color) to edge weight computation. As it often happens, as soon as I was done with the evaluation and the blog post, I realized that there is another type of information that could be considered: curvature. And indeed, it proved to have a very positive effect on the performance of the random walker segmentation.
Let me begin by exposing a problem associated with edge weights computed using the normal term alone (i.e. depending only on the angular distance between the normals of the vertices). Consider the following scene (left):
Voxelized point cloud (left) and a close-up view of the graph
edges in the region where the tall and round boxes touch
(right). The edges are colored according to their weights
(from dark blue for small weights to dark red for large
Most of the edges in the boundary region (right) are dark blue, however there are a number of red edges with quite large weights. This sort of boundary is often referred to as a “weak boundary” and, not surprisingly, has a negative effect on the performance of many segmentation algorithms. You can imagine that a boundary like this is a disaster for the flood-fill segmentation, because the flood will happily propagate through it. Luckily, the random walker algorithm is known for its robustness against weak boundaries:
Segmentations produced by the random walker algorithm
using three different choices of seeds (shown with red
In the first two cases one of the seeds is very close to the weak boundary, whereas another one is far away. In the third case there are multiple green seeds placed along the boundary, however a single purple seed is able to “resist” them from its remote corner.
This robustness has limits, of course. In the figure below the “table seed” is placed in the rear of the table, far from the boundaries with the boxes. The box segments managed to “spill” on the table through the weak boundaries:
Segmentation failure when a seed is placed too far from
a weak boundary
One way to address this problem is to make the sigma of the normal term smaller, therefore penalizing differences in normals’ orientations more. Enabling the distance term might also help, because the edges that contribute to the boundary weakness are often diagonal and therefore longer than the average. The figure below (left) demonstrates the graph edges in the same boundary region with new weights, computed using decreased normal term sigma (10% of the original one), and with the distance term enabled. (The overall edge color shift towards blue is due to it.)
Graph edges with weights computed with a smaller normal
term sigma and enabled distance term. Close-up view of
the region where the tall and round boxes touch (left)
and top-down view at the rear of the table (right).
Still, there are several edges with relatively large weights in the boundary region, but there are no large-weight paths connecting vertices on both sides of the boundary anymore. We got rid of the weak boundary, but this came at a price. Although the table itself is flat, the cloud that we get from the Kinect is not, and the further from the camera the more wavy it is. The image on the right shows a top-down view at the rear of the table, where the waves are particularly large. The edges that belong to the cavities between the “waves” were heavily penalized and virtually disappeared. Random walkers will have a hard time traversing this part of the table on their way to the boxes!
Having considered all of these I came to a conclusion that some additional geometrical feature is needed to improve the weighting function. Curvature was the first candidate, especially in the light of the fact that we anyways get it for free when estimating voxel normals (via PCA of the covariance matrix of the voxel neighborhood). I added one more exponential term to the weighting function:
where is simply a product of the voxel curvatures. Similarly to how it is done in the normal term, the product is additionally multiplied by a small constant if the angle between the voxels is convex (in order not to penalize convex boundaries).
The figure below demonstrates the edge weights computed using the new term alone:
Graph edges with weights computed using only the new
curvature term. Close-up view of the region where the
the rear of the table (right).
The boundary between the boxes is perfectly strong, whereas the weights in the rear of the table are not penalized too much. The seeding that resulted in a segmentation failure before no longer causes problems. I used the set of random seedings described in the last post to find the best sigmas for the extended weighting function. Then I generated 50 new seedings to test and compare the performance of the old (without the curvature term) and the new (with the curvature term) weighting functions. The figure below summarizes the distributions of under-segmentation errors:
The performance significantly improved both in terms of stability, quality, and number of failures (in fact, there are no failures at all).
The random walker segmentation algorithm requires that the data are modeled as a weighted graph, and the choice of edge weighting function has a great impact on the performance of the algorithm. In this blog post I will describe the weighting function and parameters that I ended up using.
Before talking about the weights of the edges between vertices, let’s discuss the vertices themselves. As mentioned in the previous blog posts, I have a pre-processing step where the input cloud is voxelized using OctreePointCloudAdjacency. Voxelization serves three purposes:
The voxels consequently become vertices of the graph, and each of them is connected with its neighbors by an edge. Each voxel has several properties: 3D position, normal orientation, and color, which may be used in edge weight computation.
As mentioned in the very first blog post on random walker segmentation, originally I used the edge weighting function from the following paper:
Later on I introduced several modifications. Now for a pair of vertices and the weight is defined as:
where , , and are the Euclidean, angular, and color differences between voxels, and the sigmas are used to balance their contributions. Compared to the weighting function of Lai et al., the color term was added, and scaling by mean values was removed.
I devised the following procedure in order to find appropriate values for the sigmas. I took a scene with known ground truth segmentation and generated 50 random proper seedings. Here by a “proper seeding” I mean a set of seeds where each seed belongs to a distinct ground truth segment, and each ground truth segment has a single seed (see example in the figure below on the left). For each of these seedings I ran random walker and computed the under-segmentation error (that was defined in one of the earlier blog posts, see example in the figure below on the right). Then I analyzed the distributions of errors resulted from different sigma values.
Voxelized point cloud with
one of the randomly
generated proper seeding
used in the experiments
Under-segmentation error of
the segmentation produced by
random walker from the
given seeds (erroneous voxels
Note that the ground truth itself is not perfect, because it is often impossible to tell apart the points at the boundary of two objects. Consequently, the ground truth segmentation is somewhat random at the boundaries, and we should not expect (or strive) any segmentation algorithm to produce exactly the same result. The under-segmentation error displayed above has 1039 erroneous voxels, and this is pretty much the best performance we could expect from a segmentation algorithm with this ground truth.
Let’s begin by examining the influence of the angular term. In this experiment I set Euclidean sigma to the value of voxel resolution and color sigma to 0.1. Below is a plot of under-segmentation error distributions for different choices of angular sigma (note that larger sigmas correspond to less influence):
Each distribution is visualized using a boxplot. The three main features are the position of the red bar (median of the distribution), size of the box (50% of the values fall inside the box), and the amount and positions of pluses (outliers). The first one gives an idea of the average performance of the algorithm. The second one expresses the segmentation stability with respect to the seed choice (with smaller box meaning better stability). The third one indicates segmentation failures. Indeed, a significant deviation of the under-segmentation error means that the output segmentation has large mis-labeled regions, which may be deemed as a failure.
Clearly, the median values of the distributions are almost the same. The differences are very small and due to the discussed properties of the under-segmentation error can not be used to draw conclusions of which sigmas are better. The box sizes, however, vary significantly. The sigmas from 0.1 to 1.1 yield the most stable performance. Also the number of failures is less for those sigmas. This evaluation does not provide enough information to chose any particular sigma in this range, so for now I settled on 0.2 (it is the second most stable, but yields less failures than 0.1).
In order to explore the influence of the color term, I set Euclidean sigma to the value of voxel resolution again and angular sigma to 0.2:
Note the last column, which shows the error distribution when the color term is removed completely. We see that sigmas from 0.1 to 0.15 provide slightly more stable results. Unfortunately it is not visible in the plot, but the number of pluses on the row is less for these sigmas. So I chose 0.15 as a result of this evaluation.
Speaking about the distance sigma, it turned out to have very small influence on the results. In most cases introduction of the distance term does not change the output at all. Still, in few cases it helps to avoid complete segmentation failure. It turned out that setting this sigma to the voxel resolution value gives the best results.
Finally, let me demonstrate the segmentations produced with the chosen sigmas. Among the 50 random seedings only 5 resulted in segmentation failure:
5 failed segmentations
Clearly, failures happened when a seed was placed either exactly on the boundary between two objects (#2), or on the outermost voxels of an object (#1, #3, #4, #5).
The remaining 45 seedings yielded good segmentations. Below are 15 of them (selected randomly):
15 succeeded segmentations
In the past weeks I decided to put the “spectral thing” on hold and turned back to the random walker segmentation. In this blog post I will talk about refactoring of the random walker algorithm and my experiments with different linear solvers. In the follow-up post I will explore how the segmentation results depend on seed placement, as well as discuss edge weighting functions and the choice of parameters for them.
First of all, I refactored and cleaned up my implementation. Recall that the random walker algorithm could be applied to cluster any kind of data as long as there is a way to model it using a weighted graph. I decided that it makes sense to have a generic templated implementation of the algorithm which would work with any weighted graph. An obvious choice to represent graphs in C++ is to use the primitives available in the Boost Graph Library (BGL). This library is very generic, feature-rich, and flexible, though at the expense of a rather steep learning curve. I took their implementation of the Boykov-Kolmogorov max-flow algorithm as an example of how to design the interface for a generic graph-based algorithm. In my case the public interface is just one templated function:
template<class Graph, class EdgeWeightMap, class VertexColorMap> bool randomWalkerSegmentation(Graph& g, EdgeWeightMap weights, VertexColorMap colors);
The user has to provide a graph, a property map that associates a weight to each edge of the graph, and a property map that contains initial vertex colors. (I adopted the term “colors” instead of “labels” because BGL has a pre-defined vertex property type with this name.) The output of the algorithm (i.e. label assignment) is written back to the color map. Internally the function instantiates a class, which does all the boring work of constructing and solving a system of linear equations, as well as interpreting its solution as a label assignment.
While this generic graph segmentation function might be useful for someone, the general audience will be interested in a class that implements a complete point cloud segmentation pipeline. This class should take care of converting an input cloud into a weighted graph, segmenting it, and turning the random walker output into a labeled point cloud. At the moment it is not clear for me how the first step should be designed. Indeed, there are multiple ways to represent a point cloud as a weighted graph, both in terms of topology and edge weights. Currently I voxelize the input cloud and use 26-neighborhood to establish edges between nodes. Alternatively, one may work with a full point cloud and use some fixed-radius neighborhood. One more option might be to generate a mesh and work with it. Exploration of these possibilities remains as a future work.
The second issue that I addressed recently was the performance of the algorithm. The main computational effort is spent on solving a sparse system of linear equations, where the number of equations is determined by the number of unlabeled vertices (i.e. basically the size of the whole point cloud). For example, the typical size of voxelized scenes from the OSD dataset that I use in my experiments is about 30000 vertices. Originally, I used the ConjugateGradient solver of Eigen, because it is “recommended for large symmetric problems”. The time needed to segment a typical point cloud with this solver is about 1 second on my three years old i5 laptop. I decided to try other options available in Eigen. In particular, I tested BiCGSTAB with Diagonal and IncompleteLUT preconditioner, SimplicialLLT, SimplicialLDLT, and SimplicialCholesky solvers. The figure below shows the runtime of these solvers with respect to the problem size. (Only one of the Simplicial*** solvers is plotted as they demonstrated very similar performance.)
The computation time depends linearly on the problem size for all solvers, however SimplicialLDLT has a much smaller growth rate. For a typical 30 thousand vertices problem it needs about 200 ms. What’s more, it can solve for multiple right-hand sides at the same time, whereas ConjugateGradient and BiCGSTAB can not. This means that as the number of labels (i.e. desired segments) grows, the computational time does not increase.
In fact, Eigen offers some more options such as CholmodSupernodalLLT, which is a wrapper for SuiteSparse package, and SparseLU, which uses the techniques from the SuperLU package. Unfortunately, the former complained that the matrices that I provide are not positive definite (though they actually are), and the latter is a very recent addition that is only available in Eigen 3.2 (which I do not have at the moment).
Taking into account the evaluation results I switched to the SimplicailLDLT solver in my random walker implementation.
We have implemented an algorithm which processes frames independently (i.e. without the connection to the previous frame). Also, now we make an assumption that both of the curbs (left and right) are presented in the scene.
Below you can see a projection of the labeled DEM to the left image. Green points correspond to the left sidewalk, blue - to the right one. Red points mark the road surface. The algorithm couldn’t find the right curb on this image, so right side of the road was labeled uncorrectly. The good news is that the left curb was detected correctly.
However our goal is to label a road on the image, not on the DEM. So, if we mark each pixel with label corresponding to the DEM’s cell we get the following labeling of the road surface:
You can see a lot of holes in the road area. They caused by holes in the disparity map. We decided not to fill them, because someone/something can be situated there (we have no information).
A disparity map of this frame is shown below. Points without disparity are marked with red.
I am pleased to announce that I just finished the second Toyota Code Sprint. The topic we tackled this time was superquadrics applied for Computer Vision tasks such as object modeling and object detection.
The code we developed was not yet integrated into PCL (it does use PCL for all the processing), but lies in a separate repository which you can find here: https://github.com/aichim/superquadrics .
An extensive report that presents some of the theory behind the concepts and algorithms we used in the project, as well as implementation details and results can be found at the end of this post.
At the end of this work, we present a set of ideas that can be used to extend the project in subsequent code sprints:
Hello everybody. Last few weeks I was trying to train an SVM for car recognition. For this purpose I was using some clouds that I had. These were the clouds of the city of Enschede, Netherlands, that I had manually labeled earlier. Training set consists of 401 clouds of cars and 401 cloud of the other objects (people, trees, signs etc.). As for the classifier, I was using Support Vector Machine from the libSVM library.
During the training I was using 5-fold cross validation and the grid search in order to get the best values of gamma and soft margin C (parameters of the Gaussian kernel). The best accuracy achived during cross validation was 91.2718% with Gamma and C equal and respectively.
The model obtained after training was then used for recognition. The set for recognition consists of the 401 cars and 401 other objects. Training and testing sets were taken randomly from different scanned streets. The best accuracy achived this far when trying to reconize test set is 90.7731% (728 correctly recognized objects of 802).
As for descriptors, I was using combination of RoPS feature and some global features such as height and width of the oriented bounding box. RoPS feature was calculated for the center of mass of the cloud with the support radius big enough to include all the points of the given cloud.
Since RoPS is better fits for the purpose of local feature extraction, I believe that using it with ISM and Hough Transform voting will result in higher accuracy.
In this entry, I’ll go over the evaluation dataset, the reasons I decided to create my own as opposed to using existing ones, and the needed criteria to evaluate multi-descriptor performance.
The first requirement is a real-world challenging dataset, to help quantify the limiting factors of each given descriptor as a function of the associated sensor. The result is a ground truth confidence interval of each descriptor’s performance. The dataset in this case contains (2) subsets, subset (A) with no scene difference, and subset (B) differs only in photometric properties and sensor noise.
The second requirement is a dataset where a percentage of any given scene is invalid or not suitable for a subset of the descriptors in the multi-descriptor set. This requirement is expressed in subset (C) which contains a large percentage of out of range depth data.
The third requirement is a minimal dataset for efficient and tractable computational performance, as expressed in subsets A, B, and C.
To evaluate each descriptor’s performance, pose estimation, as expressed by a single-angle drift around one axis, vs a known ground truth is used as a comparative measurement unit. Additionally, the inlier rate is considered in evaluating the quality of correspondence selection.
In the several previous posts I tried to provide some insight in how the spectral clustering technique may be applied to the point cloud processing domain. In particular, I have demonstrated different visualizations of eigenvectors and also did a manual analysis of one particular scene. In this blog post I will (finally) describe my algorithm that does automatic analysis of eigenvectors, which leads to unsupervised supervoxel clustering.
The input of the clustering algorithm is , a set of first eigenvectors of the graph Laplacian. Each eigenvector has elements that correspond to the supervoxels is the original problem space. The task of the algorithm is to determine the number of clusters that the data points form in the subspace spanned by the eigenvectors and, of course, assign points to these clusters.
The key insight drawn from previous examinations and discussions of the eigenvectors is that the clusters are linearly separable in one-dimensional subspaces spanned by the eigenvectors. In other words, for every pair of clusters there exists at least one eigenvector so that in its subspace these clusters are linearly separable. Based on this premise I built an algorithm which is a pretty straightforward instance of divisive hierarchical clustering approach. It starts with a single cluster that contains all the data points and recursively splits it in a greedy manner. The following pseudo-code summarizes the algorithm:
The interesting part are, of course, FindBestSplit and SplitQuality functions. But to get this straight, let me first define what a “split” is. A split is a tuple , where is the eigenvector along whose subspace the split occurs, and is a threshold value. The points that have their corresponding elements in the eigenvector less than go to the first cluster, and the remaining go to the second. For example, the figure below shows with a red line a split on the left and a split on the right:
Example splits in the subspaces spanned by the
Which of these two splits is better? Intuitively, the one on the left is more promising than the one on the right. But how to define the split quality? My initial approach was to use the difference between the points immediately above and below the splitting line as the measure. This worked to some extent, but was not good enough. Then I switched to a measure based on the relative densities of the bands above and below the splitting line. Consider the figure below:
Example splits with three bands highlighted. The
“split” band is shown in red, the “top” and “bottom”
bands are shown in yellow
The “split” band is the region of low density around the splitting line. The “top” and “bottom” bands are the high density regions immediately above and below the “split” band. I will omit the details of how these bands are computed, because it is likely that I modify the implementation in future. The quality of the split is defined as , where is the density of the corresponding band.
With this quality measure at hand, the FindBestSplit function simply iterates over all available one-dimensional subspaces (i.e. over all eigenvectors) and finds the split with the highest quality.
The performance of the algorithm is excellent on simple scenes:
Spectral supervoxel clustering of simple table-top scenes
And is rather good (though definitely not perfect) on more cluttered ones:
Spectral supervoxel clustering of cluttered table-top scenes
For example, the green book in the first scene is split into two clusters. In the second scene two small boxes in the center of the image are erroneously merged into one cluster. I think these issues are closely related with the split quality measure and the threshold associated with it. Definitely, there are ways to improve these and I plan to work on it the future.
Before exposing the clustering algorithm as promised in the last blog post, I decided to motivate it by showing how the eigenvectors may be analyzed “by hand”. Hopefully, this will also provide more intuition about what these eigenvectors actually are and how they are related with the data.
Just to remind, the problem I am trying to solve is about segmenting a set of supervoxels into meaningful components. Here a meaningful component means a subset of supervoxels that are close to each other in Euclidean sense, and are separated from the rest by a sharp change in orientation. If we view each supervoxel as a point then the problem is about clustering points in a -dimensional space. (Currently since supervoxels have 3 Euclidean coordinates plus 3 coordinates of the normal vector, however additional dimensions, e.g. color, may be added later.) The difficulty of the problem comes from the fact that the components may have arbitrary irregular shape in these dimensions. Therefore I want to map these points so some other space where the components will correspond to tight clusters, perhaps even linearly separable. The current idea is to use a subspace spanned by the first few eigenvectors of graph Laplacian of the original data.
In the last blog post I provided a visualization of the eigenvectors in the original problem space. For convenience, here are the first four eigenvectors again:
The first 4 eigenvectors in the original problem space
I want to perform clustering in a subspace though, so it is helpful to develop an intuition about how the data look like in it. The figure below demonstrates the data points (that is, supervoxels), projected on each of the first four eigenvectors. (Here and in what follows the data is whitened, i.e. de-meaned and scaled to have unit variance. Additionally, the values are sorted in increasing order.)
Data points in subspaces spanned by the first 4 eigenvectors
Obviously, in each of these subspaces (except for the second) the data is linearly separable in two clusters. What is not obvious, however, is how many clusters there will be in the combined subspace. The next figure shows data points in subspaces spanned by two different pairs of eigenvectors:
Data points in subspaces spanned by first and third (left)
and fourth and third eigenvectors (right)
Now it becomes evident that there are at least three clusters. Will there be more if we consider the subspace spanned by all these three eigenvectors? It turns out there will, see the point cloud below:
Unfortunately, we have just approached the limit in terms of how many dimensions could be conveniently visualized. Though for this particular data set it is enough, there won’t appear more clusters if we consider additional eigenvectors. Summarizing, in the subspace spanned by the first four eigenvectors the data points form four tight and well-separated (linearly) clusters. And these clusters actually correspond to the three boxes and the table in the original problem space.
Now the only thing left is to develop an algorithm which would do this kind of analysis automatically!
Correspondence rejection classes implement methods that help eliminate correspondences based on specific criteria such as distance, median distance, normal similarity measure or RanSac to name a few. Couple of additional filters I’ve experimented with include a uniqueness measure, and Lowe’s ratio measure as in “Distinctive image features from scale invariant keypoints”, D.G. Lowe, 2004. I’ve also explored the tradeoffs in implementing the filters within CorresondenceEstimation itself, or as external CorrespondenceRejection classes. The former is computationally more efficient if the rejection process is done in one pass, while the latter allows for scene-specific squential filter banks.
Follows is a quick reference guide of the available correspondence rejection classes with remarks extracted from the source code.
I concluded the last blog post by noting that it seems to be possible to segment objects based on analysis of the eigenvectors of Laplacian constructed from the point cloud. This time I will provide a visual interpretation of eigenvectors and then describe the problem of their analysis.
Let me start with a quick note on eigenvector computation. As mentioned before, they are obtained through eigendecomposition of Laplacian that represents the surface. In the beginning I used SelfAdjointEigenSolver of Eigen library. It runs in time (where is the number of points), which obviously does not scale well. Later I switched to SLEPc. It can limit computation only to a desired number of first eigenpairs and therefore does the job much faster, but still seems to have polynomial time. Therefore I decided to execute supervoxel clustering as a pre-processing step and then compute the distances over the supervoxel adjacency graph, which has a dramatically smaller size than the original point cloud.
Now let’s turn to the eigenvalues themselves. The figure below demonstrates the voxelized point cloud of a simple scene (left) and supervoxel adjacency graph (right), where the adjacency edges are colored according to their weights (from dark blue for small weights to dark red for large weights):
Voxelized point cloud (voxel
size 0.006 m)
Supervoxel adjacency graph
(seed size 0.025 m), colored
according to edge weight
Eigendecomposition of Laplacian of this weighted adjacency graph yields a set of pairs . Each eigenvector has as many elements as there are vertices in the graph. Therefore it is possible to visualize an eigenvector by painting each supervoxel according to its corresponding element in the vector. The figure below shows the first 9 eigenvectors which correspond to the smallest eigenvalues:
First 9 eigenvectors of the graph Laplacian
The first eigenvector clearly separates the scene into two parts: the third box and everything else. In the second eigenvector the table is covered with gradient, but the first and third boxes have (different) uniform colors and, therefore, stand out. The third eigenvector highlights the second box, and so on.
I have examined quite a number of eigenvectors of different scenes, and I think the following common pattern exists. First few eigenvectors tend to break scene in several regions with uniform colors and sharp edges. In the next eigenvectors gradients begin to emerge. Typically, a large part of the scene would be covered with gradient, and a smaller part (corresponding to a distinct component of the graph) would be have some uniform color.
The overall goal is to figure out the number of distinct components of the graph (that is, objects in the scene) and segment them out. As I admitted before, it is clear that the eigenvectors capture all the information needed to do this, so the question is how to extract it. In fact, this problem has already received a lot of attention from researchers under the name of “Spectral Clustering”. (Yeah, I made quite a detour through all these distance measures on 3D surfaces before I came to know it). The standard approach is described in the following paper:
In a nutshell, the original problem space typically has many dimensions (in our case the dimensions are Euclidean coordinates, normal orientations, point colors, etc.). The clusters may have arbitrary irregular shapes, so they could neither be separated linearly, nor with hyper-spheres, which renders standard techniques like K-means inapplicable. The good news are, in the subspace spanned by the first few eigenvectors of Laplacian of the graph (constructed form the original data) the data points form tight clusters, and thus K-means could be used. This effect is evident in the eigenvectors that I demonstrated earlier. Unfortunately, the number of clusters still needs to be known. There exist literature that addresses automatic selection of the number of clusters, however I have not seen any simple and reliable method so far.
In the next blog post I will describe a simple algorithm that I have developed to analyze the eigenvectors and demonstrate the results.
With my current work on optimizing correspondence estimation across the uv/xyz domains, it is worth providing a topology of the available correspondence estimation classes in PCL. For a highlevel treatment of the registration API, please refere to the registration tutorial.
Correspondence estimation attempts to match keypoints in a source cloud to keypoints in a target cloud, based on some similarity measure, feature descriptors in our case. Although applying scene relevant descriptor parameters and correspondence thresholds may reduce erronous matches, outliers persist with impact on pose estimation. This is due to the implied assumption that for each source keypoint, a corresponding target keypoint exists. The difficulty in estimating model or scene-specific descriptor parameters is another factor.
Follows is a quick reference guide of the available correspondence estimation classes with remarks extracted from the source code.
Last time I wrote about distance measures on 3D surfaces, though I did not give any details about how they are computed. In this blog post I will give a formal definition, followed by two important properties that simplify the computation and provide insights that might help to solve the ultimate goal: identification of distinct objects in a scene.
Given a mesh (or a point cloud) that represents a surface, a discretization of the Laplace-Beltrami operator (LBO) is constructed. This discretization is a sparse symmetric matrix of size , where is the number of vertices (points) in the surface. The non-zero entries of this matrix are the negated weights of the edges between adjacent vertices (points) and also vertex degrees. This matrix is often referred to as Laplacian. Eigendecomposition of Laplacian consists of pairs , where are eigenvalues, and are corresponding eigenvectors.
The diffusion distance is defined in terms of the eigenvalues and eigenvectors of Laplacian as follows:
The biharmonic distance bears a strong resemblance to it:
Here means -th element of eigenvector . Both distances have a similar structure: a sum over all eigenpairs, where summands are differences between corresponding elements of eigenvectors scaled by some function of eigenvalues. There are two properties of these distances that I would like to stress.
Firstly, the summands form a decreasing sequence. The figure below illustrates this point with eigenvalues of Laplacian of a typical point cloud:
In the left image the first hundred of eigenvalues (except to which is always zero) are plotted. Note that the values are normalized (i.e. divided) by . The magnitudes of eigenvalues increase rapidly. On the right the multipliers of both diffusion and biharmonic distances are plotted (also computed with normalized eigenvalues). The biharmonic distance multiplier is plotted for several choices of the parameter . Clearly, only a few first terms in the summation are needed to approximate either of the distances. This has an important consequence that there is no need to solve the eigenproblem completely, but rather is suffices to find a limited number of eigenpairs with small eigenvalues.
Secondly, the distance between two points and depends on the difference between their corresponding elements in eigenvectors and . The figure below demonstrates the (sorted) elements of a typical eigenvector:
One may see that there are groups of elements with the same value. For example, there are about one hundred elements with value close to . The pair-wise distances between the points that correspond to these elements will therefore be close to zero. In other words, plateaus in eigenvector graphs correspond to sets of incident points, and such sets may be interpreted as objects.
Summing up, it seems like it should be possible to identify distinct objects in a point cloud by analyzing the eigenvectors of Laplacian (even without explicitly computing any of the distance measures). Moreover, only a few first eigenvectors are relevant, so it is not necessary to solve the eigenproblem entirely.
Today I started experimenting with half (http://half.sourceforge.net/) a C++ header-only library to provide an IEEE 754 conformant 16-bit half-precision.
The mixed results are that I ended up with fairly lighter binary files (50% lighter as expected) but there is a loss of precision when I convert it back to ASSCII.
The file outdoor.ptx when encoded to binary is just 62M vs 124M.
The conversion back to ASCII is not so great though (because of rounding probably). I am exposing 10 lines from the orginal ASCII file and the one generated after converting back the binary file.
Original ASCII file:
Using half precision float:
As you can notice there are differences starting at 6th decimal position. It could be worth trying to use different rounding options to see if it helps.
Through the usage of OpenMP parallelism instructions I achieved better results for ASCII reading and LZF + J2K PTX file encoding. Here I show the improved results.
Experimental protocol is very similar to the one used earlier : 10 runs in a row of leica_ascii2binary tool with LZF + JP2K encoding option. Each time we measure the ASCII file reading time and the writing.
|File||Nb of points||ASCII||LZF + JP2K|
Compared ASCII reading times with and without OpenMP for both files are shown on the figure below. You can notice the 10 seconds gap achieved through usage of OpenMP.
Compared LZF + J2K encoding times with and without OpenMP are shown next. From the graphics you can notice the gain of almost 3 seconds during the encoding.