**Q:** *What is a code sprint?*

A **code sprint** is an accelerated research and development program, where we
pair talented developers with senior researchers and engineers for 3-6 months of extended
programming work. The model is inspired by the Google Summer Of Code
initiative, and is meant to create open source solutions for interesting 2D/3D
computer perception problems. Each code sprint is financially supported by a
different institution (see below).

Code sprints are made possible through the involvement of our host organization, Open Perception. Please see our mission page for more information.

**Q:** *What does this blog represent?*

The Point Cloud Library (PCL) **developer blog** represents a great way to keep
track of daily updates in PCL sponsored code sprints. Our developers are
writing about their experiences and progress updates while working on exciting
PCL projects.

The following shows the current list of ongoing code sprints. Click on any of the logos below to go to the respective sprint blog page.

The list of code sprints which have been completed is:

We would like to thank our financial sponsors for their generous support. For details on how to start a new code sprint or contribute to PCL, please visit http://www.pointclouds.org/about and our organization’s web site at http://www.openperception.org/get-involved/.

IntroductionThe principal goal of this project will be implement the algorithm and app developed in:

-Fei-Fei, L., Karpathy, A., Miller, S.”Object discovery Via Shape Analysis”.

This automatic object discovery will be useful for robotics or autonomous vehicles and it will be able to find different class of objects. This new approach compares different parameters of an object to classify it into a class. We can explain the function of this approach by giving a simple example: if we want to find a glass into an unknown room, we cannot have a model of all the glasses in the world. But, we can have a number of parameters about the shape that define the common parts of the glasses.

The parameters that defined a object is based on the: “Objectness measures” part in the paper. This is the core of the app and the most important part. And probably a good part to begin to code because is quite easy to code.

IntroductionI begin with main part of the code, and probably the most easy to implement. In this part I write the code to make this measures for each Point Cloud. The input in the code is a point cloud (or set of pointclouds) and the output a vector with the values of the resultant measure. To simplify the code part and the use of this for other aplications, this part is separated from the segmentation, also if anyone want to use other segmentation that the proposed in the paper is usefull.

“Objectnes measures”We have 5 different measures:

- Compactness: This measure looks into how compact is a object. Search the minimum bounding sphere
that contains all the surface of the mesh of the pointcloud.

- Symmetry: This measure analyze the reflective symmetry along the three principal axes, based on the
eigenvalues of the scatter matrix.

- Smoothness: This measure evaluate the quality of the points of the segments, if the segments have points
uniform spread around it, this will score high, if have spread points this will be have a low score.

- Local Convexity: This measure determine the convexity of each polygon edge in the mesh of the point cloud,
and score each segment by the percentage of its edges which are convex.

- Global Convexity: This measure is generated by a convex hull, and after that a mean distance
between the points and the convex hull is recorded to create this measure.

“Future Work”In the future work, I need to implement the segmentation to give data to measure the objects and test this code.

Also, there is one measure missing, one that use all the segments in diferent scenes to make a better clasification. This is the “recurence”, but this depends on the on the number of scenes and segments, this need the segmentation before and analyze the number of similar segments in the objects in all scenes. Objects with similar segments should be in the same category.

Second implementation of Code “Segmentation and creation of object candidates” (Third post)

Wednesday, July 30, 2014

IntroductionI am working on it, This is based on the Segmentation proposed by y Felzenszwalb and Huttenlocher.

Global Radius-based surface descriptor concatenates the RSD descriptor as discussed in the previous post to represent the complete object. GRSD gives a good description of the 3D shape of the object. Below are the set of objects and its GRSD descriptors i.e. histograms. I have used University of Washington’s “Large scale RGBD dataset” for the experiments.

- For an object whose surface is planar but has 2 different planes in the view
- For an object whose surface is planar but has 1 planes in the view
- For an object whose surface is spherical
- For an object whose surface is cylinderical but doesn’t have any planar surface in view

It can be seen that all the descriptors are different from eachother. Planes and box surfaces are similar as the surface characteristics are similar in this case. Both GRSD and RSD are pushed into the pcl-trunk for people to use. The test files for these two features are also included in the trunk for the basic usage of the same.

Currently working on the NURBS for small surface patches. Since NURBS are already available in PCL we will be looking at how to tailor the same for our needs. After this we plan to work on the features that compute the relationship between the surface patches.

IntroductionIn the previous phase, it was presented how to obtain a statistical model from a set of face-meshes. The next step in our project is to “

match” the mean face of the database, with the face obtain from a 3D Camera scan, like the one in the picture below:The matching is done by applying alternatively the following methods.

Rigid TransformationThis method is very similar to the Iterative Closest Point Cloud algorithm, because the goal is to estimate a rotation matrix and a translation vector that would move the average face to an optimal position, near the face of the kinect. Basically, it is required to minimize the error and this is done by calculating the Jacobian matrix of this system.

Of course this process is applied iteratively, and below are presented a few stages of positioning of the model over the scan:

The correspondences between the points is determined by comparing the orientation of the vertices from the model with the closest points from the scan. Here you can see an example with the correspondences. Note that at the edges of the scan, the corresponces tend to form pyramids, because a lot of points from the model have the same closest neighbour.

Non-Rigid TransformationOnce the model is roughly aligned, we need to modify the shape of the model to match the face from the scan. For this we make use of the eigenvectors computed in the previous phase and we calculate a Jacobian matrix for the system: , where is the matrix of eigenvectors, is the current form of the model and is the vector of basis coefficients that need to be determined.

However, there is on more constraint to be applied and that is to minimize the sum , where is the eigenvalue of the corresponding eigenvector. Therefore, to the Jacobian matrix of this transformation, we need to add a diagonal matrix with on the diagonal.

Results

The work on the lccp implementation is mainly done and the pull request is awaiting its approvel. Thanks a lot for the help of the pcl community (especially Sergey and Victor). Thanks to your comments the lccp algorithm is now in a much better shape. Talking about shapes: The next milestone in this GSOC project is the implementation of a shape generator which can be used to create various labeled scenes which can be used to create unit tests or benchmarks for part and/or object segmentation algorithms. I have written a big part of the code already. Now the question is: To which module of pcl should this generator go? I think about putting it into the geometry module. Any comment on this is more than welcome! The next image shows an assembled animal-like object which has been generated from simple geometric shapes (mainly geons).

**INTRODUCTION**

In this post, I will briefly describe the current state of the stereo module and the new features added.

Currently, the stereo module encompass two matching local-based algorithms: 1. Block-based algorithm, which is programed using the Box-Filtering algorithm proposed in [McDonnell81]. 2. Adaptive Cost 2-pass Scanline Optimization, presented in [Wang06]. Both methods use the Sum of Absolute Differences (SAD) as the dissimilarity measure.

As mentioned in the previous blog, the first objective of the present project is to implement the local-based approach proposed in [Min1], for dense correspondence estimation in a pair of grayscale rectified images with an efficient cost aggregation step. Additionally, the cost aggregation step in based on the method presented in [Yoon06], where the weighting function uses a similarity measure based on the color and spatial distances.

**DETAILS**

In order to do so, a new class CompactRepresentationStereoMatching was created in the stereo module. This class inherits from class GrayStereoMatching, which in turns inherits from class StereoMatching, since some pre and post-processing methods are re-implemented. The new class has five member functions with public access: setRadius, set FilterRadius and setNumDispCandidates, setGammaS, setGammaC, which set three data members of type int (radius, filter_radius and num_disp_candidates) and two of type double (gamma_c and gamma_s) with private access, as well as implementing the virtual method compute_impl.

**radius** corresponds to the radius of the cost aggregation window, with default value equal to 5.

**filter_radius** corresponds to the radius of the box filter used for the computation of the likelihood function. The default value is 5.

**num_disp_candidates** is the number of the subset of the disparity hypotheses used for the cost aggregation. The default value is 60.

**gamma_c** is the spatial bandwidth used for cost aggregation based on adaptive weights. The default value is 15.

**gamma_s** is the color bandwidth used for cost aggregation based on adaptive weights. The default value is 25.

Similarly to the previous methods, the current class is based on the SAD matching function, and it estimates the per-pixel cost efficiently using the Box-Filtering algorithm.

To test the algorithm, the Middlebury stereo benchmark (http://vision.middlebury.edu/stereo/) dataset is going to be used.

[McDonnell81] | McDonnell, M. J. “Box-filtering techniques”. Computer Graphics and Image Processing 17.1, 65-70, 1981. |

[Wang06] | Wang, Liang, et al. “High-quality real-time stereo using adaptive cost aggregation and dynamic programming.” 3D Data Processing, Visualization, and Transmission, Third International Symposium on. IEEE, 2006. |

[Yoon06] | K.-J. Yoon and I.-S. Kweon. “Locally Adaptive Support-Weight Approach for Visual Correspondence Search”. In Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR), 924–931, 2005. |

RSD Feature is a local feature histogram that describes the surface local to a query point. There is pcl implementation for this that is available in the features folder. With the help of my mentor I understood the algorithm by which this feature is obtained. To very if this is working perfectly we took a real object whose radius is known and generated the RSD computation on the entire point cloud of the object. This gives RSD Feature Histogram for all the points in the pointcloud. We can also get the min and max radius of the local surface patch around each point in the pointcloud. I generated various combination of parameters to know how the radius computed varies. Below is the object used which has a radius of 3.5cm which is 0.035m

Below are some of the params chosen and their corrresponding effect on the min and max radius in the local surface patch of each point. For Normal Radius search = 0.03 Max_radius = 0.7 (maximum radius after which everything is plane) RSD_radius search = 0.03

For Normal Radius search = 0.03

Max_radius = 0.1 (maximum radius after which everything is plane)

RSD_radius search = 0.03

For Normal Radius search = 0.02

Max_radius = 0.1 (maximum radius after which everything is plane)

RSD_radius search = 0.03 - This is found to be good way for generating histograms

I tried to do MLS smoothing on the point cloud data and then compute the RSD feature which makes the normal computation better and resulting in consistency over all the points on the object surface.

For Normal Radius search = 0.03

Max_radius = 0.7 (maximum radius after which everything is plane)

RSD_radius search = 0.03

For Normal Radius search = 0.03

Max_radius = 0.1 (maximum radius after which everything is plane)

RSD_radius search = 0.03

For Normal Radius search = 0.02

Max_radius = 0.1 (maximum radius after which everything is plane)

RSD_radius search = 0.03 - This is found to be good way for generating histograms

Now I tested out how the actual feature looks like at a point on the sphere to check if it matches with the histogram in the paper. The same is compared between raw point cloud from the kinect and MLS smoothened point cloud. Below is the result of the same.

It was really hard to fix the previous image that it can show the histograms with values and good resolution. So below is the snapshot of the spherical and cylinderical surfaces.

Cylinderical Surface:

Spherical Surface:

Next post will have the details of how GRSD results are and how they differentiate the characteristics of two surfaces. GRSD code from the author will be integrated into the PCL code base. We also plan to categorize the pipeline into modules that fit into the PCL code base as features, surface and segmentation sections. These information will be posted in the next post.

As the first weeks of GSOC are over I am going to summarize my progress so far. The goal of my project is to apply machine learning techniques on collected skeletal data for activity recognition. Most research in this area has been using Nite or Kinect SDK for skeleton tracking. As PCL already has a pose detector available, we want to try using it to collect skeletal information, which however, requires some modifications of the gpu/people module, which was the major focus of my work until now.

**First steps**

Before the actual start of coding it was important to research the existing methods for action recognition. We decided to implement a novel approach published by *Ferda Ofli, Rizwan Chaudhry, Gregorij Kurillo, Reneé Vidal, Ruzena Bajcsy - “Sequence of the Most Informative Joints (SMIJ): A new representation for human skeletal action recognition”*. In the presented paper, skeleton joints are ranked based on their variance and the sequence of the resulting rankings is used as the input feature vector for classification.

The next step was to make a draft of activities that we are planning to recognize. We decided to stick with the set of actions proposed by the IAS-Lab of University of Padua, which includes: *check watch, cross arms, kick, get up, pick up, point, punch, scratch head, sit down, stand, throw from bottom up, throw over head, turn around, walk and wave*.

**Trying out the pcl/gpu/people module**

Obtaining skeleton data with gpu/people module was not as easy as it seemed to be from the first sight. After the challenge of compiling the source with GPU enabled and making it run with Openni, the detector worked with following configuration:

- Ubuntu 14.04 LTS
- CUDA-toolkit 5.5.22
- Openni 1.5.4
- Avin2 Sensor Kinect driver v 5.1.2

The tree files necessary for the detection were not provided in trunk and should be downloaded here: https://github.com/PointCloudLibrary/data/tree/master/people/results .

The pose detector runs on the RGB and Depth data obtained from an RGB-D sensor (we use Microsoft Kinect) and produces colour labelling of each pixel, depicting the corresponding body part. After testing the detector following observations should be mentioned:

- The people detector worked very well in certain positions, the best case includes frontal orientation, near-range with no walls or ground visible.
- Despite the picture in the tutorial , the current implementation does not provide positions of the skeletal joints, which was also an issue for discussion . Consequently, obtaining the joint position is one of the challenges of this project.
- The program has problems with large surfaces, mostly the walls and the floor, which are often labelled as the human. This issue also occurs in the official demo video . As out project requires full body visibility, it is necessary to fix this problem (especially as it comes to the floor).
- In unusual positions, especially while turning, the body segmentation is usually correct, but the labelling often fails.

Trying out the pcl/gpu/people pose detector with one tree (left), three trees (middle), problems with large surfaces (right)

**Extending the pose detector to estimate the positions of the joints**

At the beginning it was necessary to browse through the code in order to understand how the program works. The detection happens by calling the process() function of the PeopleDetector class. In the end of the function, the variable sorted2 (with type BlobMatrix) contains the data of each body-part-label.

There are 26 labels (currently 24 used) all together:
*Lfoot, Lleg, Lknee, Lthigh,Rfoot, Rleg, Rknee, Rthigh, Rhips, Lhips, Neck, Rarm, Relbow, Rforearm, Rhand, Larm, Lelbow, Lforearm, Lhand, FaceLB, FaceRB, FaceLT, FaceRT, Rchest, Lchest, Rshoulder, Lshoulder*.

The labels *Rshoulder* and *Lshoulder* exist but are currently not implemented in the detection.

The PeopleDetector class was extended with an array containing the coordinates of the joints and finding a way to calculate those coordinates was a major challange. The first idea was to simply use the mean values of the corresponding blobs. In spite of the simplicity of this approach, the results were satisfying. The second idea was to use the buildTree() function, which estimates the optimal blob-tree starting from the neck and then recursively browse through the child-blobs and use their mean values. The buildTree() function uses the “ideal” lengths of the limbs to estimate the optimal blobs (those optimal values are defined here). I also want to thank Koen Buys for giving me tips on the calculation.

As we are interested in the position of the limbs, using the mean values of the blobs is not always appropriate. For example, we are more interested in the upper border of the hip, which is connected to the torso, instead of the hip centroid. Besides this, we are also interested in the shoulder position, which was not implemented. The elbow label was also a special case as it usually has very small area and is often not detected. Consequently, I made some additional modifications to estimate those positions, which are described below.

Shoulders:

- Basic idea: Use the highest part of the chest blob as the shoulder position
- The point cloud of the left/right chest blob is extracted.
- The maximum Y-value of this point cloud is calculated
- All the points of the chest-blob that have Y-value close to to the maximum (chosen threshold: 10 cm) are taken, their 3D mean value is calculated and used as the shoulder position

Elbows:

- If an elbow-blob already exists, nothing is done: the mean value is used.
- Otherwise: The elbow is the point of the arm (upper arm) blob, which has the largest distance from the shoulder

Hips:

- The mean of the “lowest points” (in a certain threshold) of the hip-blob (not the mean of the whole blob). This modification was done due to the fact that the blob itself is covering the whole torso.

In general the quality of the joint position depends directly on the quality of labelling. As no tracking is implemented yet, the movement of the joints is not continuous.

Skeleton visualization: “good” example with “checking watch” action (left), labelling fails when the floor is visible (right)

**Using Ground Plane People Detector for body segmentation**

As mentioned before, the original People Pose Detector has some problems with large surfaces, especially the floor. We tried to solve this problem by combining the original People Pose Detector with Ground Plane People Detector (implemented by my mentor, Matteo Munaro), to segment the body cluster before the actual labelling.

In the resulting application, at first the three points of the ground plane are selected, after which the Ground Plane People Detector removes the ground plane and estimates the point cloud belonging to the human. The points of the cluster are then transformed to the depth image, setting all other depth pixels to very high values.

Some additional corrections were added to improve the segmentation results (depth filtering, extending the legs, as too many ground floor points are removed). Additionally, the RGB and Depth calibration (8 pixel shift) is done as proposed by Lingzhu Xiang .

Using the Ground Plane People Detector improves the performance significantly if the full body visibility is required as it completely solves the large-surfaces-problem.

It should also be considered, what should be done if the Ground Plane People Detector does not detect the human (meaning that none of the detected clusters had confidence over the defined threshold). In this case we use the segmentation from the last frame, in which the person was detected.

Pose detector with full body visibility without (left) and with (right) segmentation.

Examples of activities: cross arms, check watch, kick, point, turn, wave

**Storing the data**

I am currently working on completing the framework for data collection. Storing skeletal information (3-D position of the joints) in TXT files is already implemented. People Pose Detector already includes optional storage of depth and RGB-data as PNG images. However, we decided to store the RGB and Depth images with more efficient method using the **lzf-format** (thanks to Lingzhu Xiang for the tip). Another idea I am working right now is to run the people pose detector offline on the stored images to use the full speed.

IntroductionThe goal of this project is to implement a program that will modify the expressions of several scanned faces according to the facial expressions captured by a RGBD camera.

The first step is to create a statistical model based on a training database of faces. The training set used so far was the one provided by the FaceWarehouse project and it consisted of 3D meshes stored in

.objfiles. For further information, please consult the following link: http://gaps-zju.org/facewarehouse/

AproachFor each face in the training set, a column vector

Swas created and it contained the coordinates for every vertice of the mesh. Afterwards, the avearage vector and the covariance matrix were calculated. Normally, the covariance matrix would be calculated as , however one should note that this matrix is 34530 by 34530 and in order to compute the statistical model, the most significant eigenvectors are required. To speed up the calculations, a matrix was formed by joining the vectors and the eigenvectors for were calculated. It is important to note that the size of is determined by the number of faces and that the eigenvectors of the covariance matrix can be obtained by left multiplying to the eigenvectors of . Once the eigenvectors are calculated, the statistical model is obtained according to the formula: , where is the weight of an eigenvector, determined by multiplying a random number in the range[-2,2]with the corresponding eigenvalue. The final results of this phase are presented below. The average face is:And the model is:

As you can see, the model obtained is a bit flattened compared to the mean face, that is because in the training set the majority of the faces are a bit rounded, however this project needs a model to take into consideration several types of faces, and this is why we need to consider the covariance of the samples in the database.

Feature Steps

- For this model, only the vertices of the faces were used, however the texture coordinates also need to be taken into consideration. Unfortunately, the database does not provide any information about the colors as of yet. Once the data is available the model needs to be adapted for this feature
- Once the statistical model is fully configured, a 3D registration algorithm must be applied to project the facial expression of a testing sample to the model.

ReferencesT. Vetter and V. Blanz, A Morphable Model For The Synthesis Of 3D Faces, Max-Planck-Institut, Tubingen, Germany

1. Reduction of computational redundancy in cost aggregation in stereo matching.

Saturday, June 21, 2014

**INTRODUCTION**

A stereo image pair can be used to estimate the depth of a scene. To do so, it is necessary to perform pixel matching and find the correspondences in both images. Different methods for stereo correspondence have been proposed and they are classified in two classes:

- Correlation-based algorithms: Produce a dense set of correspondences.
- Feature-based algorithms: Produce a sparse set of correspondences.

Additionally, correlation-based algorithms are usually classified in two main groups, local (window-based) or global algorithms. However, some methods do not fit into any group, and are classified in between them.

The current work is based on correlation-based algorithms, more espefically local and window based-methods, intended for applications where a dense and fast output is required.

The input of the algorithm are two calibrated images, i.e. the camera geometry is known. The images are also rectified in order to limit the correspondence to a 1D search.

**CORRESPONDENCE ESTIMATION**

The general methodology for stereo vision local approaches can be summarized as follows. An energy cost is computed for every pixel p by using the reference and d-shifted right images:

(1)

Then, the aggregated cost is computed by an adaptive sum of the per-pixel cost:

(2)

Finally, a Winner-Takes-All method is used to find the best of all the disparity hypothesis:

(3)

This whole process is complex and time consuming since it is repeated for every hypothesis d. A representation of the conventional approaches can be observed in next figure [Min1].

Min et al. [Min1] introduced a new methodology to reduce the complexity, by finding a compact representation of the per-pixel likelihood, assuming that low values do not provide really informative support. In this case, only a pre-defined number of disparity candidates per pixel are selected to perform the cost aggregation step. The subset of disparity hypotheses correspond to the local maxima points in the profile of the likelihood function, previously pre-filtered to reduce the noise, as shown in the following example:

The disparity hypotheses estimation and cost aggregation processes proposed by Min et al. are depicted in the next figure, where Sc is the subset of disparity hypothesis with size Dc:

[Min1] | Min, D., Lu, J., & Do, M. N. “A revisit to cost aggregation in stereo matching: How far can we reduce its computational redundancy?.” In IEEE International Conference on Computer Vision (ICCV), 2011 (pp. 1567-1574). |

2. Fitting superquadrics (a.k.a. The Horror of Box Constrained Non-Linear Optimization)

Friday, June 20, 2014

In this post we will see some initial results for fitting superquadrics to full pointclouds. Let’s quickly start with the math so you have a good idea of how the code works.

I will remind you again the superellipsoid equation (yes, I will keep bringing up the equation again and again so it stays forever on your brain):

(1)

We will call the expression to the left . A 3D point() will belong to the canonical superquadric defined by the parameters if . To have a general superquadric, we must consider the translation and rotation terms, hence the general equation (1) has the following form:

(2)

where is the translation of the superellipsoid center with respect to a given global frame and are the column vectors of the rotation matrix of the superellipsoid (again, with respect to some defined global axes). In fact, to be completely rigurous, we should express as a function of both the point being evaluated and the superellipsoid parameters being used: where

In order to find the superellipsoid that best fit a given full pointcloud (composed by 3D points with , we need to minimize the error between each point and equation (2). In its most basic form, we could try to minimize this equation:

Some wise people suggested a couple of modifications to the basic version above and came up with this:

the factor makes sure that the superellipsoid obtained is the smallest possible. The additional exponent improves the convergence time.

As of now, I will not go into more details on the math behind since I am running out of time to write this entry. However, there are a few things that you should remember of this post:

- Our goal is to solve a Non-Linear Square problem.
- We have 11 parameters to find
- Our parameters are bounded, which means that they have upper and lower limits. This constraints the type of algorithms we can use to optimize our solution.
- The more dense the pointcloud is, the more factors will be considered in the equation above.

As of now, we have implemented the method described in [Duncan13] to fit segmented, full pointclouds to superellipsoids. In this post we will present some initial results obtained for 4 test cases (sphere, box, cylinder and oval shape). We tested them in 3 scenarios:

- Canonical superellipsoids with no noise.
- General superellipsoids (rotation and translation added) no noise.
- General superellipsoids with noise (up to 5% of the length of the smallest principal axis).

Let’s start with the initial test cases:

Case 0: Parameters used | ||||
---|---|---|---|---|

Parameter | Sphere | Ellipsoid | Cylinder | Box |

a | 0.25 | 0.04 | 0.05 | 0.025 |

b | 0.25 | 0.08 | 0.05 | 0.08 |

c | 0.25 | 0.06 | 0.1 | 0.18 |

e1 | 1 | 0.75 | 0.25 | 0.1 |

e2 | 1 | 0.75 | 1.0 | 0.1 |

The pointclouds are shown in the following figure and are also available in my repository .

The code used for the results shown in the rest of this post can be found here .

For the first test case we generated full sampled pointclouds by using the sampling code we presented in our previous post. To initialize the maximizer, we used the pointcloud’s bounding box information for the superellipsoid dimensions and global transform. For all cases we used an initial value of 0.5 for and 1.0 for (these values are in the middle of the allowed range).

Results for the fitting are shown in the following table. It can be seen that the fitting works pretty well, which is kind of expected since this is the most basic case.

Case 0: Results | ||||
---|---|---|---|---|

Parameter | Sphere | Ellipsoid | Cylinder | Box |

a | 0.247 | 0.039 | 0.0499 | 0.025 |

b | 0.247 | 0.079 | 0.049 | 0.079 |

c | 0.247 | 0.059 | 0.099 | 0.179 |

e1 | 0.99 | 0.753 | 0.271 | 0.10 |

e2 | 0.99 | 0.753 | 0.97 | 0.13 |

For case 1, we modified the test pointclouds by applying a transformation to them. Details of the transformations for each case are shown below (the parameters remain constant so we omit to repeat them).

Case 1: Parameters | ||||
---|---|---|---|---|

Parameter | Sphere | Ellipsoid | Cylinder | Box |

x | 0.5 | -0.6 | -0.4 | -0.1 |

y | 0.8 | 0.2 | 0.7 | 0.3 |

z | 0.0 | 0.0 | 0.3 | 0.5 |

roll | 0.0 | 0.2 | 0.6 | 0.0 |

pitch | 0.0 | 0.5 | 0.9 | 0.0 |

yaw | 0.3 | 0.3 | 0.8 | 0.8 |

The results are shown below. We observed that the parameters that remain constant keep approximately the same fitted values as in Case 0, so we won’t repeat them in the next table.

Case 1: Results | ||||
---|---|---|---|---|

Parameter | Sphere | Ellipsoid | Cylinder | Box |

x | 0.5 | -0.6 | -0.4 | -0.1 |

y | 0.8 | 0.199 | 0.69 | 0.3 |

z | 0.0 | 0.0 | 0.29 | 0.49 |

roll | 0.0 | 0.199 | -0.117 | 0.0 |

pitch | 0.0 | 0.49 | 1.02 | 0.0 |

yaw | 0.0 | 0.29 | -0.055 | -2.34 |

We can observe that the translation values are well fitted, while the same is not the case for the rotation values. In the next post we should discuss some ideas to fix that.

Finally, we added noise to the pointclouds. The values used are shown in the following table (meaning that a uniform disturbance between was randomly applied to each point in the pointcloud.

Case 2: Parameters | ||||
---|---|---|---|---|

Parameter | Sphere | Ellipsoid | Cylinder | Box |

0.01 | 0.002 | 0.0025 | 0.0015 | |

Percentage | 4% | 5% | 5% | 6% |

The pointclouds are shown in the following figure:

The final parameters are shown below:

Case 2: Results | ||||
---|---|---|---|---|

Parameter | Sphere | Ellipsoid | Cylinder | Box |

a | 0.24 | 0.039 | 0.049 | 0.025 |

b | 0.245 | 0.079 | 0.049 | 0.08 |

c | 0.249 | 0.0601 | 0.102 | 0.19 |

e1 | 1 | 0.76 | 0.35 | 0.33 |

e2 | 0.91 | 0.71 | 0.92 | 0.1 |

x | 0.49 | -0.59 | -0.39 | -0.1 |

y | 0.8 | 0.19 | 0.70 | 0.3 |

z | 0.0 | 0.0 | 0.30 | 0.49 |

roll | 0.0 | 0.19 | 0.95 | 0.0 |

pitch | 0.0 | 0.50 | 0.47 | 0.0 |

yaw | -2.8 | 0.30 | 1.34 | -2.3 |

I should probably put a parallel table with all the original values so you can compare visually more easily. In any case, couple of observations:

- Rotation final values are the ones with the biggest errors.
- Noise levels that exceed 5% are not acceptable (the fitting values and vary significantly, altering the shape of the object. The other parameters keep being reasonably accurate.
- We are not using any loss function to alleviate the effect of the outliers. Here that did not prove particularly important, but when we do experiments with data that is not synthetically obtained (real pointclouds of objects) it will probably matter.
- A crucial requirement for a good fit is the initialization of the Z axis of the object (revolution axis).

[Duncan13] | Duncan, Kester, et al. “Multi-scale superquadric fitting for efficient shape and pose recovery of unknown objects.” Robotics and Automation (ICRA), 2013 IEEE International Conference on. IEEE, 2013. |

In this post I will talk about superquadric uniform sampling. You might be wondering why you should care about sampling. Well, there are 2 reasons:

*Synthetic data:*To test our superquadric fitting algorithm (next section) we will start using as a baseline canonical superquadrics with some added noise. Once a fitting algorithm works with these complete pointclouds, we will be able to proceed with more complex shapes (that are not necessarily superellipsoids but close enough to attempt to fith them).*Debugging purposes:*Visualization of our results.*Beautiful math:*The math of this part is simple, yet elegant. If you want to know the details, peek the code (linked below) or read the paper mentioned later in this entry.

From last time, you might remember that the Superellipsoids can be expressed with the explicit equations:

Now, let’s say that we want to generate pointclouds of different superellipsoids. Also by now let’s assume we only care about canonical superellipsoids (no translation and no rotation). A first idea would probably be to just sample the values of and to generate the 3D samples. Let’s see what happens if we use this simple approach:

You might notice that the results for the sphere and the oval shape are reasonably well distributed; however the results for the cylinder and the box are far from even. In general, the naive approach of uniformly sampling and would work well for high values of and (where high means closer to 1 than to 0). For lower values (such as for the box and cylinder cases where ) the performance of naively sampling the angles would degrade.

The following figure shows a more reasonable sampling:

The samples above were obtained by performing a simple technique proposed by Pilu and Fisher [PiluFisher95]. These authors noticed that in order to obtain an uniform sampling, the 3D distance between samples had to be constant; however, uniform sampling distance does not correlate with uniform angle steps.

An implementation of the algorithm proposed by Pilu and Fisher can be found in my PCL fork while the source example sampleSQ.cpp generates a pointcloud for a superellipsoid defined by user input parameters.

[PiluFisher95] | Pilu, Maurizio, and Robert B. Fisher. “Equal-distance sampling of superellipse models.” DAI RESEARCH PAPER (1995). |

**Difficult poses for the current PCL people detector**

This project aims to improve people detection capability in PCL. To validate the improvement we need measurements and data as proof. Therefore the first step of the project is to collect and establish datasets of relevant human poses. PCL’s people detector assumes that human will be in upright poses instead of arbitrary poses, and the multiple stages of detection including segmentation clustering and classification in its implementation rely on such assumption, making it susceptible to varied human poses in real world. Thus I intend to collect various representative poses that deviate from upright poses. As is discussed with my mentor Matteo, those include:

- Crouching
- Hands up
- Sitting
- Dancing/running
- Lying on sofa or floor

These poses are not equal. Crouching, dancing, and running pose challenges to the image based human classifier which currently assume a sample window of a full-body upright pose. Sitting likely involves occlusion of some part of body and increases difficulty in segmentation. Hands up would introduce errors in segmentation or even false segmentation in the current head based subclustering as it determines each subcluster by its highest local maxima which could erroneously be the hands. While lying on sofa or floor would completely defeat the current approach of segmentation because the person is no longer an independent object to be segmented. So these are the types of data we will collect and evaluate upon.

**Methods to collect data**

There are different ways to collect point cloud data, but as is suggested in OpenNI and PCL Recording Codes, the most efficient capturing method in PCL is `pcl_openni_image` (or the only method that works because other ones in PCL either totally fail or explode in I/O) which grabs RGBD frames in compact pclzf format (samples). You use `pcl_openni_image '#1'` to test Kinect configuration and `pcl_openni_image '#1' -visualize 0` to save batch frames. This produces compact data stream in 400KB per frame, or 12MB/second, with point clouds reconstructed afterwards when being used. You can visualize the data with `pcl_image_grabber_viewer -pclzf -dir captured_frames/ -fps 30`, or programmatically replace `pcl::OpenNIGrabber` with `pcl::ImageGrabber` like the image grabber viewer. The other way to collect data is to save raw frames and camera information in ROS bags and play back the image frames to reconstruct the point cloud on the fly and feed depth-registered point cloud topic to PCL via `pcl_ros`.

**Problems with depth registration**

pclzf format seems fine, until it is actually reconstructed to point clouds with a large bias in depth registration:

Depth registration is the first step of data collections. PCL’s people detector makes use of image based people classifier, which relies on correctly color registered point cloud. Somehow there are some peculiarities in how we perform the depth registration and calibration prior to that. The image above shows that color of the background is mapped onto the points that belong to the person.

My guess of this was that this might be caused by wrong calibration, or loss of calibration during the capturing and reconstructing process. So after redoing the intrinsic calibration of RGB camera and IR camera and extrinsic calibration as described in openni_launch tutorials and this Kinect calibration tutorial, I tried the other method with ROS `openni_launch depth_registration:=true publish_tf:=true/false`. The problem with depth registration persists, and OpenNI seems to ignore my calibration parameters or frame transforms no matter what and only use its built-in factory default parameters:

It turns out PCL does not perform any internal calibration at all and relies on OpenNI providing correct depth registration, and there is no way of updating OpenNI’s calibration parameters. Yes you can calibrate the Kinect all you want but there is no way to make use of the result in existing code base. This thread about what is happening behind OpenNI depth registration with pointers inside is a good read.

We can still do the whole depth registration process manually, as in pointcloud_utils.cpp, but unfortunately the data captured by `pcl_openni_image` is already depth registered somehow with wrong extrinsic calibration. To avoid spending too much time on perfecting data calibration, I decided to make a minimal fix to extrinsic error in the depth registration in pclzf data by shifting the RGB layer along X axis, in pseudo-code like this:

```
for (x = 0; x < width; x++)
for (y = 0; y < height; y++)
cloud(x, y).rgb = cloud(x + 8, y).rgb;
```

The result looks mostly acceptable with some residual “borders” around and can be improved later on:

Note that green box in the right of the person.

**Datasets and first results**

I collected datasets of various poses standing, crouching, sitting, and lying on sofa. One standing dataset looks like this (selected thumbnails):

There is also this result demonstrating the failure mode of the current head based subclustering method:

These extra erroneous segments (green boxes) are caused by the hands up pose with which the current clustering method would recognize the hands as heads. So this is where I will improve upon.

The following video are the first results on all current collected datasets. Green boxes are segmented clusters and red boxes are people detections. The minimum height is set to 0.1 so you will see lots of green boxes, which is for evaluating the current performance of the segmentation.

Some major observations for the results:

- Segmentation is very good and almost never misses the true person clusters.
- Hands up pose likely generates a bunch of false positive clusters, as explained above.
- Segmentation totally fails in the sofa dataset.
- The classifier can have some room of improvement.

So this is about everything for this time. Next time I will write about dataset annotation and performance improvement of some first techniques that I will implement.

Superquadrics are a family of geometric shapes that can represent a wide array of diverse primitives using a small number of parameters. As an example, look at the figure below: The first row depicts 5 common household objects. The second row shows the superquadrics that more closely resemble them.

Superquadrics were initially introduced in the computer graphics community by Alan Barr [Barr81], but they were later adopted by the robotics community as an effective modelling tool to approximate objects shape. In general, superquadrics include superellipsoid and supertoroids, but for most practical uses, we care for only superellipsoids. These can be expressed with the following formula:

(1)

As it can be seen, superellipsoids in their canonical form can be expressed by 5 parameters:

- : Scaling factors along principal axes
- : Shape factor of the superellipsoid cross section in a plane orthogonal to XY containing the axis Z.
- : Shape factor of the superellipsoid cross section in a plane parallel to XY.

If a general transformation is considered, then the total number of parameters required to define a superellipsoid is 11 (the 6 additional being the rotation and translation degrees of freedom )

Expression (1) shows the implicit equation of the superellipsoids. Its parametric solution can be expressed as:

with and . In our next post we will learn how to generate pointclouds for superellipsoids (which is not as simple as just sampling and ! Stay tuned :).

[Barr81] | Barr, Alan H. “Superquadrics and angle-preserving transformations.” IEEE Computer graphics and Applications 1.1 (1981): 11-23. |