Point cloud has become a popular representation for three-dimensional (3-D) models in recent years owing to the improvements in digital scanning technology. Understanding the shape of objects in point clouds is one of the most fundamental problems in shape processing, and can promote meaningful research, such as multidimensional media, dealing with semantic-based knowledge systems fields. The key challenge to shape understanding is to improve the structure and the topology representation, with the ultimate goal being to obtain an optimized shape representation model. Naturally, shape representation becomes more difficult with a large number of loops in an object.
In this study, the understanding and recognition of an object are related to the cognition and learning of geometric and topological properties of an object, and the purpose is to distinguish the object from others by determining the object type based on the geometric and semantic features. Many existing algorithms refer to semantic segmentation.12.3.–4 For a better understanding of 3-D shapes, some methods were used to analyze the object by utilizing its structure,56.7.–8 and others made use of the topology.910.11.–12 Ning et al.13 introduced a shape decomposition and skeleton extraction method that could understand the shape of objects without constructing the mesh or any other surface representation, but it would fail when dealing with an object containing a loop.
To overcome this problem, we propose an improved algorithm, extracting junction points to handle the existence of a loop in objects and generating an improved shape semantic graph (SSG) representation of the objects. The SSG is defined by a set of detected components and their connections. Based on the SSG, our algorithm can not only process the simply connected object, but also deal with the nonsimple connected object. We validate our method by recognizing objects on various point-sampled models. We can also use our SSG to identify different objects from a point-sampled models database. In summary, our paper makes the following contributions:
1. An automatic extraction method of a decomposed skeleton that increases junction points aiming to better determine the complete topology.
2. An improved representation of an object shape called an SSG that can better describe object containing a loop.
3. An efficient recognition algorithm, based on a two-step similarity measurement of an improved SSG.
The rest of this paper is organized as follows. In Sec. 2, we review previous studies that are closely related to ours. In Sec. 3, we describe our process and give detailed steps of our proposed method with the terminology involved in the method. After a discussion of the decomposition method in Sec. 4, we present an SSG of the object, and analyze on its properties in Sec. 5. In Sec. 6, a two-step similarity measurement method is described that recognizes different objects based on the SSG. Finally, we discuss the efficiency and of our decomposition method and make comparisons by providing the results of the boundary extraction and SSG. We also demonstrate the application of object recognition based on the SSG in point model database.
In recent years, researchers have proposed many shape understanding methods for planar shapes.1415.16.–17 In 3-D shape understanding, Attene et al.18 classified the methods of shape understanding into two categories from the computational perspective: the representation based on geometry and structure, and the representation based on topology.
Geometry and Structure-Based Representation
Geometry and structure-based representation involves the scope of the object (geometry representation), the object characteristics, and object decomposition components (structure representation). Generally, the definition of an ideal shape descriptor is required to capture and compute the main features of a surface, and extract the geometric shape that is invariant to rotation, translation, and scaling. This representation distinguishes the local and global features that could be combined and abstracted into a compact representation that is useful in various applications such as shape matching, shape retrieval, and shape comparison. The research institution, Institute of Applied Mathematics and Information Technologies—Genova (IMATI CNR), developed the project AIM@SHAPE. The goal was to develop a semantic-based shape representation, and design a semantic-oriented tool to obtain, construct, transfer, and deal with the shape using related knowledge.
Generally, the feature representation of a surface has two types: a local surface descriptor and a global surface descriptor. The global surface descriptor mainly describes the whole or one of the most significant features of the 3-D objects that are commonly used in 3-D object matching and classification. The local surface descriptor represents the geometrical features of a vertex’s neighborhood on the surface, which can be used in object identification, model matching, and shape registration.
At present, there are numerous studies focused on using local surface descriptors to represent the shape of an object.1920.–21 In the following, we will review several similar studies in shape understanding of the 3-D object.
De Figueiredo and Kehtarnavaz5 assumed that the object was composed of some smooth fragmentation sets and denoted the object as an attribute graph in which the attributes of each node was defined by the Gaussian curvature of the corresponding fragmentation. Their object recognition was primarily based on the graph matching method. Stein and Medioni6 focused on the intensive data, and they generally adopted two main characteristics that were coded to retrieve and match information from database quickly. Chua and Jarvis7 presented a feature descriptor—point signature, then used it for recognition of an object based on the calculation of characteristics and voting mechanism. Johnson and Hebert8 proposed a spin image representation method to determine the surface shape from the dense sample points. Sidi et al.22 introduced an unsupervised cosegmentation method to reveal the semantic shape parts and established their correspondence across the set. Guo et al.20 provided a guidance for the selection of an appropriate 3-D feature descriptor in different applications, and further they presented a comprehensive survey of existing local surface feature-based 3-D object recognition methods.21
Topology-based representation can capture and understand the shape of spatial objects by matching and distinguishing different objects through mathematical tools.23 Classical tools such as Morse theory,23,24 Reeb graph, homotype, and homology are suitable for dealing with several issues related to shape understanding. Morse theory sets the foundation for associating the topology of a given manifold with the critical points of a smooth function defined on the manifold. In recent years, Morse theory has been applied successfully to data visualization in the scalar field, and is often used to construct a multiresolution structure.
Many methods adopted the Reeb graph to analyze the topological structure of models and obtain a representation of the corresponding topology, further generating the semantic description of an object that can be used for shape understanding and recognition.
Hilaga et al.9 made use of the surface geodesic distance as the Morse function and proposed a multiscale Reeb graph algorithm. Based on the Morse theory and Reeb graph, Biasotti10 presented an extended Reeb graph which can be used to represent the relationship of points. This method could provide an effective way to classify, simplify, and store the model. Dey et al.11 investigated the mesh segmentation using a smooth function defined on the discrete domain based on Morse theory. Xiao et al.12 adopted a discrete Reeb graph approach to analyze the topological structure of the human model. In addition, Bespalov et al.25 proposed a distance matrix of vertices, by which the model is decomposed to acquire the Reeb graph. Tung and Francis26 provided an incremental Reeb graph algorithm that adopted the height function as the Morse function. Hui and Floriani27 proposed a two-level topological decomposition method and the decomposition relationship between components to implement shape understanding. Schnabel et al.28 described semantic entities as a constrained shape graph, and studied the shape understanding, including the shape problem of 3-D point cloud data. Floriani et al.29 presented TopMesh, a tool for extracting topological information from nonmanifold, 3-D objects with parts of nonuniform dimensions.
Many works are proposed for extracting the skeleton from an object to represent the topology. Dey and Sun30 introduced a mathematical definition of the curve skeleton. Mesh contraction using Laplacian smoothing was first employed by Au et al.31,32 to find skeletons of mesh surfaces and was extended to handle point sets by Cao et al.33 Tagliasacchi et al.34 presented a mesh-based contraction algorithm by incorporating Voronoi pole structure into the mean curvature flow. Our work is related to the research by Ning et al.13 that had errors when dealing with an object containing loops. Our work depends on the following crucial aspects:
(1) Our algorithm integrates the structure and topological characteristics and proposes an optimized SSG that could make shape representation more intuitive and robust and (2) our topological relation representation devises a boundary surface for different parts of the object, thus maintaining the topological structure information.
In this paper, we propose an effective method to understand and recognize the object based on decomposition and topology relations. Decomposition means decomposing the object into components, and topology relations indicate the connection between these components. The characteristics of an object can be easily recognized using the structure and topology representation. Assume that the 3-D object is represented by and the skeleton is denoted by , we give the following definition:
Decomposition: The original object is comprised of a set of points and can be decomposed into different parts , in which . Each part has a relatively independent meaning for .
Topological relations: We use a decomposed, centralized skeleton to describe the topological relations of . can provide the shape of and has a one-to-one correspondence with , i.e., , and corresponds to , and so on.
Feature points: A set of critical points that could be considered as the representative points for different parts, denoted as . They conform to the norm of human perception.
Central point: The center of the object is defined as the point with the minimum average geodesic distance to all other points, especially the point that satisfies .
Junction points: The intersection of two neighboring parts and denoted as .
The feature points are often located on the contour of objects. We first adopted the alpha-shape-based method to detect the contour points, and then obtained the optimal feature points by clustering those points and refining the points of local curvature maxima.13 , , and are essential elements in topological relations representation. The details of our algorithm are given as follows:
1. The object is decomposed into a few disjointed, meaningful sets based on the chosen feature points .
2. The geodesic distance between feature points and central point is calculated, and the points on the geodesic lines are labeled according to different ingredients and are called the initial surface skeleton IS. Moreover, the label variation of points on IS could be used to generate the junction points/boundary surface points .
3. IS is placed in the center of the object and is simplified to obtain the final surface skeleton .
4. The SSG is constructed based on , , , , and .
5. A similarity measurement is designed using the SSG of to recognize the object.
Decomposition and Topology Representation for the Object
Decomposition of the Object
Decomposition produces the object structure including the component information and can be used to guide the topological structure generation. A perception-based approach to decompose the object in a point cloud is presented by Ning et al.13 that follows a rule that segments an object along lines of negative minimum curvature. To determine the number of patches, we calculate and select the critical feature points based on perception information to represent each patch. Taking the critical marker sets as a guide, each marker is spread to a meaningful region by curvature-based decomposition and further constraints are provided by the variation of normals. A brief introduction to its strong connection to our work is as follows.
1. The method first automatically extracts those feature points with local curvature maxima that appear on a convex hull of contour points.
2. Second, the feature points and the variance in the normal direction over the variance in the neighborhood are determined to weigh whether the region is smooth.
3. Next, taking the feature points as the seed points, our method identifies connected point sets by growing the seed points on the basis of the constraint that the points belonging to one patch have little variation of normal vectors.
Afterward, object represented by point cloud is divided into parts ( is not less than the number of critical points), thus, , . Here, is called a patch of . As such, the object is comprised of . Moreover, if the object has too many points, we can simplify the data to save the computation time while keeping its characteristics using Morton ordering.35
Topological Relations of Components
We use the method in Ning et al.13 to handle the teapot data and the result is shown in Fig. 1. We found that the extracted surface skeleton misses the important loop information of the teapot handle that may have an impact on the complete shape understanding. Therefore, we propose an improved method based on the junction points of different components in object .
The initial surface skeleton consists of a set of geodesic lines from the feature points to the center of the object that can be denoted as , where is the number of feature points.
The decomposed skeleton is composed of points on , with labels corresponding to different regions of .
The junction points are those points whose neighbor vertices (based on -d tree) exist label variation of initial surface skeleton.
After decomposing and labeling each decomposed part of , the junction points are determined by several steps:
1. Detect the point whose label appears mutated in different regions such as and corresponding to label and . These points are called junction points.
2. Guided by the contour points, we judge the label variations among the neighboring points and record the frequency with which each label appears. Then we select those points whose neighbors are only labeled and as the junction points. We repeat this step until all the points on the boundary surface are checked.
Figure 2(b) shows the junction points between decomposed parts. For example, for two decomposed parts and , we detect one point as the junction point/boundary point first. Then we take as the seed point, find its neighboring points, and mark those who have the label of “1” and “2” as the junction points. The junction points are clustered according to the nearest neighbor points. For the object with loops, the junction points are clustered into two different parts leading to two different boundary surfaces (Fig. 3). Based on the final boundary surface of the teapot, the surface skeleton can be obtained by connecting the feature points Fp with the corresponding point on the boundary surface and with center of the teapot (Fig. 4).
Based on the surface skeleton extraction results, we should move it further into the interior of the object to centralize the skeleton.
Let be the final surface skeleton set. The arbitrary point on is moved into the interior of the object in the reverse surface normal direction, with contracting procedure36 that is calculated by , with the Newtonian potential function . The -nearest neighboring point set is . The iteration continues until and the final position of is recorded.
If the points on the final skeleton are dense, we can simplify and smooth the skeleton according to the label variation and the angle between two arbitrary conjoint segments (the segment is the line created by connecting two neighboring points). Finally, a smooth, simplified, and centralized skeleton is generated, which will be applied in the next section.
Shape Semantic Graph
As a shape representation, the SSG can describe the topology of objects efficiently and has wide applications in 3-D model retrieval. The SSG is unique and can capture the critical topology information for the object.
The SSG is defined by , where represents the decomposed subparts , , and , recording the topological relations between the two subparts when there is a joint that transitions from one of the labeled parts to the other connecting the skeleton points. After decomposition, we regard each part as a node [represented by the left one in Fig. 5(a)] and two nodes have an edge, if and only if they are adjacent. The adjacent relations can be evaluated by the skeletal structure [Fig. 5(b)].
For the node in the SSG , if there is only one adjacent node, especially if the node degree is 1, the note is called the endpoint of the SSG and can be treated as a feature point of the object (“loop” structure is excluded). The junction points between the two decomposed components construct a triangle node [Fig. 5(b)].
The node is corresponding to part in the object (the node may belong to the decomposition part and also to the point on the boundary surface of two parts). Each has a tag to record the type of the node . If , then the node in SSG belongs to the junction points. If , the node in SSG belongs to a decomposition part.
The topological relationship of object components can be represented by , where is composed of . records the neighborhood around nodes and has a length that is determined by the geodesic distances between two nodes, especially, . The length of can be calculated by , and denotes the geodesic distance between two points of the object.
Object Recognition Based On Similarity Measurement
The similarity measurement between two models is calculated by
The similarity measurement is used to select the most similar model in the database. Assuming that the input model is , after choosing a model from the database, the comparison steps for the SSG , of and are as follows:
1. Compare the number of nodes and . If they are consistent, then , and go to step (2). Else, and is not the model similar to . Choose another model and continue (1).
2. Compare the number of nodes that have the largest degree in and . If then and go to (3). Else, . Then choose another model and start from (1).
3. Compare the number of nodes whose degree is 1 in and . If they are consistent, let and go to (4). Else, . Then choose another model and start from (1).
4. Compare the number of nodes whose degree is 2 in and . If they are consistent, let and insert the model into a model set that is empty initially. It is a matched model. Else . Then choose another model and start from (1).
Based on (1) to (4), we can acquire a set of models that match the given model . The type of must be consistent with that of one model in . Thus, we analyze the types of the models in and choose one that appears frequently as the type of . After the initial choice, we can rule out those data that do not match with .
Figure 7 shows the initial choice from the database. Assume the given model is tippy [see Fig. 7(a)]. We can find a series of data from the database using the initial similarity matching, shown in Fig. 7(b). Since each object in the shape database has seven models under different poses, it could recognize the type of the object according to the frequency that the object appears in . Hence, the given model can be recognized as tippy in Fig. 7(b).
Progressive Similarity Measurement
Objects with different shapes can be distinguished by the initial similarity measurement, however, it is difficult to make a distinction between different poses of one object. For example, a four-leg table is consistent with four-leg tables in the database regardless of its model decomposition results, skeleton structure, or shape semantic maps. Then how do we acquire the consistent results corresponding to the initial input model? In addition, for a given model, one or more similar models could be detected (as shown in Fig. 7). However, how do we determine which one is the closest or most consistent with the given object? In order to solve these problems, we need an additional similarity detection called “progressive similarity comparison” after the initial choice.
The progressive measurement of two models is defined by the histogram of the geodesic distance between the point with a degree of 1 and the point with the largest degree. The geodesic distance is divided into equal intervals by using the maximum and minimum geodesic distances.
Experimental Results Analysis
All the tests in the paper are on a PC that has an Intel Core2 Duo 2.80 GHz CPU, together with an integrated graphics card, Intel G33/G31 express chipset, and 2G of RAM. All the data used in our experiments are from the Princeton segmentation benchmark.38 We chose seven objects under seven different poses as testing data.
Analysis on Decomposition Algorithm
Figure 9(a) shows the time in each stage: (-nearest neighbor), Seg (segmentation process), Bou (boundary detection), Clu (clustering for further critical points selection), and Cri (final critical points determination) for different point cloud data. The computational complexities of the different stages are, respectively: : , Bou: , Clu: , Cri: , and Seg: . , boundary detection, clustering, and the selection of critical points are very fast and finish in a matter of a few seconds as opposed to minutes. Our segmentation process is also fast when the data size is points. Figure 9(b) shows the relationship between the total running time and the data size. Compared with Figs. 9(a) and 9(b), the running time and the total time are improved after simplification shown in Fig. 9(c) and Fig. 9(d). We compared the time before and after data simplification for bunny. Detailed data are recorded and the execution times for both bunny and simplified bunny are compared in Table 1. Notice that the times in the table are used for preprocessing, which signifies that we only perform the computation once. However, the bigger data (such as bunny) can be simplified to improve the efficiency using Morton ordering. The results are shown in Fig. 10. Figures 10(b) and 10(c) show the data after using different sampling rates 50% and 33%, and the data after sampling contain 17,777 and 13,571 points, respectively. After simplification, the decomposition time could be improved effectively from 111.238 to 18.565 s. In addition, the simplification process is short and can process 2 million points of data in seconds. This could improve the execution time dramatically.
Datasets and time analysis on each period in decomposition.
|Dataset||Data size||Run time (s)||Total time|
Comparisons on Decomposition Algorithm
Figure 11 shows the structure of four different objects (palm, octopus, tippy, and cactus). We also compared our results with related works (Ma et al.,39 Richtsfeld and Vincze,40 and Yamazaki et al.41) in Fig. 12. The result of Yamazaki et al.41 is displayed in Fig. 12(a) [referred to as segmenting point sets (SPS)]. The results on the horse data using the Ma’s method39 [the segmentation with critical point and fuzzy clustering (SFC) method] are shown in Fig. 12(b). Richtsfeld and Vincze40 obtained good results with a core part extraction that is based on the radical reflection [see Fig. 12(c) for segmentation based on radial reflection (SRR)]. Our method decomposes the object into meaningful components [Fig. 12(d)]. In view of the fact that our method can decompose the object into more detailed information, e.g., legs, ears, body, and horse faces, ours is useful for additional semantic labeling and recognition, especially in 3-D retrieval.
To run our algorithm, we first transform the meshes into point clouds, and then map the segmentation results back to the meshes, as described before. Figure 13 shows a qualitative comparison on the tippy model and the cup model. We compared our segmentation method with the other six segmentation algorithms including deep learning,42 randomized cut,44 normalized cuts,45 random walks,46 -means,47 and approximate convexity analysis (WcSeg)43 on the PSB database.
Shape Semantic Graph Results
It is necessary to extract the boundary and skeleton for additional processing of the SSG. Figures 14(a)–14(e) show the skeleton extraction results of several data (e.g., octopus, cactus, hand, horse, and teapot). Figure 15 shows the boundary extraction results on selected data of the object database from the Princeton segmentation benchmark.38 It demonstrates that the boundary of each object can describe the shape contour of the object effectively from the optimal view.
The SSG of the shape database is also obtained [shown in Fig. 16(a)]. The similarity measurement and progressive similarity measurement are implemented on the SSG in Fig. 16(a). Afterward, the corresponding models are retrieved, as shown in Fig. 16(b). The first column in Fig. 16(b) indicates the query object. The objects with the dotted colored boxes belong to the same category as the query object by using the similarity measurement, and the final recognized result is represented by the dotted ellipses after the progressive similarity measurement.
The efficiency of the SSG when it is used to retrieve objects from the database is quite high. In our experiments, the database for objects in the point cloud is small, only containing 20 objects with 20 different postures (400 objects). In fact, the SSG, the degree of each vertex in the SSG, and the progressive measurement for 400 objects are obtained and recorded offline. For retrieval, we only need to compare the SSGs of two models, efficiently accomplished using the similarity measurement with the complexity of the algorithm. Table 2 shows the execution time when using the similarity measurement of the SSGs for seven input models. Here, the execution time refers to online time not including the offline time. This demonstrates that our method is a fast and efficient way to implement object understanding and recognition.
Execution time for similarity measurement. DS is the size of dataset, VN denotes the number of vertices in SSG, VN1 is the number of vertices whose degree is 1, VN2 is the number of vertices whose degree is 2, VDmax is the maximum degree in SSG, time refers to the consuming time for the online operation.
We demonstrate the whole process of SSG generation for the object with loops. Figure 17 shows the whole process of shape decomposition [see in Figs. 17(a)–17(e)], skeleton extraction [from Figs. 17(f)–17(r)], and finally, SSG generation [Fig. 17(s)].
Figures 18 and 19 compare the work of Ning et al.13 with our algorithm on the handling of objects with loops. It is evident that our approach produces skeletons that capture the more general geometry better, especially for the vase data in Fig. 19. The vase has more complex loops; however, the detailed shape is not presented in the corresponding SSG in Fig. 19(b). Compared with the skeletons shown in Ning et al.,13 the optimized skeletons in our paper contain more geometrical and topological information.
If we take the teapot as query input, the SSG of the teapot in Ning et al.13 is displayed in the first column in Fig. 20(a) and the retrieval results would be those with the dotted colored boxes belonging to the same category as the query object (teapot) after using the similarity measurement. Moreover, it is difficult to distinguish the data in the third and the fourth columns. Compared to our results in Fig. 20(b), after the similarity measurement it retrieves two similar results, respectively, in the fourth and last columns.
Our method can decompose objects depending on the number of patches that is determined by the feature points. When dealing with incomplete point cloud data, our method occasionally selects incorrect feature points, thus leading to incorrect topology.
Another limitation of our method is that the boundary surface between two neighboring regions could be smoothed and improved, providing a means of enhancing the shape decomposition results conversely.
In this paper, we propose a method to describe the shape semantic of different objects in point clouds. Our method is based on shape decomposition that produces components and structures of the object. The skeleton provides the topology of the object. Feature points, junction points, and the center point of the object are used to obtain a centralized skeleton to represent the topology. An SSG is generated to describe the shape of object and the similarity measurement provides an effective way to recognize different objects. Experimental results demonstrate the effectiveness and advantages of the SSG for understanding and recognition of objects with or without loops. Future research will concentrate on the improvement of the boundary surface between two neighboring regions to acquire a smooth and accurate boundary that can also be used to improve the shape decomposition results. In addition, more considerations should be given to the high-level semantics of an object when there are deformations (such as the tail of an animal, which may be straight, bent, or even curly).
The work was supported in part by the National Natural Science Foundation of China under Nos. 61272284, 61302135, 61472319 and 61571439; in part by China Postdoctoral Science Foundation under No. 2014M552469; in part by the Doctoral Fund of Ministry of Education of China under No. 20126118120022; in part by Shaanxi Postdoctoral Science Foundation under No. 434015014; in part by the Science and Technology Project of Xi’an under No. CXY1440(5).
Xiaojuan Ning received her PhD in 2011. She is an associate professor at Xi’an University of Technology. Meanwhile, she has cooperated with the Lab of LIAMA and National Laboratory of Pattern Recognition at Institute of Automation, CAS. Her current research interests include scene modeling and shape analysis.
Yinghui Wang received his PhD degree from Northwest University, Xi’an, China, in 2002. From 2003 to 2005, he was a postdoctoral fellow at Peking University, Beijing, China. Now he is a professor at the Institute of Computer Science and Engineering, Xi’an University of Technology, China. His research interests include image analysis and pattern recognition.
Weiliang Meng is an assistant professor in National Laboratory of Pattern Recognition (NLPR) at Institute of Automation, Chinese Academy of Sciences. He received the BE degree in computer science from Civil Aviation University of China in 2003, an MSc degree in computer application from Tianjin University in 2006 and a PhD degree in computer application from State Key Laboratory of Computer Science at Institute of Software, Chinese Academy of Sciences, in 2010. His research interests include geometry modeling, image-based modeling and rendering, and augmented reality.
Xiaopeng Zhang is a professor in the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences. He received his PhD degree in computer application from the Institute of Software, Chinese Academy of Sciences. His research interests include 3D-shape analysis, complex modeling and rendering, and augmented reality.