Operationally optimal vertex-based shape coding with arbitrary direction edge encoding structures

Abstract. The intention of shape coding in the MPEG-4 is to improve the coding efficiency as well as to facilitate the object-oriented applications, such as shape-based object recognition and retrieval. These require both efficient shape compression and effective shape description. Although these two issues have been intensively investigated in data compression and pattern recognition fields separately, it remains an open problem when both objectives need to be considered together. To achieve high coding gain, the operational rate-distortion optimal framework can be applied, but the direction restriction of the traditional eight-direction edge encoding structure reduces its compression efficiency and description effectiveness. We present two arbitrary direction edge encoding structures to relax this direction restriction. They consist of a sector number, a short component, and a long component, which represent both the direction and the magnitude information of an encoding edge. Experiments on both shape coding and hand gesture recognition validate that our structures can reduce a large number of encoding vertices and save up to 48.9% bits. Besides, the object contours are effectively described and suitable for the object-oriented applications.


Introduction
To facilitate the applications of object-oriented storage, retrieval, editing, and interaction, modern multimedia communications require that video content has to be easily accessible on an object basis.The MPEG-4 video compression standard provides such functionalities by describing video objects not only by texture but also by shape.Because of severe mobile environments and massive image and video retrieval demands, a good shape coding scheme has to provide an efficient compression as well as effective description. 1These two requirements have been extensively investigated in data compression [2][3][4] and pattern recognition fields [5][6][7] separately in two recent decades.However, jointly considering both objectives in one framework remains an open problem.
It is well known that the vertex-based shape representation can handle both shape coding and shape description in a natural way.As a result, this representation can be directly applied to the contour-based object-oriented applications.In general, there are two main vertex-based shape coding frameworks.][4] It treats the vertex selection and encoding separately; therefore, it loses the rate-distortion (RD) optimality.The other is the operational rate-distortion (ORD) optimal framework. 1,8,9t jointly considers the vertex selection and encoding as a shortest path problem in a directed acyclic graph (DAG); therefore, it can guarantee the optimality in the RD sense.
The main limitation of the ORD optimal framework, however, is that the optimality is contingent on the chosen parameters.Thus, the performance enhancements to this framework are to relax the constraints of these parameters.Many relaxations used in the ORD optimal framework have been focused on the admissible vertex set, 1,9,10 the sliding window strategy, 1,10,11 the edge distortion measurement, 1,[12][13][14][15][16] and the code table, [17][18][19] but few have been concentrated on the edge encoding structure.Their problem to be addressed is to find an optimal polygon that can be encoded with the lowest bit rate for a given admissible distortion, where the optimality is contingent on the given eightdirection edge encoding structure. 20Since the location of the current vertex is strongly correlated with the location of the previous one, it is often assumed that the vertices are encoded differentially.Thus, the bit rate for the entire polygon is the sum of all the edge rates that are determined not only by the code table but also by the edge encoding structure.Besides, the approximation quality depends largely on the ability of edge encoding structure to represent the contour characteristics.Therefore, the edge encoding structure plays a vital role in both compression and description performances.
There is an inherent limitation in the eight-direction structure.The edge available for this structure should be restricted to intersect the horizontal axis in an angle which is an integer multiple of π∕4 (named as restricted edge).The edge that is not in one of these eight restricted directions (named as unrestricted edge) cannot be selected, despite that the edge distortion is no larger than the given admissible distortion.Consequently, it results in the following two problems, as shown in Figs.1(a) and 1(b): 1. From a data compression perspective, it is a waste of bits, as a large number of short restricted edges are needed when the contour segments are not exactly in eight restricted directions.
2. From a pattern recognition perspective, it cannot well describe the object contour, as a large number of selected vertices may not be at or near the corners of the object contour when the contour segments are not exactly in the eight restricted directions.
The above phenomena motivate us to relax the edge direction restriction.To achieve this, we propose two arbitrary direction edge encoding structures, called the 8-and the 16-sector structures, which can encode both the restricted and unrestricted edges.First, we partition the digital coordinates into 8 or 16 sectors, and then decompose the encoding edge into a short component and a long component.After that, we encode the sector information with a fixed length code (FLC), the short and the long components with either the run length codes (RLC) or the variable length codes (VLC) in a different way, providing a further encoding bit reduction.We seamlessly embed these structures into the ORD optimal vertex-based shape coding algorithms with various parameter setups to improve their RD performance as well as to achieve better description results.Some representative results produced by our proposals are shown in Figs.1(c) and 1(d).
The rest of this paper is organized as follows.Section 2 introduces the ORD optimal shape coding framework, the related enhancements, and the traditional eight-direction edge encoding structure.Section 3 presents the 8-and the 16-sector edge encoding structures, respectively.Full analyses of the experimental results on both shape coding and hand gesture recognition are provided in Sec. 4. Finally, conclusions and future work are given in Sec. 5.
where RðPÞ is the sum of all the edge rates and DðPÞ is the maximum edge distortion.Formulation ( 1) is virtually a shortest path problem in a weighted DAG G ¼ ðA; E A ; wÞ, as shown in Fig. 2. 9 A valid path of order K from vertex v 0 to a vertex v K is an ordered set is an ordered set fv whose length is defined as follows: where wðu; vÞ is a weight function of the edge ðu; vÞ defined as follows: wðu; vÞ ¼ ∞: dðu; vÞ > D max rðu; vÞ: dðu; vÞ ≤ D max : The above definition of the weight function leads to the length of infinity for every path, which includes an edge with  16sector structure with VN ¼ 13, R ¼ 141 bits.As we can see, eightdirection structure produces a host of redundant short edges, so it is a waste of bits; our two arbitrary direction structures can avoid these redundant short edges, so they save a large number of bits as well as produce more compact results (Notations-D max : admissible distortion; VN: vertex number; R: encoding bits; Legendsdashed line: original contour; solid line: approximated polygon; asterisk: vertex).a 0,0 a 1,0 a 2,0 a 3,0 a 4,0 w(a 0,0 , a 1,0 ) w(a 1,0 , a 2,0 ) w(a 2,0 , a 3,0 ) w(a 0,0 , a 4,0 ) w(a 3,0 , a 4,0 ) w(a 1,0 , a 4,0 ) w(a 2,0 , a 4,0 ) w(a 0,0 , a 3,0 ) w(a 1,0 , a 3,0 ) w(a 0,0 , a 2,0 ) the distortion above D max .Therefore, the shortest path will not include this edge.Every finite-length path that starts at vertex a 0;0 and ends at vertex a N C −1;0 results in a path length equal to the rate of the polygon it represents.Therefore, the shortest of all these paths corresponds to the polygon with the smallest bit rate, which is the solution to the problem in (1).Let R Ã ða i;m Þ represents the minimum rate from a 0;0 to a i;m via a polygon approximation.Let qða j;n Þ be the back pointer of a j;n , which is used to remember the optimal path.Then, the solution of the shortest path can be found efficiently by the DAG shortest path algorithm 21 formalized in Algorithm 1.
We will explain how Algorithm 1 works.In lines 1 and 2, the admissible vertex set and the length of the sliding window length for each contour point are calculated.In line 3, the rate for encoding the starting point of the contour is assigned to the rate of the first polygon vertex, and the rate for reaching any of the admissible vertex is set to infinity.The "for loops" in lines 4 and 5 select the start vertex of a polygon edge and the "for loops" in lines 6 and 7 select its end vertex within the sliding window of the start vertex.The lines 8-10 are used to calculate the edge distortion and the edge rate, by which the edge weight can be determined.The most important part of this algorithm is the comparison in line 11.Here, it tests whether the new minimum rate, R Ã ða i;m Þ þ wða i;m ; a j;n Þ, to reach admissible vertex a j;n , given that the last vertex was a i;m , is smaller than the smallest minimum rate used so far to reach a j;n , R Ã ða j;n Þ.If this minimum rate is indeed smaller, then it is assigned as the new smallest minimum rate to reach admissible vertex a j;n , R Ã ða j;n Þ ¼ R Ã ða i;m Þ þ wða i;m ; a j;n Þ, and the back pointer of a j;n , qða j;n Þ is assigned to point to a i;m since this is the previous vertex used to achieve R Ã ða j;n Þ.This algorithm leads to the optimal solution because when the rate R Ã ða i;m Þ of a vertex a i;m is given, then the selection of the future vertices a j;n , i < j ≤ LðiÞ is independent of the selection of the past vertices a k;l , 0 ≤ k < m.

Enhancements to the Operational Rate-Distortion
Optimal Shape Coding As we can see, the major problem in the ORD optimal shape coding algorithm is that the optimality is contingent on the chosen admissible vertex set, 1,9,10 sliding window strategy, 1,10,11 distortion measurement, [12][13][14][15][16] code table, [17][18][19] and the edge encoding structure.The former four parameters have been intensively investigated to relax their constraints and improve the shape coding performance.Here, we provide a brief summary.The ORD optimal shape coding algorithm.
1: Calculate the ordered admissible vertex set Calculate the length of the sliding window LðiÞ, i ¼ 0;1; Calculate the edge distortion d ða i;m ; a j;n Þ;

9:
Calculate the edge rate r ða i;m ; a j;n Þ; 10: Assign the edge weight wða i;m ; a j;n Þ based on definition (3); measurement for shape coding (ADMSC) was presented to improve the accuracy of SAD and DB, and in Ref. 14, its fast algorithm was presented using the chord-length parameterization.Recently, the perceptual distortion measure has been proposed to improve the visual consistency of SAD and DB from the perspective of vision psychology. 15See Ref. 16, for a contemporary review.

VLC Table Optimization:
The goal of this research is to remove the conditioning of the ORD optimal solution on an ad hoc VLC table.In Ref. 17, the VLC optimization for the intra shape coding was proposed using the unconditional symbol probabilities of encoding edges.Then, it was extended to the inter shape coding 18 and the scalable shape coding 19 using the conditional symbol probabilities given the prior knowledge from the previous encoded frames and layers, respectively.

Traditional Eight-Direction Edge Encoding
Structure While intensive investigations have been made on the above four parameters, little attention has been paid to the edge encoding structure.Almost all the vertex-based ORD optimal shape coding algorithms employ the eight-direction edge encoding structure. 20In this structure, the eight-connected chain code and the run-length encoding are combined by representing the edge between two admissible vertices by an angle α 8-dir and a run β 8-dir , which form the symbol ðα 8-dir ; β 8-dir Þ, as illustrated in Fig. 3(a).Each of the possible symbols ðα 8-dir ; β 8-dir Þ gets a probability assigned and the resulting stream of the polygon P's edges can be optimally encoded using these probabilities.
In practical implementations, the design of code is under the hypothesis that the probability mass function of ðα 8-dir ; β 8-dir Þ is separable 20 and α 8-dir is uniformly distributed over all the eight restricted directions.Thus, α 8-dir can be encoded separately and the optimal code for α 8-dir is a 3-bit FLC.There are two main codes for β 8-dir .One is based on the hypothesis that β 8-dir is geometrically distributed ½Pðβ 8-dir ¼ jÞ ¼ ð1 − pÞ∕p j−1 ; j ≥ 1 with a parameter p ¼ 0.5.In this case, the optimal code for β 8-dir is a RLC with β 8-dir − 1 zeros and a final "1."Note that any positive integer value of β 8-dir is codable; therefore, the code has no edge length restriction.The other is based on the hypothesis that β 8-dir is piecewise uniformly distributed.In this case, the optimal code for β 8-dir is a VLC, which has the general form ½YX, where Y is the length of X.Note that the largest possible value of β 8-dir has to be a finite positive integer ε.A smaller ε often results in fewer bits for each edge, but requires more edges for the entire polygon.Conversely, a larger ε often results in fewer edges, but requires more bits for each edge.Usually, ε ¼ 15 makes a good balance between the edge rate and the edge number over a wide range of contour characteristics and admissible distortions. 11hus, 15 codewords are needed.Reference 20 gives a typical example of such code table that Y and X are a 2-bit FLC and a blog 2 β 8-dir c-bit FLC, respectively, where b c is the floor operator.
To illustrate the eight-direction structure-based edge encoding scheme for encoding an edge, consider a restricted edge ð−7;7Þ (for simplicity, we use the coordinates of the ending point as the coordinates of the edge when this edge starts at the origin, similarly hereinafter), as shown in Fig. 3(a).Its angle and run take the values of 3 • π∕4 and 7, which are encoded with a 3-bit FLC and a 7-bit RLC (a 4-bit VLC), resulting in 10 bits (7 bits) in total.
The above eight-direction structure-based edge encoding scheme is quite simple and easy to implement.However, the inherent limitation can be clarified as follows.
From a structural perspective, the uniquely decodable code for each edge consists of two parts representing α 8-dir and β 8-dir , and the total length of this code is 3 þ β 8-dir when RLC is used or 5 þ blog 2 β 8-dir c when VLC is used.For simplicity, assume that an edge of run β 8-dir approximates a contour segment of β 8-dir pixels.Then, the efficiency of this code, usually measured by the bits per contour pixel, is . Note that both sequences are increasing as β 8-dir decreases; therefore, the shorter the run, the less efficient these codes are.Figures 1(a) and 1(b) show that many short restricted edges are needed for the contour segments which are not exactly in eight restricted directions.Therefore, the eight-direction edge encoding structure is inefficient.
From a description perspective, a good shape description should make the proportion of selected vertices at or near the corners of the original shape contour as high as possible so as to benefit the object-oriented applications. 22However, when the contour segments are not exactly in eight restricted directions, the eight-direction structure not only consumes a large number of closely spaced vertices, but also strengthens the contour quantization error and noise disturbances.This affects the subjective experiences of reconstruction contours as well as degrades the performance of the object-oriented applications.
Further, to reveal the problem of the existing eight-direction structure quantitatively from a systematic perspective, we map the contour segment in Fig. 1(b) into a weighted DAG, as shown in Fig. 2. It is observed that all 10 edge distortions uphold the admissible distortion, but 5 out of 10 edge weights are assigned infinity due to the direction restriction, making the shortest path in DAG quite long and thus the bit rate for the entire polygon much larger.Therefore, we should relax the direction restriction to increase the number of available edges.The following section follows this idea.
3 Arbitrary Direction Edge Encoding Structures 3.1 Eight-Sector Edge Encoding Structure A simple and intuitive structure for an edge of arbitrary direction consists of a quadrant number α quad , an x-component magnitude β quad , and an y-component magnitude γ quad , which form the symbol ðα quad ; β quad ; γ quad Þ.The quadrant number α quad is defined as: quadrant 0 is the set of ðx; yÞ such that x ≥ 1, y ≥ 0, quadrant 1: x ≤ 0; y ≥ 1; : : : , and quadrant 3: x ≥ 0, y ≤ −1, as illustrated in Fig. 3(b).Assume that α quad , β quad , and γ quad are independently distributed; α quad is uniformly distributed over all the four quadrants; β quad and γ quad are geometrically or piecewise uniformly distributed.Then, α quad can be separately and optimally encoded with a 2-bit FLC and β quad and γ quad can be optimally encoded with RLCs or VLCs.The procedures to implement the quadrant structure-based edge encoding scheme are as follows: 1. determine the quadrant number α quad and encode it with a 2-bit FLC; 2. if α quad is even, increase β quad by 1 to make the value of β quad range from 1; If α quad is odd, increase γ quad by 1 to make the value of γ quad range from 1; 3. if RLC is selected, encode β quad with β quad − 1 zeros and a final "1" and encode γ quad with γ quad − 1 zeros and a final "1," respectively.If VLC is selected, similarly with the run encoding in the eight-direction structure-based scheme, encode β quad with a ð2 þ blog 2 β quad cÞ-bit VLC and encode γ quad with a ð2 þ blog 2 γ quad cÞ-bit VFLC.
Now we compare the performance of this quadrant structure with the traditional eight-direction one.If RLC is selected, for the quadrant structure, the bit rate for the symbol ðα quad ; β quad ; γ quad Þ, denoted by rðα quad ; β quad ; γ quad Þ, is while for the eight-direction structure, we have β 8-dir ¼ maxfβ quad ; γ quad g for the restricted edge, thus the bit rate for the symbol ðα 8-dir ; Compared Eq. ( 4) with Eq. ( 5), we have Thus, although the quadrant structure can change the unrestricted edge rate from infinity to a finite value, it increases the diagonal direction edge rate from 3 þ β 8-dir to 3 þ 2β 8-dir , resulting in β 8-dir bits increment.Assume that an edge of run β 8-dir approximates a contour segment of β 8-dir pixels.Then, for the the diagonal direction edge, the quadrant structure changes the code efficiency from 3∕β 8-dir þ 1 to 3∕β 8-dir þ 2, resulting in 1 bit increment per contour pixel.Such a large cost makes the quadrant structure unfeasible in practice.With a similar comparison method, we can draw the same conclusion for the VLC case.
To illustrate this deficiency in the quadrant structurebased edge encoding scheme, reconsider the restricted edge ð−7;7Þ, as shown in Fig. 3(b).Its quadrant numbers, x-component magnitude and y-component magnitude, take the values of 2, 7, and 7, which are encoded with a 2-bit FLC, an 8-bit RLC (a 5-bit VLC), and a 7-bit RLC (a 4-bit VLC), resulting in 17 bits (11 bits) in total.This is 7 bits (4 bits) more than those used by eight-direction structure.
The reason that the quadrant structure does not work well is that there exists dependency between the xand the y-component magnitudes and the independency assumption is unreasonable.It motivates us to make a good use of this dependency.To achieve this aim, we design an edge encoding structure consisting of a sector number α 8-sec , a short component magnitude β 8-sec , and a long component magnitude γ 8-sec , which form the symbol ðα 8-sec ; β 8-sec ; γ 8-sec Þ.From Ref. 2, we define the sector number α 8-sec as: sector 0 is the set of ðx; yÞ such that 0 ≤ y < x, sector 1: 1 ≤ x ≤ y, and sector 7: 1 ≤ −y ≤ x, as illustrated in Fig. 3(c).The short component is the smaller component of the edge's xand y-components, while the long component is the larger component of them.Because the long component magnitude γ 8-sec is not smaller than the short component magnitude β 8-sec , we can encode γ 8-sec with differential encoding method.In this way, we can make a good use of the dependency between the xand the y-component magnitudes, and allow a further reduction in the number of bits used.
Let δ 8-sec denote the difference between β 8-sec and γ 8-sec , i.e., δ 8-sec ¼ γ 8-sec − β 8-sec .In our practical implementation, we design the code under the the hypothesis that the probability mass function of ðα 8-sec ; β 8-sec ; γ 8-sec Þ is separable and α 8-sec is uniformly distributed over all the eight sectors.Thus, α 8-sec can be separately and optimally encoded with a 3-bit FLC.Similarly with the assumption about the run in the eight-direction structure-based scheme, 20 we simply assume that both β 8-sec and δ 8-sec are geometrically or piecewise uniformly distributed.Therefore, the optimal codes for both β 8-sec and δ 8-sec are RLCs or VLCs.The procedures to implement the eight-sector structure-based edge encoding scheme are as follows: 1. determine the sector number α 8-sec and encode it with a 3-bit FLC; 2. determine the short component magnitude β 8-sec and the long component magnitude γ 8-sec according to α 8-sec , and calculate the difference δ 8-sec ; 3. if α 8-sec is even, increase β 8-sec by 1 to make the value of β 8-sec range from 1; if α 8-sec is odd, increase δ 8-sec by 1 to make the value of δ 8-sec range from 1; 4. if RLC is selected, encode α 8-sec with β 8-sec − 1 zeros and a final "1" and encode δ 8-sec with δ 8-sec − 1 zeros and a final "1," respectively.
If VLC is selected, similarly with the run encoding in the eight-direction structure-based scheme, the magnitudes of both xand ycomponents are restricted to a predefined number ε.Thus, we should design a series of code tables of the ranges from 1 to 1, 1 to 2, 1 to ε for both β 8-sec and δ 8-sec .A series of reference code tables with ε ¼ 15 are shown in Table 1.Then, encode β 8-sec according to the table of the range from 1 to ε and encode δ 8-sec according to the table of the range from 1 to ε − β 8-sec þ 1.
Here, we reason why this structure works better than the traditional eight-direction one.If RLC is selected, for the eight-sector structure, the bit rate for the symbol ðα 8-sec ; β 8-sec ; γ 8-sec Þ, denoted by rðα 8-sec ; β 8-sec ; γ 8-sec Þ, is where γ 8-sec ¼ β 8-dir when the encoding edge is in the restricted direction.Comparing Eq. ( 6) with Eq. ( 5), we can see that although the eight-sector structure increases the restricted edge rate by 1 bit, it can change the unrestricted edge rate from infinity to an acceptable finite value.Therefore, a large number of bits can be saved in average.With similar comparison method, we can draw the same conclusion for the VLC case.

Sixteen-Sector Edge Encoding Structure
The above eight-sector structure defines the short component magnitude β 8-sec and the long component magnitude γ 8-sec as the smaller and the larger magnitudes of the edge's xand ycomponents, and encode β 8-sec and the difference δ 8-sec in a separate way.If RLC is selected, this structure requires 4 þ β 8-sec þ δ 8-sec bits.Although it can basically keep the bit rate for restricted edges, it will lead to a relatively larger number of bits for unrestricted edges, especially when both β 8-sec and δ 8-sec take large values.The reason for this inefficiency is that there exists dependency between β 8-sec and δ 8-sec and the independency assumption is unreasonable.It motivates us to make a good use of this dependency.
To achieve this aim, we design an edge encoding structure consisting of a sector number α 16-sec , a short component magnitude β 16-sec , and a long component magnitude γ 16-sec , which form a symbol ðα 16-sec ; β 16-sec ; γ 16-sec Þ.The sector number α 16-sec is defined as: sector 0 is the set of ðx; yÞ such that 0 ≤ 2y < x, sector 1: 1 ≤ y < x ≤ 2y; : : : , and sector 15: 1 ≤ −2y ≤ x, as illustrated in Fig. 3(d).The short component is the smaller component of the edge's two octant direction components with π∕4 difference in angle, while the long component is the larger component of them.Because the long component magnitude γ 16-sec is not smaller than the short component magnitude β 16-sec , we can encode γ 16-sec with differential encoding method.Compared Fig. 3(d) with Fig. 3(c), we have β 16-sec ¼ minfβ 8-sec ; δ 8-sec g and γ 16-sec ¼ maxfβ 8-sec ;δ 8-sec g.Therefore, in this way, we can make a good use of the dependency between the short and the long component magnitudes of the eight-sector structure, and allow a further reduction in the number of bits used.
In our practical implementation, we use the same hypotheses and the corresponding optimal codes as we used in the eight-sector structure.The procedures to implement the 16-sector structure-based edge encoding scheme are as follows: 1. determine the sector number α 16-sec and encode it with a 4-bit FLC; 2. determine the short component magnitude α 16-sec and the long component magnitude γ 16-sec according to α 16-sec , and calculate the difference δ 16-sec ; 3. if α 16-sec is even, increase β 16-sec by 1 to make the value of β 16-sec range from 1; if α 16-sec is odd, increase δ 16-sec by 1 to make the value of δ 16-sec range from 1; 4. if RLC is selected, encode β 16-sec with β 16-sec − 1 zeros and a final "1" and encode δ 16-sec with δ 16-sec − 1 zeros and a final "1." If VLC is selected, the magnitudes of both xand y-components are restricted to ε.Thus, we should design a series of code tables of the ranges from 1 to 1, 1 to 2, . . ., 1 to ε for both β 16-sec and δ 16-sec .Table 1 can also be used here: Here, we reason why this structure with RLC can outperform the eight-sector one from a data compression perspective.For the 16-sector structure, the bit rate for the symbol ðα 16-sec ; β 16-sec ; γ 16-sec Þ, denoted by rðα 16-sec ; β 16-sec ; γ 16-sec Þ, is where β 16-sec ¼ minfβ 8-sec ; δ 8-sec g and γ 16-sec ¼ maxfβ 8-sec ; δ 8-sec g.Compared Eq. ( 7) with Eq. ( 6), we can see that although the 16-sector structure increases the restricted edge rate by 1 bit, it can change the unrestricted edge rate from 4 þ β 16-sec þ γ 16-sec to 5 þ γ 16-sec , resulting in the β 16-sec − 1 bits decrement.Assume that the edge direction is uniformly distributed, then from Fig. 3(d), we can see that the probability that β 16-sec ≥ 2 will occur is far more than the probability that β 16-sec ¼ 0 will occur.Therefore, a large number of bits can be saved in average.
To illustrate this advantage, consider an unrestricted edge ð−3;7Þ, as shown in Figs.3(c) and 3(d).For the eight-sector structure, its sector number, short component magnitude, and long component magnitude take the values of 2, 3, and 7, which are encoded with a 3-bit FLC, a 4-bit RLC (4-bit VLC), and a 4-bit RLC (4-bit VLC), resulting in 11 bits in total.For the 16-sector structure, they take the values of 4, 3, and 4, which are encoded with a 4-bit FLC, a 4-bit RLC (a 3-bit VLC), and a 1-bit RLC (a 2-bit VLC), resulting in 9 bits in total.Thus, two bits are saved.
However, this bit reduction will degrade the ability in contour description.To look deep into this phenomenon, we plot the isorate contours for both the eight-and the 16-sector structures on Fig. 4, i.e., the traces of the ending points of the encoding edges having the same starting point and rate.For simplicity, assume that an edge of long component magnitude γ 8-sec approximates a contour segment of γ 8-sec pixels.We can see that for the eight-sector structure, the edges of the same rate have the same γ 8-sec , thus can approximate contour segments of the same length in pixel.Therefore, for the eight-sector structure, the codes for the edges of the same rate have the same efficiency in terms of the bits per contour pixel.Instead, for the 16-sector structure, the edges of the same rate have different γ 8-sec .To be more specific, the edges of directions k • π∕2 AE arctanð1∕2Þ (k ¼ 0;1; 2;3) have the maximum length over all the directions (named as preferential direction), while the edges of the restricted directions have the minimum length (named as nonpreferential direction), as shown in Fig. 4(b).Thus, the edges of the same rate but of different directions will approximate contour segments of the different lengths in pixel.Therefore, for the 16-sector structure, the codes for the edges of the same rate have different efficiencies, i.e., the codes for edges in or near the preferential directions have relatively higher efficiencies, whereas the codes for the edges in or near the nonpreferential directions have relatively lower efficiencies.This may bias the edge selection toward preferential directions via the ORD optimal framework, thereby giving false corners when contour segments are in or near the nonpreferential directions.We call it direction bias effect.
To illustrate this effect, we take the approximation of the horizontal contour segment from point a i;m ð0;0Þ to point a j;n ð0;12Þ shown in Fig. 4(b) as an example.For the eight-sector structure, among all approximations, the one directly using edge ða i;m ; a j;n Þ is of the most efficiency.However, for the 16-sector structure, according to Eq. ( 7), nonpreferential direction edge ða i;m ; a j;n Þ will consume 17 bits, whereas two preferential direction edges ða i;m ; a k;l Þ and ða k;l ; a j;n Þ consume 8 bits and 8 bits, respectively, resulting in 16 bits in total.Thus, the ORD optimal framework may select these two edges rather than edge ða i;m ; a j;n Þ to approximate the horizontal contour segment, which gives a false corner a k;l ð3;6Þ.This effect may significantly affect the performance of related object-oriented applications, which will be further displayed in the next section.

Experimental Results
To analyze the performance of the ORD optimal framework with our two arbitrary direction edge encoding structures in both data compression and pattern recognition fields, various ORD optimal shape coding algorithms with different parameter configurations are applied to both shape coding and hand gesture recognition.The ADMSC is chosen for contour distortion measurement. 16To clarify the nomenclature adopted, the following three-parameter notation is used: Admissible vertex band type, Edge encoding structure type, and Code table type.
Admissible vertex band (AVB) type refers to whether the AVB of width 1 pel is used; Edge encoding structure type refers to the choice of 8-direction, 8-sector, and 16-sector structure; and Code table type refers to the choice of RLC and VLC (the default type is RLC).Therefore, for instance, Basic-8-Direction-RLC means that the algorithm is based on the eight-direction structure with RLC where the AVB technique is not used.

Arbitrary Direction Edge Encoding Structures for
Shape Coding For the RD performance assessments, five MPEG-4 binary shape sequences, namely Weather.qcif,News.qcif,Stefan.sif,Children.sif,and Forman.cif,are used.These sequences have various spatial and temporal resolutions.
Figure 5(a) shows the cumulative RD curves generated with different ORD optimal algorithms.As expected, our two structures always outperform the traditional eight-direction structure, with up to 48.9% bit savings.the eight-direction structure.These additional available edges provide the ORD optimal framework more opportunities to find a shorter shortest paths in a weighted DAG with much less infinity weights.That is why our structures can result in much fewer encoding bits than the eight-direction one, which is consistent with the analysis given in Sec.2.3.It also indicates how important the direction relaxation is from a compression perspective.
Figure 5(a) also reveals that the coding gain varies with different parameter configurations.Here, we only point out the variations with two different configurations and give explanations as follows.
1.The coding gains of our two structures are almost the same when VLC is applied.This is because the bits saved by the 16-sector structure compared with the 8-sector structure come from the range and the value reductions of the short and the difference component.According to Table 1, these reductions may only lead to a few bit reduction in average to balance the additional one bit increment from the sector number, thus no further bit will be saved in average.2. The coding gains of our two structures are much better without AVB when D max ≥ 1 pel.This is because for the AVB configuration, vertices can be selected outside the original contour, which makes a large number of contour segments the direction approximately along the restricted directions available to be encoded by one restricted edges.Thus, the eightdirection structure becomes much more efficient.As a result, coding gains of our two structures decreases.
(a) (b)  For the description assessments, the Stefan object of the 100th frame of the Stefan sequence has been used.
Figure 6 shows the polygons generated with various ORD optimal algorithms.Considering the first column as an example, we can see that the traditional eight-direction structure needs 68 vertices for contour approximation, but both of our structures only need 28 vertices.As a result, the traditional one needs 453 bits, whereas our two proposals need 359 and 325 bits for contour encoding, resulting in 94 and 128 bit savings, respectively.
Moreover, we see that the polygons approximated by our proposals are quite compact and have strong ability to reflect the characteristics of the original object contour, since a relatively straight contour segment is easier to be approximated by an edge, so that the turns of the polygons are more likely to be the corners of the object contour.This will benefit the successive object-oriented applications.In the next section, we will apply these polygons to hand gesture recognition.
However, even with the help of Kinect camera, due to the limited camera resolution and the complex monitoring environment, quantization error and contour noise are almost inevitable in shape acquisition.Since the ORD optimal algorithms with our two arbitrary edge encoding structures can approximate the contour segments with arbitrary directions, quantization error, and contour noise, it is suitable to handle this recognition task.
Here, we only choose RLC to guarantee that there is no limitation on the encoding edge length.We approximate the hand shape contour by the polygons using our proposals.It is important to determine a proper D max .We let D max be proportional to the radius of the maximal inscribed disk of the hand shape, so that D max is adaptive to the size of palm.To be specific, we set D max to 30% of the radius, as this can well discriminate nonsalient fingers from large shape deformations.Only the negative turns of the polygons are considered, since it was reported that the negative corners are much more informative than the positive ones. 26Suppose that there are k negative turns, we classify a hand gesture to Rock if k ¼ 0, Paper if k ≥ 3, and Scissors otherwise.
Table 2 provides the confusion matrices and the mean recognition accuracies for various ORD optimal algorithms with our two arbitrary direction structures.As expected, the mean accuracies of the eight-sector structure are much higher than those of the 16-sector structure due to the direction bias effect as mentioned in Sec.3.2.
To further reveal this direction bias effect, Fig. 8 shows some results selected from our collected dataset.The odd columns show the hand gestures correctly recognized by both of our structures, while the even columns show those correctly recognized by the eight-sector structure but wrongly recognized by the 16-sector structure, except the Paper results which are wrongly recognized by the Basic-16-Sector algorithms.We take the second Rock gesture recognized by the Basic algorithms as an example.Although the 16-sector structure can reduce the total rates from 259 to 202 bits compared with the eight-sector Table 2 Confusion matrices for Basic-8-Sector/Basic-16-Sector and AVB-8-Sector/AVB-16-Sector algorithms.The mean recognition accuracies are 93.33%∕86% and 94%∕76%, respectively.For each gesture category, the higher recognition accuracy between the 8-sector structure and the 16-sector structure is marked in bold.We can see that 8-sector structure always results in higher recognition accuracy, endorsing the earlier comment about 8-sector structure having stronger description ability than 16-sector one.structure, the bottom right contour segment of the direction near the restricted direction of zero is approximated by two edges of the directions near the preferential directions of ∓ arctanð1∕2Þ due to the direction bias effect, and therefore, a false-negative turn is generated.It demonstrates that the 16-sector structure does well in compression but is not good at description for object-oriented applications, which is the opposite of the eight-sector structure.

Conclusions and Future Work
In this paper, we have presented two edge encoding structures, namely 8-and 16-sector structures, for the ORD optimal framework, to relax the direction restriction of the traditional eight-direction structure.Both consist of the sector number, the short component magnitude, and the long component magnitude, which are encoded with the optimal code words under two probability distribution assumptions.Although these two structures are quite simple, experiments on both shape coding and hand gesture recognition validate that the ORD optimal framework with our proposals can achieve high coding gains and mean recognition rates, which makes our proposals potentially promising in the applications of both data compression and pattern recognition fileds.
Several research directions based on our proposals are possible.RLC has no limitation on the encoding edge length but is of low efficiency; thus, it is applicable to the objectoriented applications but not to the storage and transmission, whereas VLC exactly opposes it.How to combine their advantages and avoid disadvantages is one research direction, with some preliminary results shown in Ref. 27.Moreover, although our arbitrary direction structures avoid redundant short edges, they still cause noticeable edge errors.
How to reduce them, so that the approximate polygon can be applicable to the more challenging shape-based depth map coding 28,29 and shape-based vision tasks, 30,31 is another research direction, with some preliminary results shown in Ref. 32.

1
Operational Rate-Distortion Optimal Shape Coding Let the ordered set C ¼ fc 0 ; c 1 ; • • • ; c N C −1 g denote the connected contour, where c i is the i'th pixel of C, N C is the total number of pixels in C, and c 0 ¼ c N C −1 for a closed contour.Let the ordered set A ¼ fa 0;0 ; a 1;0 ; a 1;1 ; • • • ; a N C −1;0 g denote the admissible vertex set, where a i;0 ¼ c i , a i;m is the m'th admissible vertex associated with c i , and NðiÞ is the number of admissible vertices in A associated to c i .Let LðiÞ denote the length of the sliding window in pixel along the contour, starting at contour point c i .Let the two-tuple set E A ¼ fða i;m ; a j;n Þ ∈ A 2 : ∀ i < j ≤ i þ LðiÞg denote the candidate edges for polygonal approximation.Let the ordered set P ¼ fp 0 ; p 1 ; • • • ; p N P −1 g denote the polygon used to approximate C, where p k is the k'th vertex of P, N P is the total number of vertices in P, and P ⊆ A. Let rðp k−1 ; p k Þ denote the bit rate of the edge ðp k−1 ; p k Þ and RðPÞ the bit rate for the entire polygon.Let dðp k−1 ; p k Þ denote the distortion of the edge ðp k−1 ; p k Þ, DðPÞ the polygon distortion, and D max the admissible distortion.The ORD shape coding finds an optimal polygon P in the admissible vertex set A which can be encoded with the lowest bit rate for a given admissible distortion D max . 8It can be formulated as follows: min P RðPÞ s:t: DðPÞ ≤ D max ;

Fig. 1
Fig. 1 Comparison results for the Neck region of the 31st frame of the MissAmerica.qicfsequence with D max ¼ 1 pel.Here, we select the upper left corners as the first vertices and use the run length codes to encode the edge.(a) Eight-direction structure with VN ¼ 24, R ¼ 173 bits; (b) zoom-in on the portions highlighted by the rectangle in (a); (c) eight-sector structure with VN ¼ 12, R ¼ 148 bits; (d)16sector structure with VN ¼ 13, R ¼ 141 bits.As we can see, eightdirection structure produces a host of redundant short edges, so it is a waste of bits; our two arbitrary direction structures can avoid these redundant short edges, so they save a large number of bits as well as produce more compact results (Notations-D max : admissible distortion; VN: vertex number; R: encoding bits; Legendsdashed line: original contour; solid line: approximated polygon; asterisk: vertex).

Fig. 2
Fig. 2 Modeled DAG and the shortest path of Fig. 1(d) (Notationarrow: candidate edge with weights given by the eight-direction/ eight-sector structure).

Fig. 3
Fig. 3 Illustration of (a) the angle and the run of the eight-direction structure, (b) the quadrant number, the x -component magnitude, and the y -component magnitude of the quadrant structure, (c) the sector number, the short component magnitude, and the long component magnitude of the eight-sector structure, and (d) the sector number, the short component magnitude, and the long component magnitude of the 16-sector structure (Legend-solid line: encoding edge; asterisk: vertex; long dashed line: edge component; dashed line in (a): admissible positions for the next vertex given the current vertex at the origin; dashed line in (b): quadrant boundary; dashed line in (c) and (d): sector boundary).

Figure 5 (Fig. 4
Fig. 4 (a) Isorate contour of the eight-sector structure when R ¼ 12 bits, (b) isorate contour of the 16sector structure when R ¼ 11 bits, where the direction bias effect happens when contour segments are in or near the nonpreferential directions (Legend-solid line: isorate contour; dashed line in (a): restricted direction; long dashed line in (b): preferential direction; dashed line in (b): nonpreferential direction; dotted line in (b): encoding edge for the 8-sector structure; dashed dotted line in (b): encoding edge for the 16-sector structure; asterisk in (b): vertex).

Fig. 5
Fig. 5 Compression performance comparisons.(a) Cumulative RD results from five sequences calculated by various ORD optimal algorithms; (b) number of available edges calculated frame by frame on the Stefan sequence when D max ¼ 1.5 pels.

Fig. 7
Fig. 7 Illustration of our hand gesture dataset captured by the Kinect camera.The first and second columns are the color and depth maps in our dataset, and the third column is the segmentations of hands.
1.If α 16-sec is even, encode β 16-sec according to the table of the range from 1 to bðε þ 1Þ∕2c and encode δ 16-sec according to the table of the range from 1 to ε − 2β 16-sec þ 2. 2. If α 16-sec is odd, encode β 16-sec according to the table of the range from 1 to bε∕2c and encode δ 16-sec according to the table of the range from 1 to ε