The material of an object can be considered significant data for understanding one or more scenes. We usually interact with a wide variety of materials, and we continually assess their properties such as weight, size, and texture. Knowledge of such properties could be useful for a robot manipulator that has to handle an extensive assortment of objects. As an example, the physical property can be advantageous to discern breakable objects from the robust ones. In this way, a manipulator gripper force can be tuned to avoid item damage.
The material information can also be useful in other applicative contexts such as robot localization and environmental mapping where three-dimensional (3-D) data are employed. In this regard, a better 3-D point cloud registration could be achieved by exploiting knowledge of the material type. As an example, a complex environment made of glass and highly reflective surfaces could be more challenging for registration algorithms. Therefore, a preliminary analysis aimed to identify the material type would be helpful for discarding some 3-D points from the registration method. In this way, only the stable points referring to nonchallenging materials are considered in the computation, implying an enhancement of registration accuracy.
Nevertheless, material recognition is currently a difficult challenge. Over the years, many works have been proposed to achieve material classification.22.214.171.124.–6 Most of the common approaches are based on color analysis or textural appearance. In this regard, a method based on 3-D textons1 was introduced to recognize surfaces on the basis of their textural appearance. A vocabulary of tiny surface patches together with their local and photometric properties was built to characterize the local irradiant distribution.
Other textural representations2 based on fast Markovian statistics were proposed for recognizing natural materials. The proposed features are fast to compute and robust to illumination direction as well as invariant to brightness changes. A good predictive accuracy was achieved from the analysis of several natural materials acquired under varying viewpoints, illumination colors, and directions.
A rich set of local features3 exploiting the Kernel descriptor framework combined with large-margin nearest neighbor learning was empirically studied to accomplish the material recognition of real-world objects as well.
A method that exploits several features covering various aspects of material appearance was also proposed for material classification.4 The support vector machine (SVM) framework was employed for obtaining a recognition rate of 53.1%, much better than the predictive rate obtained by using the Bayesian inference framework.7
High predictive accuracies were also achieved by using the algorithm proposed in Ref. 5. Specifically, the material appearance is modeled as the joint probability distribution of responses extracted from filter bank and color values (in the hue-saturation-value space). SVM was then employed as a classifier. By considering image patches of resolution , a very high accuracy was found.
A framework called reflectance hashing6 was introduced to model the reflectance disk of a material surface acquired from a unique optical camera measuring technique. The high-dimensional reflectance is encoded with a compact binary code that efficiently reveals the material class.
Liu et al.7 and Sharan et al.8 computed different low- and middle-level features to assess the appearance of materials. Then an augmented latent Dirichlet allocation (aLDA) method based on a Bayesian framework was applied to combine such features.
Large-scale datasets combined with deep learning were also proposed for scene classification and object recognition.9,10 In detail, a convolutional neural network (CNN) was employed to classify the materials. A mean recognition rate of about 80% was obtained.
An alternative statistical approach was presented in Ref. 11, where the joint distribution of intensity values of single images was employed together with filter banks providing state-of-the-art classification rates.
Most of the techniques presented in the literature for material recognition exploit passive 2-D cameras. However, the reflectance properties of material, the kind of surface (smooth or rough), the illumination, and the view angle conditions could compromise material identification. These cameras always need illumination in dark environments, and lighting variations could significantly complicate a real-scene analysis.
Nevertheless, the recent design of 3-D range sensors has gained significant interest for a wide number of applications.12,13 In fact, most issues related to 2-D cameras in material recognition tasks can be overcome by using time-of-flight (ToF) cameras. As an example, it is not necessary to use an external light source because such acquisition systems are able to sense the neighboring environment by employing infrared (IR) light. Hence, useful information can be collected even when the objects to be examined are poorly lit. Moreover, the 3-D data returned by ToF sensors provide significant information about the geometry and the shape of objects located in a scene. Therefore, problems tied to both view angle conditions and roughness of surfaces are considerably reduced when a 3-D sensor is used instead of 2-D ones.
Although ToF cameras can be employed only in indoor environments where the sunlight cannot interfere, the other benefits gained by employing these sensors enable better investigation of the material properties to accomplish material classification. This paper will deal with methodology for material recognition by exploiting a ToF range camera.
Similar works that exploit 3-D sensors and noncontact active techniques1415.–16 were presented to evaluate the object material. Specifically, the geometric properties of a material were investigated through the analysis of the reflected pattern of IR light. Microstructural details of materials and other associated information, i.e., shape and color, were computed by utilizing a ToF camera. The patterns related to materials were then classified by a random forest (RF) classifier.
This paper presents an alternative technique for achieving material recognition exploiting the data given from a ToF camera. The basic idea is to analyze whether a correlation can be established between the type of material of an object and the alterations affecting measurements taken with a ToF sensor. For every tested material, a patch from the 3-D point cloud dataset is extracted. At this stage, features based on different domains of transform (e.g., discrete cosine transform, Fourier transform, Hilbert transform, and so on) are computed to characterize the material.
Several working conditions have been taken into account, such as the pose of the material with respect to the depth sensor and the shutter value of the photoreceivers. A decision tree (J48) has then been employed to classify the materials.
This paper is organized as follows: some important aspects regarding ToF sensors are discussed in Sec. 2, and the methodology employed for material recognition is presented in Sec. 3. Experimental results and related discussion are reported in Sec. 4. Final conclusions and remarks are in Sec. 5.
ToF Range Camera: Depth Measurement Errors
As mentioned, the aim of this work is to identify the material category of an item (e.g., wood, metal, plastic, glass, fabric, and so on) by analyzing the information given from a range camera. This sensor exploits the well-known principle of ToF to profile the surrounding environment. Therefore, our main idea is to investigate the materials by analyzing the alterations affecting the measurements over time. In this regard, some physical material properties such as reflectance, scattering, and absorption might affect the IR light source of the ToF sensor that strikes the surface by involving fluctuations of returned information.
Since we take advantage of depth measurement alterations to accomplish material recognition, it is worth having a discussion about the unavoidable errors17,18 that might affect ToF sensors. In this regard, two main categories can be identified: systematic and nonsystematic depth errors. Typical systematic errors can be due to depth distortions (i.e., when an incorrect sinusoid is generated), lens distortions, integration time, operating temperature, overexposed reflected amplitudes, ambient light conditions, and so on. In contrast, the most common nonsystematic errors are due to multiple light reception, motion blurring, light scattering, signal-to-noise ratio (SNR) distortion, and so on.
Essentially, it is important to reduce the effect of these errors to ensure reliable material recognition. In fact, some precautions and compensation methods could be adopted to achieve our purpose.
Systematic depth measurement errors are mainly due to IR sinusoidal generators, which have limits in their modulation process. Such irregularities involve a phase perturbation caused by erroneous wrapping due to the presence of odd harmonics. Consequently, a change of depth value occurs, compromising the actual computation of distance. In the same way, other errors due to integration time and operating temperature can consistently affect the actual computation of the depth map as well. It is important to define a model of error to get more accurate and reliable depth measurements.1920.–21 In this regard, some details will be provided in Sec. 4, where the counter-measures adopted to limit these errors have been explained.
Another important aspect that has to be faced is the effect of lens distortion. Such effects are mainly due to the curvature of lenses mounted by ToF cameras. Therefore, precautionary steps have to be performed to decrease the distortions that affect the depth image. More details concerning the rectification step will be provided in Sec. 3.1.
The proper functioning of ToF cameras is strictly linked to ambient light conditions as well. In this regard, external waves having a comparable wavelength to the light source employed by the sensors for scanning the environment can compromise the reliability of measurements. Therefore, such sensors could be used only in indoor environments, as stated. Further details about the ambient light conditions of our tests will be reported in Sec. 4.
Other nonsystematic errors that might negatively affect measurements are mainly due to multiple paths of the light source, low SNR ratio, light scattering, and so on. Usually, these errors can be managed by employing filtering methods or suitable models. Specifically, many works have been presented to investigate the effect of scattering of surfaces. Most of them have presented models based on the bidirectional reflectance distribution function.22 A more accurate model of light scattering was introduced in Ref. 23, where the bidirectional scattering-surface reflectance distribution function was described. Nevertheless, this model can be employed only to measure the scattering properties of translucent materials.
In this regard, some experiments will be conducted to examine in depth how scattering, reflectance, and absorption affect our methodology in material recognition. In other words, this paper will mainly focus on the description of our approach along with related results. Therefore, all physical aspects behind our idea will be addressed in future work.
In this work, we have taken advantage of the reflectance and absorption of the material surfaces considering several working conditions. A ToF range camera has been employed to create the datasets for our experiments. Therefore, exploiting the 3-D information given by the sensor together with the related intensity values, a variety of materials has been investigated. Specifically, different features have been extracted and then evaluated by a decision tree to accomplish the material classification.
Several mathematical methods have been compared to obtain a reliable classification of material type, such as:
• fast Fourier transform (FOURIER);
• discrete Hilbert transform (HILBERT);
• discrete cosine transform (DCT);
• Karhunen–Loève transform (KLT);
• chirp-z transform (CHIRP).
These transforms are commonly employed in signal processing tasks. Each of them has particular properties or characteristics that can aid the material analysis process. Since the material to be recognized is examined by analyzing several sequences of frames over time, it is important to use transforms that are able to extract significant features from signals.
In our tests, we have taken advantage of the fast Fourier transform (FFT), i.e., a faster alternative to discrete Fourier transform (DFT). In general, the FFT is a powerful tool for pattern recognition. It is commonly employed to extract invariant features24 because of its important properties; for example, a shift in the time domain does not involve any change in the amplitude spectrum of the image. Good predictive accuracies in material recognition are expected since the frequency domain representation might provide more useful information than the time domain one. Moreover, a low processing time should be required for computing this transform.
The Hilbert transform extends a differentiable real signal into the Gauss plane. This transform adds information to the Fourier analysis because it introduces the conjugate harmonic of a given signal. Usually, it is used to handle nonstationary processes or signals for which the Fourier spectral analysis is not often suited. In our algorithm, the discrete version is exploited.
The DCT is similar to the DFT since they both decompose a discrete-time signal into a sum of scaled and shifted basis functions. However, the DCT uses only cosine functions as its kernel. The DCT is widely used in signal processing and data compression applications because of its high-compression degree of spectrum. One of the most relevant properties25,26 is the noise high-frequency isolation in a small number of coefficients compared to other transforms such as the DFT. Average recognition rates are expected since the high-compression level of this transform might negatively affect the informative content of input signals.
The KLT is a representation of orthogonal functions. It has different expansion bases that depend on the stochastic process, and their coefficients are random variables. The kernels employed in this representation are defined by the covariance function of the process. Although the KLT has a high computational complexity, it is suitable to obtain the best bases for linear decorrelation of signals and energy compression. Hence, good predictive rates should be achieved by employing this mathematical tool.
Finally, the last employed transform is the chirp-z-transform (CZT), which can be considered a generalized case of DFT. In fact, the CZT samples the Z-plane along spiral arcs, which correspond to straight lines in the Laplace-plane. The kernel of this transform is a complex number.
All listed mathematical domains have been used and compared to extract features suitable for the material recognition method. Different materials having different physical characteristics have been examined. Particularly, the material target under investigation has been fastened on a panel and then placed in front of the acquisition system. Several working conditions have been taken into account. For instance, the shutter value (or exposure time), the position (or distance), and the angle (or heading) of panels with respect to the sensor have been varied.
It is necessary to emphasize that only a portion of the scene has been considered. Specifically, a region of interest (RoI) of the panel has been extracted, as reported in Fig. 1. The target has been centered in the middle of field-of-view (FoV) of the ToF sensor. In this way, the distortion effects due to the curvature of lenses have been consistently reduced. Nevertheless, a preliminary step aimed to rectify the depth images has been performed to compensate the distortion effects. In this regard, the camera calibration toolbox for MATLAB,27 along with the well-known notation introduced by Heikkilä,28 has been exploited to get a faithful reconstruction of scenes without distortions.
At this stage, a punctual analysis of the panel is performed. In other words, every pixel of the extracted RoI is evaluated over time. A sequence of frames is acquired for every material. Thus, each pixel has values of 3-D coordinates together with the related intensity levels
Eq. (1) represents exactly how the data linked to a pixel of 2-D coordinates is arranged. In this regard, the vector contains both the distance variations of the pixel and the associated intensity values . Distance fluctuations are computed using Eq. (2). Specifically, the value of distance is evaluated by means of Eq. (3), whereas represents the average value of all distances related to the pixel under investigation. The terms , , and represent the 3-D coordinates of the pixel at the time .
Subtracting the average distance from the set of measures fulfills a precautionary step aimed at reducing the possibility that material recognition might be achieved by the classifier by considering the evaluated distance instead of its fluctuations. Artificial biases introduced by the experiment itself in the material recognition process are therefore drastically decreased.
Once the input vector has been collected, the different transformation domains listed before have been computed. In this way, one or more features are associated to each pixel of the RoI.
Description of Decision Tree
As stated before, the aim of this paper is to provide a method able to discern the category of materials by analyzing the data returned from a ToF camera.
Material recognition is achieved by exploiting a decision tree as a model of classification. In brief, a decision tree is a predictive machine-learning model able to provide an output value by evaluating numerous attribute values of the available data. It can be considered a treelike graph that has nodes and branches. Specifically, the internal nodes denote the different attributes, whereas the branches between nodes specify the possible values these attributes can have. Finally, the terminal nodes identify the final value or the classification output of the dependent variable.
Regarding this specific case, the attributes of the decision tree are the coefficients of the various computed features together with other parameters that characterize the performed experiment. Further information will be provided in Sec. 4.
In our tests, we employed the open source software Weka,29 a collection of machine-learning algorithms, for fulfilling material recognition. Specifically, we chose the J48 classifier, a high-performance algorithm suitable for large datasets, which is a popular open-source implementation of the C4.5 decision tree (Ref. 30) classifier that is available in Weka. The employed classifier supplies an easy interpretation model and it is suitable for datasets, which are heavily affected by noise. Although other classifiers (e.g., RF) provide comparable predictive accuracies with respect to J48, they require higher computational processing. Moreover, preference was given to deterministic algorithms for repeatability reasons.
Section 3.1 reports the main domains of analysis for associating features to a single pixel under examination. However, material recognition is achieved by considering the evaluation of a set of pixels, i.e., those pixels belonging to the considered RoI.
As previously stated, a decision tree is designed to provide the category of material under examination. Therefore, the data of interest have to be properly arranged to be managed by the decision graph for material classification. In other words, starting from the computed features for a set of materials and considering other parameters tied to the typology of experiments, a suitable representation has been obtained and stored for processing using Weka. Specifically, the file has been organized according to the attribute-relation file format (ARFF).
The header of an ARFF file is reported in Table 1. The coefficients of the features are arranged along the columns, then other parameters such as position, angle, active brightness, and shutter are organized in the same way. The position and the angle values represent the pose of the panel to be examined with respect to the ToF sensor. Further details will be provided in the following section. The active brightness is a boolean parameter of the camera that enables improvement of the quality of acquisition. In fact, when this parameter is disabled, all measurements returned by the photoreceivers are considered good, even if some reflected light is captured by the photosensors. Therefore, it is preferable to enable this modality to obtain more accurate measurements. The shutter value is the time duration in which the camera chip is exposed to IR light. This value is expressed in milliseconds (ms).
ARFF file organization. The coefficients of features along with the other parameters are arranged along the columns. In contrast, each pixel of the selected RoI is arranged along the rows.
|Transform 1 c1,…,cn||Transform N(optional) c1,…,cn||Position||Angle||Active brightness||Shutter||Output|
The last column represents the expected output value, which is identified by a cardinal number. In this regard, a specific number is assigned to represent each material to be classified (see Fig. 2 for more clarity).
Finally, it is worth noting that each row of the file refers to the single pixel belonging to the extracted RoI. Furthermore, as shown in Table 1, more than one feature transform can be employed. In this regard, some tests will show how the classification is affected.
It is important to underline that the data given from the acquisitions have been split into two categories: the training and the validation set. Specifically, 20% of the entire dataset has been reserved to represent the validation set. In this way, the validity of the method has been tested by analyzing the predictive accuracies of recognition.
This section will explain the obtained outcomes by considering the analysis of 10 different materials (refer to Fig. 2). Three sets of acquisitions have been separately performed:
• analysis of eight materials by considering fixed panel poses (see Sec. 4.1);
• analysis of four materials by changing both the orientation and the displacement of the panel with respect to the ToF sensor (see Sec. 4.2);
• analysis of eight materials by introducing another wooden panel in the dataset (see Sec. 4.3).
All data belonging to one class have been collected from one panel. Specifically, only the test described in Sec. 4.3 employs a dataset where two different types of wood are used to represent such a category.
Moreover, only planar surfaces have been analyzed to prove the validity of method. Nonsolid materials, such as the dark and white fabric, have been fastened onto flat panels. Therefore, all reported outcomes refer only to planar materials. However, the presented methodology could be extended to the analysis of nonregular surfaces as well. In this regard, different patches having planar surface geometry can be detected from an object under investigation. Exploiting the definition of the shape index (SI) as in Ref. 15, it is possible to measure the surface shape of any point belonging to a patch of interest. In detail, convex surfaces are identified by large SI values, concave surfaces have small SI values, and planar surfaces have medium SI values. Consequently, the patches having a medium SI value can be employed by our method to classify the material of items.
For the sake of completeness, some relevant optical properties related to examining materials are reported in Table 2. In this regard, the more common materials having different reflectance and refractive indices has been investigated to evaluate our approach.
Main optical characteristics of employed materials obtained by considering the wavelength of our ToF sensor (Fotonic E70 having λ=850 nm). The database available online31 has been used to derive such parameters.
|Material||Refractive index||Extinction coefficient||Reflectance|
In the following experiments, there are two kinds of wood, i.e., fir wood and plywood, which have different properties due to different fabrication methodologies and thicknesses. Here, the properties of fir wood are reported.
Two different-colored fabrics have been considered: dark and white. The optical properties will change since the absorption will depend on the material pigments. The physical properties reported in the table ignore the material color.
The experimental setup is reported in Fig. 3. At this stage, a brief discussion about ambient light conditions is due. In general, ToF sensors suffer from issues linked to background light. In fact, external light sources, such as sunlight or artificial lighting, could produce significant degradation of measurements. Therefore, outer optical band-pass filters could be adopted to selectively transmit only the rays having the expected wavelength, which will be input to the range camera. Hence, meaningless light is physically filtered out of the computation. In this regard, the employed ToF camera already mounts an internal optical filter. Nevertheless, to consistently reduce errors due to ambient light conditions, the experiments have been run in a completely controlled environment (without external and artificial lights).
The ToF camera used in our tests is the Fotonic E70,32 with a resolution of and a maximum measurement range of 7 m. The illumination unit emits modulated waves of near-infrared light (NIR), which are triggered by an internal reference signal, i.e., a sinusoid having a modulation frequency of 15 MHz and a wavelength of 850 nm. The phase of the incoming wave-front is estimated by means of the four-buckets algorithm, and the related distance is computed once the phase measurement is known.
It is worth highlighting that a bounded operating range of the ToF sensor has been employed to obtain more accurate measurements. In this regard, preliminary experiments not reported in this paper have shown that an average absolute distance error of 0.012 m is obtained by considering the distance range of 0.5 to 3.5 m. Conversely, the error increases significantly when higher distances are considered. Hence, the presented tests have been performed taking into account only distances in this range.
The range sensor has been fastened onto a translational stage and then linked to a laptop. The target panel, i.e., the material under investigation, has been placed in front of the camera. Moreover, a rotational stage mounted under the panel is responsible for measuring the tilts as well as providing the rotational movements. The actual distance between the camera and the material is given by means of a dot laser range finder (LRF)33 having an operating range of 0.1 to 10 m and a precision of 1 mm.
Eight Materials with Fixed Panel Poses
In this experiment, all materials except glass and fir wood have been considered. The position and the orientation of the panel on which the materials are fastened have been held fixed to 1 m and 0 deg, respectively. Furthermore, a sequence of 300 frames has been acquired.
Figure 4 shows the predictive accuracies given from the analysis of the eight materials. These accuracy rates have been obtained by considering the confusion matrices returned by Weka. Three different shutters have been tested. As shown by the bar graph, the shutter value heavily affects the likelihood of correctly recognizing the material type. In this regard, the DCT and HILBERT transforms seem less stable than the others. In contrast, the remaining transforms ensure higher recognition rates for all the reported shutter speeds.
The FOURIER, CHIRP, and KL transforms provide more reliable classification of the presented materials. Although the related recognition rates are comparable, the FOURIER transform requires less time to be computed with respect to the others. This aspect should be taken into account when real-time requirements for material recognition need to be addressed for specific applications.
As the bar graph shows, the KLT ensures high recognition rates. Such a transform was commonly employed in previous works34,35 for its effectiveness in feature extraction. As already stated, this transform is a representation of a linear combination of orthogonal functions such as the Fourier series, but the number of coefficients is variable. The Karhunen–Loève expansion is better suited, even though it requires more time to be computed with respect to the chirp and Fourier transforms. Therefore, the Fourier-based feature represents a good compromise between classification accuracy and computational time required.
In contrast, the DCT-based feature does not provide stable predictive accuracies for the considered shutter speeds. In this regard, the high-frequency noise that affects a signal is often isolated in a small number of coefficients. Since in our approach we take advantage of both the distance fluctuations and the intensity information for extracting a feature, it is fairly probable that such informative content is not preserved when the DCT transform is computed. Therefore, loss of information might occur when this transform is used.
Nevertheless, the classification rates obtained employing a unique feature do not compare favorably against previous methods. Therefore, experiments have been performed by providing the J48-based classifier with features extracted simultaneously by several transforms.
Hence, all the possible pair combinations of the presented transforms have been considered to enhance the predictive accuracies. Similar to the case of unique transform previously discussed, the features employing the DCT transform appear to be more affected by shutter value variations. As a matter of fact, a consistent decrease of predictive rates is observable from Fig. 5.
Conversely, the features extracted by using combinations of the FOURIER, CHIRP, and KL transforms ensure good predictive accuracies. Such results are still coherent with the ones obtained by employing a unique transform for extracting a feature of interest. In this regard, such features are less affected by the analyzed different shutter values.
Table 3 reports some important metrics useful in measuring the performance of the predictor. Among the computed features, only the most relevant has been proposed in this table. Moreover, a shutter of only 20 ms has been taken into account since that seems to ensure higher recognition rates.
The true positive (TP), true negative (TN), false positive (FP), false negative (FN), precision, recall, and F-measures are reported for each material and extracted feature combination.
|Material||Feature||TP||TN||FP||FN||Precision (%)||Recall (%)||F-measure (%)||Rank F-measure|
As shown by the -measure (i.e., the harmonic mean between precision and recall), the materials made of plastic and reflective surfaces can be predicted with good performance by means of KL-based features and CHIRP–KL-based features. In contrast, the dark fabric sample presents the worst case of this classification task due to its high absorption property. Other materials such as aluminum and iron show average -measure values since the considered shutter causes sensor saturation. Moreover, considering the average rank of the -measure, the FOURIER–KLT combination and CHIRP–KLT show the best results. In general, if the KL score is low, most of the descriptors perform weakly.
Some of presented features enable recognition of the material type with good accuracy. However, the computational time required to calculate them has to be taken into account when dealing with real-time applications. For the sake of completeness, Table 4 reports the computational times needed to compute the features of interest.
Computational times obtained to compute the features of interest. All the transforms have been obtained using the software MATLAB. These outcomes refer to the analysis of one pixel belonging to plywood over time.
|Feature name||Elapsed time (ms)|
The values in bold benchmark the elapsed time for computing the features that have shown the best results in terms of predictive accuracy. As observable, the FOURIER-based feature requires the least time to be computed with respect to the others. In fact, the FFT has been employed to obtain such features. In contrast, the features based on the KLT need more time to be extracted. In this regard, one order of magnitude higher than the FOURIER- and CHIRP-based features is evident from this table.
Four Materials with Different Panel Poses
In this case, a subset of considered materials has been examined since several panel poses have been taken into account, as shown in Table 5. This has been necessary for limiting the number of material/distance/orientation combinations to be acquired. Specifically, the plywood, the plastic panel, the reflective surface, and the glass surface have been analyzed since these materials are commonly present in indoor environments.
List of parameter values employed during our experiments.
|Position values (m)||Rotation values (deg)||Shutter values (ms)|
It is worth highlighting that the glass surface has been introduced in this experiment because it is more challenging for it to be acquired by the ToF camera, like the reflective surface. Therefore, its analysis could be of more interest than those of other material types.
Table 5 reports the different attribute values used during the experiment. In other words, several acquisitions have been performed considering the combinations of values listed in the table.
It is worth noting that the position and shutter values are related. In other words, when the panel is placed close to the ToF sensor, small shutter values are needed to get reliable depth measures. In contrast, the more the position increases its value, the more the shutter can be increased. Therefore, by considering a short distance between the panel and the acquisition system, small values of the shutter have to be chosen, thus enabling the avoidance of unwanted saturations of photosensors.
As shown in Fig. 6, an average increase of 10% in prediction accuracy has been obtained with respect to results reported in Fig. 4, since there are fewer material types that need to be considered. In this case, all features appear to be stable against shutter variations. The features based on FOURIER, CHIRP, and KL transforms show better predictive rates, although of a small margin.
By combining the transforms, an increase of predictive rates has been achieved, as shown in Fig. 7. Such outcomes prove once more that features extracted from multiple domains are better suited to be interpreted by the classifier.
This is consistent with the results obtained in the previous experiment. Moreover, a shutter value of 20 ms provides the highest predictive rates among those considered. In fact, this value seems to be the most suitable for the considered distance range and the materials used during the tests.
Table 6 reports the main metrics obtained by limiting the experiment to four material types only using a shutter value of 20 ms. As highlighted, very high -measure values have been achieved by using the FOURIER–CHIRP-based features and FOURIER–KL-based features. Conversely, lower predictive rates have been obtained when features based on unique transforms have been used. In fact, as already stated, the combination of features enables enhancement of the predictive rates for material classification.
The main metrics for evaluating the classifier have been reported for the case of analysis. A shutter value of 20 ms has been considered to collect the data presented here.
|Material||Feature||TP||TN||FP||FN||Precision (%)||Recall (%)||F-measure (%)||Rank F-measure|
The results of the second experiment, therefore, confirm that the descriptors based on FOURIER–KL and CHIRP–KL provide the most effective material classification.
Table 6 shows that the predictive accuracies of materials made of plastic are considerably lower than others. Specifically, the reported data refer to a shutter value of 20 ms. In this regard, saturations of photosensitive receivers occur when the plastic panel has been acquired by considering this exposure time. Therefore, meaningless measurements have been returned by the ToF camera. In contrast, the other materials have not involved any photosensor saturations, thus providing more reliable measurements. Additionally, the measurement fluctuations obtained by analyzing these materials appear to be more distinctive and meaningful by the classifier. As a consequence, they have been better classified, as shown in the table. Further experiments will be conducted to better investigate these material categories.
In Fig. 8, the recognition rates are shown, taking into account the position changes. As already highlighted, the features that ensure the highest scores are those based on a combination of FOURIER, CHIRP, and KL transforms.
The predictive accuracies related to short distances are slightly lower with respect to other positions, although by a small margin. This is probably due to the shutter values employed for these positions. Specifically, only integration times of 3, 5, 10, and 15 ms have been considered for such positions. However, it has been demonstrated that a shutter value of 20 ms ensures the highest recognition scores among those remaining. As a consequence, the scores of Fig. 8 are indirectly affected by the shutter value.
Similar to cases of analysis of shutter and position changes, the correlation between the angle variation and the predictive accuracy has been investigated. In this regard, Fig. 9 shows the obtained recognition rates by varying the angle value. It is important to underline that only the positive angles have been reported since the negative ones present very similar scores.
The most stable features against angle variations are still those obtained in previous experiments, namely the FOURIER–CHIRP, FOURIER–KL, and CHIRP–KL. High predictive accuracies have been achieved in this case as well.
In summary, it is possible to state that the combination of features enables enhancement of the prediction rates. Moreover, three of these features can be considered more stable than others in material classification, even when different shutter values and panel poses are taken into account.
Eight Materials by Introducing Another Wooden Panel in the Dataset
The presented experiments have shown the efficiency of the proposed methodology for recognizing the material type. In this paper, the most common material categories have been considered. However, several objects belonging to the same category can be found when an environment is explored. For example, there are several surface typologies made of iron, wood, plastic, or fabric having different surface smoothness or reflective properties.
Nevertheless, it is unthinkable to classify all the existing materials. In this regard, a large dataset should be used to enhance the recognition rate of material. In fact, a wide set of objects made of the same material has to be considered during the training stage for each category.
In this experiment, another wooden panel made of fir, having different characteristics from the others (plywood), has been analyzed. In this regard, the fir panel has a lighter color with respect to plywood and is also less smooth.
At first, data related to the fir panel have not been included in the training step. Therefore, the material category should be identified by the built classifier using only the data related to the other eight panels (similar to tests reported in Sec. 4.1). However, a low-predictive accuracy () has been reached. As a consequence, the fir panel has not always been identified as wood. This likely happens because the diverse roughness of the two wooden panels could affect category recognition. Conversely, by considering the data of the fir panel in the training stage and classifying the panel as wood, a significant increase of the recognition rate is achieved.
Figure 10 reports the predictive accuracies by considering the fir panel in the computation as well. As observable, a slight increase of scores has been obtained in comparison with the outcomes reported in Fig. 4. The improvement of rates is probably due to a better representation of the class “wood.” In other words, the more data used in the training step, the more the classifier is able to represent the real world.
The combination of features enables higher recognition rates (see Fig. 11). The features based on the combination of FOURIER, CHIRP, and KL show more stable scores with respect to those that significantly decrease the predictive accuracies when the shutter changes its value. Hence, such outcomes prove once more that these features are sufficiently distinctive and reliable for achieving category classification.
The metrics related to this experiment are reported in Table 7. As observable from the table, the features computed from a combination of transforms ensure the highest classification rates. By comparing these results with those of Table 3, it is possible to note that similar prediction scores have been achieved. Nevertheless, the scores referring to the category of wood are considerably improved. Specifically, the -measure of wood is increased by about 30% for all the considered features. This enhancement is due to a better representation of the class under investigation.
Here, the main metrics related to this test are shown. Very high recognition scores for materials made of wood have been achieved in comparison with Table 3. Other predictive accuracies related to other categories are still comparable with those of that table.
|Material||Feature||TP||TN||FP||FN||Precision (%)||Recall (%)||F-measure (%)||Rank F-measure|
Many works have been proposed in recent years for solving the problem of material recognition. Many of the proposed methodologies are essentially based on color and texture analysis of 2-D images. Very few works tackle this topic by exploiting 3-D information. In this regard, our method is able to classify the material typology by exploiting both the 3-D information and the intensity levels returned by the ToF camera.
Although the method proposed in this paper is directly comparable with just a few works, and, even then, it works on different datasets, a comparison table is show in Table 8, with papers grouped according to the employed processing method.
Overall recognition rates of discussed papers.
|Category of method||Reference number||Type of features||Number of materials||Category of materials||Overall recognition rates (%)|
|Color analysis and textural appearance||1||Local geometric and photometric properties||20||Natural||95.6|
|3||Gradient orientation||10||Flickr database||54.0|
|4||Color and curvature||10||Flickr database||53.1|
|5||Texture and statistical distribution filter response||20||Building||90.8|
|6||Reflectance with angular gradient computation||20||Real world||92.3|
|7 and 8||Texture||10||Flickr database||45.3|
|9 and 10||Multiscale 2-D||23||Real world||79.5|
|11||Intensity distribution||61||CUReT database||24.1|
|3-D data and intensity level analysis||1415.–16||Intensity and depth Information||4||Real world||86.7|
|Our method||3-D data alterations and intensity Variable shutter and fixed panel pose||8||Real world||61.9 (Sec. 4.1)|
|3-D data alterations and intensity Variable shutter and panel pose||4||82.9 (Sec. 4.2)|
|3-D data alterations and intensity Variable shutter and fixed panel pose||8||62.1 (Sec. 4.3)|
Works in Refs. 1, 5, and 6 employed high-resolution 2-D cameras to acquire images with respect to the low resolution of our 3-D ToF sensor. In Ref. 6, an acquisition system based on a concave parabolic mirror and a beam splitter have been used to obtain the reflectance disks of materials. Different material typologies have been analyzed. In this regard, only Ref. 6 investigated similar materials. In contrast, Refs. 1 and 5 employed natural and building materials, respectively.
The work in Ref. 2 presents average scores even though several materials are examined. However, its methodology is applied to recognize natural materials, while this work is more concerned with structured indoor environments.
Finally, Refs. 14, 15, and 16 refer to the analysis of 3-D information for achieving material recognition. Among the discussed works, these are the closest to our approach since they handled 3-D data to extract features. Furthermore, analogous material categories (wood, plastic, fabric, and paper) have been investigated. It is observable that a slightly lower accuracy rate has been achieved since in our case, challenging materials such as a reflective surface and glass have been considered.
Part of this advantage might also be dependent on the sensor’s performance difference. In fact, the absolute accuracy (or the maximum systematic error on the distance measurements) and the repeatability () of SR400036 are equal to and 4 mm, respectively. Conversely, the absolute accuracy and the repeatability () of Fotonic E70 are and 7 mm. Hence, our acquisition system used to collect the data is less accurate than the other one.
In this paper, exploiting the information given by a ToF depth camera, a group of features has been computed to accomplish the task of material classification. For each material, a suitable RoI is considered. Specifically, all pixels belonging to this RoI are separately examined over time. A sequence of 300 frames is acquired for each material placed in front of the ToF sensor. At this stage, exploiting different transforms such as Fourier, Karhunen–Loève, chirp-z, and so on, different features have been extracted. Both a training and a validation dataset have been created in order to train and test a decision tree (J48) for classifying the materials.
Results have shown how the integration time (i.e., shutter value) affects the predictive accuracies of recognition in the event that only a unique transform domain is employed to classify materials. In this regard, the features based on the Fourier, chirp, and KL transforms seem more stable with respect to the shutter variations. By considering the combinations of transforms, a significant increase of recognition rates has been achieved as well. At the same time, by reducing the number of materials and introducing other information tied to the pose of the panel, predictive accuracies have slightly increased.
The efficiency of the presented methodology has been also proven by evaluating features with changes of the position and angle of panels. Good predictive rates have been achieved, confirming the stability of computed features against parameter variations.
Moreover, by enhancing the training dataset by introducing a new typology of panel in the category of wood (fir wood), significant recognition rates have been obtained, proving once more the effectiveness of our approach.
Since these results look promising, further work will be done to prove the robustness of the proposed methodology, e.g., by considering a wider set of materials and reducing the amount of frames acquired during an experiment. Moreover, other depth sensors such as Kinect and Swiss Ranger might be investigated in order to evaluate accuracy improvements. Finally, a method for patch extraction from objects will be considered for accomplishing material recognition in less controlled environments as well.
This work was funded within the CNR-ISSIA project “MASSIME-Mechatronic innovative safety systems (wired and wireless) for railway, aerospace and robotic applications.” The authors would like to thank Mr. Michele Attolico for technical support.
Fabio Martino obtained his bachelor’s degree in electronic engineering from the Polytechnic University of Bari in 2010. At the same university, he attained a master’s degree in electronic engineering in 2013 with a thesis titled “ToF range camera: analysis and evaluation.” Since May 2013, he has been cooperating with ISSIA-CNR. His principal research interests are 3-D data analysis and image processing.
Cosimo Patruno received his BS and MS degrees in automation engineering from the Polytechnic University of Bari, Bari, Italy, in 2010 and 2013, respectively. Since May 2013, he has been cooperating with the Institute of Intelligent Systems for Automation (ISSIA), National Research Council of Italy (CNR), Bari, as a research collaborator. His main research interests include 3-D data analysis, signal and image processing, computer vision, and robotics.
Nicola Mosca received his degree (cum laude) in computer science from the University of Bari, Bari, Italy, in 2004, and his MPhil degree in transport systems engineering from the University of South Australia, Adelaide, Australia, in 2012. He has been cooperating with the Institute of Intelligent Systems for Automation (ISSIA) of the National Research Council since 2004. His research interests include image processing, computer vision, robotics, high-performance computing, and software design and development.
Ettore Stella received his degree (cum laude) in computer science from the University of Bari, Bari, Italy, in 1984. He is the scientific chief of several research projects. He is a coauthor of more than 100 papers in international journals and proceedings of conferences, book chapters, and international patents. From a professional point of view, he has certified experience in industrial automation, robotics, computer vision, high-performance computing, and software design and development.