In this paper a method is proposed for estimating the orientation of industrial parts. Classical 2d images of the part are used to train a deep neural network and infer the part pose by computing its quaternion representation. Another innovative point of this work is in the use of synthetic data which are generated on the fly during the network training from a textured cad model placed in a virtual scene. This way overcomes the difficulties to obtain pose ground truth images from real images. At the same time, using the cad model with several lighting conditions and material reflectances allows to anticipate challenging industrial situations. As a first step of the method, the part is separated from the background using a semantic segmentation network. Then a depth image of the part is produced employing an encoder-decoder network with skip connections. Finally, the depth map is associated with the local pixels coordinates to estimate the part orientation with a fully connected network using a SO(3) metric loss function. This method can estimate the part pose from real images with visually convincing results suitable for any pose refinement processes.
The work reported in this paper concerns the problem of mathematical expressions recognition. This task is
known to be a very hard one. We propose to alleviate the difficulties by taking into account two complementary
modalities. The modalities referred to are handwriting and audio ones. To combine the signals coming from
both modalities, various fusion methods are explored. Performances evaluated on the HAMEX dataset show a
significant improvement compared to a single modality (handwriting) based system.
Currently, structural pattern recognizer evaluations compare graphs of detected structure to target structures
(i.e. ground truth) using recognition rates, recall and precision for object segmentation, classification and
relationships. In document recognition, these target objects (e.g. symbols) are frequently comprised of multiple
primitives (e.g. connected components, or strokes for online handwritten data), but current metrics do not
characterize errors at the primitive level, from which object-level structure is obtained. Primitive label graphs
are directed graphs defined over primitives and primitive pairs. We define new metrics obtained by Hamming
distances over label graphs, which allow classification, segmentation and parsing errors to be characterized
separately, or using a single measure. Recall and precision for detected objects may also be computed directly
from label graphs. We illustrate the new metrics by comparing a new primitive-level evaluation to the symbol-level
evaluation performed for the CROHME 2012 handwritten math recognition competition. A Python-based
set of utilities for evaluating, visualizing and translating label graphs is publicly available.
Online handwritten data, produced with Tablet PCs or digital pens, consists in a sequence of points (x, y). As
the amount of data available in this form increases, algorithms for retrieval of online data are needed. Word
spotting is a common approach used for the retrieval of handwriting. However, from an information retrieval
(IR) perspective, word spotting is a primitive keyword based matching and retrieval strategy. We propose a
framework for handwriting retrieval where an arbitrary word spotting method is used, and then a manifold
ranking algorithm is applied on the initial retrieval scores. Experimental results on a database of more than
2,000 handwritten newswires show that our method can improve the performances of a state-of-the-art word
spotting system by more than 10%.
To model a handwritten graphical language, spatial relations describe how the strokes are positioned in the 2-dimensional space. Most of existing handwriting recognition systems make use of some predefined spatial relations. However, considering a complex graphical language, it is hard to express manually all the spatial relations. Another possibility would be to use a clustering technique to discover the spatial relations. In this paper, we discuss how to create a relational graph between strokes (nodes) labeled with graphemes in a graphical language. Then we vectorize spatial relations (edges) for clustering and quantization. As the targeted application, we extract the repetitive sub-graphs (graphical symbols) composed of graphemes and learned spatial relations. On two handwriting databases, a simple mathematical expression database and a complex flowchart database, the unsupervised spatial relations outperform the predefined spatial relations. In addition, we visualize the frequent patterns on two text-lines containing Chinese characters.
We propose in this paper a new online handwritten flowchart database and perform some first experiments to have a
baseline benchmark on this dataset. The collected database consists of 419 flowcharts labeled at the stroke and symbol
levels. In addition, an isolated database of graphical and text symbols was extracted from these collected flowcharts.
Then, we tackle the problem of online handwritten flowchart recognition from two different points of view. Firstly, we
consider that flowcharts are correctly segmented, and we propose different classifiers to perform two tasks, text/non-text
separation and graphical symbol recognition. Tested with the extracted isolated test database, we achieve up to 90% and
98% in text/non-text separation and up to 93.5% in graphical symbols recognition. Secondly, we propose a global
approach to perform flowchart segmentation and recognition. For this latter, we adopt a global learning schema and a
recognition architecture that considers a simultaneous segmentation and recognition. Global architecture is trained and
tested directly with flowcharts. Results show the interest of such global approach, but regarding the complexity of
flowchart segmentation problem, there is still lot of space to improve the global learning and recognition methods.
In this paper we propose a hybrid symbol classifier within a global framework for online handwritten mathematical
expression recognition. The proposed architecture aims at handling mathematical expression recognition as a
simultaneous optimization of symbol segmentation, symbol recognition, and 2D structure recognition under the
restriction of a mathematical expression grammar. To deal with the junk problem encountered when a segmentation
graph approach is used, we consider a two level classifier. A symbol classifier cooperates with a second classifier
specialized to accept or reject a segmentation hypothesis. The proposed system is trained with a set of synthetic online
handwritten mathematical expressions. When tested on a set of real complex expressions, the system achieves promising
results at both symbol and expression interpretation levels.
Proc. SPIE. 7534, Document Recognition and Retrieval XVII
KEYWORDS: Infrared imaging, Detection and tracking algorithms, Visualization, Databases, Computing systems, Image quality, Electronic imaging, Systems modeling, Current controlled current source, Data fusion
In this work, we propose to combine two quite different approaches for retrieving handwritten documents. Our
hypothesis is that different retrieval algorithms should retrieve different sets of documents for the same query.
Therefore, significant improvements in retrieval performances can be expected. The first approach is based on
information retrieval techniques carried out on the noisy texts obtained through handwriting recognition, while
the second approach is recognition-free using a word spotting algorithm. Results shows that for texts having
a word error rate (WER) lower than 23%, the performances obtained with the combined system are close to
the performances obtained on clean digital texts. In addition, for poorly recognized texts (WER > 52%), an
improvement of nearly 17% can be observed with respect to the best available baseline method.
Writer identification is a topic of much renewed interest today because of its importance in applications such as writer
adaptation, routing of documents and forensic document analysis. Various algorithms have been proposed to handle such
tasks. Of particular interests are the approaches that use allographic features [1-3] to perform a comparison of the
documents in question. The allographic features are used to define prototypes that model the unique handwriting styles
of the individual writers. This paper investigates a novel perspective that takes alphabetic information into consideration
when the allographic features are clustered into prototypes at the character level. We hypothesize that alphabetic
information provides additional clues which help in the clustering of allographic prototypes. An alphabet information
coefficient (AIC) has been introduced in our study and the effect of this coefficient is presented. Our experiments
showed an increase of writer identification accuracy from 66.0% to 87.0% when alphabetic information was used in
conjunction with allographic features on a database of 200 reference writers.
Proc. SPIE. 7247, Document Recognition and Retrieval XVI
KEYWORDS: Detection and tracking algorithms, Visualization, Error analysis, Feature extraction, Signal processing, Optical character recognition, Algorithm development, Electronic imaging, Systems modeling, Current controlled current source
As new innovative devices, accepting or producing on-line documents, emerge, managing facilities for these
kinds of documents such as topic spotting are required. This means that we should be able to perform text
categorization of on-line documents. The textual data available in on-line documents can be extracted through online
recognition, a process which produces noise, i.e. errors, in the resulting text. This work reports experiments
on categorization of on-line handwritten documents based on their textual contents. We analyze the effect of the
word recognition rate on the categorization performances, by comparing the performances of a categorization
system over the texts obtained through on-line handwriting recognition and the same texts available as ground
truth. Two categorization algorithms (kNN and SVM) are compared in this work. A subset of the Reuters-21578
corpus consisting of more than 2000 handwritten documents has been collected for this study. Results show that
accuracy loss is not significant, and precision loss is only significant for recall values of 60%-80% depending on
the noise levels.
Writer identification is a process which aims to identify the writer of a given handwritten document. Its implementation
is needed in applications such as forensic document analysis and document retrieval which involved the use of offline
handwritten documents. With the recent advances of technology, the invention of digital pen and paper has extended the
field of writer identification to cover online handwritten documents. In this communication, a methodology is proposed
to solve the problem of text-independent writer identification using online handwritten documents. The proposed
methodology would strive to identify the writer of a given handwritten document regardless of its text contents by
comparing his or her handwritings with those stored in a reference database. The output of this process would be a
ranked list of the writers whose handwritings are stored in the reference database. The main idea is to use the distance
measurement between the distributions of reference patterns defined at the character level. Very few, if any, attempts
have been done at this character level. Two sets of handwritten document databases each with 82 online documents contributed by 82 subjects were used in the
experiments. The reported result was 95% of Top 1 rate accuracy. Only four writers were identified wrongly, ranked as
2, 4, 5 and 12 choice returned.