Images are a much more powerful medium of expression than text, as the adage says: "One picture is worth a
thousand words." It is because compared with text consisting of an array of words, an image has more degrees
of freedom and therefore a more complicated structure. However, the less limited structure of images presents
researchers in the computer vision community a tough task of teaching machines to understand and organize
images, especially when a limit number of learning examples and background knowledge are given.
The advance of internet and web technology in the past decade has changed the way human gain knowledge.
People, hence, can exchange knowledge with others by discussing and contributing information on the web. As
a result, the web pages in the internet have become a living and growing source of information. One is therefore
tempted to wonder whether machines can learn from the web knowledge base as well. Indeed, it is possible to
make computer learn from the internet and provide human with more meaningful knowledge.
In this work, we explore this novel possibility on image understanding applied to semantic image search. We
exploit web resources to obtain links from images to keywords and a semantic ontology constituting human's
general knowledge. The former maps visual content to related text in contrast to the traditional way of associating
images with surrounding text; the latter provides relations between concepts for machines to understand to what
extent and in what sense an image is close to the image search query.
With the aid of these two tools, the resulting image search system is thus content-based and moreover,
organized. The returned images are ranked and organized such that semantically similar images are grouped
together and given a rank based on the semantic closeness to the input query. The novelty of the system is
twofold: first, images are retrieved not only based on text cues but their actual contents as well; second, the
grouping is different from pure visual similarity clustering. More specifically, the inferred concepts of each image
in the group are examined in the context of a huge concept ontology to determine their true relations with what
people have in mind when doing image search.
|