Content Based Search Engine for Historical Calligraphy Images

Content Based Search Engine for Historical Calligraphy Images

Xiafen Zhang, Vijayan Sugumaran
Copyright: © 2014 |Pages: 18
DOI: 10.4018/ijiit.2014070101
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Paper collections of historical calligraphy objects in Libraries and museums are scanned into document images to serve the academic society. However, these digitized collections are in image format, lacking the technology to search by image content. This paper proposes a search engine for searching calligraphy image content. First, 2503 page images are segmented into characters and components. Second, characters are interactively labeled and features are extracted to build a calligraphy database. When an image search query is submitted, coarse features are first extracted and used to prune the long list of calligraphy characters into a shorter list. Then fine shape features are employed to determine the most similar characters. iDistance and NB-Tree are used to create the high dimensional index. The efficiency of the algorithm has been demonstrated through experiments with 110,737 individual calligraphic character images. This research provides a demonstration of the potential use of calligraphy content search on the web.
Article Preview
Top

1. Introduction

Though printed characters are everywhere, the art of penmanship is still taught in elementary school in many countries. Calligraphy means beautiful writing and the basic learning method is by copying, including direct copying, imagery copying and creative copying of certain outstanding calligraphy works. Outstanding historical calligraphy substrates—stone, bamboo sheet, silk scroll and rice paper—are treasured in museums and libraries, and are inaccessible to the public. Before digitizing technology emerged, many original works are copied on paper and organized into books. The copies can be classified into two types: regular script and stone rubbing. Regular script is writing directly on paper, while stone rubbing is transferring from stone inscription.

With the fast development of digitizing and storage technology, these calligraphy books together with other paper books in libraries are now being scanned into images to enable wide public accessibility. China Academic Digital Associative Library (CADAL) (http://www.cadal.cn/zydt/index1402.htm), among which 236,581 are ancient books. In these ancient documents, there are calligraphy works. Figure 1(a) shows the beginning (on the right) and end (on the left) of a regular work, with the seals of successive owners clearly visible. Similarly, Figure 1(b) shows the beginning (on the right) and end (on the left) of the stone rubbing work, with the footprints visible.

Figure 1.

Scanned images of calligraphy works

ijiit.2014070101.f01

These scanned documents are page images of characters. Printed character images can be converted into text by Optical Character Recognition (OCR) technology, while the same OCR technology cannot convert calligraphy character into text. This is because unlike the printed character, each calligraphy character is different. Generally, the problem of searching for printed character images is handled by turning it into searching for text. This approach is widely used in the existing search Engines, such as “Google”, “Yahoo” and “Baidu”. Currently, they are searched by text, not by image content. Searching for Chinese calligraphy image content is still in its infancy.

Our goal is to provide support for content based calligraphy search at different level, and the main objective of this paper is to develop a search platform for calligraphy image that can’t be converted to text by OCR technology. Our early work proposed a dynamic shape matching method for ranking similar Chinese calligraphy characters (Zhang & Zhuang, 2012). However, it lacks a practical system for the digital library and museums’ application of calligraphy content searching at different levels.

The remainder of this paper is organized as follows: Section 2 discusses related research and section 3 provides an overview of the architecture of the proposed search system. Section 4 describes calligraphy segmentation and feature data creation. Section 5 presents the facilitated search process and section 6 discusses the implementation of the system. Experimental evaluation is presented in section 7. Conclusions and future works are provided in the final section (section 8).

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 1 Issue (2023)
Volume 18: 4 Issues (2022): 3 Released, 1 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing