Article Preview
Top1. Introduction
Multimedia Information Retrieval (MIR) (Datta, Joshi, Li, & Wang, 2008; Lew, 2012; Lew, Sebe, Djeraba, & Jain, 2006; Yoshitaka & Ichikawa, 1999) refers to the research endeavor that centers on searching knowledge from multimedia data. In the last decades, substantial progress has been made in different area of MIR research, such as multimedia feature extraction (Hu, Xie, Li, Zeng, & Maybank, 2011; Tuytelaars & Mikolajczyk, 2008), learning and semantics (Atrey, Hossain, El Saddik, & Kankanhalli, 2010; Clinchant, Ah-Pine, & Csurka, 2011; Wang & Hua, 2011), and high performance indexing and query (Moise, Shestakov, Gudmundsson, & Amsaleg, 2013; Scherp & Mezaris, 2013; Shestakov, Moise, Gudmundsson, & Amsaleg, 2013; Mohamed & Marchand Maillet, 2012). As shown by recent surveys (Datta et al., 2008; Lew, 2012; Lew et al., 2006; Yoshitaka & Ichikawa, 1999), since the year 2000, the MIR research efforts have grown tremendously in terms of the number of researchers and practitioners involved, as well as the research papers published. As a result of substantial progress of MIR research and applications, many related software packages, libraries, and systems have been developed and evaluated using a wide range of multimedia data. Some prominent examples include the GIFT (the GNU Image-Finding Tool) (CVML, 2007), FIRE (the Flexible Image Retrieval Engine) (Deselaers et al., 2010), Caliph & Emir (Lux, 2009), LIRE (Lucene Image Retrieval) (Savvas & Chatzichristofis, 2008), ImageTerrier and OpenIMAJ (Hare, Samangooei, Dupplaw, & Lewis, 2011). While significant progress in both MIR research and software development have been made, in practice, we have witnessed that code reuse and system composition for MIR research are still very difficult and the new system developed on top of existing MIR implementation are not optimized for efficiency and cannot be easily adapted for parallelization, which is essential for handling large multimedia data sets. In addition, there is often a steep learning curve for researchers to understand and appropriately use existing frameworks and packages that serve a wide range of MIR purposes before they can even write a single line of code. In fact, sometimes the learning cost is so high that researchers have to give up and turn to create their own software packages instead; such practices accumulatively worsen the current status. Moreover, a lot of components of the MIR software libraries are sequential programs that are designed to run on shared memory computer architectures. MIR experiments of large data sets are time consuming and resource intensive; they often take hours to days to complete and some may even fail after exhausting main memory.