Article Preview
TopIntroduction
Web-warehouse consists of both the technologies viz. Web Technology and Data Warehouse Technology (Tan, Yen, & Fang, 2003; Ng et al., 1998). Comprehensively, Web-warehouse is an approach to develop the system, which have primary objective to identify, catalog, retrieve, store and analyze the data, available in the form of text, graphics, image, sounds, videos and other multimedia form, with the help of web technologies, in order to help the user to find and analyze the information effectively (Martinez et al., 2008; Tan et al., 2003). Web technology is basically an Internet technology. Nowadays modern society is the era of digital information (Singh et al., 2017a) where Internet is the prominent source of information. This information era leads to exponential and dynamic growth of the data on Internet, which provides voluminous data on Internet (https://www.crcpress.com/Mining-Multimedia-Documents/Karaa-ey/p/book/9781138031722) for individual, decision support and various research. So, most of the data is retrieved, transformed and stored from internet for inference, decision support system and data analytics.
Despite being Internet a prominent source of information as well as an open platform to share and retrieve the data, the data available on web is not properly structured. So, it is not acceptable by traditional data warehouse. But data analytics requires more data for decision support system, so induces the traditional data warehouse to update itself to web warehouse (Zhu, & Buchmann, 2002; Ng et al., 1998; Singh et al., 2017b).
To design a web-warehouse, the architect has to tackle many challenges because of strict nature of warehouse (Inmon, 2005; Ponniah, 2001) and open nature of Web. Web-data has dynamic and complex nature. Besides there are millions of web sources available on web. So, to find the relevant and coherent data on web is alike searching a needle in a stack of hay. Thus, the very first task of web-warehousing approach is to find out the relevant web sources as external data sources for warehousing. To ascertain the relevance of the web sources, they are assessed using numerous features. These features have been classified into three categories viz. web source stability, web data quality and contextual issues of web data (Zhu, & Buchmann, 2002). As we know, MCDM (Velasquez & Hester, 2013; Triantaphyllou et al., 1998) is an approach to find out the best among all the alternatives using multiple features. Before comprehensive explanation of MCDM, little more description of the set of features is here in this section.
The first category of features set explicates that along with numerous availability of web sources, the web data changes randomly as well as frequently and a large number of web sources are summating to the web. Thus, existing web source may change or disappear (Zhu, & Buchmann, 2002).
The second category of features set elaborates the quality of web data, since a large section of available data is not checked precisely before making it available on web. Since web is an independent and open platform. So, inconsistent, ill-structured, incomplete and wrong data is often available on web (Zhu, & Buchmann, 2002).
The third category of features set explains the context of data. The data available on web is browsing oriented not analytics oriented. Here Context of data explains not only challenges in the terms of relevance of data, but it also explains the easiness in extraction of data as well as metadata such as data definition, data derivation etc. (Zhu, & Buchmann, 2002).