Article Preview
Top1. Introduction
The World Wide Web is becoming as a huge, explosive, diverse, dynamic and mostly unstructured data repository. It provides a large amount of information to the user community by providing information related to business, education, healthcare and other day to day activities. In such a scenario, the web users want to have an effective search mechanism to find the relevant information easily and precisely from the web. Moreover, most of the existing search engines and systems provide information most of which may not be relevant to the user queries. In web services, the service providers want to know the ways to predict the behavior of users and to personalize information. This helps to reduce the number of queries and to design the web site suited for various groups of users. Application users who use the web for searching want to have effective techniques to study the needs of common users and consumers. Most of them expect efficient techniques to help them for satisfying their demands in searching. Therefore, web mining becomes an important and challenging research area, since it helps to retrieve relevant information from the web.
Web mining is the process of applying the existing data mining techniques to automatically discover and extract useful information from the web documents. The unstructured representation of web data is a challenge provided by it for the data mining community since it triggers more complexity in web mining. Web mining research is actually a combination of statistics, database management systems, Information Retrieval, Artificial Intelligence and psychology.
Although information present on the web is distributed and decentralized, logically the WWW is viewed as a single and virtual document collection. In this scenario, the fundamental techniques present in traditional information retrieval (IR) research such as term weighting and query expansion are more relevant in web document retrieval (Zahia & Mohamed, 2014). The important findings from traditional IR research however, is not always applicable in the web environment. In the web, the documents are massive in size and diverse in content, format, purpose, and quality that they challenge the validity of previous research results which are based on relatively small and homogeneous collection of text data. Moreover, some of the existing IR approaches, which are applicable in theory, are not well suited for implementation in the web environment. For instance, the size, distribution, and dynamic nature of web information make it extremely difficult to construct a complete and effective data representation model for web data.
In the internet, web pages are personalized based on user behaviors and their interests. Web personalization is very important in the area of information extraction. Before the arrival of web searching for information on a particular topic was performed by searching the books in libraries and papers in printed form. In such a scenario, classifications of articles were made manually and searching relevant information was tedious. After the arrival of internet and web, enormous amount of information are provided to the users for each query. In recent days, web logs are maintained for registered users so that it is possible to perform web personalization. There are different applications in which personalization techniques are used for providing effective recommendations.
Web personalization can be performed based on user interests, the social categories of web users including country region, religion, economic status and community. These attributes play a role in user interests in few applications. For example, the purchase behaviour and food habits of individuals are mostly depending on the above mentioned attributes. When a user visits group types of web pages, the user’s interest can be deduced based on the web pages viewed by the user. In social networks, groups are formed based on user interests. Relevant information extraction has become a challenging task in the recent years. In the information of interest groups, web personalization helps to provide similar contents to all the participating group members.