Article Preview
TopIntroduction
A wide variety of research processes aimed to improve the relevance of the results provided by the Web search systems. However, due to the query shortness and the word homonymy problems, a user who looks for downloading the last version of the Integrated Development Environment (IDE) eclipse may be served with contents about the lunar eclipse or about the eclipse movie which had a lot of followers for several seasons.
Really, it is obvious that the browsing behavior of the user may change from one day to another and from one period of time to another depending on several parameters constitute what is named context, some of these parameters, that are the recent history and the real-time event are invested in this work with the goal of identifying the users’ information need. In fact, the recent browsing history of the user in addition to the fact that it may provide information about his interests (Asfari, Doan, Bourda, & Sansonnet, 2012; Mareth, Streicher, Bauer & Roller, 2010; Shen, Tan & Zhai, 2005), it can be very useful to identify the domain of study or expertise of the user what will be appointed in the rest of this paper by the technical domain of interest.
While the satisfaction of the user's information need is not always ensured from the first attempt, the user tries to express his need by a reformulation of his first query taking into account what it has returned as results, if they were relevant, he would probably seek to deepen his knowledge using keywords similar to those used in his first search, but if the search engine failed to meet the user need, this last uses different keywords trying to get closer to his goal and to avoid the displaying of previous pages. In both scenarios, the user conducts more than one search about the same information need.
In their research work about the query repetition over time, Teevan, Adar, Jones and Potts (2007) found that 33% of search engine queries were issued before by the same user. In the same research area, Sanderson et al. (2007) found that repeated queries represent a little over 50% of all the submitted queries.
According to Google insight which is a Google service that allows obtaining the search frequency of a given term on Google search engine in a specific period of time and a particular geographic area. The frequency values of a list of keywords are always scaled by taking the index one hundred for the maximum reached by the keyword with the largest traffic, in June-July, 2010, the keywords “world cup” has topped the search list, this due to the presence of the particular event held during this time period, which is the soccer World Cup 2010. In that period, Google search engine which is the most popular search engine on the Internet scored a very high submission frequency of queries like, “world cup 2011”, “fifa world cup” and “soccer world cup”. Similar to the keywords “world cup”, in June-July, 2011, the searches about the keyword “potter” like “harry”, “harry potter online”, “harry potter 7”, “harry potter movie”, “download harry potter”, achieved high scores in July, which is the release of the last movie in the Harry Potter series, as it is illustrated in Figure 1. For this, it was thought that the detection of a real-time event increases the probability that the user seeks about. So, the real-time event has been used in our work as a contextual dimension.
Figure 1. Users search behaviors and real-time events