Identifying Polarized Wikipedia Articles

Identifying Polarized Wikipedia Articles

Nikos Kirtsis (Patras University, Greece), Paraskevi Tzekou (Patras University, Greece), Jeries Besharat (Patras University, Greece) and Sofia Stamou (Patras University, Greece & Ionian University, Greece)
DOI: 10.4018/978-1-4666-2494-8.ch014
OnDemand PDF Download:
List Price: $37.50


Wikipedia is one of the most successful worldwide collaborative efforts to put together user-generated content in a meaningfully organized and intuitive manner. Currently, Wikipedia hosts millions of articles on a variety of topics, supplied by thousands of contributors. A critical factor in Wikipedia’s success is its open nature, which enables everyone to edit, revise, and/or question (via talk pages) the article contents. Considering the phenomenal growth of Wikipedia and the lack of a peer review process for its contents, it becomes evident that both editors and administrators have difficulty in validating its quality on a systematic and coordinated basis. This difficulty has motivated several research works on how to assess the quality of Wikipedia articles. In this chapter, the authors propose the exploitation of a novel indicator for the Wikipedia articles’ quality, namely information credibility. In this respect, the authors describe a method that captures the polarized (i.e., biased) information across the article contents in an attempt to infer the amount of credible (i.e., objective) information every article communicates. This approach relies on the intuition that an article offering non-polarized information about its topic is more credible and of better quality compared to an article that discusses the editors’ (subjective) opinions on that topic.
Chapter Preview


Wikipedia is one of the most popular social media websites that enables users to create content in a collaborative manner. As Wikipedia increases in both size and popularity, there is an urgent need to come up with effective quality assessment methods that would guarantee the value of its contents. Such need is primarily imposed by Wikipedia’s open nature, which enables everyone contribute new or modify existing content on a variety of topics, without any pre-requisite that content insertions and/or modifications undergo a peer review process. Wikipedia’s open nature has led to its remarkable growth, but at the same time, it has raised skepticism about the quality of its contents, considering that anyone can become a Wikipedia editor. In light of the above, numerous researchers over the last few years attempted to design methods and techniques that would capture the article features that signify quality and thus be able to quantify the overall quality of Wikipedia (Stvilia, et al., 2005b; Blumenstock, 2008a).

Most of existing Wikipedia quality assessment efforts, estimate the articles’ value based on the study of their internal characteristics such as their contextual elements (Stvilia, et al., 2005a), their linkage in the Wikipedia graph (Kamps & Koolen, 2009), their length (Blumenstock, 2008b), their factual accuracy (Giles, 2005), the formality of their language (Emigh & Herring, 2005), and many more. Additionally, over the last couple of years researchers proposed methods for the automatic identification of controversial or vandalized Wikipedia articles (Vuong, et al., 2009; Potthast, et al., 2008) in an attempt to alleviate administrators from the laborious process of manually removing malicious content from the Wikipedia collection and at the same time assist readers discriminate between commonly accepted and disputed content.

In this chapter, we build upon existing Wikipedia quality assessment efforts and propose a novel method for automatically identifying articles that need undergo revisions and/or repair in order for their contents to reach good quality levels. Our method applies text mining and lexical analysis to the Wikipedia article contents in order to firstly capture highly polarized content in the articles’ body and therefore deduce the credibility of the information that Wikipedia articles communicate to readers. In our work, the distinction between credible and polarized articles is defined as follows. A credible article is one that contains unbiased and objective information on the topic being discussed, whereas a polarized article is one that presents the personal viewpoints of its editors about the topic under discussion. Considering that Wikipedia is more than a Web 2.0 information source, we believe that the contents of its hosting articles should communicate reliable and solid information and not serve as a forum of misleading and disputable content. Therefore, via the exploitation of our proposed technique we aspire to assist Wikipedia administrators detect articles of subjective content and either repair or flag them as polarized.

Complete Chapter List

Search this Book: