Mining Electoral Data for Effective Campaigns and E-Participation: A Case Study in Venezuela

Marlene Goncalves (Universidad Simón Bolívar, Venezuela), Francisco Castro (Universidad Central de Venezuela, Venezuela), Luis Alberto Vidal (Universidad Simón Bolívar, Venezuela), Maribel Acosta (Universidad Simón Bolívar, Venezuela & Karlsruhe Institute of Technology, Germany) and Maria-Esther Vidal (Universidad Simón Bolívar, Venezuela)
E-Democracy and E-Participation are sub-areas of E-Government that utilize Information and Communication Technologies (ICT) to empower democracy and allow the participation of ordinary people during the definition of policies that affect their lives. Particularly, general elections as well as the selection of presidential candidates are types of electoral events where ICT can facilitate the constituency participation, providing a resource to influence the implementation of such events. The authors propose data mining and ranking techniques to analyze voting historical data and identify regions where electoral campaigns need to be intensified. Based on citizens’ participation patterns in previous elections, they illustrate the quality of their approach on Venezuelan electoral data and compare it with respect to the results produced by a baseline independent study. Experimental results suggest that the authors’ techniques are able to predict the classification given for the baseline study, while they are simpler and easily reproducible.
The original theoretical notion of democracy encloses universal concepts and theories that give people the power to participate in governmental decisions that will affect their lives. Although, the concept of democracy goes back to the ancient Greece, the birth of itself as a political system is a phenomenon of Western societies and modernity, space in which, like Science, Democracy came to stand as a new way of life. But the breakdown of modern paradigms in the context of what someone has called “information age”, has led governmental agencies and institutes to rethink their processes, highlighting the ongoing nature of change, and especially, the role played by ordinary citizens in political decision-making processes. This is conditioning the whole way of thinking about political systems in the West. As proposed by Castells et. al (Castells, 2000), the final realization of the potential productivity contained in the mature industrial economy, accelerated exponentially due to the fast-growing of technological models that are based on universal information. These models definitely have affected and affect the way in which we perceive the environment. New challenges arise for exploiting opportunities of globalization and information age; thus, we must learn to dominate the change, valorizing citizens’ ideas and opinions.

Furthermore, globalization integrates social and political aspects in democracy today. The integration of electronics, information technology and telecommunications, known as Information and Communication Technologies (ICT), enhances the collection, storage, analysis and transformation of data into information. Additionally, communication through text, voice, images, and so on, empowers relationships between people independently of their physical locations. Today, although the Internet has become an emblem of ICT, there have been two historic challenges. The first challenge is the strengthening of a logic system that describes dynamic real-world events, i.e., formalisms able to facilitate the understanding of dynamic or static entities. The second challenge is the development of tools to analyze the vortex of the produced data. Based on policies of transparency, these tools should be available for both political elites and citizens. Nevertheless, they are useless if we do not have mechanisms to analyze, interpret and manipulate data timely. In consequence, to implement a modern democracy, a desperate concern of our time arises, being required scalable mining tools able to analyze very large volume data.

In this paper we focus on the second challenge, i.e., the definition of ICT that support democracy; the proposed techniques provide the basis for implementing a transparent system that facilitate citizens’ participation. It is important to highlight that ICT go beyond the Internet, and they make available tools to move from the Information Society to the Knowledge Society. On one hand, the Information Society is characterized by the ability of its actors (citizens, businesses and the State) to obtain and share any information instantly from anywhere. On the other hand, the Knowledge Society is reached when data and information are integrated into an approach that allows an efficient and effective use of valuable strategic knowledge to make decisions. Thus, ICT facilitate the implementation of approaches that empower the quality of processes, as well as, their transparence and simplicity.

We propose the construction of a multi-dimensional dynamic model that is able to explore large volumes of data (millions of records) very quickly (answers in seconds) with no limitations on data patterns, or sources of information and with the usage of very light hardware (personal computer). Particularly, we use this tool to identify electoral citizens’ patterns in voting historical data. The identified patterns are used to discover electoral regions where a given candidate has the potential of switching intended votes into actual votes. These selected regions will be considered during the design of effective electoral campaigns. Due to the simplicity of the model, the proposed techniques are easily applicable in countries where candidates have reduced budgets and resources have to be effectively allocated to regions with higher potential electoral value. We study the quality of the proposed techniques in two datasets that describe results of two Venezuelan elections. We compare our results to the results of an independent study. Results suggest that our techniques are able to identify up to 85% of relevant electoral locations that the baseline study has identified, while the ranking produced by the two approaches is similar.

We summarize our contributions as follows:

