A Data Mining Approach to Advance Knowledge in Public Government: Profiling Households

A Data Mining Approach to Advance Knowledge in Public Government: Profiling Households

Paola Annoni (University of Milan, Italy), Pieralda Ferrari (University of Milan, Italy) and Silvia Salini (University of Milan, Italy)
DOI: 10.4018/978-1-60566-230-5.ch007
OnDemand PDF Download:
List Price: $37.50


Data mining is the process of ‘mining’ into large quantity of data to get useful information. It comprises a broad set of techniques originated within different applicative fields to solve various types of issues. In this chapter the data mining approach is proposed for the characterization of family consumptions in Italy. Italian expenditures are a complex system. Every year the Italian National Bureau of Statistics (ISTAT) carries out a survey on the expenditure behaviour of Italian families. The survey regards household expenditures on durable and daily goods and on various services. Here the goal is twofold: firstly it describes the most important characteristics of family behaviour with respect to expenditures on goods and usage of different services; secondly possible relationships among these behaviours are highlighted and explained by social-demographical features of families. To this purpose, a series of statistical techniques are used in sequence and different potentialities of selected methods for addressing these kinds of issues are pinpointed. This study recommends that, further investigation is needed to properly focalize on service usage for the characterization, for example, of the nature of investigated services (private or public) and, most of all, about their supply and effectiveness across the national territory. Still this study may be considered an example of operational and concrete approach of managing of large data-sets in the social-economical science, from the definition of goals to the evaluation of results.
Chapter Preview


Over the last years, due to the development and the diffusion of the Information and Communication Technology, Public Administrations are getting in short time large quantitative of data about the services (i.e. education, health, public utilities, etc.), that are provided by themselves as well as about characteristics of citizens: who they are, what they get, do, need, think about and so on. These data can be fruitfully used by government to analyze, asses and improve inventions and services. One can think, for example, to the more and more increasing interest on the evaluation and comparison of public service utilities, also because of privatization processes which were carried out or are still in progress in many countries. In this case, strategy choices about how, when and at which cost to provide a public service, customer expectations towards service availability and price, workers expectation towards occupational level and payments are all crucial elements which have to be taken into account in decision making policy.

This shows how the process of decision making in the public management is undoubtedly complex and needs to be properly founded on scientific bases requiring a good knowledge of statistical and computational methods. Therefore, data mining techniques seem especially suitable for this kind of analysis, provided that data mining is intended as a process that, mining into a large quantities of data, allows to discover new information translatable in decisions.

Born in the private sector as a tool to improve the business process, data mining has soon become a competitive weapon which can ensure the profitability of a company. To that purpose, the Nobel Prize winner Dr. Penzias comments in 1999 (see for example Groth, 2000): “Data mining will become much more important, and companies will throw away nothing about their customers because it will be so valuable. If you’re not doing this, you’re out of business”.

Most significant applications of data mining are customer relationships, quality control and forecasting in financial marketing, but the fields where data mining techniques are nowadays used with full success are numerous.

In spite of, or perhaps because of, its origin in private sector in more recent years the need to extend this approach to the public sector is felt. Nevertheless there are some differences between private and public setting. The most important differentiation is that public management has to take into account specific variables in the process evaluation. In fact, variables connected with social utility, risk and benefit for the collectivity are of primary importance and should be included in the analysis. That makes the analysis more complex, for example, with regards to public services aforementioned. These services are often intangible, the customer satisfaction is not so easy to quantify, the definition of specific targets may be difficult, the comparison between different public services might not be a simple task, due also to different geographic locations or heterogeneity of populations, the public nature of services and so on. For this reason, the problems of public management are “intrinsically” multi-way and standard techniques could not take advantage of this “richness” of data, while ‘ad hoc’ techniques and simultaneous models might be more suitable.

Furthermore, communication problems are also substantial in government management. In fact, it is important to dedicate particular attention in conciliating the need of complex and sophisticated statistical methods with an easy communication and understanding of the results, because they must be shared by service providers, decision makers, operators, users and citizens.

Despite the relevance of decision support in the public sector, the measuring process is still not completely explored. This chapter is a proposal in this direction. It is devoted to demonstrating how tools of data mining can be used to obtain useful information for government in order to make duly decision according to some aims of economical or social policy. The focus is on the analysis of Italian families expenditures through a CRISP-DM, i.e. a CRoss Industry Standard Process model for data mining (Perner, 2002). As well known, the microeconomic theory of consumption tries to explain how a consumer spends his own income on goods and services and how his behaviour is oriented to maximise his total utility (for a review see for example Blundell 1988). Here the approach is completely different since it is purely data driven and not based on specific econometric models. In this respect, authors are applying statistical models to make ‘data speak’ with the final aim of exploring data from different points of view and for summarizing them into useful information.

Complete Chapter List

Search this Book: