In this article we will present an introduction to issues relevant to database security and statistical database security. We will briefly cover various security models, elaborate on how data analysis in data warehouses (DWH) might compromise an individual’s privacy, and explain which safeguards can be used to prevent attacks. In most companies, databases are an essential part of IT infrastructure since they store critical business data. In the last two decades, databases have been used to process increasing amounts of transactional data, such as, a complete account of a person’s purchases from a retailer or connection data from calls made on a cell phone. As soon as this data became available from transactional databases and online transactional processing (OLTP) became well established, the next logical step was to use the knowledge contained in the vast amounts of data. Today, data warehouses (DWH) store aggregated data in an optimal way to serve queries related to business analysis. In recent years, most people have begun to focus their attention on security. Early OLTP applications were mainly concerned with integrity of data during transactions; today privacy and secrecy are more important as databases store an increasing amount of information about individuals, and data from different systems can be aggregated. Thuraisingham (2002) summarizes the requirements briefly as “However, we do not want the information to be used in an incorrect manner.” All security requirements stem from one of three basic requirements: confidentiality (aka secrecy), integrity, and availability (CIA). Confidentiality refers to the requirement that only authorized subjects, that is, people or processes should be permitted to read data. Integrity means that unauthorized modifications must not be permitted. This includes both modifications by unauthorized people and incorrect modification by authorized users. To correctly perform the services requested, the system needs to remain available; a denial-of-service compromises the requirement of availability. Other security requirements may include privacy, non-repudiation, and separation of duties. These requirements are, however, composite requirements that can be traced back to one of the three basic requirements. Privacy, for instance, is the non-disclosure (=confidentiality) of personal data; non-repudiation refers to the integrity of transaction logs and integrity of origin. Throughout this article we will focus only on technical attacks and safeguards and not on social engineering. Social engineering is often the easiest and, in many cases, a very successful attack vector. For an in-depth coverage of social engineering we recommend (Böck, 2007). In Section 2 we cover the most relevant access control models; in Section 3 we provide an overview of security in statistical databases. Finally, in Section 4 we highlight the essentials of securing not only the transactional and the statistical databases but the entire system.
Access Control is the most important technique or mechanism for implementing the requirements of confidentiality and integrity. Since databases were among the first large-scale systems in military applications, there is a long history of security models, dating back to the 1960s. The basic principle in all access control models is that a subject is or is not permitted to perform a certain operation on an object. This process is described by the triplet (s, op, o). A security policy specifies who is authorized to do what. A security mechanism allows enforcement of a chosen security policy. (Figure 1)
Security control models (Adam, 1989). Restriction-based protection either gives the correct answer or no answer to a query (top); data may be modified (perturbated) before it is stored in the data warehouse or the statistical database (middle); online perturbation modifies the answers for each query (bottom).