Article Preview
TopIntroduction
Due to the increasing diversity of web applications, systems and components, system administrators and developers have nowadays the chance to select the software that best fit their needs based on quality attributes such as performance, usability, and security (Barbacci, 2003). However, as several types and brands of operating systems, web servers, database management systems (DBMS), and other classes of applications become available, selecting the most appropriate one(s) becomes less and less trivial. In consequence, the interest in methods to accomplish fair and representative comparison among software and systems concerning these various attributes has grown considerably (Bondavalli, 2009).
A benchmark is a standard procedure that allows assessing and comparing systems or components according to specific characteristics (e.g., performance, availability, security) (Gray, 1993). Computer industry holds a reputed infrastructure for performance evaluation, where the Transaction Processing Performance Council (TPC) (http://www.tpc.org) benchmarks are recognized as one of the most successful benchmarking initiatives of the overall computer industry. Furthermore, the concept of dependability benchmarking has gained ground in the last few years, having already led to the proposal of dependability benchmarks for operating systems, web servers, databases and transactional systems in general (Kanoun & Spainhower, 2005). Security, however, has been largely absent from previous efforts, in a clear disparity to performance and dependability. Theoretically, a security benchmark would provide a metric (or small set of metrics) able to characterize the degree to which security goals are met in the system under testing (Payne, 2006), allowing developers and administrators to compare alternatives and make informed decisions. No clear methodology to accomplish this has been proposed so far.
Traditional security metrics are hard to define and compute (Torgerson, 2007), as they involve making isolated estimations about the ability of an unknown individual (e.g., a hacker) to discover and maliciously exploit an unknown system characteristic (e.g., a vulnerability). While techniques to find, correct and prevent actual vulnerabilities flourish in the research community (Zanero, Carettoni, & Zanchetta, 2005), the lack of accurate and representative security metrics makes the conception of security benchmarking an extremely difficult task (Bondavalli, 2009).
An alternative way to tackle this problem is to look for metrics that systematize and summarize the trustworthiness that can be justifiably put in a system or application. Instead of quantifying absolute security factors, trust-based metrics are grounded on the idea of quantifying the evidences available regarding the trustworthiness that one can put in the assessed application. However, as trust does not necessarily provide guarantees, security benchmarking can only be accomplished as a twofold process, with trustworthiness being the metric used for selecting among non-obviously flawed alternatives. In other words, a reliable benchmarking approach should provide a set of security guarantees by forcing the systems under evaluation to pass a set of basic security assessments before considering the trustworthiness aspect to support the final selection (e.g., in a web application benchmarking campaign, no application should present actual vulnerabilities detectable during testing; the ones that do not present vulnerabilities are then ranked using a process like the one proposed in this paper).
Trust-based metrics allow characterizing “the degree to which security goals are met in the given system or component” by summarizing the amount of protection that it has in terms of security mechanisms, processes, configurations, procedures and behaviors. In the web context, these metrics can be actually used in several scenarios, including: