Utilizing Sentence Embedding for Dangerous Permissions Detection in Android Apps' Privacy Policies

Utilizing Sentence Embedding for Dangerous Permissions Detection in Android Apps' Privacy Policies

Rawan Baalous, Ronald Poet
Copyright: © 2021 |Pages: 17
DOI: 10.4018/IJISP.2021010109
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Privacy policies analysis relies on understanding sentences meaning in order to identify sentences of interest to privacy related applications. In this paper, the authors investigate the strengths and limitations of sentence embeddings to detect dangerous permissions in Android apps privacy policies. Sent2Vec sentence embedding model was utilized and trained on 130,000 Android apps privacy policies. The terminology extracted by the sentence embedding model was then compared with the gold standard on a dataset of 564 privacy policies. This work seeks to provide answers to researchers and developers interested in extracting privacy related information from privacy policies using sentence embedding models. In addition, it may help regulators interested in deploying sentence embedding models to check for privacy policies' compliance with the government regulations and to identify points of inconsistencies or violations.
Article Preview
Top

Introduction

Android apps may collect, use and share users' personal information for several purposes. To support users' privacy, Google requires apps which access users' personal information to post a privacy policy which discloses how the app handles users' information and for what purposes (Google, 2017). Such policies are also intended to fulfill legal requirements by the law to protect users' privacy (Wang et al., 2019). Privacy policies support users in privacy making decisions by answering questions such as: what information will be collected from users? what the collected information will be used for? which parties will the information be shared with? For how long the information will be stored? And so on. When users accept the privacy policy, this means that they agree to release their data under the conditions specified by the privacy policy (Costante, Sun, Petković, & den Hartog, 2012).

Although privacy policies are the main source of companies' data handling practices, most users do not read privacy policies before using the services (Furnell, & Phippen, 2012). There seems to be contradictory results between studies showing users' concerns about their privacy, and that they often don't read privacy policies. One possible explanation could be related to the complexity of reading policies. Although users would like to protect their privacy in principle, they feel that this is a difficult task in practice. Hence, they give up trying to preserve control over their privacy. In addition, actually reading all encountered privacy policies looks like an impossible task (Steinfeld, 2016).

Automatic analysis of privacy policy documents may have a great advantage on extracting specific privacy information related to users' queries. However, privacy policies automatic analysis relies on understanding sentences meaning in order to identify sentences of interest to users' queries or privacy related applications. Moreover, privacy policies are often written in natural language, and hence use a wide range of expressions to describe the information types they collect, use, and share. In contrast, Android Application Program Interface (API) methods use limited terminology to describe the collected users' personal information (Hosseini, Qin, Wang, & Niu, 2018).

Figure 1 illustrates the variability of natural language expressions in Android apps' privacy policies. In the first sentence, the dangerous permissions that allow the app to access user's (phone number) and (address) are combined into a more generalized data type (contact information). In the second sentence, the same data type (contact information) is used to denote the ability of the app to access user's (address book), which is another dangerous permission. In the last sentence, the data type (contact information) encompasses the same two dangerous permissions in the first sentence, in addition to a third dangerous permission which allows the app to access user's (accounts) on the phone, such as social networking accounts. Privacy policies commonly use hypernym relation (a more general phrase that has sub ordinates) to describe their data practices (Bhatia, Evans, Wadkar, & Breaux, 2016). Using this relation throughout the privacy policy can cause multiple interpretations of the same data practice.

Figure 1.

Variability of natural language expressions in Android apps' privacy policies

IJISP.2021010109.f01

Back in 1955, the project on artificial intelligence (AI) was introduced by the assumption that aspects of learning or features of intelligence can be so precisely described so that machines can simulate them (McCarthy, Minsky, Rochester, & Shannon, 2006). Afterward, several efforts have been made to improve machines to be able to work just like human and solve complex problems. A fundamental aspect of being human is the capability of comparing things and discovering their relatedness. In this regard, various machine learning models were developed to compare semantic entities such as words and sentences (Harispe, Ranwez, Janaqi, & Montmain, 2015).

Complete Article List

Search this Journal:
Reset
Volume 18: 1 Issue (2024)
Volume 17: 1 Issue (2023)
Volume 16: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 15: 4 Issues (2021)
Volume 14: 4 Issues (2020)
Volume 13: 4 Issues (2019)
Volume 12: 4 Issues (2018)
Volume 11: 4 Issues (2017)
Volume 10: 4 Issues (2016)
Volume 9: 4 Issues (2015)
Volume 8: 4 Issues (2014)
Volume 7: 4 Issues (2013)
Volume 6: 4 Issues (2012)
Volume 5: 4 Issues (2011)
Volume 4: 4 Issues (2010)
Volume 3: 4 Issues (2009)
Volume 2: 4 Issues (2008)
Volume 1: 4 Issues (2007)
View Complete Journal Contents Listing