Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
OnDemand
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us

Privacy Protection in Enterprise Social Networks Using a Hybrid De-Identification System

Mohamed Abdou Souidi, Noria Taghezout

Source Title: International Journal of Information Security and Privacy (IJISP) 15(1)

DOI: 10.4018/IJISP.2021010107

OnDemand:

(Individual Articles)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Enterprise social networks (ESN) have been widely used within organizations as a communication infrastructure that allows employees to collaborate with each other and share files and documents. The shared documents may contain a large amount of sensitive information that affect the privacy of persons such as phone numbers, which must be protected against any kind of disclosure or unauthorized access. In this study, authors propose a hybrid de-identification system that extract sensitive information from textual documents shared in ESNs. The system is based on both machine learning and rule-based classifiers. Gradient boosted trees (GBTs) algorithm is used as machine learning classifier. Experiments ran on a modified CoNLL 2003 dataset show that GBTs algorithm achieve a very high F1-score (95%). Additionally, the rule-based classifier is consisted of regular expression and gazetteers in order to complement the machine learning classifier. Thereafter, the sensitive information extracted by the two classifiers are merged and encrypted using Format Preserving Encryption method.

Article Preview

Top

Introduction

Social networks (SNs) have become an indispensable tool in the daily life of people. In 2018, out of 4 billion users of the internet around the world, more than 3 billion users were active on social networks (We are social, 2018).

Enterprises, for their part, are adopting social networks for the development of collaboration and information sharing among employees. An enterprise social network (ESN) is a system based on exchanges within collaborative environments in a professional background. The last decade has seen a broad emergence of platforms dedicated to this new dimension of social networks, and many ESNs have emerged.

However, employees tend to share different types of documents, bills and records in the enterprise social network. Among the shared files, we find a considerable amount of sensitive information and regulated data such as credit card numbers, Social Security Numbers, drivers' license information, names and nationalities.

Sensitive information is data that requires protection against any unauthorized disclosure or access. Several types of sensitive information exist, such as Protected Health Information (HIPAA Journal, 2019), Personal Information (Identity Theft Protection Act, 2005), Customer record information (Privacy of Consumer Financial Information, 2016).

In addition, shared files containing unprotected sensitive information can raise several privacy concerns. Indeed, personal data can be traced back to an individual, which could eventually result in identity theft as well as the disclosure of information that individuals desire to remain private. Hence, the de-identification approach has emerged to protect the privacy of individuals and companies.

De-identification refers to the process of removing personally identifiable information from shared, generated, or archived data so that the remaining data becomes highly difficult to trace back to an individual. However, de-identification is far from being just a simple method; instead, it is a set of tools, techniques and algorithms applied to different types of data. Overall, it serves to protect the privacy of individuals and organizations while also minimizing the risk of data exposure. De-identifying data can thereby help organizations to use information more effectively than before (Garfinkel, 2015).

Therefore, de-identification represents a powerful privacy protection tool that covers a variety of areas such as big data, data mining, communication, social networks, and particularly for textual data. There are two main groups of methodologies employed in existing text de-identification applications: pattern matching and machine learning (Meystre et al., 2014). It is possible to find works applying a combination of both methods. Pattern recognition applications usually depend on human-defined patterns (regular expressions and gazetteers) and are easy to implement, tune and use. Besides, it does not require any training data (tagged data).

On the other hand, machine learning (ML) applications rely mainly on the training of a classifier over labelled data (dataset) to obtain a model, where words in a given text or document are classified as either sensitive or non-sensitive. Machine learning applications qualify as Named Entity Recognition (NER) applications since many of the de-identified words fall into one type of named entities such as names, places and organizations. Moreover, it requires a good and large corpus of annotated text to perform well.

Recently, many annotated corpora have appeared. First, the Conference on Computational Natural Language Learning shared task (CoNLL) 2003 dataset (Tjong Kim Sang & De Meulder, 2003). Next, the Informatics for Integrating Biology and the Bedside (i2b2) 2009 dataset (Uzuner, Solti, & Cadag, 2010) and the 2014 track1 dataset (Stubbs, Kotfila, & Uzuner, 2015). Then, the ShARe/CLEF eHealth Evaluation Lab 2013 dataset (Suominen et al., 2013). Finally, the Semantic Evaluation 2014 task 7 (Pradhan, Elhadad, Chapman, Manandhar, & Savova, 2015) and the 2016 task 12 datasets (Bethard et al., 2016). Such corpora have contributed to the evolution and development of text de-identification systems.

Complete Article List

Search this Journal:

Reset

Volume 18: 1 Issue (2024)

Volume 17: 1 Issue (2023)

Volume 16: 4 Issues (2022): 2 Released, 2 Forthcoming

Volume 15: 4 Issues (2021)

Volume 14: 4 Issues (2020)

Volume 13: 4 Issues (2019)

Volume 12: 4 Issues (2018)

Volume 11: 4 Issues (2017)

Volume 10: 4 Issues (2016)

Volume 9: 4 Issues (2015)

Volume 8: 4 Issues (2014)

Volume 7: 4 Issues (2013)

Volume 6: 4 Issues (2012)

Volume 5: 4 Issues (2011)

Volume 4: 4 Issues (2010)

Volume 3: 4 Issues (2009)

Volume 2: 4 Issues (2008)

Volume 1: 4 Issues (2007)

View Complete Journal Contents Listing

MLA

APA

Chicago

Export Reference

Privacy Protection in Enterprise Social Networks Using a Hybrid De-Identification System

Abstract

Introduction

Complete Article List