A Method of Identifying User Identity Based on Username Features

A Method of Identifying User Identity Based on Username Features

Shailesh Pancham Khapre (SRM Institute of Science and Technology, Chennai, India), Vishakha Chauhan (SRM Institute of Science and Technology, Chennai, India), Vinam Tomar (SRM Institute of Science and Technology, Chennai, India), R P. Mahapatra (SRM Institute of Science and Technology, Chennai, India) and Nidhi Bansal (SRM Institute of Science and Technology, Chennai, India)
Copyright: © 2017 |Pages: 22
DOI: 10.4018/IJHCR.2017100101

Abstract

Users typically register to use multiple web applications; there are many duplicate user identities in the Internet. The identification and integration of duplicate user identities are of great importance in the commercial field and network security field. As a result of privacy considerations, the personal information provided on the internet is usually incomplete or partially false. Considering that the username can reflect the user's personality or habit, is easy to obtain and does not involve the privacy problem, this article proposes a method to determine the identity of the user only by the username. First, the problem of user identity is described formally, and then the username features are divided into two categories: intuitive feature and contrastive feature, and the username probability distribution is analysed quantitatively. This article proposes a method of identity whether a given username belongs to the same user. Finally, a method is proposed to retrieve other usernames that may belong to the user in the username candidate set when a single user name is given.
Article Preview
Top

1. Introduction

With the rapid development of the internet, the scale of internet users is also growing, as of December 2016, the global total number of Internet users reached 3.5 billion people, accounting for the world's total population of 46.1%. At the same time, network applications also penetrate into all aspects of life, including the rapid growth in the number of registered users. For example, as of the third quarter of 2016, the microblogging service averaged at 317 million monthly active Twitter users, while for the well-known foreign website, Facebook, had 1.79 billion monthly active users. On the internet, people are usually registered to use different network applications for information access, chat communication and other activities. Research shows that (Bartunoy et al., 2012), about 8 4% of internet users have more than one social networking site account. As a result, there is a large number of duplicate user identities in the Internet. Integration of these duplicate user identity has gradually become a very meaningful research topic.

From a commercial point of view, user's repeated information in multiple network applications can be used to determine and integrate, to help network service providers a comprehensive understanding of network users, master user identity characteristics, thus providing better personalized service.

From the network security point of view, it can help the network security manager to discover the false or illegitimate identity by discriminating the repeated users 'information, so as to reduce the various network security risks caused by user anonymity and protect the legitimate user’s rights and interests. In addition, this study has a wide range of applications in collaborative recommendation, information retrieval and other fields.

However, due to the fact that single sign-on(SSO) technology is not widely used, and individual websites store their own user account information separately, so researchers have to find various ways to obtain user account information. In general, if user wasn’t register some site, they need to provide some personal information such as username, password, mailbox, age, address and so on. Intuitively, the user accounts with similar registration information in different network applications may belong to the same person. But for commercial interests or privacy protection needs, access to detailed and accurate registration information data is very difficult. In addition, for privacy and security point of view, the user registration provided is mostly incomplete or part of the content is false, which also adds to the difficulty of user authentication.

The user name usually consists of English letters, numbers and special characters such as “-”, “_” and other components, the length between 4 to 26. Compared with other registration information, username is easier to obtain, and will not reveal user privacy. In addition, the user name is usually able to reflect the owner's identity or personality characteristics, such as name, origin, birth date, preferences, etc.; and the same user's different user names usually have similar characteristics, such as length, character combination. Therefore, this paper defines the problem of user identity re-identification in multiple websites, and proposes a method to determine user identity based on user name characteristics.

1.1. The Main Contributions of this Paper are as Follows

  • 1.

    In this paper, the problem of user identity is formally described, and a solution framework based on username feature is proposed;

  • 2.

    In this paper, the implicit features of usernames are extracted from several perspectives, which are classified into two categories: intuitive features and contrastive features, and the statistical distribution of these features is analysed quantitatively;

  • 3.

    In this paper, we propose a classification method for determining whether a given username belongs to the same user. Furthermore, when a single username is given, a method is proposed to find other potential usernames in the username candidate set;

  • 4.

    Considering the characteristics of user name, the validity of the proposed method is verified on a large scale real data set (48 million user name pairs).

Section 2 introduces the related research work; Section 3 gives a formal description of the identity of the user identity; Section 4, a detailed analysis of the username of the various hidden features, and its probability distribution of statistical analysis; In section 6, a series of experiments on the real data set are used to verify the validity of the username feature and the identity decision algorithm. Finally, the whole paper is summarized in section 7.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing