Article Preview
Top1. Introduction
In recent years, big data technologies have emerged in the scenario of booming network and information technology. People can discover the laws and knowledge hidden in the data from the huge amount of data through data mining algorithms, which is important for industrial development, social services, and many other fields (Chen et al., 2023). If the data is directly provided to a third party, it will lead to the leakage of personal privacy information, which will bring a great threat to personal safety and property security. In addition, if the data miner cannot provide sufficient privacy protection, it will lead to some users refusing to provide data due to a lack of trust, and the data miner will not be able to mine more accurate information due to a lack of data. Therefore, it is necessary to design a secure privacy protection scheme.
There are more existing privacy protection schemes, such as data anonymization, data encryption, data perturbation, etc. (Cormode et al., 2021), Differential Privacy is a privacy protection technique based on data perturbation, which has become a hot research topic today due to its rigorous mathematical proofs and possession of quantitative privacy protection capabilities (Zhang et el., 2023; Duan et al., 2022; Ren et al., 2022; Qian et al., 2022). In the big data environment, to prevent privacy attacks by untrustworthy third parties and attackers with arbitrary background knowledge, sensitive information needs to be more comprehensively protected, and the Local Differential Privacy (LDP) (Duchi et al., 2013) technique has emerged. Locality refers to the random perturbation of user data before it leaves a smart device, such as a cell phone, and subsequently sent to a third-party data collector, i.e., the data collector only gets a part of the true data, and the data still retains a certain utility. Since it was formally proposed in 2013, LDP technology has been greatly developed and improved, and widely deployed in practical applications, such as Microsoft, Google, Apple, other companies have embedded LDP in their applications (Arcolezi et al., 2023).
With the frequent occurrence of privacy leakage incidents, users' awareness of privacy protection is increasing, and the demand for personalized privacy protection is also growing, for which scholars have proposed many personalized privacy protection methods (Niu et el., 2021; Ma et al., 2022; Li et al., 2022; Qian et al., 2022). Among the existing solutions, GR-PPFM (Guo et al., 2021) is more relevant as it can guarantee the availability of perturbed data while providing personalized privacy protection for users. However, it ignores the fact that there are also different privacy protection needs between the user's data attributes and attribute values. For example, home address requires a higher level of privacy protection compared to gender, and infectious disease (HIV) requires a higher level of privacy protection compared to common class of diseases (flu, fever), so GR-PPFM has some limitations. In order to solve the problem of adopting the same level of privacy protection for sensitive data in the process of data collection, ignoring the fact that different users have personalized privacy protection needs for data security and usability, as well as personalized differences in the attributes and attribute values of the data itself, this paper designs a personalized random response algorithm based on local differential privacy, which determines the sensitive level of user data by a scoring strategy, introduces the concept of sensitive weight for adaptive allocation of privacy budget, realizes the personalized privacy protection of sensitive attributes and attribute values, and ensures the availability of data while meeting the user's personalized needs. The main contributions of this paper are as follows: