A Big Data Text Coverless Information Hiding Based on Topic Distribution and TF-IDF

A Big Data Text Coverless Information Hiding Based on Topic Distribution and TF-IDF

Jiaohua Qin, Zhuo Zhou, Yun Tan, Xuyu Xiang, Zhibin He
Copyright: © 2021 |Pages: 17
DOI: 10.4018/IJDCF.20210701.oa4
Article PDF Download
Open access articles are freely available for download

Abstract

Coverless information hiding has become a hot topic in recent years. The existing steganalysis tools are invalidated due to coverless steganography without any modification to the carrier. However, for the text coverless has relatively low hiding capacity, this paper proposed a big data text coverless information hiding method based on LDA (latent Dirichlet allocation) topic distribution and keyword TF-IDF (term frequency-inverse document frequency). Firstly, the sender and receiver build codebook, including word segmentation, word frequency and TF-IDF features, LDA topic model clustering. The sender then shreds the secret information, converts it into keyword ID through the keywords-index table, and searches the text containing the secret information keywords. Secondly, the searched text is taken as the index tag according to the topic distribution and TF-IDF features. At the same time, random numbers are introduced to control the keyword order of secret information.
Article Preview
Top

1. Introduction

Information hiding technology, as an important branch in the field of information security, mainly uses the redundancy of human sensory organs to digital information to hide secret information in another information carrier, so that the hiding carrier information still shows the original characteristics. This information carrier can be any type of data, such as text, image, video or audio (Cox,2002, p.225). Although the external features of the hiding carrier are still retained, it still needs to change part of the information of the carrier (Zhang, 2016, p.475), which makes it unable to effectively resist replay attack, OCR technology, statistical analysis and other stenographic detection tools.

In view of the existing information hiding technology that needs to change carrier information, scholars have proposed the concept of coverless information hiding in recent years. The main idea of this method is that it does not need to modify the carrier information, and uses some specific characteristic information in the existing open carrier to hiding secret information (Zhou Z,2015, p.123). Because it does not make any modification to the carrier, it has good resistance to the detection of various stenographic tools. At present, researches on coverless information hiding mainly focus on two aspects: coverless information hiding based on image and text (Qin J,2019, p.171373). In terms of images, Zhou et al. (2016) proposed a coverless information hiding method based on image bag of word model (p.527), which used the bag of word model to extract visual keywords in each image, and constructed a mapping relation library for keywords of text information and visual keywords to hide information. Luo et al. (2020a;2020b) introduced deep learning into coverless image steganography, used semantic features and image segmentation based on Mask RCNN to hide information, which improved the robustness of the method. Liu et al. (2020) filtered images based on image retrieval of Dense Net features and used DWT (p. 105376) to generate hash sequences of images, improved the performance of steganography and expanded its application scope. Liu et al. (2018) combined with the Generative Adversarial Networks(p.371), replaced the category tag in the Generative Adversarial Networks with secret information and transmitted it as the driving generation of classified image, extracted the secret information in the classified image through discriminator in the Generative Adversarial Networks, and realized the coverless information hiding with the generation of Generative Adversarial Networks. In terms of text. Zhang et al. (2017a;2017b;2018) proposed a coverless information hiding method-based rank map. This method used word rank map and word frequency of words as distance calculation to retrieve ordinary text containing secret information from text database to realize coverless information hiding. However, this method has a low hiding capacity, and a Chinese character can only be hidden in a natural text. Chen et al. (2015) proposed coverless information hiding technology based on mathematical expressions (Sun,2002, p.707) of Chinese characters in 2015 (2015, p.133). This method first extracted the secret information vector from the secret information, and then retrieved a text containing the secret information vector based on the big data text, so as to achieve the purpose of hiding the secret information without any modification to the text. Zhou et al. (2016) proposed a coverless information hiding method based on multi-keywords to improve the capacity of hidden information (p.39). The main idea is to hide the number of keywords in the text hidden by keywords. Although this method improved the capacity of information hiding to some extent, it did not make high use of the text when indexing the text database. Liu and Wu (2017a,2017b) extracted all parts of Chinese characters, and used part of speech to hide the number of keywords to improve the capacity of information hiding. Long et al. (2018) proposed a method for text coverless information hiding based on word2vec (p.463). This method used word2vec to get similar keywords, that is, when the text retrieval fails, the similar keywords can be replaced with keywords, so that the hiding success rate can reach 100% and the hiding capacity can be slightly increased. Lu et al. (2018) proposed a coverless information hiding method combining indirect transmission and random codebook to solve the problem (p.331) that the coverless information hiding method had a small information hiding capacity and needed to build a large sample database. In the above references, although the hiding capacity has been improved, but it is still relatively small which is difficult to meet the actual demand.

Complete Article List

Search this Journal:
Reset
Volume 16: 1 Issue (2024)
Volume 15: 1 Issue (2023)
Volume 14: 3 Issues (2022)
Volume 13: 6 Issues (2021)
Volume 12: 4 Issues (2020)
Volume 11: 4 Issues (2019)
Volume 10: 4 Issues (2018)
Volume 9: 4 Issues (2017)
Volume 8: 4 Issues (2016)
Volume 7: 4 Issues (2015)
Volume 6: 4 Issues (2014)
Volume 5: 4 Issues (2013)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing