Outsourcing data storage in infrastructure has been a popular solution for organizations and individual users since it offers numerous advantages over traditional on-premises storage choices. Data encryption before outsourcing data to infrastructure is a general strategy to safeguard data confidentiality. It is challenging to search for the specified keywords in encrypted datasets in cloud computing settings, and it is obviously impracticable to download all the data from the cloud and decode it locally. The focus of current search technique is on exact matches and simple pattern matching, which result in incomplete or irrelevant. The approach uses 4D hyperchaotic mapping and a powerful deoxyribonucleic acid (DNA) encryption mechanism to make it very difficult to decrypt the encrypted data without the proper key. The proposed approach helps create an effective and safe encryption. Global vector word embedding is taken into consideration while generating semantically aware search results in a semantically conscious top-k multi-keyword retrieval-supporting searchable encryption technique.
Top1. Introduction
As cloud computing technology develops, more and more documents are encrypted before being transferred to the cloud for both convenience and economic reasons. As a result, security and privacy concerns are emerging in the cloud environment. A possible method to provide secure information retrieval without affecting data privacy is a keyword search over encrypted data. The existing search algorithms, however, do not take into account the semantic retrieval information of users and are therefore unable to fully satisfy users' search intentions.
The semantic and syntactic connections between words are captured using global vector word embedding. Global vector word embedding includes local and global information regarding the word vector presentation. Each word is represented as a vector of real numbers that accurately represents its meaning and context in a global vector word embedding. The vectors are intended to be distinct for unrelated and comparable terms for words with similar meanings and are used in similar circumstances. Word embedding is used to identify entities and determine similarity by analyzing the statistics of word-to-word co-occurrences in a corpus (i.e., Collection of documents). Instead of only matching keywords, semantic search seeks to return items or information relevant to a query’s meaning. The embedding of the words in the documents or information we want to retrieve is then compared to these embedded values. The most appropriate outcomes for the query in terms of its meaning is obtained by rating the documents or information based on their cosine similarity scores with the query embedding.
The word embedding matrix is factorized using Singular Value Decomposition (SVD). Factorization reduces the matrix's dimensions and space's sparseness. A GloVe can produce high-quality word embedding by factorizing the matrix with SVD in order to capture the underlying relationships between words. The purpose of dimensionality reduction is to highlight the relationships between the words and reduce the impact of the words' frequent repetition.
Deoxyribonucleic Acid (DNA) Encryption converts each letter of the alphabet into a complex combination of the four bases that make up DNA adenine (A), cytosine (C), guanine (G), and thymine (T). Whereas DNA contains the genetic code of information. Information is concealed using DNA computing cryptography by first being converted to ASCII code (decimal format), then to binary format. The binary sequence is then divided into groups of two digits. A represents 00, T represents 11, G represents 01, and C represents 10 when these groupings are finally transformed into DNA code.
For each session, the plaintext is divided into two equal parts and translated to DNA sequences using a different set of encoding tables. The cipher text is generated after implementing the given technique measures. Any type of digital data is binarized in encryption, then it is transformed into DNA by sequencing, reshaping, encrypting, crossing over, mutating, and finally reshaping. The primary steps of DNA encryption are repeated three times or more. Text files are used for transmitting encrypted data.
DNA encryption may produce a great number of potential DNA sequences by applying genetic operators including reshaping, crossover, and mutation, making it challenging for attackers to crack the encryption and recover the original data. The chromosomal population sequence is created through the reshaping process. DNA encryption uses crossover and mutation to create new DNA sequences from existing ones. Considering the following cross-over operations: rotate crossover to produce a new offspring sequence, the operator combines the DNA sequences of two parents through a series of rotational and crossover operations and single-point crossover. A DNA sequence can become mutated by randomly altering one or more nucleotides. The first form of mutation is done by flipping the bits from 0 to 1 or vice versa. In the second form of mutation, the DNA base tables are also changed arbitrarily.