Article Preview
Top1. Introduction
Security is a major concern for IT Enterprise Infrastructures. It is critical to understand the importance of security as we process and analyze massive amounts of data termed as Bigdata. This starts with understanding the data and associated security policies, and it also revolves around understanding the security policies in any organization. Particularly, when organizations are moving towards cloud to deploy their data, security is a challenging aspect.
Cloud computing, allows end users to utilize the resources like hardware, software, servers etc. on a demand-driven basis, unlike grid and cluster computing which are the traditional approaches to utilize the resources. Enormous amounts of data flooded across the Internet and the storage capacities of the relational technologies have experienced inadequacy to access the huge amounts of data. To store huge volumes of data, most of the organizations, particularly social networking sites and e-commerce sites are moving towards the cloud to deploy their applications, but at increased security risks. These growing amounts of data which are too big and complex to capture, store, process, and interpret are referred to as Bigdata. “It is characterized by the 4V’s, such as Volume, Velocity, Veracity and Variety”. The storage and analysis of such data can be made effective using the NoSQL databases.
Most modern world data is processed in the form of word documents, pdf files, audio and video formats. Relational databases may not be suitable to serve such data. If relational databases are used for scalable applications, it will impose heavy costs. This feature makes them less attractive for deploying large scale applications in a cloud. An alternative approach is to use the emerging NoSQL databases, which are not ACID-compliant (Atomicity, Consistency, Isolation, and Durability). Atomicity requires actions (read/write) to be either fully complete or not done at all. Consistency ensures only valid data is to be stored in database. Isolation ensures that concurrent execution of actions results in a system state that would be obtained if actions are executed serially. Durability ensures that the committed actions will remain so in the event of system failures. In contrast to relational databases, NoSQL provides support to structured, unstructured, and semi-structured storage of massive data in terms of peta bytes (Gupta and Narsimha, 2015).
Each of the NoSQL databases is suitable for a particular task and it is essential to note that not all the available types are suitable to deal with all the data formats and consistency models. For example, the column oriented data store hbase is suitable to analyze mission-critical data that relies on strict consistency where changes made across one node are to be immediately reflected across all the other nodes in a cluster. On the other hand, key-value data store Cassandra is suitable to deal with less critical data where eventual consistency is sufficient to make appropriate decisions as the changes made across one node need not propagate instantly across all the other nodes in a cluster. For example, if a Facebook user has purchased a brand new four wheeler, then this message is not of much importance to be sent to all the people in his friend’s list. Some of the NoSQL data stores fall under more than one category like Cassandra is both a key value and a column oriented store as well. Hence the classification is taken to be a fuzzy one.
In NoSQL databases, current security trend is weak in nature, authentication and encryption is almost nonexistent. Authentication if exists, is not enabled by default in most of the NoSQL data stores. External encryption tools cannot be used, and they are vulnerable to SQL injection attacks. Based on the user requirements to provide security for chosen NoSQL data stores, this paper focuses on the various levels at which security can be provisioned by throwing a light on the security limitations that can motivate people to design solutions to overcome the limitations. A solution is proposed as a part of the proposed work by designing a framework to achieve security for the web crawler applications for the selected NoSQL and relational data store and results are presented to show the effectiveness of the proposal by choosing an appropriate algorithm to preserve Confidentiality, Integrity and Availability aspects. In this work, NoSQL database Cassandra is chosen as an appropriate type to store and analyze the data emerging from social networking service like twitter streaming data to prove the security aspect using web crawler application.