What Is Open Source Software (OSS) and What Is Big Data?

What Is Open Source Software (OSS) and What Is Big Data?

Richard S. Segall (Arkansas State University, USA)
DOI: 10.4018/978-1-7998-2768-9.ch001

Abstract

This chapter discusses what Open Source Software is and its relationship to Big Data and how it differs from other types of software and its software development cycle. Open source software (OSS) is a type of computer software in which source code is released under a license in which the copyright holder grants users the rights to study, change, and distribute the software to anyone and for any purpose. Big Data are data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data can be discrete or a continuous stream data and is accessible using many types of computing devices ranging from supercomputers and personal workstations to mobile devices and tablets. It is discussed how fog computing can be performed with cloud computing for visualization of Big Data. This chapter also presents a summary of additional web-based Big Data visualization software.
Chapter Preview
Top

Introduction: How Open Source Software, Free Software, And Freeware Differ

Open Source Software (OSS)

Open-Source Software (OSS) is a type of computer software in which source code is released under a license in which the copyright holder grants users the rights to study, change, and distribute the software to anyone and for any purpose. (Wikipedia (2019a))

For software to be considered “Open Source”, it must meet ten conditions as defined by the Open Source Initiative (OSI). Of these ten conditions, it’s the first three that are really at the core of Open Source and differentiates it from other software. These three conditions are according to the Open Source Initiative (2007):

  • 1.

    Free Redistribution: The software can be freely given away or sold.

  • 2.

    Source Code: The source code must either be included or freely obtainable.

  • 3.

    Derived Works: Redistribution of modifications must be allowed.

The other conditions are: (Open Source Initiative (2007))

  • 4.

    Integrity of The Author's Source Code: Licenses may require that modifications are redistributed only as patches.

  • 5.

    No Discrimination against Persons or Groups: no one can be locked out.

  • 6.

    No Discrimination against Fields of Endeavor: commercial users cannot be excluded.

  • 7.

    Distribution of License: The rights attached to the program must apply to all to whom the program is redistributed without the need for execution of an additional license by those parties.

  • 8.

    License Must Not Be Specific to a Product: the program cannot be licensed only as part of a larger distribution.

  • 9.

    License Must Not Restrict Other Software: the license cannot insist that any other software it is distributed with must also be open source.

  • 10.

    License Must Be Technology:Neutral: no click-wrap licenses or other medium-specific ways of accepting the license must be required.

Macaulay (2017) discussed benefits of open source software that are summarized in Figure 1 below.

Figure 1.

Benefits of Open Source Software (OSS) (Derived from Macaulay (2017))

978-1-7998-2768-9.ch001.f01

Key Terms in this Chapter

Heatmap: A heat map (or heatmap) is a graphical representation of data where the individual values contained in a matrix are represented as colors (Heatmap, 2017 AU239: The in-text citation "Heatmap, 2017" is not in the reference list. Please correct the citation, add the reference to the list, or delete the citation. ).

Circle Packing Graph: Circle packing is the study of the arrangement of circles (of equal or varying sizes) on a given surface such that no overlapping occurs and so that all circles touch one another ( Circle Packing Graph, 2017 ).

Streaming Data: Data that has been originated, collected, processed or delivered time-wise continuously is considered as streaming data.

Data Measurement: Unit measurement to indicate the volume of data in modern computer storage devices.

Machine Learning: A process that gives machine the ability to learn without being explicitly programmed.

Sunburst Chart: A ring chart, also known as a sunburst chart or a multilevel pie chart, is used to visualize hierarchical data, depicted by concentric circles. The circle in the centre represents the root node, with the hierarchy moving outward from the center (Sunburst Graph, 2017 AU242: The in-text citation "Sunburst Graph, 2017" is not in the reference list. Please correct the citation, add the reference to the list, or delete the citation. ).

Treemap: Treemaps display hierarchical (tree-structured) data as a set of nested rectangles. Each branch of the tree is given a rectangle, which is then tiled with smaller rectangles representing sub-branches (Treemap, 2017 AU243: The in-text citation "Treemap, 2017" is not in the reference list. Please correct the citation, add the reference to the list, or delete the citation. ).

Streammap: A streamgraph, or stream graph, is a type of stacked area graph, which is displaced around a central axis, resulting in a flowing, organic shape (Streammap, 2017 AU241: The in-text citation "Streammap, 2017" is not in the reference list. Please correct the citation, add the reference to the list, or delete the citation. ).

Open Source Software (OSS): A type of computer software in which source code is released under a license in which the copyright holder grants users the rights to study, change, and distribute the software to anyone and for any purpose ( Wikipedia, 2019c ).

Schema on Read: Data analysis strategy that new data is transferred to a plan or schema without a predefined format.

Yottabyte: The yottabyte is a multiple of the unit byte for digital information. The prefix yotta indicates multiplication by the eighth power of 1000 or 1024 in the International System of Units (SI), and therefore one yottabyte is one septillion (one long scale quadrillion) bytes. The unit symbol for the yottabyte is YB. 1 YB = 10008bytes = 1024bytes = 1000000000000000000000000bytes = 1000zettabytes = 1 trillionterabytes (Yottabyte, 2017 AU244: The in-text citation "Yottabyte, 2017" is not in the reference list. Please correct the citation, add the reference to the list, or delete the citation. ).

Free and Open Source Software (FOSS): Software that can be classified as both free software and open source software ( Wikipedia, 2019a ).

Big Data: Data that exceeds the capacity of commonly used hardware and software tools to capture, store and analyze within a tolerable elapsed time is considered as big data. The three main characteristics of big data are volume, variety, and velocity.

Parallel Coordinate Plot: A parallel coordinate plot maps each row in the data table as a line, or profile. Each attribute of a row is represented by a point on the line. This makes parallel coordinate plots similar in appearance to line charts, but the way data is translated into a plot is substantially different (Tibco, 2017 AU240: The in-text citation "Tibco, 2017" is not in the reference list. Please correct the citation, add the reference to the list, or delete the citation. ).

Schema on Write: Data analysis strategy that new data is transferred to a structured predefined format.

Complete Chapter List

Search this Book:
Reset