Exploring Cryptocurrency Sentiments With Clustering Text Mining on Social Media

Exploring Cryptocurrency Sentiments With Clustering Text Mining on Social Media

Jiwen Fang, Dickson K. W. Chiu, Kevin K. W. Ho
Copyright: © 2021 |Pages: 15
DOI: 10.4018/978-1-7998-4963-6.ch007
(Individual Chapters)
No Current Special Offers


Social media has become a popular communication platform and aggregated mass information for sentimental analysis. As cryptocurrency has become a hot topic worldwide in recent years, this chapter explores individuals' behavior in sharing Bitcoin information. First, Python was used for extracting around one month's set of Tweet data to obtain a dataset of 11,674 comments during a month of a substantial increase in Bitcoin price. The dataset was cleansed and analyzed by the process documents operator of RapidMiner. A word-cloud visualization for the Tweet dataset was generated. Next, the clustering operator of RapidMiner was used to analyze the similarity of words and the underlying meaning of the comments in different clusters. The clustering results show 85% positive comments on investment and 15% negative ones to Bitcoin-related tweets concerning security. The results represent the generally bullish environment of the cryptocurrency market and general user satisfaction during the period concerned.
Chapter Preview


Since the Bitcoin (BTC) was first released in 2008 and transacted in 2009 as the first cryptocurrency (see explanations of cryptocurrency and Bitcoin in the Terms and Definition section), the world has witnessed the revolutionary blockchain technology spread worldwide. Recently, Bitcoin has drawn big crowds into the market. Bitcoin price experienced a meteoric rise from around US$1,000 at the beginning of 2017 to almost US$20,000 at the end of the same year (see: http://coindesk.com). This phenomenon has made Bitcoin become one of the most discussed topics globally, and nearly 80% of Americans have heard of Bitcoin (De, 2018). The craze has boosted the market of cryptocurrencies and Initial Coin Offering (ICO). People and institutions turn their attention and investment from traditional financial instruments to the crypto market, which results in hypes of these cryptocurrencies. The CoinMarketCap website shows 1,845 cryptocurrencies already in the market, but “less than 1% of the world’s population” has invested in cryptocurrencies (Palipea, 2018).

Based on these phenomena, this market will still draw huge potential investment and go through further technological development in the future, which appeals to much research and attention into this field. The emergence of Bitcoin may cause fundamental changes in people’s lives without centralized monetary authorities in the future, which already brings the accounting profession’s attention from different parts of the world (Tsuji & Hiraiwa, 2018). Bitcoin might be the first major cryptocurrency, developed by Satoshi Nakamoto, which depicted the decentralized blockchain feature that sprung up other blockchain technology (Brunton, 2019). Therefore, this chapter takes Bitcoin, with the largest market value and user base, as the case study to represent cryptocurrency in social media.

Furthermore, for the public, social media is one of the primary channels to search and acquire the latest news from the blockchain market (Choi et al., 2020). We have undergone drastic changes in information search behaviors over the past couple of decades, with the various pervasive Internet applications (Hong et al., 2007). This trend brings up social media to domination, with communication and interaction means completely altered to a new form, and many people use social media platforms to exchange information simultaneously (Lam et al., 2019; Fong et al., 2020). Lin and Lee (2004) claimed that people could get customized information with minimal effort and cost and obtain information to facilitate efficient decision-making within the Internet context. As social media provides “two-way conversations” between companies and clients to speed up market responsiveness and co-create value (See-To & Ho, 2014), companies from diffident industries realize the importance of social media, which is an excellent way to communicate and interact with their customers (Biederman, 2015). Twitter, as world-scale social media, brings significant impacts on individuals’ daily lives, with monthly active users up to 336 million in the first quarter of 2019 (Clement, 2019). Twitter's position is highly ranked worldwide, not only among individuals, but companies and governments also employ Twitter as a platform to expand their influences, improve public relations, provide services, and attract potential customers (Himelboim et al., 2014).

As a result, Twitter has generally been considered a powerful tool for observing cryptocurrency information searching behavior with much evidence. Firstly, the importance of cryptocurrency information on Twitter from users’ viewpoints has high values and contributes to the growing community by creating millions of content (Kraaijeveld & De Smedt, 2020). Such content could be shared across different networking platforms at breath-taking speed, leading to heated debates about cryptocurrency and much concern for the public on a global scale. The cryptomarket cannot leave out social media, which increases awareness and improves transparency to the public, particularly for ICOs, marketing strategy, etc. Social media also feature a high volume of cryptocurrency information and never-ending conversations, which provide high value to users who need cryptocurrency information for financial decisions (Almatrafi et al., 2018).

Key Terms in this Chapter

RapidMiner: Is an integrated data science software platform for data preparation, machine learning, deep learning, text mining, predictive analytics, result visualization, model validation, and optimization for a wide range of disciplines and applications, such as business, research, education, and application development (see: https://rapidminer.com/ ).

Cluster Analysis: Is used to classify objects or cases into groups called clusters, where no prior knowledge on cluster membership is required. Clustering procedures may be hierarchical or non-hierarchical, where the non-hierarchical methods in cluster analysis are often known as K-means clustering (employed in this research). Such procedures typically include problem formulation, distance measure selection, clustering procedure determination, choosing the number of clusters, cluster interpretation, and result validity assessment.

Bitcoin: Is the first decentralized peer-to-peer cryptocurrency invented under the name Satoshi Nakamoto in 2008 with an open-source software implementation without support from any central bank or authority or any need for intermediaries. Transactions are recorded with blockchain technologies in public distributed manner and verified by network nodes through cryptography.

Cryptocurrency: Is a digital medium of exchange (normally without physical forms) in which strong cryptography is used to secure the database of transaction records, to record and control the coin ownership and creation, and to verify coin transfer. Cryptocurrencies typically have no central control and use blockchain as the technology to maintain public transactions in a peer-to-peer manner.

Text Mining: Attempts to discover new, high-quality information from text collected from various sources such as websites, social media, emails, publications, and reviews, by detecting patterns or trends. Text mining typically requires pre-processing text input to structured data, detecting patterns from structured data, and result interpretation.

Social media: Encompass a wide range of websites, apps, and interactive digital technologies that quickly facilitate users’ creation and share content through virtual communities and networks. Such content includes text comments, digital photos, videos, user profiles, and usage data generated through web interactions. Social media encompasses. Some, like Twitter, specialize in sharing short written messages and links.

Sentiment Analysis (Opinion Mining): Uses natural language processing and machine learning to analyze and mining text to interpret, interpret, and classify emotions and subjective information (i.e., sentiment) from information sources. Sentiment analysis is widely applied to detect sentiments in social networks, news, websites, and other online conversations to estimate brand reputation, customer views, and different public opinions.

Complete Chapter List

Search this Book: