A Big Data Analysis of the Factors Influencing Movie Box Office in China

A Big Data Analysis of the Factors Influencing Movie Box Office in China

Wentao Gao, Ka Man Lam, Dickson K. W. Chiu, Kevin K. W. Ho
Copyright: © 2021 |Pages: 18
DOI: 10.4018/978-1-7998-4963-6.ch011
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

A movie's economic revenue comes mainly from the movie box office, while the influencing factors of the movie box office are complex and numerous. This research explores the influencing factors of China's commercial movie box office by analyzing the top 100 box office movies released in Mainland China between 2013-2016, with a total of 400 movies. The authors analyzed the data collected using correlation analysis and decision tree analysis using RapidMiner, respectively. Based on the analysis results, they put forward suggestions for improving the box office of the movie industry.
Chapter Preview
Top

Literature Review

Nowadays, the cultural industry is developing rapidly, and the bloom in the movie industry serves as a phenomenal benchmark. Therefore, the research on the movie industry has gradually become a topic of concern to scholars. As prior studies mainly focused on finding practical guidance at the micro-level discovering factors affecting the movie box office, it is meaningful for this research to study the increasing trend of movie box revenues and related prediction models macroscopically.

In the Big Data era, movie metadata and their box office can be readily retrieved for correlation analysis (Barbosu, 2016), providing excellent guidance for the development of China’s domestic movie industry. The micro-level study of the factors affecting the box office revenue of a movie can be divided into three aspects: first, the impact of movie evaluation information on the box office revenue; second, the impact of movie content on the box office revenue; third, the release time of the movie (Einav, 2007).

Gallup (1992) summarized the factors affecting movie audience behavior into story content, actors, previews, marketing, and movie titles, and found that audience feedbacks on the Internet impact the movie box office, which will last from well before in theater till off-screen (Chen, Liu, & Zhang, 2012). Jo and Choi (2015) used a database comprising 41 movie stars (famous actors and actresses) and their presence in 467 movies to analyze whether famous actors and actresses influence the movie box office. Word of mouth (WOM) is one of the most critical factors to determine movie quality. For example, Kim, Park, and Park (2013) suggested that online WOM and expert reviews play a critical role in moviegoers’ consumption behavior in the age of the Internet and social media. They also found that only the frequency of online WOM was a significant factor in international markets. Since WOM can also be posted by third-party as endorsements and quality signals, their impact on consumers is likely to depend on the consumer type and frequency of media choices (Koschat, 2012).

Key Terms in this Chapter

SPSS: Is a software platform that offers advanced statistical analysis, a vast library of machine learning algorithms, text analysis, open-source extensibility, integration with big data, and seamless deployment into applications (see: https://www.ibm.com/hk-en/analytics/spss-statistics-software ).

Random Forest “Mod & Exa”: Mod is a port which delivers the model that is built by the operator, while exa is an example set input/output port (see: https://docs.rapidminer.com/9.4/studio/getting-started/important-terms.html ).

RapidMiner: Is an integrated data science software platform for data preparation, machine learning, deep learning, text mining, predictive analytics, result visualization, model validation, and optimization for a wide range of disciplines and applications, such as business, research, education, and application development (see: https://rapidminer.com/ ).

Decision Tree: Builds classification models in the form of a tree structure by breaking down a dataset into smaller and smaller subsets, while at the same time, an associated decision tree is incrementally developed. The final result is a tree with decision nodes (having two or more branches) and leaf nodes (representing a classification or decision).

Pearson Correlation: (?) is a statistic that measures the linear correlation between two variables. It has a value between +1 and -1. A value of +1 is a total positive linear correlation, 0 is no linear correlation, and -1 is a total negative linear correlation.

Random Forest “Res”: Is result set: distance or similarity between examples of the request set and reference set (see: https://docs.rapidminer.com/9.4/studio/getting-started/important-terms.html ).

Complete Chapter List

Search this Book:
Reset