A Shallow Approach to Gradient Boosting (XGBoosts) for Prediction of the Box Office Revenue of a Movie

Dutta, Sujan; Dasgupta, Kousik

doi:10.1007/978-981-16-4301-9_16

Sujan Dutta⁷ &
Kousik Dasgupta⁷

Part of the book series: Studies in Autonomic, Data-driven and Industrial Computing ((SADIC))

208 Accesses
1 Citations

Abstract

In the recent past, machine learning paradigms like the ensemble approaches have been used effectively to predict revenue from large volumes of sales data that helped the decision-making process in many businesses. The proposed work in this paper proposes a modified approach of ensemble algorithms to predict box office revenues of upcoming movies. A shallow version of the gradient boosting (XGBoost_s) has been proposed to predict the box office revenue of movies based on several primary and derived features related to the movies in particular. Further studies have found that features such as budget, runtime, budget year ratio can also be considered as some of the more important estimators of the box office revenue. These features along with some other features have been used as an input to the proposed model in this proposed work to make significantly good predictions about the box office collection of a movie. The results are reported by testing and forecasting based on simulation on a standard data set. The precision of the model is tested using popular metrics such as R², MSLE. The results reported gives efficacy of the proposed approach that can be further used in other business models words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Variety.com (2018) Worldwide box office hits record as Disney dominates. https://variety.com/2019/film/news/box-office-record-disney-dominates-1203098075. Last accessed 05 Nov 2020
Litman BR (1998) The motion picture mega-industry. Allyn & Bacon
Google Scholar
Valenti J (1978) Motion pictures and their impact on society in the year 2001. Midwest Research Institute
Google Scholar
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Article MathSciNet Google Scholar
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 1189–1232
Google Scholar
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794
Google Scholar
GitHub. https://github.com/dmlc/xgboost. Last accessed 05 Nov 2020
Sreenivasan S (2013) Quantitative analysis of the evolution of novelty in cinema through crowdsourced keywords. Sci Rep 3(1):1–11
Article Google Scholar
Sharda R, Delen D (2006) Predicting box-office success of motion pictures with neural networks. Expert Syst Appl 30(2):243–254
Article Google Scholar
Lash MT, Zhao K (2016) Early predictions of movie success: the who, what, and when of profitability. J Manag Inf Syst 33(3):874–903
Article Google Scholar
Asur S, Huberman BA (2010) Predicting the future with social media. In: 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology, vol 1. IEEE, pp 492–499
Google Scholar
Mestyán M, Yasseri T, Kertész J (2013). Early prediction of movie box office success based on Wikipedia activity big data. PloS ONE, 8(8):e71226
Google Scholar
Berkely Edu, Domestic gross of movies. https://www.stat.berkeley.edu/~aldous/Research/Ugrad/Xiaoyu_Hu.pdf. Last accessed 05 Nov 2020
Eliashberg J, Hui SK, Zhang ZJ (2014) Assessing box office performance using movie scripts: a kernel-based approach. IEEE Trans Knowl Data Eng 26(11):2639–2648
Article Google Scholar
Delen D, Sharda R, Kumar P (2007) Movie forecast Guru: a web-based DSS for Hollywood managers. Decis Support Syst 43(4):1151–1170
Article Google Scholar
Pope LS, Jason E (eds) (2017) The movie business book. Routledge (A Focal Press Book), New York, pp. xxiii, 628. ISBN 978-1-138-65629-1
Google Scholar
The Movie Database API. https://developers.themoviedb.org. Last accessed 05 Nov 2020
Kaggle TMDB box office prediction. https://www.kaggle.com/c/tmdb-box-office-prediction/data. Last accessed 05 Nov 2020
Rahm E, Do HH (2000) Data cleaning: problems and current approaches. IEEE Data Eng Bull 23(4):3–13
Google Scholar
EDSA The Essentials of Data Analytics and Machine Learning. https://courses.edsa-project.eu/pluginfile.php/1332/mod_resource/content/0/Module%205%20-%20Feature%20transformation_V1.pdf. Last accessed 05 Nov 2020
Guyon I, Gunn S, Nikravesh M, Zadeh LA (eds) (2008) Feature extraction: foundations and applications, vol 207. Springer
Google Scholar
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42
Article Google Scholar
Chai T, Draxler RR (2014) Root mean square error (RMSE) or mean absolute error (MAE)?–arguments against avoiding RMSE in the literature. Geosci Model Dev 7(3):1247–1250
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Kalyani Government Engineering College, Kalyani, India
Sujan Dutta & Kousik Dasgupta

Authors

Sujan Dutta
View author publications
You can also search for this author in PubMed Google Scholar
Kousik Dasgupta
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Kalyani, Kalyani, West Bengal, India
Jyotsna Kumar Mandal
Assam University, Silchar, Assam, India
Somnath Mukhopadhyay
Stanford University, Palo Alto, CA, USA
Aynur Unal
Guru Nanak Institute of Technology, Kolkata, West Bengal, India
Santanu Kumar Sen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dutta, S., Dasgupta, K. (2021). A Shallow Approach to Gradient Boosting (XGBoost_s) for Prediction of the Box Office Revenue of a Movie. In: Mandal, J.K., Mukhopadhyay, S., Unal, A., Sen, S.K. (eds) Proceedings of International Conference on Innovations in Software Architecture and Computational Systems. Studies in Autonomic, Data-driven and Industrial Computing. Springer, Singapore. https://doi.org/10.1007/978-981-16-4301-9_16

Download citation

DOI: https://doi.org/10.1007/978-981-16-4301-9_16
Published: 12 October 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-4300-2
Online ISBN: 978-981-16-4301-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics