Abstract
In the recent past, machine learning paradigms like the ensemble approaches have been used effectively to predict revenue from large volumes of sales data that helped the decision-making process in many businesses. The proposed work in this paper proposes a modified approach of ensemble algorithms to predict box office revenues of upcoming movies. A shallow version of the gradient boosting (XGBoosts) has been proposed to predict the box office revenue of movies based on several primary and derived features related to the movies in particular. Further studies have found that features such as budget, runtime, budget year ratio can also be considered as some of the more important estimators of the box office revenue. These features along with some other features have been used as an input to the proposed model in this proposed work to make significantly good predictions about the box office collection of a movie. The results are reported by testing and forecasting based on simulation on a standard data set. The precision of the model is tested using popular metrics such as R2, MSLE. The results reported gives efficacy of the proposed approach that can be further used in other business models words.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Variety.com (2018) Worldwide box office hits record as Disney dominates. https://variety.com/2019/film/news/box-office-record-disney-dominates-1203098075. Last accessed 05 Nov 2020
Litman BR (1998) The motion picture mega-industry. Allyn & Bacon
Valenti J (1978) Motion pictures and their impact on society in the year 2001. Midwest Research Institute
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 1189–1232
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794
GitHub. https://github.com/dmlc/xgboost. Last accessed 05 Nov 2020
Sreenivasan S (2013) Quantitative analysis of the evolution of novelty in cinema through crowdsourced keywords. Sci Rep 3(1):1–11
Sharda R, Delen D (2006) Predicting box-office success of motion pictures with neural networks. Expert Syst Appl 30(2):243–254
Lash MT, Zhao K (2016) Early predictions of movie success: the who, what, and when of profitability. J Manag Inf Syst 33(3):874–903
Asur S, Huberman BA (2010) Predicting the future with social media. In: 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology, vol 1. IEEE, pp 492–499
Mestyán M, Yasseri T, Kertész J (2013). Early prediction of movie box office success based on Wikipedia activity big data. PloS ONE, 8(8):e71226
Berkely Edu, Domestic gross of movies. https://www.stat.berkeley.edu/~aldous/Research/Ugrad/Xiaoyu_Hu.pdf. Last accessed 05 Nov 2020
Eliashberg J, Hui SK, Zhang ZJ (2014) Assessing box office performance using movie scripts: a kernel-based approach. IEEE Trans Knowl Data Eng 26(11):2639–2648
Delen D, Sharda R, Kumar P (2007) Movie forecast Guru: a web-based DSS for Hollywood managers. Decis Support Syst 43(4):1151–1170
Pope LS, Jason E (eds) (2017) The movie business book. Routledge (A Focal Press Book), New York, pp. xxiii, 628. ISBN 978-1-138-65629-1
The Movie Database API. https://developers.themoviedb.org. Last accessed 05 Nov 2020
Kaggle TMDB box office prediction. https://www.kaggle.com/c/tmdb-box-office-prediction/data. Last accessed 05 Nov 2020
Rahm E, Do HH (2000) Data cleaning: problems and current approaches. IEEE Data Eng Bull 23(4):3–13
EDSA The Essentials of Data Analytics and Machine Learning. https://courses.edsa-project.eu/pluginfile.php/1332/mod_resource/content/0/Module%205%20-%20Feature%20transformation_V1.pdf. Last accessed 05 Nov 2020
Guyon I, Gunn S, Nikravesh M, Zadeh LA (eds) (2008) Feature extraction: foundations and applications, vol 207. Springer
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42
Chai T, Draxler RR (2014) Root mean square error (RMSE) or mean absolute error (MAE)?–arguments against avoiding RMSE in the literature. Geosci Model Dev 7(3):1247–1250
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Dutta, S., Dasgupta, K. (2021). A Shallow Approach to Gradient Boosting (XGBoosts) for Prediction of the Box Office Revenue of a Movie. In: Mandal, J.K., Mukhopadhyay, S., Unal, A., Sen, S.K. (eds) Proceedings of International Conference on Innovations in Software Architecture and Computational Systems. Studies in Autonomic, Data-driven and Industrial Computing. Springer, Singapore. https://doi.org/10.1007/978-981-16-4301-9_16
Download citation
DOI: https://doi.org/10.1007/978-981-16-4301-9_16
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-4300-2
Online ISBN: 978-981-16-4301-9
eBook Packages: Computer ScienceComputer Science (R0)