Abstract
Big data means large amount of data requires new technologies for its faster processing. It is ineffective to process the large amount of data with traditional devices. Big data provides an extra advantage in business and better service delivery. Big data brings a new change in decision making process of various business organizations. Big data has many challenges related to the 5Vs-Volume, Velocity, Variety, Veracity and Value. Hadoop is a Big Data tool used to process larger amounts of Data. It has many subcomponents work together to achieve the goal of faster processing. Apache Hive and Apache Pig are tools used to access data in different ways in Hadoop Ecosystem. Apache Hive depends upon SQL like queries while Apache Pig uses scripts. These two tools uses MapReduce or Apache Tez framework to access data. In this paper we analyze how these two frameworks uses Hadoop Distributed File System (HDFS) by comparing them in both theoretical and empirical way.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Khanzode, G.P.: Insights internet of things: endless opportunities. Banglore, India, Infosys, Insights (2012)
Kaisler, S., Armour, F., Espinosa J.A., Money, W.: Big data: issues and challenges moving forward. In: IEEE, 46th Hawaii International Conference on System Sciences (2013)
Big Data. http://en.wikipedia.org/wiki/Big_data
Facebook collecting data. http://gigaom.com/. 24 Feb 2014
Ouaknine, K., Carey, M., Kirkpatrick, S.: The pig mix benchmark on pig, MapReduce, HPCC systems. In: Big Data (BigData congress), 2015 IEEE International Congress on, pp. 643–648 (2015)
Gates, A.F., Natkovich, O., Chopra, S., Kamath, P., Narayanamurthy, S.M., Olston, C., Reed, B., Srinivasan, S., Srivastava, U.: Building a high-level dataflow system on top of Map-Reduce: the Pig experience. In: Proceedings of the VLDB Endowment, Vol. 2, no. 2, pp. 1414–1425 (2009)
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. In: Proceedings of the VLDB Endowment, Vol. 2, no. 2, pp. 1626–1629 (2009)
Azzedin, F.: Towards a scalable HDFS architecture. In: Collaboration Technologies and Systems (CTS), 2013 International Conference on, pp. 155–161(2013)
Bansal, S.K.: Towards a semantic extract-transform-load (ETL) framework for big data integration. In: Big Data (BigData Congress), 2014 IEEE International Congress on, pp. 522–529 (2014)
Fuad, A., Erwin, A., Ipung, H.P.: Processing performance on Apache Pig, Apache Hive and MySQL cluster. In: Information, Communication Technology and System (ICTS), 2014 International Conference on, pp. 297–302 (2014)
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Antony, S., Liu, H. and Murthy, R.: Hive-a petabyte scale data warehouse using hadoop. In: Data Engineering (ICDE), 2010 IEEE 26th International Conference, pp. 996–1005. IEEE (2010)
Gates, A.F., Dai, J., Nair, T.: Apache pig’s optimizer. In: IEEE Data Engineering Bulletin 36, no. 1 (2013)
Maitrey, S., Jha, C.K.: Handling big data efficiently by using map reduce technique. In: Computational Intelligence & Communication Technology (CICT), 2015 IEEE International Conference on, pp. 703–708 (2015)
Hadoop. http://www.hortonworks.com
Ravindra, P.: Towards optimization of RDF analytical queries on MapReduce. In: Data Engineering Workshops (ICDEW), 2014 IEEE 30th International Conference on, pp. 335–339 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Singh, R., Kaur, P.J. (2016). Theoretical and Empirical Analysis of Usage of MapReduce and Apache Tez in Big Data. In: Satapathy, S., Das, S. (eds) Proceedings of First International Conference on Information and Communication Technology for Intelligent Systems: Volume 2. Smart Innovation, Systems and Technologies, vol 51. Springer, Cham. https://doi.org/10.1007/978-3-319-30927-9_52
Download citation
DOI: https://doi.org/10.1007/978-3-319-30927-9_52
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30926-2
Online ISBN: 978-3-319-30927-9
eBook Packages: EngineeringEngineering (R0)