Theoretical and Empirical Analysis of Usage of MapReduce and Apache Tez in Big Data

Singh, Rupinder; Kaur, Puneet Jai

doi:10.1007/978-3-319-30927-9_52

Rupinder Singh⁵ &
Puneet Jai Kaur⁵

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 51))

963 Accesses
1 Citations

Abstract

Big data means large amount of data requires new technologies for its faster processing. It is ineffective to process the large amount of data with traditional devices. Big data provides an extra advantage in business and better service delivery. Big data brings a new change in decision making process of various business organizations. Big data has many challenges related to the 5Vs-Volume, Velocity, Variety, Veracity and Value. Hadoop is a Big Data tool used to process larger amounts of Data. It has many subcomponents work together to achieve the goal of faster processing. Apache Hive and Apache Pig are tools used to access data in different ways in Hadoop Ecosystem. Apache Hive depends upon SQL like queries while Apache Pig uses scripts. These two tools uses MapReduce or Apache Tez framework to access data. In this paper we analyze how these two frameworks uses Hadoop Distributed File System (HDFS) by comparing them in both theoretical and empirical way.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Khanzode, G.P.: Insights internet of things: endless opportunities. Banglore, India, Infosys, Insights (2012)
Google Scholar
Kaisler, S., Armour, F., Espinosa J.A., Money, W.: Big data: issues and challenges moving forward. In: IEEE, 46th Hawaii International Conference on System Sciences (2013)
Google Scholar
Big Data. http://en.wikipedia.org/wiki/Big_data
Facebook collecting data. http://gigaom.com/. 24 Feb 2014
Ouaknine, K., Carey, M., Kirkpatrick, S.: The pig mix benchmark on pig, MapReduce, HPCC systems. In: Big Data (BigData congress), 2015 IEEE International Congress on, pp. 643–648 (2015)
Google Scholar
Gates, A.F., Natkovich, O., Chopra, S., Kamath, P., Narayanamurthy, S.M., Olston, C., Reed, B., Srinivasan, S., Srivastava, U.: Building a high-level dataflow system on top of Map-Reduce: the Pig experience. In: Proceedings of the VLDB Endowment, Vol. 2, no. 2, pp. 1414–1425 (2009)
Google Scholar
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. In: Proceedings of the VLDB Endowment, Vol. 2, no. 2, pp. 1626–1629 (2009)
Google Scholar
Azzedin, F.: Towards a scalable HDFS architecture. In: Collaboration Technologies and Systems (CTS), 2013 International Conference on, pp. 155–161(2013)
Google Scholar
Bansal, S.K.: Towards a semantic extract-transform-load (ETL) framework for big data integration. In: Big Data (BigData Congress), 2014 IEEE International Congress on, pp. 522–529 (2014)
Google Scholar
Fuad, A., Erwin, A., Ipung, H.P.: Processing performance on Apache Pig, Apache Hive and MySQL cluster. In: Information, Communication Technology and System (ICTS), 2014 International Conference on, pp. 297–302 (2014)
Google Scholar
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Antony, S., Liu, H. and Murthy, R.: Hive-a petabyte scale data warehouse using hadoop. In: Data Engineering (ICDE), 2010 IEEE 26th International Conference, pp. 996–1005. IEEE (2010)
Google Scholar
Gates, A.F., Dai, J., Nair, T.: Apache pig’s optimizer. In: IEEE Data Engineering Bulletin 36, no. 1 (2013)
Google Scholar
Maitrey, S., Jha, C.K.: Handling big data efficiently by using map reduce technique. In: Computational Intelligence & Communication Technology (CICT), 2015 IEEE International Conference on, pp. 703–708 (2015)
Google Scholar
Hadoop. http://www.hortonworks.com
Ravindra, P.: Towards optimization of RDF analytical queries on MapReduce. In: Data Engineering Workshops (ICDEW), 2014 IEEE 30th International Conference on, pp. 335–339 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of I.T, U.I.E.T, Panjab University, Chandigarh, India
Rupinder Singh & Puneet Jai Kaur

Authors

Rupinder Singh
View author publications
You can also search for this author in PubMed Google Scholar
Puneet Jai Kaur
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rupinder Singh .

Editor information

Editors and Affiliations

Deparment of CSE, Anil Neerukonda Ins. of Tech. & Sci., Vishakapatnam, India
Suresh Chandra Satapathy
Indian Statistical Institute, Jadavpur University, Kolkata, India
Swagatam Das

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Singh, R., Kaur, P.J. (2016). Theoretical and Empirical Analysis of Usage of MapReduce and Apache Tez in Big Data. In: Satapathy, S., Das, S. (eds) Proceedings of First International Conference on Information and Communication Technology for Intelligent Systems: Volume 2. Smart Innovation, Systems and Technologies, vol 51. Springer, Cham. https://doi.org/10.1007/978-3-319-30927-9_52

Download citation

DOI: https://doi.org/10.1007/978-3-319-30927-9_52
Published: 04 May 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30926-2
Online ISBN: 978-3-319-30927-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics