Skip to main content

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 51))

Abstract

Big data means large amount of data requires new technologies for its faster processing. It is ineffective to process the large amount of data with traditional devices. Big data provides an extra advantage in business and better service delivery. Big data brings a new change in decision making process of various business organizations. Big data has many challenges related to the 5Vs-Volume, Velocity, Variety, Veracity and Value. Hadoop is a Big Data tool used to process larger amounts of Data. It has many subcomponents work together to achieve the goal of faster processing. Apache Hive and Apache Pig are tools used to access data in different ways in Hadoop Ecosystem. Apache Hive depends upon SQL like queries while Apache Pig uses scripts. These two tools uses MapReduce or Apache Tez framework to access data. In this paper we analyze how these two frameworks uses Hadoop Distributed File System (HDFS) by comparing them in both theoretical and empirical way.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Khanzode, G.P.: Insights internet of things: endless opportunities. Banglore, India, Infosys, Insights (2012)

    Google Scholar 

  2. Kaisler, S., Armour, F., Espinosa J.A., Money, W.: Big data: issues and challenges moving forward. In: IEEE, 46th Hawaii International Conference on System Sciences (2013)

    Google Scholar 

  3. Big Data. http://en.wikipedia.org/wiki/Big_data

  4. Facebook collecting data. http://gigaom.com/. 24 Feb 2014

  5. Ouaknine, K., Carey, M., Kirkpatrick, S.: The pig mix benchmark on pig, MapReduce, HPCC systems. In: Big Data (BigData congress), 2015 IEEE International Congress on, pp. 643–648 (2015)

    Google Scholar 

  6. Gates, A.F., Natkovich, O., Chopra, S., Kamath, P., Narayanamurthy, S.M., Olston, C., Reed, B., Srinivasan, S., Srivastava, U.: Building a high-level dataflow system on top of Map-Reduce: the Pig experience. In: Proceedings of the VLDB Endowment, Vol. 2, no. 2, pp. 1414–1425 (2009)

    Google Scholar 

  7. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. In: Proceedings of the VLDB Endowment, Vol. 2, no. 2, pp. 1626–1629 (2009)

    Google Scholar 

  8. Azzedin, F.: Towards a scalable HDFS architecture. In: Collaboration Technologies and Systems (CTS), 2013 International Conference on, pp. 155–161(2013)

    Google Scholar 

  9. Bansal, S.K.: Towards a semantic extract-transform-load (ETL) framework for big data integration. In: Big Data (BigData Congress), 2014 IEEE International Congress on, pp. 522–529 (2014)

    Google Scholar 

  10. Fuad, A., Erwin, A., Ipung, H.P.: Processing performance on Apache Pig, Apache Hive and MySQL cluster. In: Information, Communication Technology and System (ICTS), 2014 International Conference on, pp. 297–302 (2014)

    Google Scholar 

  11. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Antony, S., Liu, H. and Murthy, R.: Hive-a petabyte scale data warehouse using hadoop. In: Data Engineering (ICDE), 2010 IEEE 26th International Conference, pp. 996–1005. IEEE (2010)

    Google Scholar 

  12. Gates, A.F., Dai, J., Nair, T.: Apache pig’s optimizer. In: IEEE Data Engineering Bulletin 36, no. 1 (2013)

    Google Scholar 

  13. Maitrey, S., Jha, C.K.: Handling big data efficiently by using map reduce technique. In: Computational Intelligence & Communication Technology (CICT), 2015 IEEE International Conference on, pp. 703–708 (2015)

    Google Scholar 

  14. Hadoop. http://www.hortonworks.com

  15. Ravindra, P.: Towards optimization of RDF analytical queries on MapReduce. In: Data Engineering Workshops (ICDEW), 2014 IEEE 30th International Conference on, pp. 335–339 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rupinder Singh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Singh, R., Kaur, P.J. (2016). Theoretical and Empirical Analysis of Usage of MapReduce and Apache Tez in Big Data. In: Satapathy, S., Das, S. (eds) Proceedings of First International Conference on Information and Communication Technology for Intelligent Systems: Volume 2. Smart Innovation, Systems and Technologies, vol 51. Springer, Cham. https://doi.org/10.1007/978-3-319-30927-9_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-30927-9_52

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-30926-2

  • Online ISBN: 978-3-319-30927-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics