Duration: 1 day
Apache Hadoop is the open-source framework designed to help solve some of the storage and analysis issues around Big Data. This hands-on workshop continues on from COMP1630, and assumes prior knowledge of the industry standards in data modeling, relational database design, and SQL programming. It is aimed at a broad audience including administrators, data analysts, and managers. Participants build on their existing database skills to work with larger and more complex data sets and to gain an overview of Hadoop and Big Data. Starting with the basic concepts and components of Hadoop, students will use Hive to query data stored in Hadoop with an SQL-like query language. Lectures and labs introduce the normal usage of a Hadoop system using the Cloudera Quickstart virtual machine. Homework and exercises will focus on getting data into the Hadoop Distributed File System (HDFS), basic file operations, and running queries on existing data. Upon successful completion of this course, participants will be able to define Big Data, identify the basic components of Hadoop, and run queries on Big Data using SQL on Hive.
• Define Big Data.
• Describe why Hadoop was developed.
• Identify the basic components of a Hadoop system.
• Describe how files are stored in the Hadoop Distributed File System (HDFS).
• Identify the steps taken in a typical Map Reduce analysis.
• Run queries on data using Hive.
Method of Delivery
• Onsite/Live class instructions or Online web conference
• Open discussion
• Case studies