Big Data using Hadoop and Spark

Big Data using Hadoop and Spark

Course Description

1. Introduction to Big Data and Hadoop

  • Characteristics of Big Data Technology
  • Introduction to Hadoop
  • Hadoop Configuration
  • Hadoop Core Components – HDFS and MapReduce
  • HDFS Architecture
  • Data Types in Hadoop
  • Hadoop MapReduce – Features and Processes
  • Advanced MapReduce and HDFS
  • Introduction to PIG

2. Hive, HBASE and Hadoop Ecosystem Components

  • Introduction to HIVE
  • HIVE – Characteristics 
  • Hive Query Language
  • Data Models and Data Types in HIVE
  • Introduction to HBASE
  • Characteristics and Architecture of HBASE
  • Cloudera – Introduction
  • Cloudera Distribution and Manager
  • Comparison of PIG, HIVE and MapReduce
  • Introduction to Zoo Keeper, Sqoop and Oozie

3. Big Data via Python and R

  • PySpark: Spark with Python
  • Loading Data in PySpark shell
  • Python Functional programming using filter() and map()
  • Programming in PySpark RDD’s
  • PySpark SQL and Dataframes
  • Machine Learning with PySpark MLlib
  • Introduction to Spark in R using sparklyr



Reach Us

Call or use the form to request a free initial consultation.

Office 1.05, 1st Floor, Building 2,Croxely Business Park, Watford, WD18 8YA


    Leave A Message