fbpx
Bigdata Apache Hadoop Spark Scala Training Syllabus:
  • Linux (Ubuntu/Centos) – Tips and Tricks
  • Basic(core) Java Programming Concepts – OOPS
  • Learning Objectives: In this module, you will understand what Big Data is, the limitations of the traditional solutions for Big Data problems, how Hadoop solves those Big Data problems, Hadoop Ecosystem, Hadoop Architecture, HDFS, Anatomy of File Read and Write & how MapReduce works.
  • Topics:
    • Introduction to Big Data & Big Data Challenges
    • Limitations & Solutions of Big Data Architecture
    • Hadoop & its Features
    • Hadoop Ecosystem
    • Hadoop 2.x Core Components
    • Hadoop Storage: HDFS (Hadoop Distributed File System)
    • Hadoop Processing: MapReduce Framework
    • Different Hadoop Distributions
  •  
  • Hadoop 2.x Architecture
  • Typical workflow
  • HDFS Commands
  • Writing files to HDFS
  • Reading files from HDFS
  • Rack awareness
  • Hadoop daemons
  •  
  • Before MapReduce
  • MapReduce overview
  • Word count problem
  • Word count flow and solution
  • MapReduce flow
  •  
  • Data Types
  • File Formats
  • Explain the Driver, Mapper and Reducer code
  • Configuring development environment – Eclipse
  • Writing unit test
  • Running locally
  • Running on cluster
  • Hands on exercises
  •  
  • Anatomy of MapReduce job run
  • Job submission
  • Job initialization
  • Task assignment
  • Job completion
  • Job scheduling
  • Job failures
  • Shuffle and sort
  • Hands on exercises
  •  
  • File Formats – Sequence Files
  • Compression Techniques
  • Input Formats – Input splits & records, text input, binary input
  • Output Formats – text output, binary output, lazy output
  • Hands on exercises
  •  
  • Counters
  • Side data distribution
  • MapReduce combiner
  • MapReduce partitioner
  • MapReduce distributed cache
  • Hands exercises
  •  
  • Hive Architecture
  • Types of Metastore
  • Hive Data Types
  • HiveQL
  • File Formats – Parquet, ORC, Sequence and Avro Files Comparison
  • Partitioning & Bucketing
  • Hive JDBC Client
  • Hive UDFs
  • Hive Serdes
  • Hive on Tez
  • Hands-on exercises
  • Integration with Tableau
  •  
  • Flume Architecture
  • Flume Agent Setup
  • Types of sources, channels, sinks Multi Agent Flow
  • Hands-on exercises
  • Introduction to Apache Pig
  • MapReduce vs Pig
  • Pig Components & Pig Execution
  • Pig Data Types & Data Models in Pig
  • Pig Latin Programs
  • Shell and Utility Commands
  • Pig UDF & Pig Streaming
  • Testing Pig scripts with Punit
  • Aviation use-case in PIG
  • Pig Demo of Healthcare Dataset
  •  
  • HBase Data Model
  • HBase Shell
  • HBase Client API
  • Hive Data Loading Techniques
  • Apache Zookeeper Introduction
  • ZooKeeper Data Model
  • Zookeeper Service
  • HBase Bulk Loading
  • Getting and Inserting Data
  • HBase Filters
  •  
  • Sqoop Architecture
  • Sqoop Import Command Arguments, Incremental Import
  • Sqoop Export
  • Sqoop Jobs
  • Hands-on exercises
  •  
  • Spark Basics
  • What is Apache Spark?
  • Spark Installation
  • Spark Configuration
  • Spark Context
  • Using Spark Shell
  • Resilient Distributed Datasets (RDDs) – Features, Partitions, Tuning Parallelism
  • Functional Programming with Spark
  • ark Basics
  • What is Apache Spark?
  • Spark Installation
  • Spark Configuration
  • Spark Context
  • Using Spark Shell
  • Resilient Distributed Datasets (RDDs) – Features, Partitions, Tuning Parallelism
  • Functional Programming with Spark
  •  
  • Oozie
  • Oozie Components
  • Oozie Workflow
  • Scheduling Jobs with Oozie Scheduler
  • Demo of Oozie Workflow
  • Oozie Coordinator
  • Oozie Commands
  • Oozie Web Console
  • Oozie for MapReduce
  • Combining flow of MapReduce Jobs
  • Hive in Oozie
  • Hadoop Project Demo
  • Hadoop Talend Integration
  •  
  • Log File Analysis covering Flume, HDFS, MR/Pig, Hive, Tableau
  • Crime Data Analysis Covering Oozie, Sqoop, HDFS, Hive, HBase, RestFul Client.
  • Hadoop Use Cases in Insurance Domain
  • Hadoop Use Cases in Retail Domain
  •  

Bigdata Apache Hadoop Training

Next Batch Date :

05-Oct-2020

Enroll Now

Next Batch Date :

05-Oct-2020

Enroll Now

Next Batch Date :

05-Oct-2020

Enroll Now

Next Batch Date :

05-Oct-2020

Enroll Now

Next Batch Date :

05-Oct-2020

Enroll Now

Enquiry for Batch & Seat Availability




Course Curriculum

No curriculum found !
©  2001-2020 SMEClabs.  All Rights Reserved.