TypeClassroom Training
REGISTER

Contact Us

Fields marked with an * are required


Upcoming Trainings

There are no upcoming events at this time.

Overview

Audience & Prerequisites

Course Outline

Schedule & Fees

Certification

Big Data Hadoop Certification

It is a comprehensive Hadoop Big Data training course designed by industry experts considering current industry job requirements to provide in-depth learning on big data and Hadoop Modules. This is an industry recognized Big Data certification training course that is a combination of the training courses in Hadoop developer, Hadoop administrator, Hadoop testing, and analytics. This Cloudera Hadoop training will prepare you to clear big data certification.

Objectives

  • Master fundamentals of Hadoop 2.7 and YARN and write applications using them
  • Setting up Pseudo node and Multi node cluster on Amazon EC2
  • Master HDFS, MapReduce, Hive, Pig, Oozie, Sqoop, Flume, Zookeeper, HBase
  • Learn Spark, Spark RDD, Graphx, MLlib writing Spark applications
  • Master Hadoop administration activities like cluster managing,monitoring,administration and troubleshooting
  • Configuring ETL tools like Pentaho/Talend to work with MapReduce, Hive, Pig, etc
  • Detailed understanding of Big Data analytics
  • Hadoop testing applications using MR Unit and other automation tools.
  • Work with Avro data formats
  • Practice real-life projects using Hadoop and Apache Spark
  • Be equipped to clear Big Data Hadoop Certification.

Intended Audience

  • Programming Developers and System Administrators
  • Experienced working professionals , Project managers
  • Big DataHadoop Developers eager to learn other verticals like Testing, Analytics, Administration
  • Mainframe Professionals, Architects & Testing Professionals
  • Business Intelligence, Data warehousing and Analytics Professionals
  • Graduates, undergraduates eager to learn the latest Big Data technology can take this Big Data Hadoop Certification online training

Prerequisites

  • There is no pre-requisite to take this Big data training and to master Hadoop. But basics of UNIX, SQL and java would be good.At Intellipaat, we provide complimentary unix and Java course with our Big Data certification training to brush-up the required skills so that you are good on you Hadoop learning path.

Course Outline                                                  Duration: 2 Days

Introduction to Big Data & Hadoop and its Ecosystem, Map Reduce and HDFS

What is Big Data, Where does Hadoop fit in, Hadoop Distributed File System – Replications, Block Size, Secondary Namenode, High Availability, Understanding YARN – ResourceManager, NodeManager, Difference between 1.x and 2.x

Hadoop Installation & setup

Hadoop 2.x Cluster Architecture , Federation and High Availability, A Typical Production Cluster setup , Hadoop Cluster Modes, Common Hadoop Shell Commands, Hadoop 2.x Configuration Files, Cloudera Single node cluster

Deep Dive in Mapreduce

How Mapreduce Works, How Reducer works, How Driver works, Combiners, Partitioners, Input Formats, Output Formats, Shuffle and Sort, Mapside Joins, Reduce Side Joins, MRUnit, Distributed Cache

Lab exercises:

Working with HDFS, Writing WordCount Program, Writing custom partitioner, Mapreduce with Combiner , Map Side Join, Reduce Side Joins, Unit Testing Mapreduce, Running Mapreduce in LocalJobRunner Mode

Graph Problem Solving

What is Graph, Graph Representation, Breadth first Search Algorithm, Graph Representation of Map Reduce, How to do the Graph Algorithm, Example of Graph Map Reduce,

    Exercise 1: Exercise 2:Exercise 3:

Detailed understanding of Pig

A. Introduction to Pig

Understanding Apache Pig, the features, various uses and learning to interact with Pig

B. Deploying Pig for data analysis

The syntax of Pig Latin, the various definitions, data sort and filter, data types, deploying Pig for ETL, data loading, schema viewing, field definitions, functions commonly used.

C. Pig for complex data processing

Various data types including nested and complex, processing data with Pig, grouped data iteration, practical exercise

D. Performing multi-dataset operations

Data set joining, data set splitting, various methods for data set combining, set operations, hands-on exercise

E. Extending Pig

Understanding user defined functions, performing data processing with other languages, imports and macros, using streaming and UDFs to extend Pig, practical exercises

F. Pig Jobs

Working with real data sets involving Walmart and Electronic Arts as case study

Detailed understanding of Hive

A. Hive Introduction

Understanding Hive, traditional database comparison with Hive, Pig and Hive comparison, storing data in Hive and Hive schema, Hive interaction and various use cases of Hive

B. Hive for relational data analysis

Understanding HiveQL, basic syntax, the various tables and databases, data types, data set joining, various built-in functions, deploying Hive queries on scripts, shell and Hue.

C. Data management with Hive

The various databases, creation of databases, data formats in Hive, data modeling, Hive-managed Tables, self-managed Tables, data loading, changing databases and Tables, query simplification with Views, result storing of queries, data access control, managing data with Hive, Hive Metastore and Thrift server.

D. Optimization of Hive

Learning performance of query, data indexing, partitioning and bucketing

E. Extending Hive

Deploying user defined functions for extending Hive

F. Hands on Exercises – working with large data sets and extensive querying

Deploying Hive for huge volumes of data sets and large amounts of querying

G. UDF, query optimization

Working extensively with User Defined Queries, learning how to optimize queries, various methods to do performance tuning.

Impala

A. Introduction to Impala

What is Impala?, How Impala Differs from Hive and Pig, How Impala Differs from Relational Databases, Limitations and Future Directions, Using the Impala Shell

B. Choosing the Best (Hive, Pig, Impala)

C. Modeling and Managing Data with Impala and Hive

Data Storage Overview, Creating Databases and Tables, Loading Data into Tables, HCatalog, Impala Metadata Caching

D. Data Partitioning

Partitioning Overview, Partitioning in Impala and Hive

(AVRO) Data Formats

Selecting a File Format, Tool Support for File Formats, Avro Schemas, Using Avro with Hive and Sqoop, Avro Schema Evolution, Compression

Introduction to Hbase architecture

What is Hbase, Where does it fits, What is NOSQL

Apache Spark

A. Why Spark? Working with Spark and Hadoop Distributed File System

What is Spark, Comparison between Spark and Hadoop, Components of Spark

B. Spark Components, Common Spark Algorithms-Iterative Algorithms, Graph Analysis, Machine Learning

Apache Spark- Introduction, Consistency, Availability, Partition, Unified Stack Spark, Spark Components, Scalding example, mahout, storm, graph

C. Running Spark on a Cluster, Writing Spark Applications using Python, Java, Scala

Explain python example, Show installing a spark, Explain driver program, Explaining spark context with example, Define weakly typed variable, Combine scala and java seamlessly, Explain concurrency and distribution., Explain what is trait, Explain higher order function with example, Define OFI scheduler, Advantages of Spark, Example of Lamda using spark, Explain Mapreduce with example

Hadoop Cluster Setup and Running Map Reduce Jobs

Multi Node Cluster Setup using Amazon ec2 – Creating 4 node cluster setup, Running Map Reduce Jobs on Cluster

Major Project – Putting it all together and Connecting Dots

Putting it all together and Connecting Dots, Working with Large data sets, Steps involved in analyzing large data

ETL Connectivity with Hadoop Ecosystem

How ETL tools work in Big data Industry, Connecting to HDFS from ETL tool and moving data from Local system to HDFS, Moving Data from DBMS to HDFS, Working with Hive with ETL Tool, Creating Map Reduce job in ETL tool, End to End ETL PoC showing big data integration with ETL tool.

Cluster Configuration

Configuration overview and important configuration file, Configuration parameters and values, HDFS parameters MapReduce parameters, Hadoop environment setup, ‘Include’ and ‘Exclude’ configuration files, Lab: MapReduce Performance Tuning

Administration and Maintenance

Namenode/Datanode directory structures and files, File system image and Edit log, The Checkpoint Procedure, Namenode failure and recovery procedure, Safe Mode, Metadata and Data backup, Potential problems and solutions / what to look for, Adding and removing nodes, Lab: MapReduce File system Recovery

Monitoring and Troubleshooting

Best practices of monitoring a cluster, Using logs and stack traces for monitoring and troubleshooting, Using open-source tools to monitor the cluster

Job Scheduler: Map reduce job submission flow

How to schedule Jobs on the same cluster, FIFO Schedule, Fair Scheduler and its configuration

Multi Node Cluster Setup and Running Map Reduce Jobs on Amazon Ec2

Multi Node Cluster Setup using Amazon ec2 – Creating 4 node cluster setup, Running Map Reduce Jobs on Cluster

ZOOKEEPER

ZOOKEEPER Introduction, ZOOKEEPER use cases, ZOOKEEPER Services, ZOOKEEPER data Model, Znodes and its types, Znodes operations, Znodes watches, Znodes reads and writes, Consistency Guarantees, Cluster management, Leader Election, Distributed Exclusive Lock, Important points

Advance Oozie

Why Oozie?, Installing Oozie, Running an example, Oozie- workflow engine, Example M/R action, Word count example, Workflow application, Workflow submission, Workflow state transitions, Oozie job processing, Oozie security, Why Oozie security?, Job submission, Multi tenancy and scalability, Time line of Oozie job, Coordinator, Bundle, Layers of abstraction, Architecture, Use Case 1: time triggers, Use Case 2: data and time triggers, Use Case 3: rolling window

Advance Flume

Overview of Apache Flume, Physically distributed Data sources, Changing structure of Data, Closer look, Anatomy of Flume, Core concepts, Event, Clients, Agents, Source, Channels, Sinks, Interceptors, Channel selector, Sink processor, Data ingest, Agent pipeline, Transactional data exchange, Routing and replicating, Why channels?, Use case- Log aggregation, Adding flume agent, Handling a server farm, Data volume per agent, Example describing a single node flume deployment

Advance HUE

HUE introduction, HUE ecosystem, What is HUE?, HUE real world view, Advantages of HUE, How to upload data in File Browser?, View the content, Integrating users, Integrating HDFS, Fundamentals of HUE FRONTEND

Advance Impala

IMPALA Overview: Goals, User view of Impala: Overview, User view of Impala: SQL, User view of Impala: Apache HBase, Impala architecture, Impala state store, Impala catalogue service, Query execution phases, Comparing Impala to Hive

Hadoop Application Testing

Why testing is important, Unit testing, Integration testing, Performance testing, Diagnostics, Nightly QA test, Benchmark and end to end tests, Functional testing, Release certification testing, Security testing, Scalability Testing, Commissioning and Decommissioning of Data Nodes Testing, Reliability testing, Release testing

Roles and Responsibilities of Hadoop Testing Professional

Understanding the Requirement, preparation of the Testing Estimation, Test Cases, Test Data, Test bed creation, Test Execution, Defect Reporting, Defect Retest, Daily Status report delivery, Test completion, ETL testing at every stage (HDFS, HIVE, HBASE) while loading the input (logs/files/records etc) using sqoop/flume which includes but not limited to data verification, Reconciliation, User Authorization and Authentication testing (Groups, Users, Privileges etc), Report defects to the development team or manager and driving them to closure, Consolidate all the defects and create defect reports, Validating new feature and issues in Core Hadoop.

Framework called MR Unit for Testing of Map-Reduce Programs

Report defects to the development team or manager and driving them to closure, Consolidate all the defects and create defect reports, Responsible for creating a testing Framework called MR Unit for testing of Map-Reduce programs.

Unit Testing

Automation testing using the OOZIE, Data validation using the query surge tool.

Test Execution

Test plan for HDFS upgrade, Test automation and result

Test Plan Strategy and writing Test Cases for testing Hadoop Application

How to test install and configure

Job and Certification Support

Cloudera Certification Tips and Guidance and Mock Interview Preparation, Practical Development Tips and Techniques

Please write to us at info@itstechschool.com & contact us at +91-9870480053 for the course price & certification cost, schedule & location

Drop Us a Query

This training course is designed to help you clear both Cloudera Spark and Hadoop Developer Certification (CCA175) exam and Cloudera Certified Administrator for Apache Hadoop (CCAH) exam. The entire training course content is in line with these two certification programs and helps you clear these certification exams with ease and get the best jobs in the top MNCs.

As part of this training you will be working on real time projects and assignments that have immense implications in the real world industry scenario thus helping you fast track your career effortlessly.

At the end of this training program there will be quizzes that perfectly reflect the type of questions asked in the respective certification exams and helps you score better marks in certification exam.

ITS Course Completion Certificate will be awarded on the completion of Project work (on expert review) and upon scoring of at least 60% marks in the quiz. Intellipaat certification is well recognized in top 80+ MNCs like Ericsson, Cisco, Cognizant, Sony, Mu Sigma, Saint-Gobain, Standard Chartered, TCS, Genpact, Hexaware, etc.

For more info kindly Contact Us.


Give Your Reviews on this Course


85 + 5 =