- 36 Hours of Case Studies on Real-life Scenarios
- 12 Sessions of 3 hours each on weekends
- 18 Sessions of 2 hours each on weekdays
- Practical Assignments
- Lifetime Access to Learning Management System
- 24x7 Expert Support
- Course Completion Certificate
- Online Forum for Discussions
Available Courses Delivery
This course is available in the following formats:
This course has been designed to train learners in the Big Data Hadoop Ecosystem. It teaches how to develop Spark applications using Scala programming. It develops an understanding to distinguish between Spark and Hadoop. It teaches about the tools and techniques to enhance application efficiency. It educates about strengthening the processing speed using Spark RDD. It also trains in customization of Spark using Scala.
- Introduction to Big Data, Hadoop, Hadoop Distributed File System (HDFS), and Yet Another Resource Negotiator (YARN)
- Familiarize with the limitations of MapReduce and the role of Spark in surmounting all the shortcomings
- Educate about the Scala fundamental concepts and features
- Establish proficiency in Spark Ecosystem tools, such as, Spark SQL, Sqoop Kafka, Spark MLib, and Flume & Spark Streaming
- Train in ingesting data in HDFS using Sqoop & Flume
- Teach about managing real-time data through the publish-subscribe messaging system, such as, Kafka
- Acquaint with Spark ML programming and Graph X programming
- Demonstrate expertise in Big Data Hadoop Ecosystem
- Better remuneration as an Apache Spark Specialist
- Considered an expert with exclusive skill sets of Apache and Spark
- Basic knowledge of Database, SQL, and Query language will be beneficial
Who should take up?
- Business Intelligence Professionals
- ETL and DW Professionals
- Senior IT Professionals
- Testing Professionals
- Mainframe Professionals
- Big Data Professionals
- Software Architects and Engineers
- Software Developers
- Data Scientists
- Analytics Professionals
- What is Big Data?
- Big Data Customer Scenarios
- Limitations and Solutions of Existing Data Analytics Architecture with Uber Use Case
- How Hadoop Solves the Big Data Problem?
- What is Hadoop?
- Hadoop?s Key Characteristics
- Hadoop Ecosystem and HDFS
- Hadoop Core Components
- Rack Awareness and Block Replication
- YARN and its Advantage
- Hadoop Cluster and its Architecture
- Hadoop: Different Cluster Modes
- Big Data Analytics with Batch & Real-time Processing
- Why Spark is needed?
- What is Spark?
- How Spark differs from other frameworks?
- Spark at Yahoo!
- What is Scala?
- Why Scala for Spark?
- Scala in other Frameworks
- Introduction to Scala REPL
- Basic Scala Operations
- Variable Types in Scala
- Control Structures in Scala
- Foreach loop, Functions and Procedures
- Collections in Scala- Array
- ArrayBuffer, Map, Tuples, Lists, and more
- Functional Programming
- Higher Order Functions
- Anonymous Functions
- Class in Scala
- Getters and Setters
- Custom Getters and Setters
- Properties with only Getters
- Auxiliary Constructor and Primary Constructor
- Extending a Class
- Overriding Methods
- Traits as Interfaces and Layered Traits
- Spark?s Place in Hadoop Ecosystem
- Spark Components & its Architecture
- Spark Deployment Modes
- Introduction to Spark Shell
- Writing your first Spark Job Using SBT
- Submitting Spark Job
- Spark Web UI
- Data Ingestion using Sqoop
- Challenges in Existing Computing Methods
- Probable Solution & How RDD Solves the Problem
- What is RDD, It?s Operations, Transformations & Actions
- Data Loading and Saving Through RDDs
- Key-Value Pair RDDs
- Other Pair RDDs, Two Pair RDDs
- RDD Lineage
- RDD Persistence
- WordCount Program Using RDD Concepts
- RDD Partitioning & How It Helps Achieve Parallelization
- Passing Functions to Spark
- Need for Spark SQL
- What is Spark SQL?
- Spark SQL Architecture
- SQL Context in Spark SQL
- User Defined Functions
- Data Frames & Datasets
- Interoperating with RDDs
- JSON and Parquet File Formats
- Loading Data through Different Sources
- Spark ? Hive Integration
- Why Machine Learning?
- What is Machine Learning?
- Where Machine Learning is Used?
- Face Detection: USE CASE
- Different Types of Machine Learning Techniques
- Introduction to MLlib
- Features of MLlib and MLlib Tools
- Various ML algorithms supported by MLlib
- Supervised Learning - Linear Regression, Logistic Regression, Decision Tree, Random Forest
- Unsupervised Learning - K-Means Clustering & How It Works with MLlib
- Analysis on US Election Data using MLlib (K-Means)
- Need for Kafka
- What is Kafka?
- Core Concepts of Kafka
- Kafka Architecture
- Where is Kafka Used?
- Understanding the Components of Kafka Cluster
- Configuring Kafka Cluster
- Kafka Producer and Consumer Java API
- Need of Apache Flume
- What is Apache Flume?
- Basic Flume Architecture
- Flume Sources
- Flume Sinks
- Flume Channels
- Flume Configuration
- Integrating Apache Flume and Apache Kafka
- Drawbacks in Existing Computing Methods
- Why Streaming is Necessary?
- What is Spark Streaming?
- Spark Streaming Features
- Spark Streaming Workflow
- How Uber Uses Streaming Data
- Streaming Context & DStreams
- Transformations on DStreams
- Describe Windowed Operators and Why it is Useful
- Important Windowed Operators
- Slice, Window and ReduceByWindow Operators
- Stateful Operators
- Apache Spark Streaming: Data Sources
- Streaming Data Source Overview
- Apache Flume and Apache Kafka Data Sources
- Example: Using a Kafka Direct Data Source
- Perform Twitter Sentimental Analysis Using Spark Streaming
- Work on an end-to-end Financial domain project covering all the major concepts of Spark taught during the course
- In this module, you will be learning the key concepts of Spark GraphX programming and operations along with different GraphX algorithms and their implementations
SEARCHING FOR THE RIGHT COURSE?
Upskill counselors can help you pick the suitable program
CALL US NOW 1-800-299-5097