BIG DATA PROCESSING
Apache Hadoop
Introduction
Apache Hadoop: Introduction
Hadoop HDFS Commands
Examples of Test Questions
Apache Spark
Apache Spark: Intoduction
Apache Spark: Resilient Distributed Dataset (RDD)
Apache Spark: RDD Transformations
Apache Spark: RDD Actions
Apache Spark: SparkSQL & SparkSession
Apache Spark: Handle with DataFrames
Read JSON data to DataFrame
Read CSV file to DataFrame
Create DataFrame by Defining the Scheme
Apache Spark: Machine Learning with MLlib
Frequent Pattern Mining
Spark MLlib: FP-Growth
Classification: Logistic Regression
Train and Evaluate Logistic Regression Model
Feature Extraction and Data Pre-processing
Collaborative Filtering / Recommendation Systems
Movie Recomendation System with ALS
Load MovieLens Data to SparkSQL
Train ALS model for Movie Recomendation System
MLlib Pipelines
ML Transformers
ML Estimators
ML Evaluator
ML CrossValidator
Pipeline Example
Apache Spark: Streaming Basic Concepts
Discretized Streams (DStream)
Linking, Streaming Context
Spark Twitter Streaming Example
Save Twitter Stream as JSON files
Apache Spark: Graph Analysis via GraphX
Linking & GraphX RDDs
Property Graph
PageRank
Apache Zeppelin: Web-based Spark Engine
Deploy Zeppelin with Spark and Hadoop on Windows
Setup Zeppelin Ineterpreter Environment
Practical Task
Practice 1: Basic Spark RDD Operations
Practice 2: SparkSQL, DataFrames
Practice 3: Spark MLlib - Market Basket Analysis via FPGrowth
Practice 4: Spark MLlib - Logistic Regression
Practice 5: Spark Streaming - Twitter Analysis on Real Time
Practice 6: Spark MLLib - Movie Recomendation System
Practice 7: Spark MLlib: Analyze Uber Data
Practice 8: Spark GraphX - Analyze Bike Routes Data
Tutorials
Develop Apache Spark Apps with IntelliJ IDEA
Deploy Hadoop Cluster on Windows OS
Deploy Spark Cluster on Windows OS
Twitter Credential Setup Guide
Test Questions and Materials
Examples of Test Questions
Apache Spark Use Case Projects
San Francisco Crime Classification
House Prices: Advanced Regression Techniques
Airbnb New User Bookings
What's Cooking?
Powered by
GitBook
Introduction
BIG DATA PROCESSING
results matching "
"
No results matching "
"