Sale!

Learning Spark Lightning Fast Big Data Analysis 1st Edition by Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia ISBN 1449358624 9781449358624

Name: Learning Spark Lightning Fast Big Data Analysis 1st Edition by Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia ISBN 1449358624 9781449358624
SKU: EB-15610
Availability: InStock

Original price was: $50.00.Current price is: $25.00.

Authors:Holden Karau; Andy Konwinski; Patrick Wendell; Matei Zaharia , Series:IT & Computer [213] , Tags:Data in all domains is getting bigger. How can you work with it efficiently? , Author sort:Karau, Holden & Konwinski, Andy & Wendell, Patrick & Zaharia, Matei , Identifiers:Identifiers:mobi-asin:fa545bc4-f0fc-40ff-a0c2-fe79e2542e40 , Languages:Languages:eng , Published:Published:Jan 2015 , Publisher:O’Reilly Media , Comments:Comments:Data in all domains is getting bigger. How can you work with it efficiently?This book introduces Apache Spark, the open source cluster computing system that makesdata analytics fast to write and fast to run. With Spark, you can tackle big datasetsquickly through simple APIs in Python, Java, and Scala. Written by the developers ofSpark, this book will have data scientists and engineers up and running in notime.

SKU: EB-15610 Category: eBook PDF Tags: Andy Konwinski, Holden Karau, Matei Zaharia, Patrick Wendell, Spark Lightning Fast

Description

Learning Spark Lightning Fast Big Data Analysis 1st Edition by Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia – Ebook PDF Instant Download/Delivery. 1449358624 ,9781449358624
Full download Learning Spark Lightning Fast Big Data Analysis 1st Edition after payment

Product details:
ISBN 10: 1449358624
ISBN 13: 9781449358624
Author: Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia

Data in all domains is getting bigger. How can you work with it efficiently? Recently updated for Spark 1.3, this book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. This edition includes new information on Spark SQL, Spark Streaming, setup, and Maven coordinates.

Written by the developers of Spark, this book will have data scientists and engineers up and running in no time. You’ll learn how to express parallel jobs with just a few lines of code, and cover applications from simple batch jobs to stream processing and machine learning.

Quickly dive into Spark capabilities such as distributed datasets, in-memory caching, and the interactive shell
Leverage Spark’s powerful built-in libraries, including Spark SQL, Spark Streaming, and MLlib
Use one programming paradigm instead of mixing and matching tools like Hive, Hadoop, Mahout, and Storm
Learn how to deploy interactive, batch, and streaming applications
Connect to data sources including HDFS, Hive, JSON, and S3
Master advanced topics like data partitioning and shared variables

Learning Spark Lightning Fast Big Data Analysis 1st Edition Table of contents:

Chapter 1: Introduction to Apache Spark

What is Apache Spark?
The Spark Ecosystem: Components and Use Cases
Key Features of Spark
Spark vs. Hadoop
Getting Started with Spark
Running Spark on Your Local Machine

Chapter 2: Spark Basics

Introduction to Spark Programming
RDDs (Resilient Distributed Datasets) and DataFrames
Creating RDDs and DataFrames
Transformations and Actions in Spark
Lazy Evaluation
Caching in Spark

Chapter 3: Spark DataFrames and SQL

Introduction to Spark DataFrames
Operations on DataFrames
Using SQL with Spark
Using the Spark SQL API
Connecting to External Databases
Managing DataFrames and Tables

Chapter 4: Working with Data

Loading Data into Spark
Reading from Various Data Sources (CSV, JSON, Parquet, etc.)
Data Transformation and Cleansing
Working with Structured Data
Joining DataFrames and RDDs

Chapter 5: Spark’s Machine Learning Library (MLlib)

Introduction to MLlib
Classification, Regression, and Clustering
Building Machine Learning Pipelines in Spark
Evaluating Models in Spark
Feature Engineering and Feature Selection
Tuning Hyperparameters in Spark MLlib

Chapter 6: Spark Streaming

What is Spark Streaming?
Processing Real-Time Data with Spark Streaming
DStreams and Transformations
Integrating with Kafka and Other Streaming Sources
Stateful Transformations
Windowed Operations and Time-based Processing

Chapter 7: Spark on the Cloud

Running Spark on Cloud Platforms (AWS, Azure, Google Cloud)
Setting up Spark on EC2 and other Cloud Services
Using Spark on Databricks
Managing Spark Clusters on the Cloud
Performance Tuning for Cloud Environments

Chapter 8: Performance Tuning and Optimization

Understanding Spark’s Performance Characteristics
Best Practices for Tuning Spark Jobs
Partitioning and Shuffling Data
Configuring Spark for Maximum Efficiency
Monitoring and Debugging Spark Jobs

Chapter 9: Advanced Spark Features

Spark SQL Optimization Techniques
Broadcast Variables and Accumulators
Custom Transformations and Actions
Using GraphX for Graph Processing
Working with SparkR and PySpark

Chapter 10: Case Studies and Real-World Applications

Case Study 1: Building a Data Pipeline with Spark
Case Study 2: Machine Learning with Spark MLlib
Case Study 3: Real-Time Data Analysis with Spark Streaming
Performance Tuning for Large-Scale Data Processing

Appendices

A: Installing Spark and Setting Up the Environment
B: Using the Spark Shell and REPL
C: Resources for Learning More about Apache Spark
D: Spark and Hadoop Integration

People also search for Learning Spark Lightning Fast Big Data Analysis 1st Edition:

learning spark lightning fast big data analysis pdf

borrow learning spark lightning fast big data analysis

learning spark lightning fast big data analysis matei zaharia 2015

learning spark lightning fast big data analysis 2nd edition