Sale!

Doing Data Science Straight Talk From the Frontline 1st Edition by Cathy ONeil, Rachel Schutt ISBN 9781449363901 1449363903

Name: Doing Data Science Straight Talk From the Frontline 1st Edition by Cathy ONeil, Rachel Schutt ISBN 9781449363901 1449363903
SKU: EB-15654
Availability: InStock

Original price was: $50.00.Current price is: $25.00.

Authors:Cathy O’Neil; Rachel Schutt , Series:IT & Computer [247] , Tags:Computers; Data Science; General; Data Analytics; Programming; Algorithms; Mathematical & Statistical Software; Mathematics; Probability & Statistics; Bayesian Analysis; Regression Analysis; Stochastic Processes; Time Series , Author sort:O’Neil, Cathy & Schutt, Rachel , Ids:Google; 9781449363895 , Languages:Languages:eng , Published:Published:Oct 2013 , Publisher:”O’Reilly Media, Inc.” , Comments:Comments:Now that people are aware that data can make the difference in an election or a business model, data science as an occupation is gaining ground. But how can you get started working in a wide-ranging, interdisciplinary field thatâ€™s so clouded in hype? This insightful book, based on Columbia Universityâ€™s Introduction to Data Science class, tells you what you need to know.In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If youâ€™re familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science.Topics include:Statistical inference, exploratory data analysis, and the data science processAlgorithmsSpam filters, Naive Bayes, and data wranglingLogistic regressionFinancial modelingRecommendation engines and causalityData visualizationSocial networks and data journalismData engineering, MapReduce, Pregel, and HadoopDoing Data Science is collaboration between course instructor Rachel Schutt, Senior VP of Data Science at News Corp, and data science consultant Cathy Oâ€™Neil, a senior data scientist at Johnson Research Labs, who attended and blogged about the course.

SKU: EB-15654 Category: eBook PDF Tags: Cathy ONeil, Data Science Straight, Frontline, Rachel Schutt

Description

Doing Data Science Straight Talk From the Frontline 1st Edition by Cathy ONeil, Rachel Schutt – Ebook PDF Instant Download/Delivery. 9781449363901 ,1449363903
Full download Doing Data Science Straight Talk From the Frontline 1st Edition after payment

Product details:
ISBN 10: 1449363903
ISBN 13: 9781449363901
Author: Cathy ONeil, Rachel Schutt

Now that people are aware that data can make the difference in an election or a business model, data science as an occupation is gaining ground. But how can you get started working in a wide-ranging, interdisciplinary field that’s so clouded in hype? This insightful book, based on Columbia University’s Introduction to Data Science class, tells you what you need to know. In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you’re familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science. Topics include: Statistical inference, exploratory data analysis, and the data science process Algorithms Spam filters, Naive Bayes, and data wrangling Logistic regression Financial modeling Recommendation engines and causality Data visualization Social networks and data journalism Data engineering, MapReduce, Pregel, and Hadoop Doing Data Science is collaboration between course instructor Rachel Schutt, Senior VP of Data Science at News Corp, and data science consultant Cathy O’Neil, a senior data scientist at Johnson Research Labs, who attended and blogged about the course.

Doing Data Science Straight Talk From the Frontline 1st Edition Table of contents:

Chapter 1. Introduction: What Is Data Science?

Big Data and Data Science Hype

Getting Past the Hype

Why Now?

Datafication

The Current Landscape (with a Little History)

Data Science Jobs

A Data Science Profile

Thought Experiment: Meta-Definition

OK, So What Is a Data Scientist, Really?

In Academia

In Industry

Chapter 2. Statistical Inference, Exploratory Data Analysis, and the Data Science Process

Statistical Thinking in the Age of Big Data

Statistical Inference

Populations and Samples

Populations and Samples of Big Data

Big Data Can Mean Big Assumptions

Modeling

Exploratory Data Analysis

Philosophy of Exploratory Data Analysis

Exercise: EDA

The Data Science Process

A Data Scientist’s Role in This Process

Thought Experiment: How Would You Simulate Chaos?

Case Study: RealDirect

How Does RealDirect Make Money?

Exercise: RealDirect Data Strategy

Chapter 3. Algorithms

Machine Learning Algorithms

Three Basic Algorithms

Linear Regression

k-Nearest Neighbors (k-NN)

k-means

Exercise: Basic Machine Learning Algorithms

Solutions

Summing It All Up

Thought Experiment: Automated Statistician

Chapter 4. Spam Filters, Naive Bayes, and Wrangling

Thought Experiment: Learning by Example

Why Won’t Linear Regression Work for Filtering Spam?

How About k-nearest Neighbors?

Naive Bayes

Bayes Law

A Spam Filter for Individual Words

A Spam Filter That Combines Words: Naive Bayes

Fancy It Up: Laplace Smoothing

Comparing Naive Bayes to k-NN

Sample Code in bash

Scraping the Web: APIs and Other Tools

Jake’s Exercise: Naive Bayes for Article Classification

Sample R Code for Dealing with the NYT API

Chapter 5. Logistic Regression

Thought Experiments

Classifiers

Runtime

You

Interpretability

Scalability

M6D Logistic Regression Case Study

Click Models

The Underlying Math

Estimating α and β

Newton’s Method

Stochastic Gradient Descent

Implementation

Evaluation

Media 6 Degrees Exercise

Sample R Code

Chapter 6. Time Stamps and Financial Modeling

Kyle Teague and GetGlue

Timestamps

Exploratory Data Analysis (EDA)

Metrics and New Variables or Features

What’s Next?

Cathy O’Neil

Thought Experiment

Financial Modeling

In-Sample, Out-of-Sample, and Causality

Preparing Financial Data

Log Returns

Example: The S&P Index

Working out a Volatility Measurement

Exponential Downweighting

The Financial Modeling Feedback Loop

Why Regression?

Adding Priors

A Baby Model

Exercise: GetGlue and Timestamped Event Data

Exercise: Financial Data

Chapter 7. Extracting Meaning from Data

William Cukierski

Background: Data Science Competitions

Background: Crowdsourcing

The Kaggle Model

A Single Contestant

Their Customers

Thought Experiment: What Are the Ethical Implications of a Robo-Grader?

Feature Selection

Example: User Retention

Filters

Wrappers

Embedded Methods: Decision Trees

Entropy

The Decision Tree Algorithm

Handling Continuous Variables in Decision Trees

Random Forests

User Retention: Interpretability Versus Predictive Power

David Huffaker: Google’s Hybrid Approach to Social Research

Moving from Descriptive to Predictive

Social at Google

Privacy

Thought Experiment: What Is the Best Way to Decrease Concern and Increase Understanding and Control?

Chapter 8. Recommendation Engines: Building a User-Facing Data Product at Scale

A Real-World Recommendation Engine

Nearest Neighbor Algorithm Review

Some Problems with Nearest Neighbors

Beyond Nearest Neighbor: Machine Learning Classification

The Dimensionality Problem

Singular Value Decomposition (SVD)

Important Properties of SVD

Principal Component Analysis (PCA)

Alternating Least Squares

Fix V and Update U

Last Thoughts on These Algorithms

Thought Experiment: Filter Bubbles

Exercise: Build Your Own Recommendation System

Sample Code in Python

Chapter 9. Data Visualization and Fraud Detection

Data Visualization History

Gabriel Tarde

Mark’s Thought Experiment

What Is Data Science, Redux?

Processing

Franco Moretti

A Sample of Data Visualization Projects

Mark’s Data Visualization Projects

New York Times Lobby: Moveable Type

Project Cascade: Lives on a Screen

Cronkite Plaza

eBay Transactions and Books

Public Theater Shakespeare Machine

Goals of These Exhibits

Data Science and Risk

About Square

The Risk Challenge

The Trouble with Performance Estimation

Model Building Tips

Data Visualization at Square

Ian’s Thought Experiment

Data Visualization for the Rest of Us

Data Visualization Exercise

Chapter 10. Social Networks and Data Journalism

Social Network Analysis at Morning Analytics

Case-Attribute Data versus Social Network Data

Social Network Analysis

Terminology from Social Networks

Centrality Measures

The Industry of Centrality Measures

Thought Experiment

Morningside Analytics

How Visualizations Help Us Find Schools of Fish

More Background on Social Network Analysis from a Statistical Point of View

Representations of Networks and Eigenvalue Centrality

A First Example of Random Graphs: The Erdos-Renyi Model

A Second Example of Random Graphs: The Exponential Random Graph Model

Data Journalism

A Bit of History on Data Journalism

Writing Technical Journalism: Advice from an Expert

Chapter 11. Causality

Correlation Doesn’t Imply Causation

Asking Causal Questions

Confounders: A Dating Example

OK Cupid’s Attempt

The Gold Standard: Randomized Clinical Trials

A/B Tests

Second Best: Observational Studies

Simpson’s Paradox

The Rubin Causal Model

Visualizing Causality

Definition: The Causal Effect

Three Pieces of Advice

Chapter 12. Epidemiology

Madigan’s Background

Thought Experiment

Modern Academic Statistics

Medical Literature and Observational Studies

Stratification Does Not Solve the Confounder Problem

What Do People Do About Confounding Things in Practice?

Is There a Better Way?

Research Experiment (Observational Medical Outcomes Partnership)

Closing Thought Experiment

Chapter 13. Lessons Learned from Data Competitions: Data Leakage and Model Evaluation

Claudia’s Data Scientist Profile

The Life of a Chief Data Scientist

On Being a Female Data Scientist

Data Mining Competitions

How to Be a Good Modeler

Data Leakage

Market Predictions

Amazon Case Study: Big Spenders

A Jewelry Sampling Problem

IBM Customer Targeting

Breast Cancer Detection

Pneumonia Prediction

How to Avoid Leakage

Evaluating Models

Accuracy: Meh

Probabilities Matter, Not 0s and 1s

Choosing an Algorithm

A Final Example

Parting Thoughts

Chapter 14. Data Engineering: MapReduce, Pregel, and Hadoop

About David Crawshaw

Thought Experiment

MapReduce

Word Frequency Problem

Enter MapReduce

Other Examples of MapReduce

What Can’t MapReduce Do?

Pregel

About Josh Wills

Thought Experiment

On Being a Data Scientist

Data Abundance Versus Data Scarcity

Designing Models

Economic Interlude: Hadoop

A Brief Introduction to Hadoop

Cloudera

Back to Josh: Workflow

So How to Get Started with Hadoop?

Chapter 15. The Students Speak

Process Thinking

Naive No Longer

Helping Hands

Your Mileage May Vary

Bridging Tunnels

Some of Our Work

Chapter 16. Next-Generation Data Scientists, Hubris, and Ethics

What Just Happened?

What Is Data Science (Again)?

What Are Next-Gen Data Scientists?

Being Problem Solvers

Cultivating Soft Skills

Being Question Askers

Being an Ethical Data Scientist

Career Advice

Index

People also search for Doing Data Science Straight Talk From the Frontline 1st Edition:

doing data science straight talk from the frontline ppt

doing data science straight talk from the frontline o’reilly media

straight talk wireless terms and conditions

how does straight talk wireless work

Doing Data Science Straight Talk From the Frontline 1st Edition by Cathy ONeil, Rachel Schutt ISBN 9781449363901 1449363903

You may also like…

Login