Holdings Information
Doing data science / Rachel Schutt and Cathy O'Neil.
Bibliographic Record Display
-
Title:Doing data science / Rachel Schutt and Cathy O'Neil.
-
Author/Creator:Schutt, Rachel, 1976- author.
-
Other Contributors/Collections:O'Neil, Cathy, author.
ebrary, Inc.
-
Published/Created:Sebastopol, CA : O'Reilly Media, 2013.
-
Holdings
Holdings Record Display
-
Location:ONLINEWhere is this?
-
Call Number: QA76.9.D343
-
Number of Items:
0
- Status:No information available
-
Location:ONLINEWhere is this?
-
Library of Congress Subjects:Big data.
Data mining.
Information science.
Data structures (Computer science)
Database management.
Cyberinfrastructure.
-
Subject(s):Electronic books.
-
Edition:First edition.
-
Description:1 online resource.
-
Notes:Includes index.
Description based on online resource; title from digital title page (viewed on Dec. 16, 2013).
-
ISBN:1449358659
9781449358655
9781449363895 (electronic bk.)
144936389X (electronic bk.)
-
Contents:Machine generated contents note: Big Data and Data Science Hype
Getting Past the Hype
Why Now?
Datafication
Current Landscape (with a Little History)
Data Science Jobs
Data Science Profile
Thought Experiment: Meta-Definition
OK, So What Is a Data Scientist, Really?
In Academia
In Industry
Statistical Thinking in the Age of Big Data
Statistical Inference
Populations and Samples
Populations and Samples of Big Data
Big Data Can Mean Big Assumptions
Modeling
Exploratory Data Analysis
Philosophy of Exploratory Data Analysis
Exercise: EDA
Data Science Process
Data Scientist's Role in This Process
Thought Experiment: How Would You Simulate Chaos?
Case Study: RealDirect
How Does RealDirect Make Money?
Exercise: RealDirect Data Strategy
Machine Learning Algorithms
Three Basic Algorithms
Linear Regression
k-Nearest Neighbors (k-NN)
k-means
Exercise: Basic Machine Learning Algorithms
Solutions
Summing It All Up
Thought Experiment: Automated Statistician
Thought Experiment: Learning by Example
Why Won't Linear Regression Work for Filtering Spam?
How About k-nearest Neighbors?
Naive Bayes
Bayes Law
Spam Filter for Individual Words
Spam Filter That Combines Words: Naive Bayes
Fancy It Up: Laplace Smoothing
Comparing Naive Bayes to k-NN
Sample Code in bash
Scraping the Web: APIs and Other Tools
Jake's Exercise: Naive Bayes for Article Classification
Sample R Code for Dealing with the NYT API
Thought Experiments
Classifiers
Runtime
You
Interpretability
Scalability
M6D Logistic Regression Case Study
Click Models
Underlying Math
Estimating α and β
Newton's Method
Stochastic Gradient Descent
Implementation 124,
Evaluation
Media 6 Degrees Exercise
Sample R Code
Kyle Teague and GetGlue
Timestamps
Exploratory Data Analysis (EDA)
Metrics and New Variables or Features
What's Next?
Cathy O'Neil
Thought Experiment
Financial Modeling
In-Sample, Out-of-Sample, and Causality
Preparing Financial Data
Log Returns
Example: The S&P Index
Working out a Volatility Measurement
Exponential Downweighting
Financial Modeling Feedback Loop
Why Regression?
Adding Priors
Baby Model
Exercise: GetGlue and Timestamped Event Data
Exercise: Financial Data
William Cukierski
Background: Data Science Competitions
Background: Crowdsourcing
Kaggle Model
Single Contestant
Their Customers
Thought Experiment: What Are the Ethical Implications of a Robo-Grader?
Feature Selection
Example: User Retention
Filters
Wrappers
Embedded Methods: Decision Trees
Entropy
Decision Tree Algorithm
Handling Continuous Variables in Decision Trees
Random Forests
User Retention: Interpretability Versus Predictive Power
David Huffaker: Google's Hybrid Approach to Social Research
Moving from Descriptive to Predictive
Social at Google
Privacy
Thought Experiment: What Is the Best Way to Decrease Concern and Increase Understanding and Control?
Real-World Recommendation Engine
Nearest Neighbor Algorithm Review
Some Problems with Nearest Neighbors
Beyond Nearest Neighbor: Machine Learning Classification
Dimensionality Problem
Singular Value Decomposition (SVD)
Important Properties of SVD
Principal Component Analysis (PCA)
Alternating Least Squares
Fix V and Update U
Last Thoughts on These Algorithms
Thought Experiment: Filter Bubbles
Exercise: Build Your Own Recommendation System
Sample Code in Python
Data Visualization History
Gabriel Tarde
Mark's Thought Experiment
What Is Data Science, Redux?
Processing
Franco Moretti
Sample of Data Visualization Projects
Mark's Data Visualization Projects
New York Times Lobby: Moveable Type
Project Cascade: Lives on a Screen
Cronkite Plaza
eBay Transactions and Books
Public Theater Shakespeare Machine
Goals of These Exhibits
Data Science and Risk
About Square
Risk Challenge
Trouble with Performance Estimation
Model Building Tips
Data Visualization at Square
Ian's Thought Experiment
Data Visualization for the Rest of Us
Data Visualization Exercise
Social Network Analysis at Morning Analytics
Case-Attribute Data versus Social Network Data
Social Network Analysis
Terminology from Social Networks
Centrality Measures
Industry of Centrality Measures
Thought Experiment
Morningside Analytics
How Visualizations Help Us Find Schools of Fish
More Background on Social Network Analysis from a Statistical Point of View
Representations of Networks and Eigenvalue Centrality
First Example of Random Graphs: The Erdos-Renyi Model
Second Example of Random Graphs: The Exponential Random Graph Model
Data Journalism
Bit of History on Data Journalism
Writing Technical Journalism: Advice from an Expert
Correlation Doesn't Imply Causation
Asking Causal Questions
Confounders: A Dating Example
OK Cupid's Attempt
Gold Standard: Randomized Clinical Trials
A/B Tests
Second Best: Observational Studies
Simpson's Paradox
Rubin Causal Model
Visualizing Causality
Definition: The Causal Effect
Three Pieces of Advice
Madigan's Background
Thought Experiment
Modern Academic Statistics
Medical Literature and Observational Studies
Stratification Does Not Solve the Confounder Problem
What Do People Do About Confounding Things in Practice?
Is There a Better Way?
Research Experiment (Observational Medical Outcomes Partnership)
Closing Thought Experiment
Claudia's Data Scientist Profile
Life of a Chief Data Scientist
On Being a Female Data Scientist
Data Mining Competitions
How to Be a Good Modeler
Data Leakage
Market Predictions
Amazon Case Study: Big Spenders
Jewelry Sampling Problem
IBM Customer Targeting
Breast Cancer Detection
Pneumonia Prediction
How to Avoid Leakage
Evaluating Models
Accuracy: Meh
Probabilities Matter, Not 0s and 1s
Choosing an Algorithm
Final Example
Parting Thoughts
About David Crawshaw
Thought Experiment
MapReduce
Word Frequency Problem
Enter MapReduce
Other Examples of MapReduce
What Can't MapReduce Do?
Pregel
About Josh Wills
Thought Experiment
On Being a Data Scientist
Data Abundance Versus Data Scarcity
Designing Models
Economic Interlude: Hadoop
Brief Introduction to Hadoop
Cloudera
Back to Josh: Workflow
So How to Get Started with Hadoop?
Process Thinking
Naive No Longer
Helping Hands
Your Mileage May Vary
Bridging Tunnels
Some of Our Work
What Just Happened?
What Is Data Science (Again)?
What Are Next-Gen Data Scientists?
Being Problem Solvers
Cultivating Soft Skills
Being Question Askers
Being an Ethical Data Scientist
Career Advice.