New Search Search History

Holdings Information

    Doing data science / Rachel Schutt and Cathy O'Neil.

    • Title:Doing data science / Rachel Schutt and Cathy O'Neil.
    •    
    • Author/Creator:Schutt, Rachel, 1976- author.
    • Other Contributors/Collections:O'Neil, Cathy, author.
      ebrary, Inc.
    • Published/Created:Sebastopol, CA : O'Reilly Media, 2013.
    • Holdings

      • Location:ONLINEWhere is this?
      • Call Number: QA76.9.D343
      • Number of Items:
        0
      • Status:No information available 
       
    • Library of Congress Subjects:Big data.
      Data mining.
      Information science.
      Data structures (Computer science)
      Database management.
      Cyberinfrastructure.
    • Subject(s):Electronic books.
    • Edition:First edition.
    • Description:1 online resource.
    • Notes:Includes index.
      Description based on online resource; title from digital title page (viewed on Dec. 16, 2013).
    • ISBN:1449358659
      9781449358655
      9781449363895 (electronic bk.)
      144936389X (electronic bk.)
    • Contents:Machine generated contents note: Big Data and Data Science Hype
      Getting Past the Hype
      Why Now?
      Datafication
      Current Landscape (with a Little History)
      Data Science Jobs
      Data Science Profile
      Thought Experiment: Meta-Definition
      OK, So What Is a Data Scientist, Really?
      In Academia
      In Industry
      Statistical Thinking in the Age of Big Data
      Statistical Inference
      Populations and Samples
      Populations and Samples of Big Data
      Big Data Can Mean Big Assumptions
      Modeling
      Exploratory Data Analysis
      Philosophy of Exploratory Data Analysis
      Exercise: EDA
      Data Science Process
      Data Scientist's Role in This Process
      Thought Experiment: How Would You Simulate Chaos?
      Case Study: RealDirect
      How Does RealDirect Make Money?
      Exercise: RealDirect Data Strategy
      Machine Learning Algorithms
      Three Basic Algorithms
      Linear Regression
      k-Nearest Neighbors (k-NN)
      k-means
      Exercise: Basic Machine Learning Algorithms
      Solutions
      Summing It All Up
      Thought Experiment: Automated Statistician
      Thought Experiment: Learning by Example
      Why Won't Linear Regression Work for Filtering Spam?
      How About k-nearest Neighbors?
      Naive Bayes
      Bayes Law
      Spam Filter for Individual Words
      Spam Filter That Combines Words: Naive Bayes
      Fancy It Up: Laplace Smoothing
      Comparing Naive Bayes to k-NN
      Sample Code in bash
      Scraping the Web: APIs and Other Tools
      Jake's Exercise: Naive Bayes for Article Classification
      Sample R Code for Dealing with the NYT API
      Thought Experiments
      Classifiers
      Runtime
      You
      Interpretability
      Scalability
      M6D Logistic Regression Case Study
      Click Models
      Underlying Math
      Estimating α and β
      Newton's Method
      Stochastic Gradient Descent
      Implementation 124,
      Evaluation
      Media 6 Degrees Exercise
      Sample R Code
      Kyle Teague and GetGlue
      Timestamps
      Exploratory Data Analysis (EDA)
      Metrics and New Variables or Features
      What's Next?
      Cathy O'Neil
      Thought Experiment
      Financial Modeling
      In-Sample, Out-of-Sample, and Causality
      Preparing Financial Data
      Log Returns
      Example: The S&P Index
      Working out a Volatility Measurement
      Exponential Downweighting
      Financial Modeling Feedback Loop
      Why Regression?
      Adding Priors
      Baby Model
      Exercise: GetGlue and Timestamped Event Data
      Exercise: Financial Data
      William Cukierski
      Background: Data Science Competitions
      Background: Crowdsourcing
      Kaggle Model
      Single Contestant
      Their Customers
      Thought Experiment: What Are the Ethical Implications of a Robo-Grader?
      Feature Selection
      Example: User Retention
      Filters
      Wrappers
      Embedded Methods: Decision Trees
      Entropy
      Decision Tree Algorithm
      Handling Continuous Variables in Decision Trees
      Random Forests
      User Retention: Interpretability Versus Predictive Power
      David Huffaker: Google's Hybrid Approach to Social Research
      Moving from Descriptive to Predictive
      Social at Google
      Privacy
      Thought Experiment: What Is the Best Way to Decrease Concern and Increase Understanding and Control?
      Real-World Recommendation Engine
      Nearest Neighbor Algorithm Review
      Some Problems with Nearest Neighbors
      Beyond Nearest Neighbor: Machine Learning Classification
      Dimensionality Problem
      Singular Value Decomposition (SVD)
      Important Properties of SVD
      Principal Component Analysis (PCA)
      Alternating Least Squares
      Fix V and Update U
      Last Thoughts on These Algorithms
      Thought Experiment: Filter Bubbles
      Exercise: Build Your Own Recommendation System
      Sample Code in Python
      Data Visualization History
      Gabriel Tarde
      Mark's Thought Experiment
      What Is Data Science, Redux?
      Processing
      Franco Moretti
      Sample of Data Visualization Projects
      Mark's Data Visualization Projects
      New York Times Lobby: Moveable Type
      Project Cascade: Lives on a Screen
      Cronkite Plaza
      eBay Transactions and Books
      Public Theater Shakespeare Machine
      Goals of These Exhibits
      Data Science and Risk
      About Square
      Risk Challenge
      Trouble with Performance Estimation
      Model Building Tips
      Data Visualization at Square
      Ian's Thought Experiment
      Data Visualization for the Rest of Us
      Data Visualization Exercise
      Social Network Analysis at Morning Analytics
      Case-Attribute Data versus Social Network Data
      Social Network Analysis
      Terminology from Social Networks
      Centrality Measures
      Industry of Centrality Measures
      Thought Experiment
      Morningside Analytics
      How Visualizations Help Us Find Schools of Fish
      More Background on Social Network Analysis from a Statistical Point of View
      Representations of Networks and Eigenvalue Centrality
      First Example of Random Graphs: The Erdos-Renyi Model
      Second Example of Random Graphs: The Exponential Random Graph Model
      Data Journalism
      Bit of History on Data Journalism
      Writing Technical Journalism: Advice from an Expert
      Correlation Doesn't Imply Causation
      Asking Causal Questions
      Confounders: A Dating Example
      OK Cupid's Attempt
      Gold Standard: Randomized Clinical Trials
      A/B Tests
      Second Best: Observational Studies
      Simpson's Paradox
      Rubin Causal Model
      Visualizing Causality
      Definition: The Causal Effect
      Three Pieces of Advice
      Madigan's Background
      Thought Experiment
      Modern Academic Statistics
      Medical Literature and Observational Studies
      Stratification Does Not Solve the Confounder Problem
      What Do People Do About Confounding Things in Practice?
      Is There a Better Way?
      Research Experiment (Observational Medical Outcomes Partnership)
      Closing Thought Experiment
      Claudia's Data Scientist Profile
      Life of a Chief Data Scientist
      On Being a Female Data Scientist
      Data Mining Competitions
      How to Be a Good Modeler
      Data Leakage
      Market Predictions
      Amazon Case Study: Big Spenders
      Jewelry Sampling Problem
      IBM Customer Targeting
      Breast Cancer Detection
      Pneumonia Prediction
      How to Avoid Leakage
      Evaluating Models
      Accuracy: Meh
      Probabilities Matter, Not 0s and 1s
      Choosing an Algorithm
      Final Example
      Parting Thoughts
      About David Crawshaw
      Thought Experiment
      MapReduce
      Word Frequency Problem
      Enter MapReduce
      Other Examples of MapReduce
      What Can't MapReduce Do?
      Pregel
      About Josh Wills
      Thought Experiment
      On Being a Data Scientist
      Data Abundance Versus Data Scarcity
      Designing Models
      Economic Interlude: Hadoop
      Brief Introduction to Hadoop
      Cloudera
      Back to Josh: Workflow
      So How to Get Started with Hadoop?
      Process Thinking
      Naive No Longer
      Helping Hands
      Your Mileage May Vary
      Bridging Tunnels
      Some of Our Work
      What Just Happened?
      What Is Data Science (Again)?
      What Are Next-Gen Data Scientists?
      Being Problem Solvers
      Cultivating Soft Skills
      Being Question Askers
      Being an Ethical Data Scientist
      Career Advice.
    Session Timeout
    New Session