Quantcast
Channel: Paul Long's Blog » big data
Viewing all articles
Browse latest Browse all 17

MRIA National Conference 2014 — Understanding Predictive Analytics

$
0
0

The following notes were live blogged from the “Understanding Predictive Analytics” session given by Chuck Chakrapani (Leger Marketing) on June 10, 2014. Minimal editing was done on the post, so there will be typos in the post.  Below is a video interview with the presenter:

Is interested in technology enabled predictive analytics (as opposed to technology driven)

What is Data Analysis:

  • big data
  • machine learning
  • data mining predictive analytics
  • text mining
  • etc.

Everything is predictive:

  • do we want to go to this session or another
  • do i take this job offer
  • will my stocks go up as well

Business

  • will this new product succeed
  • can i icrese the price
  • who will be by my target audience

Steps:

What will happen — A or B will happen, will have consequences on either results

Google Fusion

  • Enable you to pull information from the web
  • This means we have access to a vast amount of  secondary data

 The New Science of Data Science

Data science is the study of the generalizable extraction of knowledge from data.  It builds on techniques and theories from many fields:

  • signal processing
  • probability
  • etc

 What is big data?

  • A large amount of data?
  • More data than your desktop could handle?
  • One zetabyte of data
  • No agreed upon definitions
  • A tentaive framework
  • From the data universe that is infinite and constantly in flux

Big Data and the Flu

  • Google searches conversations about the flu to predict infection rates.  So big data is great when it works.  The problem with big data is that it is only correlations

Machine learning

  • Example:  Amazon tells me what I should read based on what I am reading now
  • Machine learns and predicts

What Happens When You Use Gmail

  • Google ads based on emails

Two Functions of Predictive Analytics

  • Classification
  • Prediction

 The objectives haven’t changed, but:

  • Lower costs
  • better predictability
  • faster turn-around

Example

  • 25 years ago, a single cluster analysis of 600 respondents on 30 variable will run for 24 hours on a pc
  • Today you can run 100 cluster analysis of 1000 respondents on 30 variables in one afternoon

How does that help?

Then:

  • one respondent randomly to represent a segment
  • everyone close is assigned to the segment
  • there is nothing to indicate if it is reasonable
  • no way of validating your segments
  • holdout sample is better than nothing, not good enough

Now:

  • We can have larger samples which help us split the sample into a Training set and Test set
  • We can do hundred of clutters on analysis on the same data

Message:

Do not think of big data as everything.  Unless you combine data with analysis the whole thing is useless.  You need to have objectives.

 

 


Viewing all articles
Browse latest Browse all 17

Trending Articles