Using Data Mining and Machine Learning in Retail



Similar documents
Big Data Success Step 1: Get the Technology Right

Big Data Tools: Game Changer for Mainstream Enterprises

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

III JORNADAS DE DATA MINING

Transforming the Telecoms Business using Big Data and Analytics

BIG DATA What it is and how to use?

Customized Report- Big Data

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Manifest for Big Data Pig, Hive & Jaql

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

ANALYTICS CENTER LEARNING PROGRAM

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Big Data Analytics Nokia

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Intro to Big Data and Business Intelligence

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Testing Big data is one of the biggest

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Large scale processing using Hadoop. Ján Vaňo

Ganzheitliches Datenmanagement

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May Santa Clara, CA

Unified Batch & Stream Processing Platform

BIG DATA TRENDS AND TECHNOLOGIES

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

BIG DATA TECHNOLOGY. Hadoop Ecosystem

How To Handle Big Data With A Data Scientist

The 4 Pillars of Technosoft s Big Data Practice

From Spark to Ignition:

Big Data Analytics - Accelerated. stream-horizon.com

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Hadoop implementation of MapReduce computational model. Ján Vaňo


Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Information Builders Mission & Value Proposition

DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

CloudRank-D:A Benchmark Suite for Private Cloud Systems

Distributed Computing and Big Data: Hadoop and MapReduce

Using distributed technologies to analyze Big Data

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Microsoft Big Data Solutions. Anar Taghiyev P-TSP

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

BigMemory and Hadoop: Powering the Real-time Intelligent Enterprise

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

Introduction to Data Mining

How To Scale Out Of A Nosql Database

SQL + NOSQL + NEWSQL + REALTIME FOR INVESTMENT BANKS

FINANCIAL SERVICES: FRAUD MANAGEMENT A solution showcase

How To Create A Data Visualization With Apache Spark And Zeppelin

HPC ABDS: The Case for an Integrating Apache Big Data Stack

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

HDP Hadoop From concept to deployment.

Big Data lisää älyä tiedosta

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Big Data? Definition # 1: Big Data Definition Forrester Research

HDFS. Hadoop Distributed File System

Integrating a Big Data Platform into Government:

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Big Data Zurich, November 23. September 2011

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.

An Approach to Implement Map Reduce with NoSQL Databases

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

Using an In-Memory Data Grid for Near Real-Time Data Analysis

An Oracle White Paper October Oracle: Big Data for the Enterprise

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

BIG DATA SOLUTION DATA SHEET

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Data Refinery with Big Data Aspects

Proact whitepaper on Big Data

How To Use Big Data For Telco (For A Telco)

Big Data: Overview and Roadmap eglobaltech. All rights reserved.

Big Data and Apache Hadoop s MapReduce

Cloud Computing Now and the Future Development of the IaaS

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

NoSQL for SQL Professionals William McKnight

Play with Big Data on the Shoulders of Open Source

The basic data mining algorithms introduced may be enhanced in a number of ways.

Driving Growth in Insurance With a Big Data Architecture

Integrating Kerberos into Apache Hadoop

Hadoop Ecosystem B Y R A H I M A.

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

Ali Ghodsi Head of PM and Engineering Databricks

Challenges for Data Driven Systems

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland

Transcription:

Using Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data Solution Architect Sears Holdings

Over a Century of Innovation A Fortune 100 company, nearly $40 billion in annualgrowing revenue. data volumeslargest fourth The nation s broad line Shortened Tight IT retailer with almost 2,500 full-line processing and budgets windows specialty retail stores in the US and Canada. A front runner inthe Big Data effortsescalating Latency in Challenge costs including driving personalized data marketing and generating savings from legacy migration. Hitting ETL Running one of the biggest scalability rewards complexity ceilings programs that captures and analyzes a Demanding business very large number of customer requirements transactions quickly. 2

What is Big Data? Big Data can no longer be defined by the amount of data, but by the type, speed, and storage capacity needed to compute and analyze that data. 3

Data, Data, and More Data We are creating so much data, so quickly, that 90% of the data in the world today has been created in the last 2 years. 4

The Problem with Large Scale Data Processing With traditional computer processing--it can be difficult to compute everything, due to storage space, processing time, and cost. This typically leads to incomplete computations, data latency, and overall lack of quality analysis. Hadoop brings infinite scalability, extremely large storage capability, and fast data processing. 5

Enter Hadoop Apache Hadoop is a framework which: Runs applications on a large cluster built of commodity hardware. Provides reliability and data motion to applications. Implements a computational paradigm named MapReduce. Applications divided into small fragments of work for execution/ re-execution on any node in the cluster. Provides a Distributed File System (HDFS) that stores data on compute nodes, resulting in high aggregate bandwidth across the cluster. Both Map/Reduce and the Distributed File System Framework automatically handle the node failures. 6

Why Use Hadoop? Stability: Hadoop is horizontally scalable. Easily stores and processes petabytes of data, just by adding hardware. Economical: Uses commodity based hardware. Efficient: Extremely powerful processing ability. Reliability: Data is replicated 3x times (min) in different locations; failed tasks are rerun. Storage space & Capacity: Central Repository; Keep everything forever. 7

Big Data Analytics in Retail How can I better manage my inventory? How can I better understand my customers buying habits? How can I detect fraudulent activity? How can I create better targeted interaction with my customer? How do I get customers to purchase more products? 8

The Evolution Data Analysis 9

What is Mahout? Top Apache Foundation software project Uses Scalable Machine Learning algorithms Collection of pre-built data-mining libraries Primary focus on collaborative filtering, clustering & classification Houses a Java based math library that uses common math operations Uses MapReduce paradigm 10

Examples of Data Mining & Machine Learning 11

3 Primary Algorithms Clustering Recommendation Systems Market Basket Analysis 12

Clustering A process of grouping similar things in such a way, so that like items are grouped together with other items that most closely represent themselves. 13

Motivation behind Clustering Why use Clustering?? To better understand a customer s buying behavior To develop targeted marketing campaigns To understand interest, motivation, and lifestyle, in order more effectively move merchandise in and out of stores 14

Recommendation Systems An information filtering system that is used to predict a users rating or preference, typically using a collaborative, content-based or hybrid approach to recommendations. 15

Collaborative Filtering Framework that filters and recommends items based on user behavior, preferences and activities. Based on their similarities to others. Recommenders User based Item based Online and Offline support Can utilize Hadoop Uses numerous similarity measurements, such as Cosine, LLR, Tanimoto, Pearson, and more. 16

Content- Based Filtering Looks at the item and the users preference in order, and provides a recommendation. Users Ratings A B Feature Values User Profile C Content used in the past Allows for highly precise recommendations. Difficulty when making recommendation over cross-sections of service when used for crossselling. Matching X Y Z Feature Values Content with similar feature values is recommended Content Profile profile Contents 17

Market-Basket Model A model used to describe the commonality of several relationships between two objects. Items: anything that is purchased Basket: a set of items The numbers of items in a basket is typically small, and the number of baskets is typically large 18

Market Basket Models A list of Purchasers Additional Purchaser data is can be useful (but is not needed) A list of transactions Seek to identify purchasing patterns What items are normally purchased together What is the purchasing sequence Is there a seasonality effect to purchasing Categorize buying behavior Translate buying behavior into actionable insight Targeted promotions Inventory placement Store layout Cross- Selling 19

Frequent Itemsets Any set of items that appears regularly within multiple baskets Originally used to analyze a physical supermarket basket Best used to link commonly bought together pairs that often have no relationship to each other Example: Diapers & Beer A major store chain discovered that diapers and beer were regularly appearing in baskets together. Theory was that if you bought diapers you are likely to have a baby at home, with a baby at home it is less likely that you go to a bar to drink, and more likely you will have a beer at home. 20

Applying Market Baskets Models Retail Stores Showroom floor planning Catalog layout Crossing selling Fraud Analysis 21

Big Data Stack Data Visualization & Reporting Consumption Consumption Semantic Semantic Computation/Acc Computation/Acc ess ess Storage Storage Data Analytics Hive/Pig Advance Query Hive/Pig Storage-hdfs On-Promises Integration Integration Integration Integration Advance Query Storage-hdfs Cloud NOSQL NOSQLDB DB Security Security Security Frequency Frequency Data Mining Metadata Data Governance & Integration --ETL/ELT On demand Real-Time Streaming Time series 22

Open vs Closed Stack Distribution Distribution Consump Consump tion tion Semanti Semanti c c Computat Computat ion/acces ion/acces s s Storage Storage /NO /NOSQL SQL DB DB Security Security Integratio Integratio n n Source Source Blo g 23

Questions? 24