1 Using Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data Solution Architect Sears Holdings
2 Over a Century of Innovation A Fortune 100 company, nearly $40 billion in annualgrowing revenue. data volumeslargest fourth The nation s broad line Shortened Tight IT retailer with almost 2,500 full-line processing and budgets windows specialty retail stores in the US and Canada. A front runner inthe Big Data effortsescalating Latency in Challenge costs including driving personalized data marketing and generating savings from legacy migration. Hitting ETL Running one of the biggest scalability rewards complexity ceilings programs that captures and analyzes a Demanding business very large number of customer requirements transactions quickly. 2
3 What is Big Data? Big Data can no longer be defined by the amount of data, but by the type, speed, and storage capacity needed to compute and analyze that data. 3
4 Data, Data, and More Data We are creating so much data, so quickly, that 90% of the data in the world today has been created in the last 2 years. 4
5 The Problem with Large Scale Data Processing With traditional computer processing--it can be difficult to compute everything, due to storage space, processing time, and cost. This typically leads to incomplete computations, data latency, and overall lack of quality analysis. Hadoop brings infinite scalability, extremely large storage capability, and fast data processing. 5
6 Enter Hadoop Apache Hadoop is a framework which: Runs applications on a large cluster built of commodity hardware. Provides reliability and data motion to applications. Implements a computational paradigm named MapReduce. Applications divided into small fragments of work for execution/ re-execution on any node in the cluster. Provides a Distributed File System (HDFS) that stores data on compute nodes, resulting in high aggregate bandwidth across the cluster. Both Map/Reduce and the Distributed File System Framework automatically handle the node failures. 6
7 Why Use Hadoop? Stability: Hadoop is horizontally scalable. Easily stores and processes petabytes of data, just by adding hardware. Economical: Uses commodity based hardware. Efficient: Extremely powerful processing ability. Reliability: Data is replicated 3x times (min) in different locations; failed tasks are rerun. Storage space & Capacity: Central Repository; Keep everything forever. 7
8 Big Data Analytics in Retail How can I better manage my inventory? How can I better understand my customers buying habits? How can I detect fraudulent activity? How can I create better targeted interaction with my customer? How do I get customers to purchase more products? 8
9 The Evolution Data Analysis 9
10 What is Mahout? Top Apache Foundation software project Uses Scalable Machine Learning algorithms Collection of pre-built data-mining libraries Primary focus on collaborative filtering, clustering & classification Houses a Java based math library that uses common math operations Uses MapReduce paradigm 10
13 Clustering A process of grouping similar things in such a way, so that like items are grouped together with other items that most closely represent themselves. 13
14 Motivation behind Clustering Why use Clustering?? To better understand a customer s buying behavior To develop targeted marketing campaigns To understand interest, motivation, and lifestyle, in order more effectively move merchandise in and out of stores 14
15 Recommendation Systems An information filtering system that is used to predict a users rating or preference, typically using a collaborative, content-based or hybrid approach to recommendations. 15
16 Collaborative Filtering Framework that filters and recommends items based on user behavior, preferences and activities. Based on their similarities to others. Recommenders User based Item based Online and Offline support Can utilize Hadoop Uses numerous similarity measurements, such as Cosine, LLR, Tanimoto, Pearson, and more. 16
17 Content- Based Filtering Looks at the item and the users preference in order, and provides a recommendation. Users Ratings A B Feature Values User Profile C Content used in the past Allows for highly precise recommendations. Difficulty when making recommendation over cross-sections of service when used for crossselling. Matching X Y Z Feature Values Content with similar feature values is recommended Content Profile profile Contents 17
18 Market-Basket Model A model used to describe the commonality of several relationships between two objects. Items: anything that is purchased Basket: a set of items The numbers of items in a basket is typically small, and the number of baskets is typically large 18
19 Market Basket Models A list of Purchasers Additional Purchaser data is can be useful (but is not needed) A list of transactions Seek to identify purchasing patterns What items are normally purchased together What is the purchasing sequence Is there a seasonality effect to purchasing Categorize buying behavior Translate buying behavior into actionable insight Targeted promotions Inventory placement Store layout Cross- Selling 19
20 Frequent Itemsets Any set of items that appears regularly within multiple baskets Originally used to analyze a physical supermarket basket Best used to link commonly bought together pairs that often have no relationship to each other Example: Diapers & Beer A major store chain discovered that diapers and beer were regularly appearing in baskets together. Theory was that if you bought diapers you are likely to have a baby at home, with a baby at home it is less likely that you go to a bar to drink, and more likely you will have a beer at home. 20
22 Big Data Stack Data Visualization & Reporting Consumption Consumption Semantic Semantic Computation/Acc Computation/Acc ess ess Storage Storage Data Analytics Hive/Pig Advance Query Hive/Pig Storage-hdfs On-Promises Integration Integration Integration Integration Advance Query Storage-hdfs Cloud NOSQL NOSQLDB DB Security Security Security Frequency Frequency Data Mining Metadata Data Governance & Integration --ETL/ELT On demand Real-Time Streaming Time series 22
23 Open vs Closed Stack Distribution Distribution Consump Consump tion tion Semanti Semanti c c Computat Computat ion/acces ion/acces s s Storage Storage /NO /NOSQL SQL DB DB Security Security Integratio Integratio n n Source Source Blo g 23
An Oracle White Paper June 2013 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure
INTELLIGENT BUSINESS STRATEGIES W H I T E P A P E R Architecting A Big Data Platform for Analytics By Mike Ferguson Intelligent Business Strategies October 2012 Prepared for: Table of Contents Introduction...
1 Contents Introduction. 1 View Point Phil Shelley, CTO, Sears Holdings Making it Real Industry Use Cases Retail Extreme Personalization. 6 Airlines Smart Pricing. 9 Auto Warranty and Insurance Efficiency.
April 2013 Operational Intelligence: What It Is and Why You Need It Now Sponsored by Splunk Contents Introduction 1 What Is Operational Intelligence? 1 Trends Driving the Need for Operational Intelligence
32 Big Data: present and future Big Data: present and future Mircea Răducu TRIFU, Mihaela Laura IVAN University of Economic Studies, Bucharest, Romania email@example.com, firstname.lastname@example.org
American Journal of Engineering Research (AJER) e-issn : 2320-0847 p-issn : 2320-0936 Volume-03, Issue-05, pp-266-270 www.ajer.org Research Paper Open Access Convergence of Big Data and Cloud Sreevani.Y.V.
E-PAPER March 2014 Big Data & the Cloud: The Sum Is Greater Than the Parts Learn how to accelerate your move to the cloud and use big data to discover new hidden value for your business and your users.
For Big Data Analytics There s No Such Thing as Too Big The Compelling Economics and Technology of Big Data Computing March 2012 By: 4syth.com Emerging big data thought leaders Forsyth Communications 2012.
A Forrester Consulting Thought Leadership Paper Commissioned By SAP Real-Time Data Management Delivers Faster Insights, Extreme Transaction Processing, And Competitive Advantage June 2013 Table Of Contents
Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com W H I T E P A P E R B i g D a t a : W h a t I t I s a n d W h y Y o u S h o u l d C a r e Sponsored
Plug Into The Cloud with Oracle Database 12c ORACLE WHITE PAPER DECEMBER 2014 Disclaimer The following is intended to outline our general product direction. It is intended for information purposes only,
IntroductiontoManagementInformationSystems Summary 1. Explain why information systems are so essential in business today. Information systems are a foundation for conducting business today. In many industries,
HOW TO TURN 9 RETAIL IT CHALLENGES INTO 9 BUSINESS OPPORTUNITIES Intro According to a recent market study on be the main driver of total retail sales Embracing mobility the state of the retail sector that
Emergence and Taxonomy of Big Data as a Service Benoy Bhagattjee Working Paper CISL# 2014-06 May 2014 Composite Information Systems Laboratory (CISL) Sloan School of Management, Room E62-422 Massachusetts
White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers
TABLE OF CONTENTS Introduction... 3 The Importance of Triplestores... 4 Why Triplestores... 5 The Top 8 Things You Should Know When Considering a Triplestore... 9 Inferencing... 9 Integration with Text
BUSINESS INTELLIGENCE: FROM DATA COLLECTION TO DATA MINING AND ANALYSIS Appendix W4A for EC organizations can be viewed as either transactional or analytical. Transactional data are those pieces of information
Analysis of Data Virtualization & Enterprise Data Standardization in Business Intelligence Laljo John Pullokkaran Working Paper CISL# 2013-10 May 2013 Composite Information Systems Laboratory (CISL) Sloan
WHITEPAPER Microsoft SQL Server Databases Thrive in the Cloud Virtualizing Data-Intensive Applications for Page 2 Overview As more and more organizations embrace cloud computing to save money, increase
Web Scale IT in the Enterprise It all starts with the data Issue 1 2 Q&A With Claus Moldt, Former Global CIO for SalesForce.com and David Roth, CEO of AppFirst 6 From the Gartner Files: Building a Modern
fs viewpoint www.pwc.com/fsi 02 15 19 21 27 31 Point of view A deeper dive Competitive intelligence A framework for response How PwC can help Appendix Where have you been all my life? How the financial
Introduction.... 1 Emerging Trends and Technologies... 3 The Changing Landscape... 4 The Impact of New Technologies... 8 Cloud... 9 Mobile... 10 Social Media... 13 Big Data... 16 Technology Challenges...
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
Automated Business Intelligence Delivering real business value,quickly, easily, and affordably 2 Executive Summary For years now, the greatest weakness of the Business Intelligence (BI) industry has been
G00249318 Top 10 Technology Trends Impacting Information Infrastructure, 2013 Published: 19 February 2013 Analyst(s): Regina Casonato, Mark A. Beyer, Merv Adrian, Ted Friedman, Debra Logan, Frank Buytendijk,
Big Data Getting Value from Big Data: Focus on the Opportunities, Not the Obstacles Table of Contents 2 Embark on Your Big Data Journey with Confidence Getting Started, Keeping Moving 3 Big Data Hype Versus
Trends in Cloud Computing and Big Data Nikita Bhagat, Ginni Bansal, Dr.Bikrampal Kaur email@example.com, firstname.lastname@example.org, email@example.com Abstract - BIG data refers to the
Convergence of Social, Mobile and Cloud: 7 Steps to Ensure Success June, 2013 Contents Executive Overview...4 Business Innovation & Transformation...5 Roadmap for Social, Mobile and Cloud Solutions...7
CGMA REPORT From insight to impact Unlocking opportunities in big data Two of the world s most prestigious accounting bodies, AICPA and CIMA, have formed a joint venture to establish the Chartered Global