Big Data Analytics. Analysis of high-volume and unstructured Data



Similar documents
KNIME UGM 2014 Partner Session

ANALYTICS CENTER LEARNING PROGRAM

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Solve Your Toughest Challenges with Data Mining

Comprehensive Analytics on the Hortonworks Data Platform

Achieving Business Value through Big Data Analytics Philip Russom

TEXT ANALYTICS INTEGRATION

Big Data. Fast Forward. Putting data to productive use

Real World Application and Usage of IBM Advanced Analytics Technology

BIG DATA What it is and how to use?

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

5 Big Data Use Cases to Understand Your Customer Journey CUSTOMER ANALYTICS EBOOK

BIG DATA TRENDS AND TECHNOLOGIES

Navigating Big Data business analytics

Solve your toughest challenges with data mining

Information Architecture

Data Science & Big Data Practice

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

Predictive Analytics: Turn Information into Insights

Bizzmaxx Intelligent Sales & Marketing Errol van Engelen Managing Director Errol.vanengelen@bizzmaxx.nl

Big Data: What You Should Know. Mark Child Research Manager - Software IDC CEMA

Overview, Goals, & Introductions

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

12/10/2012. Real-Time Analytics & Attribution. Client Case Study: Staples. Noah Powers Principal Solutions Architect, Customer Intelligence, SAS

Mastering Big Data. Steve Hoskin, VP and Chief Architect INFORMATICA MDM. October 2015

Oracle Big Data SQL Technical Update

Solve your toughest challenges with data mining

Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis

Social Media Implementations

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

Predictive Marketing for Banking

Text Analytics Beginner s Guide. Extracting Meaning from Unstructured Data

IBM SPSS Modeler Premium

BIG DATA TECHNOLOGY. Hadoop Ecosystem

IBM Predictive Analytics Solutions

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

The Future of Data Management

EMC Greenplum Driving the Future of Data Warehousing and Analytics. Tools and Technologies for Big Data

TouchPoint Sales: Tools for Accelerating a Multi-Channel, Customer-Focused Sales Process. Kellye Proctor, TouchPoint Product Manager

and Analytic s i n Consu m e r P r oducts

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

Reduce and manage operating costs and improve efficiency. Support better business decisions based on availability of real-time information

Transforming the Telecoms Business using Big Data and Analytics

HDP Hadoop From concept to deployment.

The Future of Data Management with Hadoop and the Enterprise Data Hub

Melissa Coates. Tools & Techniques for Implementing Corporate and Self-Service BI. Triad SQL BI User Group 6/25/2013. BI Architect, Intellinet

KNIME & Avira, or how I ve learned to love Big Data

Operational Intelligence: Real-Time Business Analytics for Big Data Philip Russom

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

Are You Ready for Big Data?

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Implementing Data Models and Reports with Microsoft SQL Server

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Data Mining for Everyone

SOCIAL MEDIA MONITORING AND SENTIMENT ANALYSIS SYSTEM

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Delivering Smart Answers!

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

CONNECTING DATA WITH BUSINESS

Data Mining Solutions for the Business Environment

Big Data Big Data/Data Analytics & Software Development

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

The Potential of Big Data in the Cloud. Juan Madera Technology Consultant

Past, present, and future Analytics at Loyalty NZ. V. Morder SUNZ 2014

Technical Report. The KNIME Text Processing Feature:

Three steps to put Predictive Analytics to Work

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Testing 3Vs (Volume, Variety and Velocity) of Big Data

The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer,

BIG DATA I N B A N K I N G

The Definitive Guide to Data Blending. White Paper

SAP Predictive Analysis: Strategy, Value Proposition

Advanced Analytics. The Way Forward for Businesses. Dr. Sujatha R Upadhyaya

Building a Data Warehouse

The basic data mining algorithms introduced may be enhanced in a number of ways.

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

Information Systems Roles in the Value Chain Customer Relationship Management (CRM) Systems 09/11/2015. ACS 3907 E-Commerce

ACS 3907 E-Commerce. Instructor: Kerry Augustine November 10 th Bowen Hui, Beyond the Cube Consulting Services Ltd.

BUY BIG DATA IN RETAIL

Analytics Drives Big Data Drives Infrastructure Confessions of Storage turned Analytics Geeks

Sentiment Analysis on Big Data

Customer Success Platform Buyer s Guide

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Are You Ready for Big Data?

Business Intelligence for Big Data

Transcription:

Big Data Analytics Analysis of high-volume and unstructured Data Stefan Weingaertner, DYMATRIX CONSULTING GROUP KNIME Meetup Italia, 10 th October 2013 1

Agenda 1 Company Introduction 2 Big Data - an Introduction 3 Big Data Analytics on high-volume Data 4 Big Data Analytics on unstructured Data 5 Livedemo: Advanced Email Classification 6 Q & A 2

Company Introduction 3

DYMATRIX The analytical CRM Company» Solution provider for Customer Intelligence, Marketing Automation and Advanced Predictive Analytics» Consulting, development and implementation know how, based upon more than 900 projects with mid- and large cap companies across industries» Goal- and client- oriented project execution based upon award winning, established solutions» Owner managed and independent 4

Our Consulting Competence Centers Business Intelligence Advanced Analytics Campaign Management E-commerce insight» Conception of (big) data warehouse and business intelligence architectures» Enterprise Reporting Systems» Dashboards» Sales Controlling» Planning & Forecasting» Balanced Scorecard» Customer Segmentation» Customer Value Analysis» Propensity Modeling (Cross-/Upsell/Churn)» Shopping Basket Analysis» Credit Rating Analysis & Credit Scoring» Text Mining» Data Mining Automation» Design and Optimization of Campaign Processes and Workflows» Implementation of Campaign Management Systems» Integration of Data Mining Models in Campaign Processes» Campaign Optimization» Consulting & Implementation of Next Best Activity Processes» Web Tracking» Web Controlling» Web Mining» Real Time Recommendation» Social Media Tracking & Analysis» Web Performance Measurement» Customer Journey Analytics» Big Data Analytics Analysis of client oriented processes Initial situation Analysis Conception of processes for customer retention and its optimization - customer reactivation and new customer activation benchmarking against industry leaders 5

Solution Portfolio The Customer Insight Suite DynaCampaign» Intelligent multi-touchpoint campaign management platform» Planning, target group selection, execution and response measurement of campaigns» Event-triggered realtime campaigning DynaMine» End2end automation of data mining processes» Intelligent model management for automation of preprocessing, training & scoring of models DynaCision» Realtime decision management platform» Design & exection of complex embedded decision processess DynaSocial» Social CRM platform to listen, track, identify and quantify customer needs and sentiments 6

Our KNIME Solution Nodes & KNIME Consulting Services PMML2SQL / PMML2SAS Converter» Convert PMML to executable SQL Code for In- Database-Scoring» Convert PMML to executable SAS Code for Model Scoring within SAS Big Data Integration + Business Consulting + Analytical Consulting + Technical Consulting + Trainings» Access any Hadoop large-scale distributed batch processing infrastructure from KNIME» Efficiently distribute large amounts of data & preprocessing across a set of machines Uplift Modeling» Predictive Modeling Nodes to predict the incremental response to marketing actions» For up-sell, cross-sell, churn and retention activities Interactive Scorecard Builder» interactive Scorecard Building Nodes for Design of Credit or Marketing Scorecards 7

Referenzen References Telecommunication Travel, Transportation Retail, Service Provider 8

References Banks, Insurances Media Utilities, Industries, Public Schwäbisch Hall 9

Big Data - an Introduction 10

A Characterization of Big Data Structured & Unstructured Batch Structured Big Data Streaming Zettabyte Terabyte Volume Source: Understanding Big Data (Zikopolous et al.), 2012 11

Challenge: Big Data Collection & Integration Needs Remember Possibilities Service & Support Decisions Usage Approach Delivery Purchase Source: Phil Winters, 2011 12

Big Data Analytics: Learn, Target & Influence! Needs Remember Possibilities Service & Support Decisions Usage Approach Delivery Purchase Source: Phil Winters, 2011 13

Big Data Analytics on high-volume Data Structured & Unstructured Batch Structured Big Data Streaming Zettabyte Terabyte Volume 14

Big Data Sources Hadoop Core Hadoop Extensions Analytic Applications Big Data Access Hive HBase MapReduce Routines Mahout MapReduce Hadoop Distributed File System (HDFS) 15

Big Data Sources Hadoop Core Hadoop Extensions Analytic Applications Big Data Analytics Hive HBase MapReduce Routines Mahout PMML2SQL Converter MapReduce Hadoop Distributed File System (HDFS) 16

Big Data Analytics on unstructured Data Structured & Unstructured Batch Structured Big Data Streaming Zettabyte Terabyte Volume 17

Big Data is not just about structured data 80% 80% of the world s data is unstructured. Unstructured data is growing at 15 times 15 times the rate of structured data. Source: Google Trends April 6, 2012 18

Imagine» to classify all customer related text messages by Source / Origin Sentiment Product or Service Business Transaction Context etc.» to identify unknown trends» to identify cause and effect relations» to react on that information, e.g. Technical Problems Needs Usability Competition etc. The KNIME platform supports these efforts with comprehensive Text Analytics & Network Analytics capabilities! 19

Deutsche Telekom: Social Earthquake 1000 800 Facebook Posts & Comments March & April 2013 First Rumours: Limitation of Bandwidth (21.3. 23.3.) DSL-Drossel : Official Pressrelease on Limitation of Bandwidth leads to a Social Earthquake. (22.4. 27.4.) 600 400 Negativ Neutral Positiv 200 0 1. Mrz. 8. Mrz. 15. Mrz. 22. Mrz. 29. Mrz. 5. Apr. 12. Apr. 19. Apr. 26. Apr. 20

DYMATRIX Text Mining Process 21

DYMATRIX Text Mining Process (KNIME Text Processing) Text Datasources Text Enrichment Subject Matching Sentiment Classification Information Delivery Datasources: Facebook Twitter Emails Data Provider like GNIP, Datasift etc. Crawled Data etc. For Machine Learning Provide Training Data for Classification (e.g. Sentiment) Language Detection English German Many more Language individual NLP POS Tagging Penn Treebank Tagger STTS Tagger Text Cleansing Stop Words Punctuations Stemming Sentiment Amplifier Matching of Sentiment- & Emoticon- Dictionaries Text Tagging with any Subjects Products Brands Business Transactions Service Complaints Requests etc. Fuzzy Matching with Dictionary Tagger Matching of Subject- Dictionaries Text Vectorization Creation of text predictors to predict sentiments Machine Learning Classification with Predictive Analytics (e.g. Decision Tree) Retraining Interface Adjustment of misclassified messages for permanent optimization of classification Text Data Mart Make information available in central Text Data Mart for visualization, alerting etc. Fields of Application Email-Routing Event triggered Campaign Management etc. 22

DYMATRIX Text Mining Process: Datasources Text Datasources Text Enrichment Subject Matching Sentiment Classification Information Delivery Access any Text Datasource to start the Text Mining Process» Facebook» Twitter» Emails» Crawler» Data Provider like GNIP, Datasift etc. Exemplified contribution on Facebook Fanpage Vodafone UK 23

DYMATRIX Text Mining Process: Text Enrichment Text Datasources Text Enrichment Subject Matching Sentiment Classification Information Delivery Original Facebook Message Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it. Sentiment Amplifier Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap [----] signal but yet paying FULL monthly contract! Vodafone sort it. Penn Treebank POS Tagger (English Messages) Why[WRB] not[rb] sort[vbg] your[prp] signal[vbp] issues [VBZ] out[in] instead[rb] of[in] bringing[vbg] new[jj] phones[nns]!!!![sym] Wk[NNP] 3[CD] of[in] crap[nn] but[cc] yet[rb] paying[vbg] FULL[NNP] monthly[rb] contract[nn]![sym] Vodafone[NNP] sort[vbg] it[prp].[sym] Removal of Stop Words & Punctuations sort[vbg] signal[vbp] issues [VBZ] instead[rb] bringing[vbg] phones[nns] Wk[NNP] 3[CD] crap[nn] paying[vbg] monthly[rb] contract[nn] Vodafone[NNP] 24

DYMATRIX Text Mining Process: Subject Matching Text Datasources Text Enrichment Subject Matching Sentiment Classification Information Delivery Original Facebook Message Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it. BUSINESS TRANSACTION: Complaint NETWORK: No Signal Subject Matching (Fuzzy Matching) Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal [NETWORK] but yet paying FULL monthly contract! Vodafone sort it [COMPLAINT]. PRODUCT: Nokia Lumia 925 25

DYMATRIX Text Mining Process: Sentiment Classification Text Datasources Text Enrichment Subject Matching Sentiment Classification Information Delivery Original Facebook Message Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it. Text Classification with Decision Tree Output from Text Enrichment Text Vectorization (Transformation) Predictors relevant for Text Classification, e.g. - Emoticons positive/negative - Length of message - Fragments positive/negative - Likes - Words positive/negative - Comments - Author-related Inputs - Other linguistic Inputs Resulting Classification 26

DYMATRIX Text Mining Process: Information Delivery Text Datasources Text Enrichment Subject Matching Sentiment Classification Information Delivery Make information available in central Text Data Mart Visualization in DynaSocial Original Facebook Message Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it. + Sentiment Business Transaction Product Relevance Network Other Fields of Application + + +» Subject-oriented Email-Classification & Email-Routing 27

DYMATRIX Text Mining Process: KNIME Workflow 28

Benefits 29

KNIME Server: Develop once, deploy everywhere!» Text Enrichment & Classification Workflows can be used for classification of any electronic text message (e.g. Social Content, Blogs, Emails).» KNIME Server-based Text Enrichment & Classification Workflows can be deployed as a webservice and called easily from any other application. Benefits» Uniformed Sentiment- and Classification-Handling for all customerrelated messages.» Batch- or Realtime-Execution from any application. 30

Application Integration I: DynaSocial Social Media Monitoring & Analytics 31

DynaSocial Social Media Excellence Architecture Social Media Analytics Content Extractor Facebook Twitter Social Media Data Provider Advanced Social Media Analytics Text Mining & Network Mining Text Enrichment & Classification Network Insights Social Media Analytics Data Management Social Media Analytics Dashboard Social Service Platforms Generic Big Data Model Client individual Sources Social Engagement Emails Integrated Social Inbox including all Social Touchpoints DynaSocial Configuration Center Data Sources Sentiments & Classifications Reports & Dashboard 32

DynaSocial Management Dashboard Activities Platform Distribution Overall Sentiments Sentiment Ratio Trends compared to competition (Share of Voice) Top Keywords Key Influencer Geographic Distribution Flexible Selection of Time Windows 33

DynaSocial Management Dashboard (Project Example) 34

Application Integration II: Advanced Email-Classification Multidimensional realtime Email-Classification 35

Email Classification: MS Exchange Connector.NET Batch 2 Call.NET Procedure and transfer email contents to KNIME Server via Webservice Call. 1 Incoming Email KNIME Server 3 Call KNIME Text Enrichment & Classification Workflows und return classification results. Microsoft Exchange Webservice 4 5 Classification results are returned to Exchange Server and are saved persistantly with object categories. Any clients having access to Exchange Server get the same classification. Microsoft Outlook Microsoft Outlook Webaccess Other Email-Clients 36

Livedemo Realtime Email- Classification 37

Q & A 38

Contact DYMATRIX CONSULTING GROUP GmbH Zeppelin Carré Lautenschlagerstrasse 2 D-70173 Stuttgart Your Contact: Stefan Weingaertner Thank you for your attention. We are happy to answer any of your questions! Phone Fax E-Mail Web +49.711.22.007.88-12 +49.711.22.007.88-88 s.weingaertner@dymatrix.de www.dymatrix.de 39