Big Data Analytics Analysis of high-volume and unstructured Data Stefan Weingaertner, DYMATRIX CONSULTING GROUP KNIME Meetup Italia, 10 th October 2013 1
Agenda 1 Company Introduction 2 Big Data - an Introduction 3 Big Data Analytics on high-volume Data 4 Big Data Analytics on unstructured Data 5 Livedemo: Advanced Email Classification 6 Q & A 2
Company Introduction 3
DYMATRIX The analytical CRM Company» Solution provider for Customer Intelligence, Marketing Automation and Advanced Predictive Analytics» Consulting, development and implementation know how, based upon more than 900 projects with mid- and large cap companies across industries» Goal- and client- oriented project execution based upon award winning, established solutions» Owner managed and independent 4
Our Consulting Competence Centers Business Intelligence Advanced Analytics Campaign Management E-commerce insight» Conception of (big) data warehouse and business intelligence architectures» Enterprise Reporting Systems» Dashboards» Sales Controlling» Planning & Forecasting» Balanced Scorecard» Customer Segmentation» Customer Value Analysis» Propensity Modeling (Cross-/Upsell/Churn)» Shopping Basket Analysis» Credit Rating Analysis & Credit Scoring» Text Mining» Data Mining Automation» Design and Optimization of Campaign Processes and Workflows» Implementation of Campaign Management Systems» Integration of Data Mining Models in Campaign Processes» Campaign Optimization» Consulting & Implementation of Next Best Activity Processes» Web Tracking» Web Controlling» Web Mining» Real Time Recommendation» Social Media Tracking & Analysis» Web Performance Measurement» Customer Journey Analytics» Big Data Analytics Analysis of client oriented processes Initial situation Analysis Conception of processes for customer retention and its optimization - customer reactivation and new customer activation benchmarking against industry leaders 5
Solution Portfolio The Customer Insight Suite DynaCampaign» Intelligent multi-touchpoint campaign management platform» Planning, target group selection, execution and response measurement of campaigns» Event-triggered realtime campaigning DynaMine» End2end automation of data mining processes» Intelligent model management for automation of preprocessing, training & scoring of models DynaCision» Realtime decision management platform» Design & exection of complex embedded decision processess DynaSocial» Social CRM platform to listen, track, identify and quantify customer needs and sentiments 6
Our KNIME Solution Nodes & KNIME Consulting Services PMML2SQL / PMML2SAS Converter» Convert PMML to executable SQL Code for In- Database-Scoring» Convert PMML to executable SAS Code for Model Scoring within SAS Big Data Integration + Business Consulting + Analytical Consulting + Technical Consulting + Trainings» Access any Hadoop large-scale distributed batch processing infrastructure from KNIME» Efficiently distribute large amounts of data & preprocessing across a set of machines Uplift Modeling» Predictive Modeling Nodes to predict the incremental response to marketing actions» For up-sell, cross-sell, churn and retention activities Interactive Scorecard Builder» interactive Scorecard Building Nodes for Design of Credit or Marketing Scorecards 7
Referenzen References Telecommunication Travel, Transportation Retail, Service Provider 8
References Banks, Insurances Media Utilities, Industries, Public Schwäbisch Hall 9
Big Data - an Introduction 10
A Characterization of Big Data Structured & Unstructured Batch Structured Big Data Streaming Zettabyte Terabyte Volume Source: Understanding Big Data (Zikopolous et al.), 2012 11
Challenge: Big Data Collection & Integration Needs Remember Possibilities Service & Support Decisions Usage Approach Delivery Purchase Source: Phil Winters, 2011 12
Big Data Analytics: Learn, Target & Influence! Needs Remember Possibilities Service & Support Decisions Usage Approach Delivery Purchase Source: Phil Winters, 2011 13
Big Data Analytics on high-volume Data Structured & Unstructured Batch Structured Big Data Streaming Zettabyte Terabyte Volume 14
Big Data Sources Hadoop Core Hadoop Extensions Analytic Applications Big Data Access Hive HBase MapReduce Routines Mahout MapReduce Hadoop Distributed File System (HDFS) 15
Big Data Sources Hadoop Core Hadoop Extensions Analytic Applications Big Data Analytics Hive HBase MapReduce Routines Mahout PMML2SQL Converter MapReduce Hadoop Distributed File System (HDFS) 16
Big Data Analytics on unstructured Data Structured & Unstructured Batch Structured Big Data Streaming Zettabyte Terabyte Volume 17
Big Data is not just about structured data 80% 80% of the world s data is unstructured. Unstructured data is growing at 15 times 15 times the rate of structured data. Source: Google Trends April 6, 2012 18
Imagine» to classify all customer related text messages by Source / Origin Sentiment Product or Service Business Transaction Context etc.» to identify unknown trends» to identify cause and effect relations» to react on that information, e.g. Technical Problems Needs Usability Competition etc. The KNIME platform supports these efforts with comprehensive Text Analytics & Network Analytics capabilities! 19
Deutsche Telekom: Social Earthquake 1000 800 Facebook Posts & Comments March & April 2013 First Rumours: Limitation of Bandwidth (21.3. 23.3.) DSL-Drossel : Official Pressrelease on Limitation of Bandwidth leads to a Social Earthquake. (22.4. 27.4.) 600 400 Negativ Neutral Positiv 200 0 1. Mrz. 8. Mrz. 15. Mrz. 22. Mrz. 29. Mrz. 5. Apr. 12. Apr. 19. Apr. 26. Apr. 20
DYMATRIX Text Mining Process 21
DYMATRIX Text Mining Process (KNIME Text Processing) Text Datasources Text Enrichment Subject Matching Sentiment Classification Information Delivery Datasources: Facebook Twitter Emails Data Provider like GNIP, Datasift etc. Crawled Data etc. For Machine Learning Provide Training Data for Classification (e.g. Sentiment) Language Detection English German Many more Language individual NLP POS Tagging Penn Treebank Tagger STTS Tagger Text Cleansing Stop Words Punctuations Stemming Sentiment Amplifier Matching of Sentiment- & Emoticon- Dictionaries Text Tagging with any Subjects Products Brands Business Transactions Service Complaints Requests etc. Fuzzy Matching with Dictionary Tagger Matching of Subject- Dictionaries Text Vectorization Creation of text predictors to predict sentiments Machine Learning Classification with Predictive Analytics (e.g. Decision Tree) Retraining Interface Adjustment of misclassified messages for permanent optimization of classification Text Data Mart Make information available in central Text Data Mart for visualization, alerting etc. Fields of Application Email-Routing Event triggered Campaign Management etc. 22
DYMATRIX Text Mining Process: Datasources Text Datasources Text Enrichment Subject Matching Sentiment Classification Information Delivery Access any Text Datasource to start the Text Mining Process» Facebook» Twitter» Emails» Crawler» Data Provider like GNIP, Datasift etc. Exemplified contribution on Facebook Fanpage Vodafone UK 23
DYMATRIX Text Mining Process: Text Enrichment Text Datasources Text Enrichment Subject Matching Sentiment Classification Information Delivery Original Facebook Message Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it. Sentiment Amplifier Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap [----] signal but yet paying FULL monthly contract! Vodafone sort it. Penn Treebank POS Tagger (English Messages) Why[WRB] not[rb] sort[vbg] your[prp] signal[vbp] issues [VBZ] out[in] instead[rb] of[in] bringing[vbg] new[jj] phones[nns]!!!![sym] Wk[NNP] 3[CD] of[in] crap[nn] but[cc] yet[rb] paying[vbg] FULL[NNP] monthly[rb] contract[nn]![sym] Vodafone[NNP] sort[vbg] it[prp].[sym] Removal of Stop Words & Punctuations sort[vbg] signal[vbp] issues [VBZ] instead[rb] bringing[vbg] phones[nns] Wk[NNP] 3[CD] crap[nn] paying[vbg] monthly[rb] contract[nn] Vodafone[NNP] 24
DYMATRIX Text Mining Process: Subject Matching Text Datasources Text Enrichment Subject Matching Sentiment Classification Information Delivery Original Facebook Message Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it. BUSINESS TRANSACTION: Complaint NETWORK: No Signal Subject Matching (Fuzzy Matching) Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal [NETWORK] but yet paying FULL monthly contract! Vodafone sort it [COMPLAINT]. PRODUCT: Nokia Lumia 925 25
DYMATRIX Text Mining Process: Sentiment Classification Text Datasources Text Enrichment Subject Matching Sentiment Classification Information Delivery Original Facebook Message Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it. Text Classification with Decision Tree Output from Text Enrichment Text Vectorization (Transformation) Predictors relevant for Text Classification, e.g. - Emoticons positive/negative - Length of message - Fragments positive/negative - Likes - Words positive/negative - Comments - Author-related Inputs - Other linguistic Inputs Resulting Classification 26
DYMATRIX Text Mining Process: Information Delivery Text Datasources Text Enrichment Subject Matching Sentiment Classification Information Delivery Make information available in central Text Data Mart Visualization in DynaSocial Original Facebook Message Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it. + Sentiment Business Transaction Product Relevance Network Other Fields of Application + + +» Subject-oriented Email-Classification & Email-Routing 27
DYMATRIX Text Mining Process: KNIME Workflow 28
Benefits 29
KNIME Server: Develop once, deploy everywhere!» Text Enrichment & Classification Workflows can be used for classification of any electronic text message (e.g. Social Content, Blogs, Emails).» KNIME Server-based Text Enrichment & Classification Workflows can be deployed as a webservice and called easily from any other application. Benefits» Uniformed Sentiment- and Classification-Handling for all customerrelated messages.» Batch- or Realtime-Execution from any application. 30
Application Integration I: DynaSocial Social Media Monitoring & Analytics 31
DynaSocial Social Media Excellence Architecture Social Media Analytics Content Extractor Facebook Twitter Social Media Data Provider Advanced Social Media Analytics Text Mining & Network Mining Text Enrichment & Classification Network Insights Social Media Analytics Data Management Social Media Analytics Dashboard Social Service Platforms Generic Big Data Model Client individual Sources Social Engagement Emails Integrated Social Inbox including all Social Touchpoints DynaSocial Configuration Center Data Sources Sentiments & Classifications Reports & Dashboard 32
DynaSocial Management Dashboard Activities Platform Distribution Overall Sentiments Sentiment Ratio Trends compared to competition (Share of Voice) Top Keywords Key Influencer Geographic Distribution Flexible Selection of Time Windows 33
DynaSocial Management Dashboard (Project Example) 34
Application Integration II: Advanced Email-Classification Multidimensional realtime Email-Classification 35
Email Classification: MS Exchange Connector.NET Batch 2 Call.NET Procedure and transfer email contents to KNIME Server via Webservice Call. 1 Incoming Email KNIME Server 3 Call KNIME Text Enrichment & Classification Workflows und return classification results. Microsoft Exchange Webservice 4 5 Classification results are returned to Exchange Server and are saved persistantly with object categories. Any clients having access to Exchange Server get the same classification. Microsoft Outlook Microsoft Outlook Webaccess Other Email-Clients 36
Livedemo Realtime Email- Classification 37
Q & A 38
Contact DYMATRIX CONSULTING GROUP GmbH Zeppelin Carré Lautenschlagerstrasse 2 D-70173 Stuttgart Your Contact: Stefan Weingaertner Thank you for your attention. We are happy to answer any of your questions! Phone Fax E-Mail Web +49.711.22.007.88-12 +49.711.22.007.88-88 s.weingaertner@dymatrix.de www.dymatrix.de 39