/ What does the future hold for predictive analytics? It's tough to make predictions, especially about the future (Yogi Berra) Einat Shimoni EVP and senior analysts STKI IT Knowledge Integrators galit@stki.info einat@stki.info
Analytics as always a HOT topic תחומי הפרויקטים, אשר החלו בארגונך ב- 2013 / מתוכננים ל- 2014 1 1 1 12 21 29 32 44 50 53 53 53 53 62 68 71 76 80 Source: STKI inquiry barometer, 2014 2
Evolution of analytics Cognitive Insights Deep use of semantics, text analytics, NLP and machine-learning to provide new wisdom. Real time analysis Business focus Unstructured data Insights Self Service and Discoveries Analytics & Insights More use of predictive and analysis tools by business users. Some analysis of unstructured data in an external big-data style data mart Business users gaining control over BI (use of Self service tools). DW updated more frequently but is still in the classical model. Advanced Visualization Letting go Enabling experiments Proactive Classic DW BI insights linked to operational processes (i.e, marketing lists to call service agents; risk analysis leads to operational process). Classic DW, structured data only. IT doing most BI work Passive Classic DW Pull-only model (need to extract reports from it). IT is doing most of BI work. Classic DW model (single version of the truth), updated ~once a day. Structured data only IT focus Structured data only Reports 3
The data sandbox A data sandbox, in the context of big data, is a standalone datamart, scalable and developmental platform used to explore an organization's rich information sets through interaction and collaboration. A data sandbox is primarily explored by data science teams. Data sandbox platforms provide the computing required for data scientists to tackle typically complex analytical workloads. What are we looking for? I don t know, but it s going to be amazing! 4
Data Warehouse architecture Phase 1: Co-existence Insights from external data Bureaus that analyze and track social media as an external service: Analytic platform for external, unstructured data Text analysis Data Science External data Internal transactional data Pattern spotting Events detection Proactive INFORMATION REPOSITORY 5
Data Warehouse architecture Phase 1: Co-existence Insights from external data Analytic platform for external, unstructured data Text analysis Data Science External data Internal transactional data Pattern spotting Events detection Proactive INFORMATION REPOSITORY 6
Data Warehouse architecture Phase 2: Virtual DW/Hybrid BI INFORMATION REPOSITORY Metadata Permissions Caching The virtual Data Warehouse Part of the data can be kept here Insights from external data Analytic platform for external, unstructured data External data Text analysis Data Science 7
Data Warehouse architecture Phase 3: OLTP + OLAP INFORMATION REPOSITORY Metadata semantic layer The virtual Data Warehouse Insights from external data Text analysis Analytic platform for external, unstructured data Data Science External data Same database for both analytical and transactional data 8
Small data = the new big data 9
The 4 V s Source: IBM 10
Veracity Big Data Veracity refers to the biases, noise and abnormality in data. Is the data that is being stored, and mined meaningful to the problem being analyzed. Inderpal feel veracity in data analysis is the biggest challenge when compares to things like volume and velocity Source: http://inside-bigdata.com/2013/09/12/beyond-volume-variety-velocity-issue-big-data-veracity/ You don t know the value of your data until you reach a discovery or by using it 11
Wanted: Data Scientist Data Scientist The Hottest Job You Haven't Heard Of Salary: $140K - $200K Major staff shortage: McKinsey: By 2018, the U.S alone could face a shortage of 140,000-190,000 people (2008-2018: 10 years cycle for next gen. graduates) Gartner: By 2015, big data demand will generate 1 million jobs in G1000 but only one- third of those jobs will be filled Informationweek: 18% of big data-focused companies want to increase staff by 30% in the next two years, 53% expect it will be hard 12
Data Scientist Skills (cross-disciplines): Structured & unstructured data (also from real-time streams) Java programming Statistics Machine-learning algorithms NLP Business concepts (MBAs) Computer Science Statistics MBA 13
Kaggle: data scientists outsourcing via competitions Thousands of experts from 100 countries and 200 universities Einat Shimoni s work Copyright@2013 Do not remove source or attribution from any slide, graph or portion of graph 14
Wisdom is the application of Knowledge To attain knowledge, add things everyday. To attain wisdom, remove things every day. Laozi Wisdom Applied Knowledge Knowledge Organized Information Information Linked elements with concepts Data Discrete elements like words, numbers, names 15
What s the difference between information and knowledge? It s like the difference between knowing Julia Roberts phone number and Knowing Julia Roberts - Woody Allen Galit Fein and & Einat Shimoni s work/ Copyright@2014 16
New analytics category Pattern spotting Events detection Proactive 17
Do you know this artist? David Mccandless: Infographic artist. My pet-hate is pie charts. Love pie. Hate pie-charts 18
His works of art http://www.informationisbeautiful.net/ 19
Why do we care so much about sentiment? 20
Text analytics Automatic categorization /Content Analysis: IBM ICA, Vivisimo Integrators/ BI players solutions (i.e, Opisoft, Matrix, Taldor, Ness ) Sentiment analysis: Radian6 (Salesforce) FocalInfo SAP SAS Tracx (Israeli startup) New social listening in Microsoft dynamics CRM Search players: Attivio Melingo HP (Autonomy) Several projects in financial organizations and defense sector 21
Thank you! 22