From Big Data to Smart Data Marin Dimitrov - CTO May 2013
About Ontotext Provides products and services for creating, managing and exploiting semantic data Founded in 2000 Offices in Bulgaria, USA and UK Major clients and industries Media & Publishing (BBC, Press Association, EuroMoney, NDP Nieuwsmedia) HCLS (AstraZeneca, UCB, NIBIO) Cultural Heritage (The British Museum, The National Archives, Polish National Museum, Dutch Public Library) Government (UK Parliament, United Nations FAO, LMI) From Big Data to Smart Data (Semantic Days 2013) May 2013 #2
Contents The Problem with Big Data for BI From Big Data to Smart Data Success Stories by Ontotext #3
BIG DATA FOR BUSINESS INTELLIGENCE #4
The Problem with Big Data for BI #5
The Problem with Big Data for BI It s not only about Volume, Velocity & Variety Too much focus on processing speed & storage volume Brute force approaches increase the amount of data processed But not necessarily the Value & insight derived from data May lead to even more data quality & inconsistency problems Problems with data visualisation & exploration Often do not lead to better decision making #6
The Problem with Big Data for BI BI success is not measured by Volume, Velocity & Variety, but by more derived Value Organisations should learn how to better utilise their small data before targeting Big Data Quality over quantity Better understanding of the data leads to better decision making Avoid needle in a haystack situations #7
The Problem with Big Data for BI #8
Smart Data for Better BI Efficiently analyse unstructured data Most of the enterprise data is still unstructured Even within structured & transactional data sources there is a lot of embedded unstructured data and this unstructured data is poorly analysed (if at all) => lots of potential value still remains locked (sometimes even within semantic / Linked Data with insufficient granularity) #9
Smart Data for Better BI Focus on metadata first, Big Data later (As opposed to: Big Data first, metadata later) Enrich data Interlink data Provide a common metadata layer Break legacy silos Align heterogeneous metadata if necessary Better analysis of the data, better insight #10
SUCCESS STORIES #11
UK Job Market Intelligence Comprehensive recruitment database for the UK 4 million job ads / vacancies (dynamic) 220,000 company websites & 700 job boards monitored Questions we can answer What skills are in demand at present? Which are the top job boards in a region? Which is the right Job board for your industry sector? Which are the most active job advertisers / employers? Which are the agencies and employers that do not advertise on your job board? #12
UK Job Market Intelligence #13
UK Job Market Intelligence Technology stack Web mining & focussed crawling KB construction from open & proprietary data sources Skills taxonomy (based on DISCO) Text mining & semantic enrichment Reconciliation & interlinking BI reporting & dashboards #14
UK Job Market Intelligence #15
UK Job Market Intelligence #16
UK Job Market Intelligence #17
Asset Recovery Intelligence System (ARIS) Support Financial Intelligence Units with tracking stolen assets, fight corruption & money laundering Questions we can answer What are the reported activities related to a person? What is the person s personal/professional network? What are corruptions cases reported in regional news? Data sources News feeds from major news agencies Dow Jones data & news feeds SARs to the FIU Open data (people & companies, Wikipedia) #18
Asset Recovery Intelligence System (ARIS) #19
Asset Recovery Intelligence System (ARIS) Technology stack Web Mining Text mining & semantic enrichment (KIM) ARIS ontology People, companies, assets, relations, financial transactions, Reconciliation & Interlinking Triplestore (OWLIM) Semantic search & exploration UX BI reporting / factsheets / alerts #20
Semantic Information Integration & Enrichment #21
Q & A Thank you! @ontotext #22