Applying Semantics to Unstructured Data (Big and Getting Bigger)



Similar documents
You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

INTRODUCTION TO CASSANDRA

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out

Big Data Integration: A Buyer's Guide

Big Data and Transactional Databases Exploding Data Volume is Creating New Stresses on Traditional Transactional Databases

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

All You Wanted to Know About Big Data Projects Chida Jan 2014

How To Handle Big Data With A Data Scientist

Big Data Comes of Age: Shifting to a Real-time Data Platform

Managing and analyzing data have always offered the greatest benefits

TDWI: BUSINESS INTELLIGENCE & DATA WAREHOUSING EDUCATION EUROPE

Architecting an Industrial Sensor Data Platform for Big Data Analytics: Continued

Apache Hadoop: The Big Data Refinery

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

The Next Wave of Data Management. Is Big Data The New Normal?

White Paper. Does Data Modeling Still Matter, A companion document for the O Kelly Associates webinar. By Joe Maguire and Peter O Kelly

Big Data and Semantic Web in Manufacturing. Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Architectures for Big Data Analytics A database perspective

Big Data Technologies Compared June 2014

Big Data Zurich, November 23. September 2011

A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle

Open source large scale distributed data management with Google s MapReduce and Bigtable

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

Big Data Big Deal? Salford Systems

Changing the Equation on Big Data Spending

The Enterprise Data Hub and The Modern Information Architecture

Evolution to Revolution: Big Data 2.0

Open source Google-style large scale data analysis with Hadoop

SQL + NOSQL + NEWSQL + REALTIME FOR INVESTMENT BANKS

Hadoop Big Data for Processing Data and Performing Workload

SEAIP 2009 Presentation

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

The Future of Data Management

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

How To Use Big Data For Telco (For A Telco)

Hadoop. Sunday, November 25, 12

How to Enhance Traditional BI Architecture to Leverage Big Data

Big Data on Google Cloud

Walk Before You Run. Prerequisites to Linked Data. Kenning Arlitsch Dean of the

Industry 4.0 and Big Data

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Step by Step: Big Data Technology. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 25 August 2015

Blazent IT Data Intelligence Technology:

BIG DATA TOOLS. Top 10 open source technologies for Big Data

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

HDP Hadoop From concept to deployment.

Beyond The Hype of Big Data

HDP Enabling the Modern Data Architecture

Information Builders Mission & Value Proposition

Big Data Too Big To Ignore

Big Data, Big Traffic. And the WAN

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

Big Data and Hadoop for the Executive A Reference Guide

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

TOP 8 TRENDS FOR 2016 BIG DATA

How To Scale Out Of A Nosql Database

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Apache Hadoop Patterns of Use

BIG DATA CHALLENGES AND PERSPECTIVES

Big Data Readiness. A QuantUniversity Whitepaper. 5 things to know before embarking on your first Big Data project

INTELLIGENT BUSINESS STRATEGIES WHITE PAPER

DATAOPT SOLUTIONS. What Is Big Data?

BIG DATA-AS-A-SERVICE

Implement Hadoop jobs to extract business value from large and varied data sets

Big Data and Healthcare Payers WHITE PAPER

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

APPROACHABLE ANALYTICS MAKING SENSE OF DATA

This Symposium brought to you by

22 SMARTENTERPRISEMAG.COM

Data Modeling for Big Data

MarkLogic Enterprise Data Layer

Big Data Defined Introducing DataStack 3.0

BIG DATA IN BUSINESS ENVIRONMENT

INDEX. Introduction Page 3. Methodology Page 4. Findings. Conclusion. Page 5. Page 10

Architecting Your Company. Ann Winblad Co-Founder and Managing Director

Big Data and Apache Hadoop Adoption:

Talend Big Data. Delivering instant value from all your data. Talend

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

Databases 2 (VU) ( )

New Modeling Challenges: Big Data, Hadoop, Cloud

Leveraging Big Data Technologies to Support Research in Unstructured Data Analytics

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Big Data. Dr.Douglas Harris DECEMBER 12, 2013

Cloud Platform Warfare in 2013 and Beyond

Building Your Big Data Team

Transcription:

Applying Semantics to Unstructured Data (Big and Getting Bigger) Wednesday, November 30, 2012 4:00 5:00 Bryan Bell Vice President, Enterprise Solutions, Expert System Lynda Moulton, Analyst & Consultant, LWM Technology Services Peter O'Kelly Principal Analyst, O'Kelly Associates

Overall Session Agenda Introduction and context-setting "Big Data" 101 for Business Semantics and the Big Data Opportunity 2

Big Data 101 Agenda Big data in context Recap Risks Recommendations 3

Big Data in Context What is big data? Unhelpfully, both big data and NoSQL, generally considered a key part of the big data wave, are defined more in terms of what they aren t than what they are A typical big data definition (Wikipedia): [ ] data sets that grow so large that they become awkward to work with using on-hand database management tools Often associated with Gartner s volume, variety (and complexity), and velocity model Also value and veracity considerations 4

Big Data in Context Why is big data a big deal now? Commoditized hardware, software, and networking Capability and price/performance curves that continue to defy all economic laws Cloud services with radical new capability/cost equations Maturation and uptake of related open source software, especially Hadoop Powerful and often no- or low-cost 5

Big Data in Context Why is big data a big deal now (continued)? Market enthusiasm for NoSQL systems Useful and often open source /public domain data sources and services Mainstreaming of semantic tools and techniques 6

A Prime Minicomputer, c1982 7

Fast-Forward to 2012 8

Fast-Forward to 2012 9

Fast-Forward to 2012 10

Fast-Forward to 2012 11

Fast-Forward to 2012 12

Google BigQuery 13

Hadoop Hadoop is often considered central to big data Originating with Google s MapReduce architecture, Apache Hadoop is an open source architecture for distributed processing on networks of commodity hardware From Wikipedia: Map step: The master node takes the input, divides it into smaller sub-problems, and distributes them to worker nodes Reduce step: The master node then collects the answers to all the sub-problems and combines them in some way to form the output the answer to the problem it was originally trying to solve 14

Hadoop Commercial application domains include (from Wikipedia) Log and/or clickstream analysis of various kinds Marketing analytics Machine learning and/or sophisticated data mining Image processing Processing of XML messages Web crawling and/or text processing General archiving, including of relational/tabular data, e.g. for compliance 15

Hadoop Hadoop is popular and rapidly evolving Most leading information management vendors have embraced Hadoop There is now a Hadoop ecosystem 16

Meanwhile, Back in the Googleplex Dremel, BigQuery, Spanner, and other really big data projects 17

Meanwhile, Back in the Googleplex 18

Google Now 19

A NoSQL Taxonomy From the NoSQL Wikipedia article: 20

A View of the NoSQL Landscape 21

Another NoSQL Landscape View

NoSQL Perspectives The NoSQL meme confusingly conflates Document database requirements Best served by XML DBMS (XDBMS) Physical database model decisions on which only DBAs and systems architects should focus And which are more complementary than competitive with DBMS Object databases, which have floundered for decades But with which some application developers are nonetheless enamored, for minimized impedance mismatch, despite significant information management compromises Semantic (e.g., RDF) models Also more complementary than competitive with RDBMS/XDBMS Also consider: the traditional DBMS players can leverage the same underlying technology power curves 23

Data as a Service The (single source of) truth is out there?... High-quality data sources are being commoditized Value is shifting to the ability to discern and leverage conceptual connections, not just to manage big databases Some resources and developments to explore Social networking graphs and activities Data.com (Salesforce.com) Data.gov Google Knowledge Graph Linked Data Microsoft Windows Azure Data Marketplace Wikidata.org Wolfram Alpha 24

Mainstreaming Semantics Tools and techniques applied in search of more meaning, e.g., Vocabulary management Disambiguation and auto-categorization Text mining and analysis Context and relationship analysis It s still ideal to help people capture and apply data and metadata in context Semantic tools/techniques are complementary 25

Mainstreaming Semantics The Semantic Web is still more vision than reality But Google, Microsoft, and Yahoo, and Yandex, for example, are improving Web searches by capturing and applying more metadata and relationships via schema.org schemas in Web pages And Google s Knowledge Graph is about things, not strings, with, as of mid-2012, 500 million objects, as well as more than 3.5 billion facts about and relationships between these different objects 26

Recap Commoditization and cloud Very significant new opportunities Hadoop and related frameworks Complementary to RDBMS and XDBMS NoSQL Likely headed for meme-bust Data services Game-changing potential Semantic tools and techniques Rapidly gaining momentum 27

Risks The potential for an ever-expanding set of information silos Focus on minimized redundancy and optimized integration GIGO (garbage in, garbage out) at super-scale New opportunities for unprecedented self-inflicted damage, for organizations that don t model or query effectively Cognitive overreach The potential for information workers to create and act on nonsensical queries based on poorly-designed and/or misunderstood information models Skills gaps can create competitive disadvantages Modeling, query formulation, and data analysis Critical thinking and information literacy 28

Recommendations Aim high: big data is in many respects just getting started A lot of technology recycling but also significant and disruptive innovation Work to build consensus among stakeholders on the opportunities and risks Focus on human skills e.g., critical thinking and information literacy For now, an instance of the most creative and powerful type of semantic big data processor we know of is between your ears 29