Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak 9.6.2015

Similar documents
Text Mining - Scope and Applications

GRAPHICAL USER INTERFACE, ACCESS, SEARCH AND REPORTING

Manjula Ambur NASA Langley Research Center April 2014

MACHINE LEARNING BASICS WITH R

Social Media Implementations

Tackling Challenging Problems in Academia & Industry - An Interdisciplinary Approach

E6895 Advanced Big Data Analytics Lecture 3:! Spark and Data Analytics

Search and Data Mining: Techniques. Introduction Anna Yarygina Boris Novikov

TEXT ANALYTICS INTEGRATION

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy

itunes 1.6 Cognitive - The Building Blocks for Smarter Apps

How To Make Sense Of Data With Altilia

Direct-to-Company Feedback Implementations

Digital Collections as Big Data. Leslie Johnston, Library of Congress Digital Preservation 2012

Hexaware E-book on Predictive Analytics

Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis

What is Artificial Intelligence?

Fast and Easy Delivery of Data Mining Insights to Reporting Systems

Monitoring Replication

Web and Social Media Analytics

Multichannel Customer Listening and Social Media Analytics

Text Analysis for Big Data. Magnus Sahlgren

Sentiment analysis on tweets in a financial domain

Semantic Search in E-Discovery. David Graus & Zhaochun Ren

Knowledge Discovery from patents using KMX Text Analytics

IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper

A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS

Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project

An Ontology Based Text Analytics on Social Media

Hoover s Quick Reference Sheet

Combination Chart Extensible Visualizations. Product: IBM Cognos Business Intelligence Area of Interest: Reporting

DEMYSTIFYING BIG DATA. What it is, what it isn t, and what it can do for you.

Welcome to the new SAP Predictive Analytics 2.0!

A Statistical Text Mining Method for Patent Analysis

Databricks. A Primer

Brochure More information from

The Text Analytics Market(s)

SURVEY REPORT DATA SCIENCE SOCIETY 2014

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

Publishing Stories to Lumira Cloud

14:30 Watson applicaties bouwen met IBM Bluemix

Tech Presentation 2016

Information Management course

Databricks. A Primer

Search and Information Retrieval

Applications of Deep Learning to the GEOINT mission. June 2015

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts

Understanding Data: A Comparison of Information Visualization Tools and Techniques

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Sunnie Chung. Cleveland State University

Joint Research Centre

How To Learn From The Revolution

Getting to Know Big Data

Get the most value from your surveys with text analysis

SAP Predictive Analysis Installation

CSE 517A MACHINE LEARNING INTRODUCTION

Text Analytics with Ambiverse. Text to Knowledge.

2015 Workshops for Professors

Visualizing Big Data. Activity 1: Volume, Variety, Velocity

How To Use Spagobi Suite

A Framework of User-Driven Data Analytics in the Cloud for Course Management

An interdisciplinary model for analytics education

HSD. W Business Analytics (M.Sc.) IT in Business Analytics. IT Applications in Business Analytics SS2016 / 01 Introduction Thomas Zeutschler

Spatio-Temporal Patterns of Passengers Interests at London Tube Stations

Data Scientist for 1h

White Paper. Version 1.2 May 2015 RAID Incorporated

UNDERSTANDING WATSON ANALYTICS

Introduction. A. Bellaachia Page: 1

Big Data: Image & Video Analytics

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

How To Perform Predictive Analysis On Your Web Analytics Data In R 2.5

Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot

Advanced analytics at your hands

Extensible Visualizations. Product: IBM Cognos Business Intelligence Area of Interest: Reporting

Predictive Analytics. Noam Zeigerson, CTO

BIG DATA What it is and how to use?

Course 10977A: Updating Your SQL Server Skills to Microsoft SQL Server 2014

IBM SPSS Modeler Premium

Challenges of Cloud Scale Natural Language Processing

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Corporate Presentation

CONCEPTCLASSIFIER FOR SHAREPOINT

MLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS November 7, Machine Learning Group

The First Online 3D Epigraphic Library: The University of Florida Digital Epigraphy and Archaeology Project

Advanced Analytics & IoT Architectures

IBM Announces Eight Universities Contributing to the Watson Computing System's Development

Research Article International Journal of Emerging Research in Management &Technology ISSN: (Volume-4, Issue-4) Abstract-

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Séminaire du LaDHUL. Periklis Andritsos, «Big Data Challenges, Opportunities and Avenues of Research, or How did I grow up talking to data»

WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Sentiment Analysis on Big Data

FROM WORDS TO INSIGHTS: RETHINKING CONTENT AND BIG DATA

Big Data and Analytics: Challenges and Opportunities

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

COMP 590: Artificial Intelligence

Jobsket ATS. Empowering your recruitment process

Azure Machine Learning, SQL Data Mining and R

Machine Learning and Predictive Analytics Foster Growth [1]

Transcription:

Computer-Based Text- and Data Analysis Technologies and Applications Mark Cieliebak 9.6.2015

Data Scientist analyze Data Library use 2

About Me Mark Cieliebak + Software Engineer & Data Scientist + PhD in Computer Science + CIO at Netbreeze (now Microsoft) + >30 scientific publications Lecturer at CEO of SpinningBytes AG 3

"Classical" Data Analysis Uses IT to retrieve, store, search, count, calculate, compare, visualize Peptide Sequencing data. 4

Computer-Based Data Analytics Started with Artificial intelligence in the 1960's Analyzes huge amounts of data Uses Machine Learning for Pattern Detection Image and Text Classification Predictive Analysis Data Clustering and many more! 5

Applications of Data Analytics Internet Search IBM Watson (wins Jeopardy in 2011) Spelling Correction Deep Blue (beats Kasparow 1997) DATA ANALYTICS Voice Recognition Recommendation Systems Machine Translation 6 Selfdriving Cars Email Spam Detection

Foundation of Data Analytics 1960 2015 Memory: 64'000 Bytes 8'589'934'592 Bytes Performance: 500'000 FLOPS 33'863'000'000'000'000 FLOPS Cost per GFLOP: (1984) $42'780'000.00 $0.08 7

Foundation of Data Analytics 8

Foundations of Data Analytics Deep Learning Faster Computers + More Data + Better Algorithms 9

Application Training Machine Learning in a Nutshell Predicted Label 10

Application: Social Media Analysis Sentiment Analysis Topic Extraction Trend Detection Alerting 11

Twitter Facts Data Access Free Access with Search API Commercial Access to Firehose Stream (all or 10%) 12

Sentiment Analysis on Twitter #StackOverflow names #Apple #Swift the world's most loved #programming language http://www.bloomberg.com 13

Sentiment Analysis: Performance of Commercial Tools (F1-Score) 0,7 0,6 0,5 0,4 0,3 0,2 Average of All Tools Best Tool per Corpus Overall Best Tool (Sentigem): 0,1 0 Experimantal Setup: 9 commercial sentiment analysis tools evaluated on 7 public corpora with 28653 short texts. 14

Take-Home Lesson Sentiment Tools on short texts achieve on average an F1-Score of 51%. 15

Application: Newspaper Segmentation 16

Application: Sales Prediction 17

More Applications Expert Match Face/Image Recognition Foundation Register 18 Speaker Detection

What do Data Scientists Need? 19

Data Sources Experimental Data Reference Datasets Dictionaires, Ontologies, Thesaurus Scientific Papers Social Media Media Archives: Newspapers, Magazines Websites Live Streams: Twitter, Online News, Product Reviews Videos/Movies/Pictures Books: Belletristic, Technical Literature Wikipedia Hand-crafted Datasets etc. etc. 20

Data Provisioning Data Collections: Linguistic Data Consortium European Language Ressource Association NISt TIMIT etc. Access, Licensing Open Data: Open Governmental Data OGD Open Research Data ORD Linked Open Data Participation, Guidelines 21

Data Scientist analyze support Data Library use 22

Digitizing Make Information Accessible! Extract Text, Images, Charts, Tables Categorize Search Browse (Summarize) 23

Data Integration Combining data from different sources is very time consuming! 24

SODES: Automatic Data Integration Data enters SODES via Linking to, e.g., CKAN API Crawling User upload Semantic Context Comprehension Matching of columns Data Quality Improvements Export to various standard formats Enables analytics in specialized tools of choice Automatic Data Intake Search Preview Download Integration Content based search on Full text of data Full text of meta data Descriptions of data sets Easy data exploration enabled by All integrable search results in one table Statistical standard plots & measures 25

Talk in Short Text and Data Analytics is successful Researcher need access to data Data Scientists have powerful tools 26

Thank You! Mark Cieliebak Institute of Applied Information Technology (InIT) Winterthur, Switzerland Email: ciel@zhaw.ch, Website: www.zhaw.ch/~ciel 27