Speaking of Data: Computational Language Analysis

Size: px
Start display at page:

Download "Speaking of Data: Computational Language Analysis"

Transcription

1 NP-SBJ PRP it S VBD had VP VBN traded VP S-ADV NP-SBJ ADJP-PRD -NONE- RB * over-the-counter TOP h 1 INDEX e 2 def q rel bark v rel prpstn m rel LBL h 4 dog n rel LBL h RELS LBL h 1 ARG0 x 5 LBL h 9 8 ARG0 e MARG h 3 RSTR h 6 ARG0 x 2 5 ARG1 x BODY h 5 7 HCONS h 3 = q h 9, h 6 = q h 8 Speaking of Data: Computational Language Analysis (In About Twenty Five Minutes) Stephan Oepen Universitetet i Oslo, Department of Informatics oe@ifi.uio.no

2 So, What Actually is Language Technology? (2001: A Space Odyssey; HAL 9000; 1968) Speaking of Data: Computational Language Analysis (2)

3 So, What Actually is Language Technology? (2001: A Space Odyssey; HAL 9000; 1968) Speaking of Data: Computational Language Analysis (2)

4 So, What Actually is Language Technology? (IBM Watson beats long-time Jeopardy! champions; 2011) Speaking of Data: Computational Language Analysis (2)

5 So, What Actually is Language Technology? (young) interdisciplinary science: language, computing, cognition; (again) culturally and commercially relevant for knowledge society. Speaking of Data: Computational Language Analysis (2)

6 What Makes Natural Language a Hard Problem? < Den andre veien mot Bergen er kort x 30 x 25 = 25 > The other path towards Bergen is short. {0.58} (1:1:0). > The other road towards Bergen is short. {0.56} (1:0:0). > The second road towards Bergen is short. {0.55} (2:0:0). > That other path towards Bergen is a card. {0.54} (0:1:0). > That other road towards Bergen is a card. {0.54} (0:0:0). > The second path towards Bergen is short. {0.51} (2:1:0). > The other road against Bergen is short. {0.48} (1:2:0). > The second road against Bergen is short. {0.48} (2:2:0). > Short is the other street towards Bergen. {0.33} (1:4:0). > Short is the second street towards Bergen. {0.33} (2:4:0). Speaking of Data: Computational Language Analysis (3)

7 What Makes Natural Language a Hard Problem? < Den andre veien mot Bergen er kort x 30 x 25 = 25 > The other path towards Bergen is short. {0.58} (1:1:0). > The other road towards Bergen is short. {0.56} (1:0:0). > The second road towards Bergen is short. {0.55} (2:0:0). > That other path towards Bergen is a card. {0.54} (0:1:0). > That other road towards Bergen is a card. {0.54} (0:0:0). > The second path towards Bergen is short. {0.51} (2:1:0). > The other road against Bergen is short. {0.48} (1:2:0). > The second road against Bergen is short. {0.48} (2:2:0). Scraped Off the Internet The other way to Bergen is short. > Short is the other street towards Bergen. {0.33} (1:4:0). > Short isthe theroad second to street the other towards bergen Bergen. is short {0.33}. (2:4:0). Den other roads against Boron Gene are short. Other one autobahn against Mountains am abrupt. Speaking of Data: Computational Language Analysis (3)

8 Some Examples: The Google Books Corpus (1 of 2) We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of culturomics, focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. [Jean-Baptiste Michel et al., 2010; Science] Speaking of Data: Computational Language Analysis (4)

9 Some Examples: The Google Books Corpus (2 of 2) Speaking of Data: Computational Language Analysis (5)

10 Some Examples: The Google Books Corpus (2 of 2) The corpus cannot be read by a human. If you tried to read only the entries from the year 2000 alone, at the reasonable pace of 200 words per minute, without interruptions for food or sleep, it would take eighty years. The sequence of letters is a thousand times longer than the human genome. Speaking of Data: Computational Language Analysis (5)

11 Some Examples: Statistical Machine Translation Rundt 200 forskere og informatikere fra hele verden kommer til Universitetet i Oslo og Ole Johan Dahls hus for å delta på foredrag og workshops om tungregning i forskningssammenheng. Konferansen har høy profil, og etter to dager med innledende workshops åpnes selve konferansen av viserektor Doris Jorde 26. mai. Some 200 researchers and computer scientist from all over the world stream to Oslo and the new Computer Science Building (Ole Johan Dahls hus) to participate in presentations and workshops on highperformance computing. The conference keeps a top-tier international profile. Following two days of introductory workshops, the main conference is opened on May 26 by vice dean Doris Jorde. Speaking of Data: Computational Language Analysis (6)

12 Some Examples: Statistical Machine Translation Rundt 200 forskere og informatikere fra hele verden kommer til Universitetet i Oslo og Ole Johan Dahls hus for å delta på foredrag og workshops om tungregning i forskningssammenheng. Konferansen har høy profil, og etter to dager med innledende workshops åpnes selve konferansen av viserektor Doris Purely statistical approach: based on probabilities of (a) word-to-word or phrase-to-phrase translations and (b) strings of just the target language words (n-grams). Some 200 researchers and computer scientist from all over the world stream to Oslo and the new Computer Science Building (Ole Johan Dahls hus) to participate in presentations and workshops on highperformance computing. The conference keeps a top-tier international profile. Following two days of introductory work- Jorde 26. mai. Statistics estimated shops, from: the main conference is (a) many millions of words of opened parallelon textmay and 26 by vice (b) trillions of words of target dean language Doris Jorde. text. Speaking of Data: Computational Language Analysis (6)

13 Closer to Home: Web-Scale N-Grams Google Web 1T N-Gram Corpus Frequency counts for sequences of between one and five words; reflecting one trillion tokens of English Web content (around 2006); royalty-free for research use; available for a dozen of languages; about to become important component in translation, parsing, et al. Technological Requirements 25 gbytes compressed ( 300 gbytes uncompressed); 14 million keys; hundreds to tens of thousands of queries in analysing one sentence. Speaking of Data: Computational Language Analysis (7)

14 Closer to Home: Web-Scale N-Grams Google Web 1T N-Gram Corpus Frequency counts for sequences of between one and five words; reflecting one trillion tokens of English Web content (around 2006); royalty-free for research use; available for a dozen of languages; about to become important component in translation, parsing, et al. Relatively typical use pattern: indexing and searching large, unstructured or structured data; predominantly random access. Technological Requirements 25 gbytes compressed ( 300 gbytes uncompressed); 14 million keys; hundreds to tens of thousands of queries in analyzing one sentence. Speaking of Data: Computational Language Analysis (7)

15 Our Current (Modest) Utlization of NorStore Speaking of Data: Computational Language Analysis (8)

16 Our Current (Modest) Utlization of NorStore Bring on-line a collection of standard language resources; used in teaching and research, across disciplines and faculties; should in principle be shared with other (Norwegian) universities. (To date, only textual resources; audio and video coming.) Speaking of Data: Computational Language Analysis (8)

17 Language Analysis: Some High-Level Predictions A Family of Disciplines Going Computational Imminent paradigm shift in linguistics, lexicography, philologies, et al. no scientific computing tradition: diversity in backgrounds and needs. Storage and Processing Requirements Emerging on-line repositories of language data (including metadata); standardize services for user identity and access rights management; batch processing: refinement of (mostly unstructured) language data; interactive processing: spontaneous search and browsing services; needs tighter coupling of storage and (scalable) processing services. Speaking of Data: Computational Language Analysis (9)

18 Language Analysis: Some High-Level Predictions A Family of Disciplines Going Computational Imminent paradigm shift in linguistics, lexicography, philologies, et al. no scientific computing tradition: diversity in backgrounds and needs. Storage and Processing Requirements Emerging on-line repositories of language data (including metadata); standardize services for user identity and access rights management; batch processing: refinement of (mostly unstructured) language data; interactive processing: spontaneous search and browsing services; needs tighter coupling of storage and (scalable) processing services. Language Analysis is an inherently data-intensive science. Speaking of Data: Computational Language Analysis (9)

19 More Concretely: Short-Term Initiatives Language Analysis einfrastructure User Group UniNett Sigma (Σ) looking to establish discipline-specific user groups; group-internal functions: exchange experience, coordinate activities; interface function to Σ: give feedback on user experience and needs; maybe one annual meeting contact if interested. Birds of a Feather Meeting Later Today Informal exchange of experiences, among language analysis users: 13:30 15:00: meeting room AWK (third floor). Speaking of Data: Computational Language Analysis (10)

20 Credits NoTur and NorStore (via UniNett Sigma); The UiO Scientific Computation Group; The Norwegian Taxpayer.

Guidelines for Resource Allocation on the National e-infrastructure.

Guidelines for Resource Allocation on the National e-infrastructure. Guidelines for Resource Allocation on the National e-infrastructure. Abbreviations: Administration: Committee: Partners/Metacenter: UNINETT Sigma2 Resource Allocation Committee (Ressursfordelingskomité)

More information

Welcome to the Master program in Network and System Administration (master's - 2 years)

Welcome to the Master program in Network and System Administration (master's - 2 years) Welcome to the Master program in Network and System Administration (master's - 2 years) Department of informatics Kirsti Dalseth E-mail: studieinfo@ifi.uio.no Phone: +47-22852410 Student reception, 4th

More information

TSD: a Secure and Scalable Service for Sensitive Data and ebiobanks

TSD: a Secure and Scalable Service for Sensitive Data and ebiobanks TSD: a Secure and Scalable Service for Sensitive Data and ebiobanks Gard Thomassen, PhD Head of Research Support Services Group University Center for Information Technology (USIT) University of Oslo Outline

More information

Global Scientific Data Infrastructures: The Big Data Challenges. Capri, 12 13 May, 2011

Global Scientific Data Infrastructures: The Big Data Challenges. Capri, 12 13 May, 2011 Global Scientific Data Infrastructures: The Big Data Challenges Capri, 12 13 May, 2011 Data-Intensive Science Science is, currently, facing from a hundred to a thousand-fold increase in volumes of data

More information

Periodical evaluation of the Bachelor Program in Geosciences, for the Faculty of Mathematics and Natural Sciences, University of Oslo

Periodical evaluation of the Bachelor Program in Geosciences, for the Faculty of Mathematics and Natural Sciences, University of Oslo Periodical evaluation of the Bachelor Program in Geosciences, for the Faculty of Mathematics and Natural Sciences, University of Oslo This documents reports on the evaluation, analyses, and conclusions

More information

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural

More information

INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning jtl@ifi.uio.no

INF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning jtl@ifi.uio.no INF5820 Natural Language Processing - NLP H2009 Jan Tore Lønning jtl@ifi.uio.no Semantic Role Labeling INF5830 Lecture 13 Nov 4, 2009 Today Some words about semantics Thematic/semantic roles PropBank &

More information

Data Driven Discovery In the Social, Behavioral, and Economic Sciences

Data Driven Discovery In the Social, Behavioral, and Economic Sciences Data Driven Discovery In the Social, Behavioral, and Economic Sciences Simon Appleford, Marshall Scott Poole, Kevin Franklin, Peter Bajcsy, Alan B. Craig, Institute for Computing in the Humanities, Arts,

More information

Hadoop Usage At Yahoo! Milind Bhandarkar (milindb@yahoo-inc.com)

Hadoop Usage At Yahoo! Milind Bhandarkar (milindb@yahoo-inc.com) Hadoop Usage At Yahoo! Milind Bhandarkar (milindb@yahoo-inc.com) About Me Parallel Programming since 1989 High-Performance Scientific Computing 1989-2005, Data-Intensive Computing 2005 -... Hadoop Solutions

More information

RFI Summary: Executive Summary

RFI Summary: Executive Summary RFI Summary: Executive Summary On February 20, 2013, the NIH issued a Request for Information titled Training Needs In Response to Big Data to Knowledge (BD2K) Initiative. The response was large, with

More information

Basic probability theory and n-gram models

Basic probability theory and n-gram models theory and n-gram models INF4820 H2010 Institutt for Informatikk Universitetet i Oslo 28. september Outline 1 2 3 Outline 1 2 3 Gambling the beginning of probabilities Example Throwing a fair dice Example

More information

Cloud Computing at Google. Architecture

Cloud Computing at Google. Architecture Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale

More information

Data Aggregation and Cloud Computing

Data Aggregation and Cloud Computing Data Intensive Scalable Computing Harnessing the Power of Cloud Computing Randal E. Bryant February, 2009 Our world is awash in data. Millions of devices generate digital data, an estimated one zettabyte

More information

Free reflexives: Reflexives without

Free reflexives: Reflexives without Nordic Atlas of Language Structures (NALS) Journal, Vol. 1, 522 526 C opyright Björn Lundquist 2014 Licensed under a Creative Commons Attribution 3.0 License Free reflexives: Reflexives without a sentence

More information

Nordic Master in Didactics of Mathematics

Nordic Master in Didactics of Mathematics Nordic Master in Didactics of Mathematics NORDIMA Barbro Grevholm Seminar i Bergen den 7-8 september 2011 Nordic Master in Didactics of Mathematics Project number NMP-2009/10730 The Master Consortium consists

More information

Large-Scale Test Mining

Large-Scale Test Mining Large-Scale Test Mining SIAM Conference on Data Mining Text Mining 2010 Alan Ratner Northrop Grumman Information Systems NORTHROP GRUMMAN PRIVATE / PROPRIETARY LEVEL I Aim Identify topic and language/script/coding

More information

Report of Contributions

Report of Contributions Norwegian mini-winter school (Universitetet i Bergen og Oslo, NTNU, CERN) Report of Contributions https://indico.cern.ch/e/43419 Norwegian... / Report of Contributions Besøk i ATLAS detektoren Contribution

More information

Distributed Computing and Big Data: Hadoop and MapReduce

Distributed Computing and Big Data: Hadoop and MapReduce Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:

More information

Big Data and Apache Hadoop s MapReduce

Big Data and Apache Hadoop s MapReduce Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23

More information

Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak 9.6.2015

Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak 9.6.2015 Computer-Based Text- and Data Analysis Technologies and Applications Mark Cieliebak 9.6.2015 Data Scientist analyze Data Library use 2 About Me Mark Cieliebak + Software Engineer & Data Scientist + PhD

More information

Manjula Ambur NASA Langley Research Center April 2014

Manjula Ambur NASA Langley Research Center April 2014 Manjula Ambur NASA Langley Research Center April 2014 Outline What is Big Data Vision and Roadmap Key Capabilities Impetus for Watson Technologies Content Analytics Use Potential use cases What is Big

More information

Mediating Science in Norway: Realities and Challenges. Harald Hornmoen, Journalism Programme Oslo University College

Mediating Science in Norway: Realities and Challenges. Harald Hornmoen, Journalism Programme Oslo University College Mediating Science in Norway: Realities and Challenges Harald Hornmoen, Journalism Programme Oslo University College Dear participant, Science Communication/information requires simplification, and in order

More information

ONE platform for ALL YOUR DATA Radim Petrzela February 26 th, 2013

ONE platform for ALL YOUR DATA Radim Petrzela February 26 th, 2013 ONE platform for ALL YOUR DATA Radim Petrzela February 26 th, 2013 POWER OF HITACHI Founded in 1910 US$118B FY11 900 subsidiaries 324,000 employees More than 760 PhDs INFORMATION and TELECOMMU- NICATIONS

More information

Seilingsliste. Nordgående. Januar. Februar

Seilingsliste. Nordgående. Januar. Februar 6 Nordgående Skip Fra Bergen Til og a Sandnessjøen Harstad Hammerfest Ankomst Trdheim Bodø Trom Hningsvåg Kirkenes Finnrken ** Kg Harald Finnrken * Kg Harald Finnrken Kg Harald Finnrken Kg Harald Finnrken

More information

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social

More information

Analysis of Web Archives. Vinay Goel Senior Data Engineer

Analysis of Web Archives. Vinay Goel Senior Data Engineer Analysis of Web Archives Vinay Goel Senior Data Engineer Internet Archive Established in 1996 501(c)(3) non profit organization 20+ PB (compressed) of publicly accessible archival material Technology partner

More information

Hadoop for Enterprises:

Hadoop for Enterprises: Hadoop for Enterprises: Overcoming the Major Challenges Introduction to Big Data Big Data are information assets that are high volume, velocity, and variety. Big Data demands cost-effective, innovative

More information

IBM Big Data in Government

IBM Big Data in Government IBM Big in Government Turning big data into smarter decisions Deepak Mohapatra Sr. Consultant Government IBM Software Group dmohapatra@us.ibm.com The Big Paradigm Shift 2 Big Creates A Challenge And an

More information

UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure

UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure Authors: A O Jaunsen, G S Dahiya, H A Eide, E Midttun Date: Dec 15, 2015 Summary Uninett Sigma2 provides High

More information

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering

More information

Big Data Challenges in Bioinformatics

Big Data Challenges in Bioinformatics Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?

More information

EMC ADVERTISING ANALYTICS SERVICE FOR MEDIA & ENTERTAINMENT

EMC ADVERTISING ANALYTICS SERVICE FOR MEDIA & ENTERTAINMENT EMC ADVERTISING ANALYTICS SERVICE FOR MEDIA & ENTERTAINMENT Leveraging analytics for actionable insight ESSENTIALS Put your Big Data to work for you Pick the best-fit, priority business opportunity and

More information

Dr. John E. Kelly III Senior Vice President, Director of Research. Differentiating IBM: Research

Dr. John E. Kelly III Senior Vice President, Director of Research. Differentiating IBM: Research Dr. John E. Kelly III Senior Vice President, Director of Research Differentiating IBM: Research IBM Research Priorities Impact on IBM and the Marketplace Globalization and Leverage Balanced Research Agenda

More information

Sources: Summary Data is exploding in volume, variety and velocity timely

Sources: Summary Data is exploding in volume, variety and velocity timely 1 Sources: The Guardian, May 2010 IDC Digital Universe, 2010 IBM Institute for Business Value, 2009 IBM CIO Study 2010 TDWI: Next Generation Data Warehouse Platforms Q4 2009 Summary Data is exploding

More information

Big Data Platform Evaluation

Big Data Platform Evaluation Technology Evaluation Report BMMsoft EDMT Server with Sybase IQ Big Data Platform Evaluation 2012 InfoSizing, Inc. (WWW.SIZING.COM) Big Data Platform Evaluation BMMsoft EDMT Server Executive Summary In

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

SEAIP 2009 Presentation

SEAIP 2009 Presentation SEAIP 2009 Presentation By David Tan Chair of Yahoo! Hadoop SIG, 2008-2009,Singapore EXCO Member of SGF SIG Imperial College (UK), Institute of Fluid Science (Japan) & Chicago BOOTH GSB (USA) Alumni Email:

More information

What Is In-Memory Computing and What Does It Mean to U.S. Leaders? EXECUTIVE WHITE PAPER

What Is In-Memory Computing and What Does It Mean to U.S. Leaders? EXECUTIVE WHITE PAPER What Is In-Memory Computing and What Does It Mean to U.S. Leaders? EXECUTIVE WHITE PAPER A NEW PARADIGM IN INFORMATION TECHNOLOGY There is a revolution happening in information technology, and it s not

More information

Big Data Hope or Hype?

Big Data Hope or Hype? Big Data Hope or Hype? David J. Hand Imperial College, London and Winton Capital Management Big data science, September 2013 1 Google trends on big data Google search 1 Sept 2013: 1.6 billion hits on big

More information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department

More information

«If the patient could decide» - diagnosis and treatment of breast cancer

«If the patient could decide» - diagnosis and treatment of breast cancer «If the patient could decide» - diagnosis and treatment of breast cancer Oslo University Hospital Innovation director Kari J. Kværner Financed by the Norwegian Design Council (DIP-midler) the Norwegian

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

Data-Intensive Science and Scientific Data Infrastructure

Data-Intensive Science and Scientific Data Infrastructure Data-Intensive Science and Scientific Data Infrastructure Russ Rew, UCAR Unidata ICTP Advanced School on High Performance and Grid Computing 13 April 2011 Overview Data-intensive science Publishing scientific

More information

A Strategic Approach to Unlock the Opportunities from Big Data

A Strategic Approach to Unlock the Opportunities from Big Data A Strategic Approach to Unlock the Opportunities from Big Data Yue Pan, Chief Scientist for Information Management and Healthcare IBM Research - China [contacts: panyue@cn.ibm.com ] Big Data or Big Illusion?

More information

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING Mary-Elizabeth ( M-E ) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc. Is there valuable

More information

I N T E L L I G E N T S O L U T I O N S, I N C. DATA MINING IMPLEMENTING THE PARADIGM SHIFT IN ANALYSIS & MODELING OF THE OILFIELD

I N T E L L I G E N T S O L U T I O N S, I N C. DATA MINING IMPLEMENTING THE PARADIGM SHIFT IN ANALYSIS & MODELING OF THE OILFIELD I N T E L L I G E N T S O L U T I O N S, I N C. OILFIELD DATA MINING IMPLEMENTING THE PARADIGM SHIFT IN ANALYSIS & MODELING OF THE OILFIELD 5 5 T A R A P L A C E M O R G A N T O W N, W V 2 6 0 5 0 USA

More information

Symantec's Secret Sauce for Mobile Threat Protection. Jon Dreyfus, Ellen Linardi, Matthew Yeo

Symantec's Secret Sauce for Mobile Threat Protection. Jon Dreyfus, Ellen Linardi, Matthew Yeo Symantec's Secret Sauce for Mobile Threat Protection Jon Dreyfus, Ellen Linardi, Matthew Yeo 1 Agenda 1 2 3 4 Threat landscape and Mobile Insight overview What s unique about Mobile Insight Mobile Insight

More information

Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu

Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu Domain Adaptive Relation Extraction for Big Text Data Analytics Feiyu Xu Outline! Introduction to relation extraction and its applications! Motivation of domain adaptation in big text data analytics! Solutions!

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

I. Justification and Program Goals

I. Justification and Program Goals MS in Data Science proposed by Department of Computer Science, B. Thomas Golisano College of Computing and Information Sciences Department of Information Sciences and Technologies, B. Thomas Golisano College

More information

Chapter 11. Managing Knowledge

Chapter 11. Managing Knowledge Chapter 11 Managing Knowledge VIDEO CASES Video Case 1: How IBM s Watson Became a Jeopardy Champion. Video Case 2: Tour: Alfresco: Open Source Document Management System Video Case 3: L'Oréal: Knowledge

More information

Big Data a threat or a chance?

Big Data a threat or a chance? Big Data a threat or a chance? Helwig Hauser University of Bergen, Dept. of Informatics Big Data What is Big Data? well, lots of data, right? we come back to this in a moment. certainly, a buzz-word but

More information

Scala Storage Scale-Out Clustered Storage White Paper

Scala Storage Scale-Out Clustered Storage White Paper White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current

More information

Big Data Analytics: Today's Gold Rush November 20, 2013

Big Data Analytics: Today's Gold Rush November 20, 2013 Copyright 2013 Vivit Worldwide Big Data Analytics: Today's Gold Rush November 20, 2013 Brought to you by Copyright 2013 Vivit Worldwide Hosted by Bernard Szymczak Vivit Leader Ohio Chapter TQA SIG Copyright

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

Stairs 5. Test Stairs 5 Chapter 1 Meeting people. Bokmål. Step 1. Listening. Sett kryss ved bildene som passer til teksten:

Stairs 5. Test Stairs 5 Chapter 1 Meeting people. Bokmål. Step 1. Listening. Sett kryss ved bildene som passer til teksten: Test Stairs 5 Chapter 1 Meeting people Bokmål Step 1 Listening Sett kryss ved bildene som passer til teksten: 1 Step 1 Reading Sett strek mellom riktig bilde og setning. The boys are on a plane. The girl

More information

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme Big Data Analytics Prof. Dr. Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany 33. Sitzung des Arbeitskreises Informationstechnologie,

More information

Is Big Data a Big Deal? What Big Data Does to Science

Is Big Data a Big Deal? What Big Data Does to Science Is Big Data a Big Deal? What Big Data Does to Science Netherlands escience Center Wilco Hazeleger Wilco Hazeleger Student @ Wageningen University and Reading University Meteorology PhD @ Utrecht University,

More information

CiteSeer x in the Cloud

CiteSeer x in the Cloud Published in the 2nd USENIX Workshop on Hot Topics in Cloud Computing 2010 CiteSeer x in the Cloud Pradeep B. Teregowda Pennsylvania State University C. Lee Giles Pennsylvania State University Bhuvan Urgaonkar

More information

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012 MEDICAL DATA MINING Timothy Hays, PhD Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012 2 Healthcare in America Is a VERY Large Domain with Enormous Opportunities for Data

More information

Keynote: Big Data, Big Deal

Keynote: Big Data, Big Deal Keynote: Big Data, Big Deal Piyush Malik Global Business Services, IBM Silicon Valley San Diego October 6 th, 2015 Outline 1 Why Big Data matters 2 Real World Applications 3 Future in a Data-Driven world

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

ndna Tim Hughes Avdeling for Medisinsk Gene@kk Oslo Universitets Sykehus (Ullevål)

ndna Tim Hughes Avdeling for Medisinsk Gene@kk Oslo Universitets Sykehus (Ullevål) ndna Utvikling av nasjonal analyse- og lagringspla3orm for DNA sekvensdata i helsevesenet Tim Hughes Avdeling for Medisinsk Gene@kk Oslo Universitets Sykehus (Ullevål) My goal Present the ndna project

More information

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Online Content Optimization Using Hadoop. Jyoti Ahuja Dec 20 2011

Online Content Optimization Using Hadoop. Jyoti Ahuja Dec 20 2011 Online Content Optimization Using Hadoop Jyoti Ahuja Dec 20 2011 What do we do? Deliver right CONTENT to the right USER at the right TIME o Effectively and pro-actively learn from user interactions with

More information

Clinical Knowledge Manager. Product Description 2012 MAKING HEALTH COMPUTE

Clinical Knowledge Manager. Product Description 2012 MAKING HEALTH COMPUTE Clinical Knowledge Manager Product Description 2012 MAKING HEALTH COMPUTE Cofounder and major sponsor Member and official submitter for HL7/OMG HSSP RLUS, EIS 'openehr' is a registered trademark of the

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,

More information

Hvordan kunders informasjonsbehov er endret og hvordan bedrifter kan tilpasse seg denne endringen. Steinar Heggemsnes HP Exstream

Hvordan kunders informasjonsbehov er endret og hvordan bedrifter kan tilpasse seg denne endringen. Steinar Heggemsnes HP Exstream Hvordan kunders informasjonsbehov er endret og hvordan bedrifter kan tilpasse seg denne endringen. Steinar Heggemsnes HP Exstream Kommunikasjon i går og i dag Agenda: Kommunikasjon i går og i dag Hvordan

More information

How to transform data into dollars this is always about Business Intelligence

How to transform data into dollars this is always about Business Intelligence Swiss BI Day - 03/04/2014! How to transform data into dollars this is always about Business Intelligence! Philippe Nieuwbourg philippe.nieuwbourg@decideo.com A lot of things in common between oil and

More information

Attitudes towards English in Norwegian newspaper discourse. Anne-Line Graedler

Attitudes towards English in Norwegian newspaper discourse. Anne-Line Graedler Attitudes towards English in Norwegian newspaper discourse Anne-Line Graedler Outline 1. Background 2. Aim 3. Data 4. Analysis Context What are the texts about? How is English referred to in the texts?

More information

Whitepaper. Leveraging Social Media Analytics for Competitive Advantage

Whitepaper. Leveraging Social Media Analytics for Competitive Advantage Whitepaper Leveraging Social Media Analytics for Competitive Advantage May 2012 Overview - Social Media and Vertica From the Internet s earliest days computer scientists and programmers have worked to

More information

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content

More information

FP7-ICT-2013-11-4.2. Scalable Data Analytics. Deadline: 16 April 2013 at 17:00:00 (Brussels local time)

FP7-ICT-2013-11-4.2. Scalable Data Analytics. Deadline: 16 April 2013 at 17:00:00 (Brussels local time) Scalable Data Analytics Deadline: 16 April 2013 at 17:00:00 (Brussels local time) Agenda Time 14H30 Programme Overview of Objective 4.2 Scalable Data Analytics By Carola Carstens, European Commission,

More information

Who needs humans to run computers? Role of Big Data and Analytics in running Tomorrow s Computers illustrated with Today s Examples

Who needs humans to run computers? Role of Big Data and Analytics in running Tomorrow s Computers illustrated with Today s Examples 15 April 2015, COST ACROSS Workshop, Würzburg Who needs humans to run computers? Role of Big Data and Analytics in running Tomorrow s Computers illustrated with Today s Examples Maris van Sprang, 2015

More information

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS! The Bloor Group IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS VENDOR PROFILE The IBM Big Data Landscape IBM can legitimately claim to have been involved in Big Data and to have a much broader

More information

Sanjeev Kumar. contribute

Sanjeev Kumar. contribute RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a

More information

DEFINITE AND INDEFINITE FORM

DEFINITE AND INDEFINITE FORM DEFINITE AND INDEFINITE FORM In Norwegian, a noun can appear either in the indefinite form or in the definite form. There are some absolute rules that determine which form is correct, but three important

More information

Streaming Big Data Performance Benchmark for Real-time Log Analytics in an Industry Environment

Streaming Big Data Performance Benchmark for Real-time Log Analytics in an Industry Environment Streaming Big Data Performance Benchmark for Real-time Log Analytics in an Industry Environment SQLstream s-server The Streaming Big Data Engine for Machine Data Intelligence 2 SQLstream proves 15x faster

More information

Gør dine big data klar til analyse på en nem måde med Hadoop og SAS Data Loader for Hadoop. Jens Dahl Mikkelsen SAS Institute

Gør dine big data klar til analyse på en nem måde med Hadoop og SAS Data Loader for Hadoop. Jens Dahl Mikkelsen SAS Institute Gør dine big data klar til analyse på en nem måde med Hadoop og SAS Data Loader for Hadoop Jens Dahl Mikkelsen SAS Institute Indhold Udfordringer for analytikeren Hvordan kan SAS Data Loader for Hadoop

More information

Topics in basic DBMS course

Topics in basic DBMS course Topics in basic DBMS course Database design Transaction processing Relational query languages (SQL), calculus, and algebra DBMS APIs Database tuning (physical database design) Basic query processing (ch

More information

Research Data Alliance: Current Activities and Expected Impact. SGBD Workshop, May 2014 Herman Stehouwer

Research Data Alliance: Current Activities and Expected Impact. SGBD Workshop, May 2014 Herman Stehouwer Research Data Alliance: Current Activities and Expected Impact SGBD Workshop, May 2014 Herman Stehouwer The Vision 2 Researchers and innovators openly share data across technologies, disciplines, and countries

More information

Manifest for Big Data Pig, Hive & Jaql

Manifest for Big Data Pig, Hive & Jaql Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,

More information

Breaking News! Big Data is Solved. What Is In-Memory Computing and What Does It Mean to U.S. Leaders? EXECUTIVE WHITE PAPER

Breaking News! Big Data is Solved. What Is In-Memory Computing and What Does It Mean to U.S. Leaders? EXECUTIVE WHITE PAPER Breaking News! Big Data is Solved. What Is In-Memory Computing and What Does It Mean to U.S. Leaders? EXECUTIVE WHITE PAPER There is a revolution happening in information technology, and it s not just

More information

Streaming Big Data Performance Benchmark. for

Streaming Big Data Performance Benchmark. for Streaming Big Data Performance Benchmark for 2 The V of Big Data Velocity means both how fast data is being produced and how fast the data must be processed to meet demand. Gartner Static Big Data is a

More information

M3039 MPEG 97/ January 1998

M3039 MPEG 97/ January 1998 INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND ASSOCIATED AUDIO INFORMATION ISO/IEC JTC1/SC29/WG11 M3039

More information

Streaming multimedia les from relational database

Streaming multimedia les from relational database Streaming multimedia les from relational database Tomasz Rybak Applied Systems Division Software Departament Faculty of Computer Science Bialystok Technical University rybak@ii.pb.bialystok.pl Tomasz Rybak

More information

Lean Development A team approach to Software Application Development

Lean Development A team approach to Software Application Development Lean Development A team approach to Software Application Development By P. Nallasenapathi Vice President, Saksoft Date: March 2006 India Phone: +91 44 2461 4501 Email: info@saksoft.com USA Phone: +1 212

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

QlikView Overview. Inge Nyheim Partner Sales Manager Inge.nyheim@qlikview.com

QlikView Overview. Inge Nyheim Partner Sales Manager Inge.nyheim@qlikview.com QlikView Overview Inge Nyheim Partner Sales Manager Inge.nyheim@qlikview.com Legal Disclaimer This Presentation contains forward-looking statements, including, but not limited to, statements regarding

More information

ECS 165A: Introduction to Database Systems

ECS 165A: Introduction to Database Systems ECS 165A: Introduction to Database Systems Todd J. Green based on material and slides by Michael Gertz and Bertram Ludäscher Winter 2011 Dept. of Computer Science UC Davis ECS-165A WQ 11 1 1. Introduction

More information

IBM: An Early Leader across the Big Data Security Analytics Continuum Date: June 2013 Author: Jon Oltsik, Senior Principal Analyst

IBM: An Early Leader across the Big Data Security Analytics Continuum Date: June 2013 Author: Jon Oltsik, Senior Principal Analyst ESG Brief IBM: An Early Leader across the Big Data Security Analytics Continuum Date: June 2013 Author: Jon Oltsik, Senior Principal Analyst Abstract: Many enterprise organizations claim that they already

More information

DATA MINING CONCEPTS AND TECHNIQUES. Marek Maurizio E-commerce, winter 2011

DATA MINING CONCEPTS AND TECHNIQUES. Marek Maurizio E-commerce, winter 2011 DATA MINING CONCEPTS AND TECHNIQUES Marek Maurizio E-commerce, winter 2011 INTRODUCTION Overview of data mining Emphasis is placed on basic data mining concepts Techniques for uncovering interesting data

More information

Senter for avbildning. Sverre Holm

Senter for avbildning. Sverre Holm Senter for avbildning Sverre Holm Sonar Ultrasound Imaging University of Oslo Faculty of Mathematics and Natural Sciences Department of Informatics Group for Digital Signal Processing and Image Analysis:

More information

Rational Asset Analyzer Technology Preview

Rational Asset Analyzer Technology Preview IBM Software Group Rational Asset Analyzer Technology Preview Richard Szulewski Rational Product Manager System z rszulews@us.ibm.com 2010 IBM Corporation Rational Asset Analyzer Technology Preview Rational

More information

Lær at spille din kort rigtigt, og få erhvervserfaring gennem AIESEC. Hvad er AIESEC? Vil du vide mere? Hvilke erfaringer kan du opnå i AIESEC?

Lær at spille din kort rigtigt, og få erhvervserfaring gennem AIESEC. Hvad er AIESEC? Vil du vide mere? Hvilke erfaringer kan du opnå i AIESEC? c Lær at spille din kort rigtigt, og få erhvervserfaring gennem AIESEC. Hvad er AIESEC? -International frivillig studenterorganisation -Hjælper studerende over hele verden med at finde globale udvekslingsmuligheder,

More information

Workshop Series on Open Source Research Methodology in Support of Non-Proliferation

Workshop Series on Open Source Research Methodology in Support of Non-Proliferation The International Centre for Security Analysis The Policy Institute at King s King s College London Workshop Series on Open Source Research Methodology in Support of Non-Proliferation Workshop 1: Exploiting

More information

The Challenges of Integrating Structured and Unstructured Data

The Challenges of Integrating Structured and Unstructured Data LANDMARK TECHNICAL PAPER 1 LANDMARK TECHNICAL PAPER The Challenges of Integrating Structured and Unstructured Data By Jeffrey W. Pferd, PhD, Sr. Vice President Strategic Consulting Practice at Petris Presented

More information

BIG DATA: PROMISE, POWER AND PITFALLS NISHANT MEHTA

BIG DATA: PROMISE, POWER AND PITFALLS NISHANT MEHTA BIG DATA: PROMISE, POWER AND PITFALLS NISHANT MEHTA Agenda Promise Definition Drivers of and for Big Data Increase revenue using Big Data Power Optimize operations and decrease costs Discover new revenue

More information

-84- Svein Lie Institutt for nordisk språk og Universitetet i Oslo

-84- Svein Lie Institutt for nordisk språk og Universitetet i Oslo -84- COMBINATORY COORDINATION IN NORWEGIAN Svein Lie Institutt for nordisk språk og Universitetet i Oslo litteratur It has been claimed that 1. Ola og Kan er gift Ola and Kan are married is ambigous. The

More information