Speaking of Data: Computational Language Analysis
|
|
- Scot Dorsey
- 8 years ago
- Views:
Transcription
1 NP-SBJ PRP it S VBD had VP VBN traded VP S-ADV NP-SBJ ADJP-PRD -NONE- RB * over-the-counter TOP h 1 INDEX e 2 def q rel bark v rel prpstn m rel LBL h 4 dog n rel LBL h RELS LBL h 1 ARG0 x 5 LBL h 9 8 ARG0 e MARG h 3 RSTR h 6 ARG0 x 2 5 ARG1 x BODY h 5 7 HCONS h 3 = q h 9, h 6 = q h 8 Speaking of Data: Computational Language Analysis (In About Twenty Five Minutes) Stephan Oepen Universitetet i Oslo, Department of Informatics oe@ifi.uio.no
2 So, What Actually is Language Technology? (2001: A Space Odyssey; HAL 9000; 1968) Speaking of Data: Computational Language Analysis (2)
3 So, What Actually is Language Technology? (2001: A Space Odyssey; HAL 9000; 1968) Speaking of Data: Computational Language Analysis (2)
4 So, What Actually is Language Technology? (IBM Watson beats long-time Jeopardy! champions; 2011) Speaking of Data: Computational Language Analysis (2)
5 So, What Actually is Language Technology? (young) interdisciplinary science: language, computing, cognition; (again) culturally and commercially relevant for knowledge society. Speaking of Data: Computational Language Analysis (2)
6 What Makes Natural Language a Hard Problem? < Den andre veien mot Bergen er kort x 30 x 25 = 25 > The other path towards Bergen is short. {0.58} (1:1:0). > The other road towards Bergen is short. {0.56} (1:0:0). > The second road towards Bergen is short. {0.55} (2:0:0). > That other path towards Bergen is a card. {0.54} (0:1:0). > That other road towards Bergen is a card. {0.54} (0:0:0). > The second path towards Bergen is short. {0.51} (2:1:0). > The other road against Bergen is short. {0.48} (1:2:0). > The second road against Bergen is short. {0.48} (2:2:0). > Short is the other street towards Bergen. {0.33} (1:4:0). > Short is the second street towards Bergen. {0.33} (2:4:0). Speaking of Data: Computational Language Analysis (3)
7 What Makes Natural Language a Hard Problem? < Den andre veien mot Bergen er kort x 30 x 25 = 25 > The other path towards Bergen is short. {0.58} (1:1:0). > The other road towards Bergen is short. {0.56} (1:0:0). > The second road towards Bergen is short. {0.55} (2:0:0). > That other path towards Bergen is a card. {0.54} (0:1:0). > That other road towards Bergen is a card. {0.54} (0:0:0). > The second path towards Bergen is short. {0.51} (2:1:0). > The other road against Bergen is short. {0.48} (1:2:0). > The second road against Bergen is short. {0.48} (2:2:0). Scraped Off the Internet The other way to Bergen is short. > Short is the other street towards Bergen. {0.33} (1:4:0). > Short isthe theroad second to street the other towards bergen Bergen. is short {0.33}. (2:4:0). Den other roads against Boron Gene are short. Other one autobahn against Mountains am abrupt. Speaking of Data: Computational Language Analysis (3)
8 Some Examples: The Google Books Corpus (1 of 2) We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of culturomics, focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. [Jean-Baptiste Michel et al., 2010; Science] Speaking of Data: Computational Language Analysis (4)
9 Some Examples: The Google Books Corpus (2 of 2) Speaking of Data: Computational Language Analysis (5)
10 Some Examples: The Google Books Corpus (2 of 2) The corpus cannot be read by a human. If you tried to read only the entries from the year 2000 alone, at the reasonable pace of 200 words per minute, without interruptions for food or sleep, it would take eighty years. The sequence of letters is a thousand times longer than the human genome. Speaking of Data: Computational Language Analysis (5)
11 Some Examples: Statistical Machine Translation Rundt 200 forskere og informatikere fra hele verden kommer til Universitetet i Oslo og Ole Johan Dahls hus for å delta på foredrag og workshops om tungregning i forskningssammenheng. Konferansen har høy profil, og etter to dager med innledende workshops åpnes selve konferansen av viserektor Doris Jorde 26. mai. Some 200 researchers and computer scientist from all over the world stream to Oslo and the new Computer Science Building (Ole Johan Dahls hus) to participate in presentations and workshops on highperformance computing. The conference keeps a top-tier international profile. Following two days of introductory workshops, the main conference is opened on May 26 by vice dean Doris Jorde. Speaking of Data: Computational Language Analysis (6)
12 Some Examples: Statistical Machine Translation Rundt 200 forskere og informatikere fra hele verden kommer til Universitetet i Oslo og Ole Johan Dahls hus for å delta på foredrag og workshops om tungregning i forskningssammenheng. Konferansen har høy profil, og etter to dager med innledende workshops åpnes selve konferansen av viserektor Doris Purely statistical approach: based on probabilities of (a) word-to-word or phrase-to-phrase translations and (b) strings of just the target language words (n-grams). Some 200 researchers and computer scientist from all over the world stream to Oslo and the new Computer Science Building (Ole Johan Dahls hus) to participate in presentations and workshops on highperformance computing. The conference keeps a top-tier international profile. Following two days of introductory work- Jorde 26. mai. Statistics estimated shops, from: the main conference is (a) many millions of words of opened parallelon textmay and 26 by vice (b) trillions of words of target dean language Doris Jorde. text. Speaking of Data: Computational Language Analysis (6)
13 Closer to Home: Web-Scale N-Grams Google Web 1T N-Gram Corpus Frequency counts for sequences of between one and five words; reflecting one trillion tokens of English Web content (around 2006); royalty-free for research use; available for a dozen of languages; about to become important component in translation, parsing, et al. Technological Requirements 25 gbytes compressed ( 300 gbytes uncompressed); 14 million keys; hundreds to tens of thousands of queries in analysing one sentence. Speaking of Data: Computational Language Analysis (7)
14 Closer to Home: Web-Scale N-Grams Google Web 1T N-Gram Corpus Frequency counts for sequences of between one and five words; reflecting one trillion tokens of English Web content (around 2006); royalty-free for research use; available for a dozen of languages; about to become important component in translation, parsing, et al. Relatively typical use pattern: indexing and searching large, unstructured or structured data; predominantly random access. Technological Requirements 25 gbytes compressed ( 300 gbytes uncompressed); 14 million keys; hundreds to tens of thousands of queries in analyzing one sentence. Speaking of Data: Computational Language Analysis (7)
15 Our Current (Modest) Utlization of NorStore Speaking of Data: Computational Language Analysis (8)
16 Our Current (Modest) Utlization of NorStore Bring on-line a collection of standard language resources; used in teaching and research, across disciplines and faculties; should in principle be shared with other (Norwegian) universities. (To date, only textual resources; audio and video coming.) Speaking of Data: Computational Language Analysis (8)
17 Language Analysis: Some High-Level Predictions A Family of Disciplines Going Computational Imminent paradigm shift in linguistics, lexicography, philologies, et al. no scientific computing tradition: diversity in backgrounds and needs. Storage and Processing Requirements Emerging on-line repositories of language data (including metadata); standardize services for user identity and access rights management; batch processing: refinement of (mostly unstructured) language data; interactive processing: spontaneous search and browsing services; needs tighter coupling of storage and (scalable) processing services. Speaking of Data: Computational Language Analysis (9)
18 Language Analysis: Some High-Level Predictions A Family of Disciplines Going Computational Imminent paradigm shift in linguistics, lexicography, philologies, et al. no scientific computing tradition: diversity in backgrounds and needs. Storage and Processing Requirements Emerging on-line repositories of language data (including metadata); standardize services for user identity and access rights management; batch processing: refinement of (mostly unstructured) language data; interactive processing: spontaneous search and browsing services; needs tighter coupling of storage and (scalable) processing services. Language Analysis is an inherently data-intensive science. Speaking of Data: Computational Language Analysis (9)
19 More Concretely: Short-Term Initiatives Language Analysis einfrastructure User Group UniNett Sigma (Σ) looking to establish discipline-specific user groups; group-internal functions: exchange experience, coordinate activities; interface function to Σ: give feedback on user experience and needs; maybe one annual meeting contact if interested. Birds of a Feather Meeting Later Today Informal exchange of experiences, among language analysis users: 13:30 15:00: meeting room AWK (third floor). Speaking of Data: Computational Language Analysis (10)
20 Credits NoTur and NorStore (via UniNett Sigma); The UiO Scientific Computation Group; The Norwegian Taxpayer.
Guidelines for Resource Allocation on the National e-infrastructure.
Guidelines for Resource Allocation on the National e-infrastructure. Abbreviations: Administration: Committee: Partners/Metacenter: UNINETT Sigma2 Resource Allocation Committee (Ressursfordelingskomité)
More informationWelcome to the Master program in Network and System Administration (master's - 2 years)
Welcome to the Master program in Network and System Administration (master's - 2 years) Department of informatics Kirsti Dalseth E-mail: studieinfo@ifi.uio.no Phone: +47-22852410 Student reception, 4th
More informationTSD: a Secure and Scalable Service for Sensitive Data and ebiobanks
TSD: a Secure and Scalable Service for Sensitive Data and ebiobanks Gard Thomassen, PhD Head of Research Support Services Group University Center for Information Technology (USIT) University of Oslo Outline
More informationGlobal Scientific Data Infrastructures: The Big Data Challenges. Capri, 12 13 May, 2011
Global Scientific Data Infrastructures: The Big Data Challenges Capri, 12 13 May, 2011 Data-Intensive Science Science is, currently, facing from a hundred to a thousand-fold increase in volumes of data
More informationPeriodical evaluation of the Bachelor Program in Geosciences, for the Faculty of Mathematics and Natural Sciences, University of Oslo
Periodical evaluation of the Bachelor Program in Geosciences, for the Faculty of Mathematics and Natural Sciences, University of Oslo This documents reports on the evaluation, analyses, and conclusions
More informationAccelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems
Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural
More informationINF5820 Natural Language Processing - NLP. H2009 Jan Tore Lønning jtl@ifi.uio.no
INF5820 Natural Language Processing - NLP H2009 Jan Tore Lønning jtl@ifi.uio.no Semantic Role Labeling INF5830 Lecture 13 Nov 4, 2009 Today Some words about semantics Thematic/semantic roles PropBank &
More informationData Driven Discovery In the Social, Behavioral, and Economic Sciences
Data Driven Discovery In the Social, Behavioral, and Economic Sciences Simon Appleford, Marshall Scott Poole, Kevin Franklin, Peter Bajcsy, Alan B. Craig, Institute for Computing in the Humanities, Arts,
More informationHadoop Usage At Yahoo! Milind Bhandarkar (milindb@yahoo-inc.com)
Hadoop Usage At Yahoo! Milind Bhandarkar (milindb@yahoo-inc.com) About Me Parallel Programming since 1989 High-Performance Scientific Computing 1989-2005, Data-Intensive Computing 2005 -... Hadoop Solutions
More informationRFI Summary: Executive Summary
RFI Summary: Executive Summary On February 20, 2013, the NIH issued a Request for Information titled Training Needs In Response to Big Data to Knowledge (BD2K) Initiative. The response was large, with
More informationBasic probability theory and n-gram models
theory and n-gram models INF4820 H2010 Institutt for Informatikk Universitetet i Oslo 28. september Outline 1 2 3 Outline 1 2 3 Gambling the beginning of probabilities Example Throwing a fair dice Example
More informationCloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
More informationData Aggregation and Cloud Computing
Data Intensive Scalable Computing Harnessing the Power of Cloud Computing Randal E. Bryant February, 2009 Our world is awash in data. Millions of devices generate digital data, an estimated one zettabyte
More informationFree reflexives: Reflexives without
Nordic Atlas of Language Structures (NALS) Journal, Vol. 1, 522 526 C opyright Björn Lundquist 2014 Licensed under a Creative Commons Attribution 3.0 License Free reflexives: Reflexives without a sentence
More informationNordic Master in Didactics of Mathematics
Nordic Master in Didactics of Mathematics NORDIMA Barbro Grevholm Seminar i Bergen den 7-8 september 2011 Nordic Master in Didactics of Mathematics Project number NMP-2009/10730 The Master Consortium consists
More informationLarge-Scale Test Mining
Large-Scale Test Mining SIAM Conference on Data Mining Text Mining 2010 Alan Ratner Northrop Grumman Information Systems NORTHROP GRUMMAN PRIVATE / PROPRIETARY LEVEL I Aim Identify topic and language/script/coding
More informationReport of Contributions
Norwegian mini-winter school (Universitetet i Bergen og Oslo, NTNU, CERN) Report of Contributions https://indico.cern.ch/e/43419 Norwegian... / Report of Contributions Besøk i ATLAS detektoren Contribution
More informationDistributed Computing and Big Data: Hadoop and MapReduce
Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:
More informationBig Data and Apache Hadoop s MapReduce
Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23
More informationComputer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak 9.6.2015
Computer-Based Text- and Data Analysis Technologies and Applications Mark Cieliebak 9.6.2015 Data Scientist analyze Data Library use 2 About Me Mark Cieliebak + Software Engineer & Data Scientist + PhD
More informationManjula Ambur NASA Langley Research Center April 2014
Manjula Ambur NASA Langley Research Center April 2014 Outline What is Big Data Vision and Roadmap Key Capabilities Impetus for Watson Technologies Content Analytics Use Potential use cases What is Big
More informationMediating Science in Norway: Realities and Challenges. Harald Hornmoen, Journalism Programme Oslo University College
Mediating Science in Norway: Realities and Challenges Harald Hornmoen, Journalism Programme Oslo University College Dear participant, Science Communication/information requires simplification, and in order
More informationONE platform for ALL YOUR DATA Radim Petrzela February 26 th, 2013
ONE platform for ALL YOUR DATA Radim Petrzela February 26 th, 2013 POWER OF HITACHI Founded in 1910 US$118B FY11 900 subsidiaries 324,000 employees More than 760 PhDs INFORMATION and TELECOMMU- NICATIONS
More informationSeilingsliste. Nordgående. Januar. Februar
6 Nordgående Skip Fra Bergen Til og a Sandnessjøen Harstad Hammerfest Ankomst Trdheim Bodø Trom Hningsvåg Kirkenes Finnrken ** Kg Harald Finnrken * Kg Harald Finnrken Kg Harald Finnrken Kg Harald Finnrken
More informationSearch and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social
More informationAnalysis of Web Archives. Vinay Goel Senior Data Engineer
Analysis of Web Archives Vinay Goel Senior Data Engineer Internet Archive Established in 1996 501(c)(3) non profit organization 20+ PB (compressed) of publicly accessible archival material Technology partner
More informationHadoop for Enterprises:
Hadoop for Enterprises: Overcoming the Major Challenges Introduction to Big Data Big Data are information assets that are high volume, velocity, and variety. Big Data demands cost-effective, innovative
More informationIBM Big Data in Government
IBM Big in Government Turning big data into smarter decisions Deepak Mohapatra Sr. Consultant Government IBM Software Group dmohapatra@us.ibm.com The Big Paradigm Shift 2 Big Creates A Challenge And an
More informationUNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure
UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure Authors: A O Jaunsen, G S Dahiya, H A Eide, E Midttun Date: Dec 15, 2015 Summary Uninett Sigma2 provides High
More informationClustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012
Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering
More informationBig Data Challenges in Bioinformatics
Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?
More informationEMC ADVERTISING ANALYTICS SERVICE FOR MEDIA & ENTERTAINMENT
EMC ADVERTISING ANALYTICS SERVICE FOR MEDIA & ENTERTAINMENT Leveraging analytics for actionable insight ESSENTIALS Put your Big Data to work for you Pick the best-fit, priority business opportunity and
More informationDr. John E. Kelly III Senior Vice President, Director of Research. Differentiating IBM: Research
Dr. John E. Kelly III Senior Vice President, Director of Research Differentiating IBM: Research IBM Research Priorities Impact on IBM and the Marketplace Globalization and Leverage Balanced Research Agenda
More informationSources: Summary Data is exploding in volume, variety and velocity timely
1 Sources: The Guardian, May 2010 IDC Digital Universe, 2010 IBM Institute for Business Value, 2009 IBM CIO Study 2010 TDWI: Next Generation Data Warehouse Platforms Q4 2009 Summary Data is exploding
More informationBig Data Platform Evaluation
Technology Evaluation Report BMMsoft EDMT Server with Sybase IQ Big Data Platform Evaluation 2012 InfoSizing, Inc. (WWW.SIZING.COM) Big Data Platform Evaluation BMMsoft EDMT Server Executive Summary In
More informationSearch and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
More informationSEAIP 2009 Presentation
SEAIP 2009 Presentation By David Tan Chair of Yahoo! Hadoop SIG, 2008-2009,Singapore EXCO Member of SGF SIG Imperial College (UK), Institute of Fluid Science (Japan) & Chicago BOOTH GSB (USA) Alumni Email:
More informationWhat Is In-Memory Computing and What Does It Mean to U.S. Leaders? EXECUTIVE WHITE PAPER
What Is In-Memory Computing and What Does It Mean to U.S. Leaders? EXECUTIVE WHITE PAPER A NEW PARADIGM IN INFORMATION TECHNOLOGY There is a revolution happening in information technology, and it s not
More informationBig Data Hope or Hype?
Big Data Hope or Hype? David J. Hand Imperial College, London and Winton Capital Management Big data science, September 2013 1 Google trends on big data Google search 1 Sept 2013: 1.6 billion hits on big
More informationNgram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department
More information«If the patient could decide» - diagnosis and treatment of breast cancer
«If the patient could decide» - diagnosis and treatment of breast cancer Oslo University Hospital Innovation director Kari J. Kværner Financed by the Norwegian Design Council (DIP-midler) the Norwegian
More informationData Refinery with Big Data Aspects
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
More informationData-Intensive Science and Scientific Data Infrastructure
Data-Intensive Science and Scientific Data Infrastructure Russ Rew, UCAR Unidata ICTP Advanced School on High Performance and Grid Computing 13 April 2011 Overview Data-intensive science Publishing scientific
More informationA Strategic Approach to Unlock the Opportunities from Big Data
A Strategic Approach to Unlock the Opportunities from Big Data Yue Pan, Chief Scientist for Information Management and Healthcare IBM Research - China [contacts: panyue@cn.ibm.com ] Big Data or Big Illusion?
More informationCAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING
CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING Mary-Elizabeth ( M-E ) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc. Is there valuable
More informationI N T E L L I G E N T S O L U T I O N S, I N C. DATA MINING IMPLEMENTING THE PARADIGM SHIFT IN ANALYSIS & MODELING OF THE OILFIELD
I N T E L L I G E N T S O L U T I O N S, I N C. OILFIELD DATA MINING IMPLEMENTING THE PARADIGM SHIFT IN ANALYSIS & MODELING OF THE OILFIELD 5 5 T A R A P L A C E M O R G A N T O W N, W V 2 6 0 5 0 USA
More informationSymantec's Secret Sauce for Mobile Threat Protection. Jon Dreyfus, Ellen Linardi, Matthew Yeo
Symantec's Secret Sauce for Mobile Threat Protection Jon Dreyfus, Ellen Linardi, Matthew Yeo 1 Agenda 1 2 3 4 Threat landscape and Mobile Insight overview What s unique about Mobile Insight Mobile Insight
More informationDomain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu
Domain Adaptive Relation Extraction for Big Text Data Analytics Feiyu Xu Outline! Introduction to relation extraction and its applications! Motivation of domain adaptation in big text data analytics! Solutions!
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More informationI. Justification and Program Goals
MS in Data Science proposed by Department of Computer Science, B. Thomas Golisano College of Computing and Information Sciences Department of Information Sciences and Technologies, B. Thomas Golisano College
More informationChapter 11. Managing Knowledge
Chapter 11 Managing Knowledge VIDEO CASES Video Case 1: How IBM s Watson Became a Jeopardy Champion. Video Case 2: Tour: Alfresco: Open Source Document Management System Video Case 3: L'Oréal: Knowledge
More informationBig Data a threat or a chance?
Big Data a threat or a chance? Helwig Hauser University of Bergen, Dept. of Informatics Big Data What is Big Data? well, lots of data, right? we come back to this in a moment. certainly, a buzz-word but
More informationScala Storage Scale-Out Clustered Storage White Paper
White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current
More informationBig Data Analytics: Today's Gold Rush November 20, 2013
Copyright 2013 Vivit Worldwide Big Data Analytics: Today's Gold Rush November 20, 2013 Brought to you by Copyright 2013 Vivit Worldwide Hosted by Bernard Szymczak Vivit Leader Ohio Chapter TQA SIG Copyright
More informationBig Data With Hadoop
With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
More informationStairs 5. Test Stairs 5 Chapter 1 Meeting people. Bokmål. Step 1. Listening. Sett kryss ved bildene som passer til teksten:
Test Stairs 5 Chapter 1 Meeting people Bokmål Step 1 Listening Sett kryss ved bildene som passer til teksten: 1 Step 1 Reading Sett strek mellom riktig bilde og setning. The boys are on a plane. The girl
More informationBig Data Analytics. Prof. Dr. Lars Schmidt-Thieme
Big Data Analytics Prof. Dr. Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany 33. Sitzung des Arbeitskreises Informationstechnologie,
More informationIs Big Data a Big Deal? What Big Data Does to Science
Is Big Data a Big Deal? What Big Data Does to Science Netherlands escience Center Wilco Hazeleger Wilco Hazeleger Student @ Wageningen University and Reading University Meteorology PhD @ Utrecht University,
More informationCiteSeer x in the Cloud
Published in the 2nd USENIX Workshop on Hot Topics in Cloud Computing 2010 CiteSeer x in the Cloud Pradeep B. Teregowda Pennsylvania State University C. Lee Giles Pennsylvania State University Bhuvan Urgaonkar
More informationMEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012
MEDICAL DATA MINING Timothy Hays, PhD Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012 2 Healthcare in America Is a VERY Large Domain with Enormous Opportunities for Data
More informationKeynote: Big Data, Big Deal
Keynote: Big Data, Big Deal Piyush Malik Global Business Services, IBM Silicon Valley San Diego October 6 th, 2015 Outline 1 Why Big Data matters 2 Real World Applications 3 Future in a Data-Driven world
More informationAn Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
More informationndna Tim Hughes Avdeling for Medisinsk Gene@kk Oslo Universitets Sykehus (Ullevål)
ndna Utvikling av nasjonal analyse- og lagringspla3orm for DNA sekvensdata i helsevesenet Tim Hughes Avdeling for Medisinsk Gene@kk Oslo Universitets Sykehus (Ullevål) My goal Present the ndna project
More informationModule Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
More informationManaging Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
More informationOnline Content Optimization Using Hadoop. Jyoti Ahuja Dec 20 2011
Online Content Optimization Using Hadoop Jyoti Ahuja Dec 20 2011 What do we do? Deliver right CONTENT to the right USER at the right TIME o Effectively and pro-actively learn from user interactions with
More informationClinical Knowledge Manager. Product Description 2012 MAKING HEALTH COMPUTE
Clinical Knowledge Manager Product Description 2012 MAKING HEALTH COMPUTE Cofounder and major sponsor Member and official submitter for HL7/OMG HSSP RLUS, EIS 'openehr' is a registered trademark of the
More informationInternational Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,
More informationHvordan kunders informasjonsbehov er endret og hvordan bedrifter kan tilpasse seg denne endringen. Steinar Heggemsnes HP Exstream
Hvordan kunders informasjonsbehov er endret og hvordan bedrifter kan tilpasse seg denne endringen. Steinar Heggemsnes HP Exstream Kommunikasjon i går og i dag Agenda: Kommunikasjon i går og i dag Hvordan
More informationHow to transform data into dollars this is always about Business Intelligence
Swiss BI Day - 03/04/2014! How to transform data into dollars this is always about Business Intelligence! Philippe Nieuwbourg philippe.nieuwbourg@decideo.com A lot of things in common between oil and
More informationAttitudes towards English in Norwegian newspaper discourse. Anne-Line Graedler
Attitudes towards English in Norwegian newspaper discourse Anne-Line Graedler Outline 1. Background 2. Aim 3. Data 4. Analysis Context What are the texts about? How is English referred to in the texts?
More informationWhitepaper. Leveraging Social Media Analytics for Competitive Advantage
Whitepaper Leveraging Social Media Analytics for Competitive Advantage May 2012 Overview - Social Media and Vertica From the Internet s earliest days computer scientists and programmers have worked to
More informationWeb Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
More informationFP7-ICT-2013-11-4.2. Scalable Data Analytics. Deadline: 16 April 2013 at 17:00:00 (Brussels local time)
Scalable Data Analytics Deadline: 16 April 2013 at 17:00:00 (Brussels local time) Agenda Time 14H30 Programme Overview of Objective 4.2 Scalable Data Analytics By Carola Carstens, European Commission,
More informationWho needs humans to run computers? Role of Big Data and Analytics in running Tomorrow s Computers illustrated with Today s Examples
15 April 2015, COST ACROSS Workshop, Würzburg Who needs humans to run computers? Role of Big Data and Analytics in running Tomorrow s Computers illustrated with Today s Examples Maris van Sprang, 2015
More informationIBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!
The Bloor Group IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS VENDOR PROFILE The IBM Big Data Landscape IBM can legitimately claim to have been involved in Big Data and to have a much broader
More informationSanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a
More informationDEFINITE AND INDEFINITE FORM
DEFINITE AND INDEFINITE FORM In Norwegian, a noun can appear either in the indefinite form or in the definite form. There are some absolute rules that determine which form is correct, but three important
More informationStreaming Big Data Performance Benchmark for Real-time Log Analytics in an Industry Environment
Streaming Big Data Performance Benchmark for Real-time Log Analytics in an Industry Environment SQLstream s-server The Streaming Big Data Engine for Machine Data Intelligence 2 SQLstream proves 15x faster
More informationGør dine big data klar til analyse på en nem måde med Hadoop og SAS Data Loader for Hadoop. Jens Dahl Mikkelsen SAS Institute
Gør dine big data klar til analyse på en nem måde med Hadoop og SAS Data Loader for Hadoop Jens Dahl Mikkelsen SAS Institute Indhold Udfordringer for analytikeren Hvordan kan SAS Data Loader for Hadoop
More informationTopics in basic DBMS course
Topics in basic DBMS course Database design Transaction processing Relational query languages (SQL), calculus, and algebra DBMS APIs Database tuning (physical database design) Basic query processing (ch
More informationResearch Data Alliance: Current Activities and Expected Impact. SGBD Workshop, May 2014 Herman Stehouwer
Research Data Alliance: Current Activities and Expected Impact SGBD Workshop, May 2014 Herman Stehouwer The Vision 2 Researchers and innovators openly share data across technologies, disciplines, and countries
More informationManifest for Big Data Pig, Hive & Jaql
Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,
More informationBreaking News! Big Data is Solved. What Is In-Memory Computing and What Does It Mean to U.S. Leaders? EXECUTIVE WHITE PAPER
Breaking News! Big Data is Solved. What Is In-Memory Computing and What Does It Mean to U.S. Leaders? EXECUTIVE WHITE PAPER There is a revolution happening in information technology, and it s not just
More informationStreaming Big Data Performance Benchmark. for
Streaming Big Data Performance Benchmark for 2 The V of Big Data Velocity means both how fast data is being produced and how fast the data must be processed to meet demand. Gartner Static Big Data is a
More informationM3039 MPEG 97/ January 1998
INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND ASSOCIATED AUDIO INFORMATION ISO/IEC JTC1/SC29/WG11 M3039
More informationStreaming multimedia les from relational database
Streaming multimedia les from relational database Tomasz Rybak Applied Systems Division Software Departament Faculty of Computer Science Bialystok Technical University rybak@ii.pb.bialystok.pl Tomasz Rybak
More informationLean Development A team approach to Software Application Development
Lean Development A team approach to Software Application Development By P. Nallasenapathi Vice President, Saksoft Date: March 2006 India Phone: +91 44 2461 4501 Email: info@saksoft.com USA Phone: +1 212
More informationHadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
More informationQlikView Overview. Inge Nyheim Partner Sales Manager Inge.nyheim@qlikview.com
QlikView Overview Inge Nyheim Partner Sales Manager Inge.nyheim@qlikview.com Legal Disclaimer This Presentation contains forward-looking statements, including, but not limited to, statements regarding
More informationECS 165A: Introduction to Database Systems
ECS 165A: Introduction to Database Systems Todd J. Green based on material and slides by Michael Gertz and Bertram Ludäscher Winter 2011 Dept. of Computer Science UC Davis ECS-165A WQ 11 1 1. Introduction
More informationIBM: An Early Leader across the Big Data Security Analytics Continuum Date: June 2013 Author: Jon Oltsik, Senior Principal Analyst
ESG Brief IBM: An Early Leader across the Big Data Security Analytics Continuum Date: June 2013 Author: Jon Oltsik, Senior Principal Analyst Abstract: Many enterprise organizations claim that they already
More informationDATA MINING CONCEPTS AND TECHNIQUES. Marek Maurizio E-commerce, winter 2011
DATA MINING CONCEPTS AND TECHNIQUES Marek Maurizio E-commerce, winter 2011 INTRODUCTION Overview of data mining Emphasis is placed on basic data mining concepts Techniques for uncovering interesting data
More informationSenter for avbildning. Sverre Holm
Senter for avbildning Sverre Holm Sonar Ultrasound Imaging University of Oslo Faculty of Mathematics and Natural Sciences Department of Informatics Group for Digital Signal Processing and Image Analysis:
More informationRational Asset Analyzer Technology Preview
IBM Software Group Rational Asset Analyzer Technology Preview Richard Szulewski Rational Product Manager System z rszulews@us.ibm.com 2010 IBM Corporation Rational Asset Analyzer Technology Preview Rational
More informationLær at spille din kort rigtigt, og få erhvervserfaring gennem AIESEC. Hvad er AIESEC? Vil du vide mere? Hvilke erfaringer kan du opnå i AIESEC?
c Lær at spille din kort rigtigt, og få erhvervserfaring gennem AIESEC. Hvad er AIESEC? -International frivillig studenterorganisation -Hjælper studerende over hele verden med at finde globale udvekslingsmuligheder,
More informationWorkshop Series on Open Source Research Methodology in Support of Non-Proliferation
The International Centre for Security Analysis The Policy Institute at King s King s College London Workshop Series on Open Source Research Methodology in Support of Non-Proliferation Workshop 1: Exploiting
More informationThe Challenges of Integrating Structured and Unstructured Data
LANDMARK TECHNICAL PAPER 1 LANDMARK TECHNICAL PAPER The Challenges of Integrating Structured and Unstructured Data By Jeffrey W. Pferd, PhD, Sr. Vice President Strategic Consulting Practice at Petris Presented
More informationBIG DATA: PROMISE, POWER AND PITFALLS NISHANT MEHTA
BIG DATA: PROMISE, POWER AND PITFALLS NISHANT MEHTA Agenda Promise Definition Drivers of and for Big Data Increase revenue using Big Data Power Optimize operations and decrease costs Discover new revenue
More information-84- Svein Lie Institutt for nordisk språk og Universitetet i Oslo
-84- COMBINATORY COORDINATION IN NORWEGIAN Svein Lie Institutt for nordisk språk og Universitetet i Oslo litteratur It has been claimed that 1. Ola og Kan er gift Ola and Kan are married is ambigous. The
More information