Text Analytics with Ambiverse. Text to Knowledge.

Size: px
Start display at page:

Download "Text Analytics with Ambiverse. Text to Knowledge. www.ambiverse.com"

Transcription

1 Text Analytics with Ambiverse Text to Knowledge

2 Version 1.0, February

3 Contents 1 Ambiverse: Text to Knowledge Text is all Around Ambiverse: Leading research to industry Text to Knowledge 6 2 Named Entity Disambiguation What is it? Why is it Important? Why is it Challenging? Ambiverse Gives Meaning to Text Ambiverse & YAGO, a Powerful Combination Integrating Domain-specific Knowledge Ambiverse Text Analytics in Facts 10 3 Applications Ambiverse Search Ambiverse Analyze Ambiverse Write Personalized Text Analytics 17

4

5 1. Ambiverse: Text to Knowledge 1.1 Text is all Around Most of the information produced by persons, organizations, and public institutions is in the form of text. In 2014, 300 million new websites were created. 1 Every year, 2 million blog posts are written, 2 thousands of news sites around the globe publish articles, and millions of new updates in social networks are generated. In fact, most of human interaction is performed via unstructured data (e.g., articles, reports, social network posts, adds, comments, reviews, etc). Companies and public institutions also tend to produce, on a regular basis, large quantities of internal documents. This vast amount of text goes beyond of what is commonly understood as big data. Textual information is not easy to interpret, it basically lacks a well defined structure. To make use of it, it is necessary to provide the machine with certain text understanding capabilities so that these huge collections of documents can be computationally analyzed and transformed into useful data. It is being increasingly understood that text analytics gives a big leverage to companies, persons, and public institutions. The text analytics market is expected to grow at an average rate of 25% per year. 3 By 2013 only 1% of the companies were processing its textual information, by % will do (Figure 1.1). 4 In domains such as news, advertising, finance, insurance, among others, companies are starting to make sense of its textual data as a means of adding value to their businesses newsletter_turning_dark_data_into_smart_data.pdf

6 6 Chapter 1. Ambiverse: Text to Knowledge % of companies using text analystics Figure 1.1: The use of text analytics will increase dramatically in the coming years 1.2 Ambiverse: Leading research to industry Ambiverse, a spin-off of the Max Planck Institute for Informatics, joins the new world of text analytics. Ambiverse develops a technology to automatically understand, analyze, and manage big collections of textual data. Ambiverse is built on years of state-of-the-art research in text analytics. In 2015, Ambiverse received an EXIST Transfer of Research grant by the German Federal Ministry for Economic Affairs and the European Union. 1.3 Text to Knowledge Our technology is focused on the recognition and disambiguation of named entities in text. It relies on years of experience in scientific developments by the Max Planck Institute for Informatics, a world leading institution in automatic text understanding. Our technology for named entity disambiguation was named the best named entity disambiguation system by IBM 5 and our corresponding scientific publications are among the most cited in the international automatic text understanding community 67. This cutting edge technology gives Ambiverse an advantage in the text analytics world, allowing the development of a new generation of text analytics tools to transform textual information into machine-understandable knowledge. 5 D. A. Ferrucci (2012). Introduction to This is Watson. IBM Journal of Research and Development. 6 J. Hoffart et al. (2011). Robust Disambiguation of Named Entities in Text. In Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP). 7 J. Hoffart et al. (2013). YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia. Artificial Intelligence.

7 2. Named Entity Disambiguation 2.1 What is it? A named entity, or simply entity, is a real-world object such as a person, an organization, a location or a product. Named entity disambiguation is the task of automatically recognizing the names of these objects in text and identifying their real-world reference. For instance, in the sentence Page played the hit Kashmir on his uniquely tuned Les Paul our disambiguation system recognizes that the mention Page refers to the famous rock guitarist Jimmy Page and not to Larry Page, founder of Google, and that Les Paul refers to the guitar and not its designer (see Figure 2.1). Figure 2.1: Selecting the correct entity for each mention: Jimmy Page, the song Kashmir and a Les Paul guitar

8 Chapter 2. Named Entity Disambiguation Why is it Important? Ambiguous entities are all around us. The variety of names is much smaller than one may think; there are more entities than names. Places are named after people, and people after people. Also places tend to have similar names, the same as people or products. In this context, knowing the real-world object of a reference produces significant gains in text understanding capabilities. If one wants to select or analyze documents mentioning the city of Paris in France, first we have to make sure that the mentions of Paris refer to the entity we are interested in and not, for instance, to the city of Paris in Texas. If one wants to efficiently search for information about Larry Page, we have to make sure to exclude documents about Jimmy Page, another famous Page. Even more, if companies want to analyze customer opinions about cars, they need to understand that a tweet refers to the Jeep Wrangler and not to Jeans Wrangler ( I bought a Wrangler, and it is very comfortable, I sell my brand new Wranglers, Figure 2.3). Knowing the correct meaning of a name allows to more efficiently analyze and search over large text collections. Ambiverse developed a state-of-the-art technology to disambiguate entities and a set of applications around it for smart text analytics. Image from flickr (zombieite) - CC-BY 2.0 Figure 2.2: Ambiverse Text Analytics helps to identify the real enthusiastic fans. 2.3 Why is it Challenging? Named entity mentions can be very ambiguous. The name Page can already refer to hundreds of entities, for more ambiguous names like John the potential candidates are likely in the thousands. A machine needs to resolve the meanings of all names in a single text assuring coherence among the entities (e.g., it is reasonable that Paris and France are simultaneously assigned to the french capital and the European country). Naive approaches of simply enumerating all possible combinations would quickly come up against a brick wall. Even for a single sentence with three or four moderately ambiguous names, the combination exceeds 100,000. For full documents, this becomes infeasible for even the fastest machines. Solving such a problem requires smart technologies as the one we provide in Ambiverse Text Analytics.

9 2.4 Ambiverse Gives Meaning to Text 9 Page played the hit Kashmir on his uniquely tuned Les Paul. 500 x 50 x 5 = possible candidate combinations Figure 2.3: There are 500 possible Pages, 50 possible Kashmirs, 5 possible Les Paul, leading to possible entity combinations. 2.4 Ambiverse Gives Meaning to Text Ambiverse Text Analytics opens up a wide range of possibilities to manage and understand big text collections. Its main characteristic is the capability to understand the meaning of the objects, detaching them from their textual representations. For instance, in the sentences Page played Kashmir., Jimmy rocked the show at Knebworth! and James Patrick Page is one of the greatest guitarists of all time., Ambiverse Text Analytics understands that Jimmy, Page, and James Patrick Page all refer to the same person (Figure 2.4). It understands real world concepts in text regardless of how they are actually mentioned. This allows Ambiverse to develop a set of applications around the named entity disambiguation technology, changing the way in which text is stored, searched, analyzed and produced. James Patrick Page is one of the greatest guitarists of all time. Page played Kashmir. Jimmy rocked the show at Knebworth! Figure 2.4: Ambiverse Text Analytics understands that all sentences refer to the same Jimmy Page. 2.5 Ambiverse & YAGO, a Powerful Combination All entities like Jimmy Page, Larry Page, Les Paul (person) and his self-named guitar are present in our YAGO knowledge base [Hof+13]. YAGO, which is derived from Wikipedia, can be thought of as a very large collection of entities. YAGO also contains accurate characterizations of all entities. It knows that Larry Page is a computer scientist, a corporate director, and a billionaire, that Google is a U. S. company, or that Jimmy Page is a guitarist and a musician. These characteristics of the entities are called categories or classes and are the key to develop useful applications

10 10 Chapter 2. Named Entity Disambiguation around named entity disambiguation technology. An example of YAGO is shown in Figure 2.5. artifact subclass subclass Classes song musician guitar type 1975 type type in created plays was played at played at Entities happened in Figure 2.5: Example of the knowledge stored in YAGO: The entities, their classes, and the relations between them. 2.6 Integrating Domain-specific Knowledge The flexible architecture of Ambiverse Text Analytics allows the use of additional domainspecific entities. Other knowledge bases (e.g., a company-specific knowledge base or a product catalog) can be easily integrated into our system or a specific user can concentrate in a specific slice of YAGO. This enables companies to focus on the entities of importance to them, like their products or customers. Ambiverse Text Analytics to be fully customized to the specific needs of our customers. 2.7 Ambiverse Text Analytics in Facts Performance The following numbers correspond to average length news articles processed on a compute instance with 16 CPU cores and 32 GB of memory. Documents per hour with high accuracy: Documents per hour with highest accuracy: The exact accuracy depends on the nature of the documents. An experimental evaluation on a large set of newswire documents [Hof+11] showed 80% accuracy for the high accuracy setting and 83% accuracy for the highest accuracy setting.

11 2.7 Ambiverse Text Analytics in Facts Languages We currently support English and German, while the prototype research languages include Arabic, Chinese, Italian and Spanish Knowledge Base A brief comparison of the size of YAGO and other prominent openly available knowledge bases shows that YAGO is among the most comprehensive and precise. YAGO s distinct advantages are the clear semantic modelling of entities and especially the specific class hierarchy, ranging from very general categories like person to highly specific ones like British rhythm and blues boom musicians. Also, YAGO is the only knowledge base that has been evaluated in terms of accuracy [Hof+13]. Entities Classes Accuracy English YAGO3 3.5 million 550 thousand > 95% Combined YAGO3 (10 languages) 4.6 million 570 thousand > 95% English DBpedia 4.8 million 735 not evaluated Combined DBpedia 38.3 million 735 not evaluated Table 2.1: Facts about the YAGO knowledge base! More details about YAGO are available at

12

13 3. Applications Ambiverse s cutting edge text analysis technology allows the development of a whole range of next-generation applications to manage, search, analyze and produce text. 3.1 Ambiverse Search Searching for Entities Traditional search engines take words or phrases as input and return a set of documents, in which these words or phrases may be more relevant. They have limited understanding of the user intent in the sense that they do not give meaning to the input words. They only understand their form. For instance, they cannot understand if the input word Paris refers to the city in France, to Paris Hilton, or to the mythological Greek character. Searching for Paris in a regular search engine will return documents where the word Paris appears without distinguishing which Paris it is. Probably documents referring to the city of Paris in France will be ranked at the top since it is the most popular entity. Users searching for less common Paris references should refine their input (e.g. Paris Greece Troy ), forcing them to express their intention by incorporating (sometimes unavailable) extra knowledge into the input. However, if the documents are first processed via Ambiverse Text Analytics (meaning that all entities in all documents have been previously identified), the user can search for the entities themselves independently of how they are mentioned in the text, and without any additional background knowledge. The user intent is fully described in the input entity itself. For instance, the user can directly search for Paris Hilton and no matter how she is referred to (e.g. Paris, Paris Hilton, Hilton s granddaughter, etc.), all documents in which she is mentioned will be retrieved (and properly ranked). All other documents where other Paris occurrences appear (Paris, France; the Greek character; Paris, Texas) will be excluded. This type of ambiguity is more common that one may think, resulting in highly imprecise search results. Ambiverse Search gives the user the capability to search for meaning or concepts on huge text collections, reaching more precise results by better interpreting the user s

14 14 Chapter 3. Applications Figure 3.1: Searching for the word Prada is imprecise due to its ambiguity. Figure 3.2: Searching for the company Prada gives precise results: Ambiguities have been resolved. intent, abstracting meaning from textual forms. Out of the box, we provide search for 4.6 million entities, to which, in addition, customer-specific entities can easily be added (see Section 2.6). Figures 3.1 and 3.2 provide an example of regular and smart search.! Try your own examples in the prototype of Ambiverse Search at

15 3.2 Ambiverse Analyze Searching for Categories: the Power of the YAGO Knowledge Base As mentioned before, YAGO contains information about categories for each entity. This allows us to incorporate a new abstraction layer to our search, something impossible in traditional search engines. Instead of searching for a given entity, we can directly search for a category so that a set of entities is grouped in our search. For instance, we can directly search for fashion labels, and all the documents mentioning a fashion label (e.g., Prada, Gucci, Chanel, etc.) will be retrieved. We can also search for documents containing German soccer players (e.g., Schweinsteiger, Thomas Müller, Mesut Özil, etc.), Harvard alumni (e.g., Barack Obama, Ban Ki-Moon, Natalie Portman, Robert Solow, etc.), or any other category available in our knowledge base. The secret here is that Ambiverse Text Analytics is capable of identifying the entities in the text and our knowledge base knows the categories of those entities. Our knowledge base contains more than 570k categories. Figure 3.3: Searching for the category high fashion brands finds documents on all fashion labels. 3.2 Ambiverse Analyze Understanding entities in text allows a whole new range of text analytics tools. For instance, one can visualize the correlation over time between two companies or even the correlation between a company and its sector. Ambiverse Analyze helps you understand how mentions of the fashion label Prada correlate to mentions of all fashion labels (Figure 3.4).! Try the prototype of Ambiverse Analyze at

16 16 Chapter 3. Applications Figure 3.4: Ambiverse Analyze plots the trends of Prada against all other fashion labels. 3.3 Ambiverse Write Understanding entities is also a key element in the production of intelligent texts. We developed Ambiverse Write, a smart authoring platform for intelligent text production: While typing, entities are automatically recognized, relevant entities are suggested and background information is provided to the author on the fly. An author writing about fashion topics will get suggestions about fashion brands or designers, and background information about them directly while typing. Figure 3.5: Ambiverse Write allows authors to write texts and link entities at the same time. Once the writing process has been completed, the text is ready for smart publishing: it gets annotated with the correct entities and can be immediately integrated into Ambiverse Search and Analyze. This integration also enables Ambiverse to continuously improve the quality of its technology, incorporating user specific annotations.

17 3.4 Personalized Text Analytics 17 In the example shown in Figure 3.5, authors can get a deeper understanding about the entities they are writing about without ever leaving the editor. Additionally, the links improve the reading experience for all readers, adding value to the article, making them stay longer, and use the article as a prominent reference.! Contact us for a demonstration of the prototype. 3.4 Personalized Text Analytics Companies or even individual users usually have their own knowledge base or want to add their own customization to YAGO (e.g., they may be interested in only a part of it or modify some entities or categories). We developed a framework that allows users to add their own entities to their specific knowledge base making our disambiguation technology fully customizable to each particular user and/or organization. Ambiverse Text Analytics will then focus on entities of interest for the user or adapt to the setting that the user considers most appropriate. The tool for augmenting an existing knowledge base is very intuitive and extremely simple to use. The user has different possibilities to easily generate its customized knowledge base without specific knowledge of our technology.! Contact us for a demonstration of the prototype.

18

19 References [Hof+11] [Hof+13] Johannes Hoffart et al. Robust Disambiguation of Named Entities in Text. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2011, pages (cited on page 10). Johannes Hoffart et al. YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia. In: Artificial Intelligence 194 (2013), pages (cited on pages 9, 11). Demos & Further Readings Ambiverse Search The prototype is available at Ambiverse Analyze Explore the prototype at YAGO More details are available at Ambiverse Max Planck Institute for Informatics Campus E Saarbrücken Germany Phone: Fax: [email protected]

IBM Content Analytics with Enterprise Search, Version 3.0

IBM Content Analytics with Enterprise Search, Version 3.0 IBM Content Analytics with Enterprise Search, Version 3.0 Highlights Enables greater accuracy and control over information with sophisticated natural language processing capabilities to deliver the right

More information

Sentiment Analysis on Big Data

Sentiment Analysis on Big Data SPAN White Paper!? Sentiment Analysis on Big Data Machine Learning Approach Several sources on the web provide deep insight about people s opinions on the products and services of various companies. Social

More information

ifinder ENTERPRISE SEARCH

ifinder ENTERPRISE SEARCH DATA SHEET ifinder ENTERPRISE SEARCH ifinder - the Enterprise Search solution for company-wide information search, information logistics and text mining. CUSTOMER QUOTE IntraFind stands for high quality

More information

Text Mining - Scope and Applications

Text Mining - Scope and Applications Journal of Computer Science and Applications. ISSN 2231-1270 Volume 5, Number 2 (2013), pp. 51-55 International Research Publication House http://www.irphouse.com Text Mining - Scope and Applications Miss

More information

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction Chapter-1 : Introduction 1 CHAPTER - 1 Introduction This thesis presents design of a new Model of the Meta-Search Engine for getting optimized search results. The focus is on new dimension of internet

More information

Mining the Web of Linked Data with RapidMiner

Mining the Web of Linked Data with RapidMiner Mining the Web of Linked Data with RapidMiner Petar Ristoski, Christian Bizer, and Heiko Paulheim University of Mannheim, Germany Data and Web Science Group {petar.ristoski,heiko,chris}@informatik.uni-mannheim.de

More information

IKEA: Behind the Best Global Retail Web Site

IKEA: Behind the Best Global Retail Web Site IKEA: Behind the Best Global Retail Web Site IKEA (www.ikea.com) is one of the world s most successful retailers, with more than 220 stores spanning 35 countries. Founded in Sweden more than 50 years ago,

More information

Real-Time Analytics: Integrating Social Media Insights with Traditional Data

Real-Time Analytics: Integrating Social Media Insights with Traditional Data SAP Brief SAP Rapid Deployment s SAP HANA Sentiment Intelligence Rapid-Deployment Objectives Real-Time Analytics: Integrating Social Media Insights with Traditional Data Capturing customer sentiment from

More information

Exploration and Visualization of Post-Market Data

Exploration and Visualization of Post-Market Data Exploration and Visualization of Post-Market Data Jianying Hu, PhD Joint work with David Gotz, Shahram Ebadollahi, Jimeng Sun, Fei Wang, Marianthi Markatou Healthcare Analytics Research IBM T.J. Watson

More information

Social Business Intelligence For Retail Industry

Social Business Intelligence For Retail Industry Actionable Social Intelligence SOCIAL BUSINESS INTELLIGENCE FOR RETAIL INDUSTRY Leverage Voice of Customers, Competitors, and Competitor s Customers to Drive ROI Abstract Conversations on social media

More information

International Social Media: Best Practices

International Social Media: Best Practices International Social Media: Best Practices Chris Adams Director of Research and Online Marketing Miles SOCIAL MEDIA & UGC BEST PRACTICES social 5 With you today Chris Adams Director of Research & Online

More information

Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak 9.6.2015

Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak 9.6.2015 Computer-Based Text- and Data Analysis Technologies and Applications Mark Cieliebak 9.6.2015 Data Scientist analyze Data Library use 2 About Me Mark Cieliebak + Software Engineer & Data Scientist + PhD

More information

How To Make Sense Of Data With Altilia

How To Make Sense Of Data With Altilia HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to

More information

Automated decision-making along the product life cycle saves OTTO millions

Automated decision-making along the product life cycle saves OTTO millions Customer Case Study RETAIL Automated decision-making along the product life cycle saves OTTO millions OTTO is a leader in Smart Data in German retail Overview Customer Online retailer for fashion and lifestyle

More information

Anatomy of Cyber Threats, Vulnerabilities, and Attacks

Anatomy of Cyber Threats, Vulnerabilities, and Attacks Anatomy of Cyber Threats, Vulnerabilities, and Attacks ACTIONABLE THREAT INTELLIGENCE FROM ONTOLOGY-BASED ANALYTICS 1 Anatomy of Cyber Threats, Vulnerabilities, and Attacks Copyright 2015 Recorded Future,

More information

Comprendium Translator System Overview

Comprendium Translator System Overview Comprendium System Overview May 2004 Table of Contents 1. INTRODUCTION...3 2. WHAT IS MACHINE TRANSLATION?...3 3. THE COMPRENDIUM MACHINE TRANSLATION TECHNOLOGY...4 3.1 THE BEST MT TECHNOLOGY IN THE MARKET...4

More information

DBpedia German: Extensions and Applications

DBpedia German: Extensions and Applications DBpedia German: Extensions and Applications Alexandru-Aurelian Todor FU-Berlin, Innovationsforum Semantic Media Web, 7. Oktober 2014 Overview Why DBpedia? New Developments in DBpedia German Problems in

More information

OPEN SOURCE INFORMATION ACQUISITION, ANALYSIS, AND INTEGRATION IN THE IAEA DEPARTMENT OF SAFEGUARDS 1

OPEN SOURCE INFORMATION ACQUISITION, ANALYSIS, AND INTEGRATION IN THE IAEA DEPARTMENT OF SAFEGUARDS 1 JAMES MARTIN CENTER FOR NONPROLIFERATION STUDIES Twentieth Anniversary Celebration: The Power and Promise of Nonproliferation Education and Training December 3-5, 2009 OPEN SOURCE INFORMATION ACQUISITION,

More information

A Strategic Approach to Unlock the Opportunities from Big Data

A Strategic Approach to Unlock the Opportunities from Big Data A Strategic Approach to Unlock the Opportunities from Big Data Yue Pan, Chief Scientist for Information Management and Healthcare IBM Research - China [contacts: [email protected] ] Big Data or Big Illusion?

More information

HELP DESK SYSTEMS. Using CaseBased Reasoning

HELP DESK SYSTEMS. Using CaseBased Reasoning HELP DESK SYSTEMS Using CaseBased Reasoning Topics Covered Today What is Help-Desk? Components of HelpDesk Systems Types Of HelpDesk Systems Used Need for CBR in HelpDesk Systems GE Helpdesk using ReMind

More information

Data Visualization Techniques

Data Visualization Techniques Data Visualization Techniques From Basics to Big Data with SAS Visual Analytics WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 Generating the Best Visualizations for Your Data... 2 The

More information

Content Analyst's Cerebrant Combines SaaS Discovery, Machine Learning, and Content to Perform Next-Generation Research

Content Analyst's Cerebrant Combines SaaS Discovery, Machine Learning, and Content to Perform Next-Generation Research INSIGHT Content Analyst's Cerebrant Combines SaaS Discovery, Machine Learning, and Content to Perform Next-Generation Research David Schubmehl IDC OPINION Organizations are looking for better ways to perform

More information

NEWSJACK YOUR WAY INTO THE MEDIA

NEWSJACK YOUR WAY INTO THE MEDIA 1 HOW TO NEWSJACK YOUR WAY INTO THE MEDIA A Free Excerpt from Bestselling Author David Meerman Scott s Hit Book, Newsjacking. Brought to you by 2 Portions excerpted with permission of the publisher John

More information

CORRALLING THE WILD, WILD WEST OF SOCIAL MEDIA INTELLIGENCE

CORRALLING THE WILD, WILD WEST OF SOCIAL MEDIA INTELLIGENCE CORRALLING THE WILD, WILD WEST OF SOCIAL MEDIA INTELLIGENCE Michael Diederich, Microsoft CMG Research & Insights Introduction The rise of social media platforms like Facebook and Twitter has created new

More information

Setting the Standard for Safe City Projects in the United States

Setting the Standard for Safe City Projects in the United States Leading Safe Cities Setting the Standard for Safe City Projects in the United States Edge360 is a provider of Safe City solutions to State & Local governments, helping our clients ensure they have a secure,

More information

DATA VISUALIZATION: When Data Speaks Business PRODUCT ANALYSIS REPORT IBM COGNOS BUSINESS INTELLIGENCE. Technology Evaluation Centers

DATA VISUALIZATION: When Data Speaks Business PRODUCT ANALYSIS REPORT IBM COGNOS BUSINESS INTELLIGENCE. Technology Evaluation Centers PRODUCT ANALYSIS REPORT IBM COGNOS BUSINESS INTELLIGENCE DATA VISUALIZATION: When Data Speaks Business Jorge García, TEC Senior BI and Data Management Analyst Technology Evaluation Centers Contents About

More information

Predictive Analytics Certificate Program

Predictive Analytics Certificate Program Information Technologies Programs Predictive Analytics Certificate Program Accelerate Your Career Offered in partnership with: University of California, Irvine Extension s professional certificate and

More information

UNIVERSITY OF INFINITE AMBITIONS. MASTER OF SCIENCE COMPUTER SCIENCE DATA SCIENCE AND SMART SERVICES

UNIVERSITY OF INFINITE AMBITIONS. MASTER OF SCIENCE COMPUTER SCIENCE DATA SCIENCE AND SMART SERVICES UNIVERSITY OF INFINITE AMBITIONS. MASTER OF SCIENCE COMPUTER SCIENCE DATA SCIENCE AND SMART SERVICES MASTER S PROGRAMME COMPUTER SCIENCE - DATA SCIENCE AND SMART SERVICES (DS3) This is a specialization

More information

STRATEGY MARKETING. Target MANAGEMENT VISION. Effective app store marketing strategies for your mobile VoIP app

STRATEGY MARKETING. Target MANAGEMENT VISION. Effective app store marketing strategies for your mobile VoIP app STRATEGY Target MARKETING MANAGEMENT VISION Effective app store marketing strategies for your mobile VoIP app 01 Effective app store marketing strategies for your mobile VoIP app These days it's not enough

More information

Unlocking The Value of the Deep Web. Harvesting Big Data that Google Doesn t Reach

Unlocking The Value of the Deep Web. Harvesting Big Data that Google Doesn t Reach Unlocking The Value of the Deep Web Harvesting Big Data that Google Doesn t Reach Introduction Every day, untold millions search the web with Google, Bing and other search engines. The volumes truly are

More information

Multi language e Discovery Three Critical Steps for Litigating in a Global Economy

Multi language e Discovery Three Critical Steps for Litigating in a Global Economy Multi language e Discovery Three Critical Steps for Litigating in a Global Economy 2 3 5 6 7 Introduction e Discovery has become a pressure point in many boardrooms. Companies with international operations

More information

iphone/ipad App Business Plan

iphone/ipad App Business Plan iphone/ipad App Business Plan Basic Ideas That Will Help You Make Money Why iphone/ipad? Let's see what the AppStore stats say: Average price for an app: $3.99 $4.99. Anyone can afford the apps without

More information

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING Mary-Elizabeth ( M-E ) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc. Is there valuable

More information

Leveraging unstructured data for improved decision making: A retail banking perspective

Leveraging unstructured data for improved decision making: A retail banking perspective View Point Leveraging unstructured data for improved decision making: A retail banking perspective - Sowmya Ramachandran and Kalyan Malladi Overview Up until now, despite possessing a large stash of structured

More information

Domain Analytics. Jay Daley,.nz Registrar Conference, 2015

Domain Analytics. Jay Daley,.nz Registrar Conference, 2015 Domain Analytics Jay Daley,.nz Registrar Conference, 2015 Domain Analytics Explained Using data science to provide insight into domain name usage Value for registrars understanding customers Value for

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful

More information

CLOUD ANALYTICS: Empowering the Army Intelligence Core Analytic Enterprise

CLOUD ANALYTICS: Empowering the Army Intelligence Core Analytic Enterprise CLOUD ANALYTICS: Empowering the Army Intelligence Core Analytic Enterprise 5 APR 2011 1 2005... Advanced Analytics Harnessing Data for the Warfighter I2E GIG Brigade Combat Team Data Silos DCGS LandWarNet

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

Localizing Your Mobile App is Good for Business

Localizing Your Mobile App is Good for Business Global Insight Localizing Your Mobile App is Good for Business Simply put, the more people who can find and use your mobile application in their native language, the larger your potential market. But launching

More information

CORPORATE PRESENTATION. Stefano Spaggiari, CEO

CORPORATE PRESENTATION. Stefano Spaggiari, CEO CORPORATE PRESENTATION Stefano Spaggiari, CEO LEADER IN COGNITIVE COMPUTING AND TEXT ANALYTICS Launches the first software, Errata Corrige The Cogito technology is born US subsidiary opens; Cogito is patented

More information

MLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS 2014. November 7, 2014. Machine Learning Group

MLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS 2014. November 7, 2014. Machine Learning Group Big Data and Its Implication to Research Methodologies and Funding Cornelia Caragea TARDIS 2014 November 7, 2014 UNT Computer Science and Engineering Data Everywhere Lots of data is being collected and

More information

Speech-to-Text Solutions for the European Market

Speech-to-Text Solutions for the European Market Think beyond the limits! Speech-to-Text Solutions for the European Market SME View to Language Scalability Jimmy Kunzmann European Media Laboratory European Media Laboratory European Media Laboratory GmbH

More information

Pay Per Click Marketing

Pay Per Click Marketing WHITE PAPER August 2009 Pay Per Click Marketing Drive Leads and Sales the Cost Efficient Way Introduction Pay Per Click is one of the most cost effective marketing tools available, and can generate instant

More information

IBM Global Business Services Microsoft Dynamics CRM solutions from IBM

IBM Global Business Services Microsoft Dynamics CRM solutions from IBM IBM Global Business Services Microsoft Dynamics CRM solutions from IBM Power your productivity 2 Microsoft Dynamics CRM solutions from IBM Highlights Win more deals by spending more time on selling and

More information

Marketing your Educational Program

Marketing your Educational Program Marketing your Educational Program Brand management strategies as a guide By Tonio Palmer Internationaler TDP Workshop, Tampa, FL April 22 nd, 2007 Introduction: Lauder Institute Joint Degree Program Wharton

More information

MAN VS. MACHINE. How IBM Built a Jeopardy! Champion. 15.071x The Analytics Edge

MAN VS. MACHINE. How IBM Built a Jeopardy! Champion. 15.071x The Analytics Edge MAN VS. MACHINE How IBM Built a Jeopardy! Champion 15.071x The Analytics Edge A Grand Challenge In 2004, IBM Vice President Charles Lickel and coworkers were having dinner at a restaurant All of a sudden,

More information

Gain Contextual Awareness for a Smarter Digital Enterprise with SAP HANA Vora

Gain Contextual Awareness for a Smarter Digital Enterprise with SAP HANA Vora SAP Brief SAP Technology SAP HANA Vora Objectives Gain Contextual Awareness for a Smarter Digital Enterprise with SAP HANA Vora Bridge the divide between enterprise data and Big Data Bridge the divide

More information

SAP BusinessObjects Edge BI, Standard Package Preferred Business Intelligence Choice for Growing Companies

SAP BusinessObjects Edge BI, Standard Package Preferred Business Intelligence Choice for Growing Companies SAP Solutions for Small Businesses and Midsize Companies SAP BusinessObjects Edge BI, Standard Package Preferred Business Intelligence Choice for Growing Companies SAP BusinessObjects Edge BI, Standard

More information

White Paper: Leveraging Web Intelligence to Enhance Cyber Security

White Paper: Leveraging Web Intelligence to Enhance Cyber Security White Paper: Leveraging Web Intelligence to Enhance Cyber Security October 2013 Inside: New context on Web Intelligence The need for external data in enterprise context Making better use of web intelligence

More information

JamiQ Social Media Monitoring Software

JamiQ Social Media Monitoring Software JamiQ Social Media Monitoring Software JamiQ's multilingual social media monitoring software helps businesses listen, measure, and gain insights from conversations taking place online. JamiQ makes cutting-edge

More information

Collecting Polish German Parallel Corpora in the Internet

Collecting Polish German Parallel Corpora in the Internet Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska

More information

Cloud Computing Capacity Planning. Maximizing Cloud Value. Authors: Jose Vargas, Clint Sherwood. Organization: IBM Cloud Labs

Cloud Computing Capacity Planning. Maximizing Cloud Value. Authors: Jose Vargas, Clint Sherwood. Organization: IBM Cloud Labs Cloud Computing Capacity Planning Authors: Jose Vargas, Clint Sherwood Organization: IBM Cloud Labs Web address: ibm.com/websphere/developer/zones/hipods Date: 3 November 2010 Status: Version 1.0 Abstract:

More information

Chapter 11. Managing Knowledge

Chapter 11. Managing Knowledge Chapter 11 Managing Knowledge VIDEO CASES Video Case 1: How IBM s Watson Became a Jeopardy Champion. Video Case 2: Tour: Alfresco: Open Source Document Management System Video Case 3: L'Oréal: Knowledge

More information

Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study

Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study Amar-Djalil Mezaour 1, Julien Law-To 1, Robert Isele 3, Thomas Schandl 2, and Gerd Zechmeister

More information

Big Impacts from Big Data UNION SQUARE ADVISORS LLC

Big Impacts from Big Data UNION SQUARE ADVISORS LLC Big Impacts from Big Data Solid Fundamental Drivers for the Big Data Analytics Market Massive Data Growth The Digital Universe - Data Growth (1) 7,910 exabytes Impacts of Analytics Will Be Felt Across

More information

Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu

Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu Domain Adaptive Relation Extraction for Big Text Data Analytics Feiyu Xu Outline! Introduction to relation extraction and its applications! Motivation of domain adaptation in big text data analytics! Solutions!

More information

Using In-Memory Data Fabric Architecture from SAP to Create Your Data Advantage

Using In-Memory Data Fabric Architecture from SAP to Create Your Data Advantage SAP HANA Using In-Memory Data Fabric Architecture from SAP to Create Your Data Advantage Deep analysis of data is making businesses like yours more competitive every day. We ve all heard the reasons: the

More information

Big Data: What defines it and why you may have a problem leveraging it DISCUSSION PAPER

Big Data: What defines it and why you may have a problem leveraging it DISCUSSION PAPER DISCUSSION PAPER 1. Enterprise data revolution One of the key trends in the enterprise technology world at the moment - and one that has been steadily growing in influence and importance in the past few

More information

Ridiculously Good Outsourcing. The Monetization of Big Data: Made Possible By Humans. www.taskus.com [email protected] (888) 400 - TASK

Ridiculously Good Outsourcing. The Monetization of Big Data: Made Possible By Humans. www.taskus.com info@taskus.com (888) 400 - TASK From The TaskUs Library The Monetization of Big Data: Made Possible By Humans Ridiculously Good Outsourcing www.taskus.com [email protected] (888) 400 - TASK The Monetization of Big Data: Made Possible by

More information

Why Semantic Analysis is Better than Sentiment Analysis. A White Paper by T.R. Fitz-Gibbon, Chief Scientist, Networked Insights

Why Semantic Analysis is Better than Sentiment Analysis. A White Paper by T.R. Fitz-Gibbon, Chief Scientist, Networked Insights Why Semantic Analysis is Better than Sentiment Analysis A White Paper by T.R. Fitz-Gibbon, Chief Scientist, Networked Insights Why semantic analysis is better than sentiment analysis I like it, I don t

More information

Big Data and Scripting. (lecture, computer science, bachelor/master/phd)

Big Data and Scripting. (lecture, computer science, bachelor/master/phd) Big Data and Scripting (lecture, computer science, bachelor/master/phd) Big Data and Scripting - abstract/organization abstract introduction to Big Data and involved techniques lecture (2+2) practical

More information

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance. Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics

More information

Text Analytics Beginner s Guide. Extracting Meaning from Unstructured Data

Text Analytics Beginner s Guide. Extracting Meaning from Unstructured Data Text Analytics Beginner s Guide Extracting Meaning from Unstructured Data Contents Text Analytics 3 Use Cases 7 Terms 9 Trends 14 Scenario 15 Resources 24 2 2013 Angoss Software Corporation. All rights

More information

Overview of MT techniques. Malek Boualem (FT)

Overview of MT techniques. Malek Boualem (FT) Overview of MT techniques Malek Boualem (FT) This section presents an standard overview of general aspects related to machine translation with a description of different techniques: bilingual, transfer,

More information

IBM SPSS Text Analytics for Surveys

IBM SPSS Text Analytics for Surveys IBM SPSS Text Analytics for Surveys IBM SPSS Text Analytics for Surveys Easily make your survey text responses usable in quantitative analysis Highlights With IBM SPSS Text Analytics for Surveys you can:

More information

Beyond listening Driving better decisions with business intelligence from social sources

Beyond listening Driving better decisions with business intelligence from social sources Beyond listening Driving better decisions with business intelligence from social sources From insight to action with IBM Social Media Analytics State of the Union Opinions prevail on the Internet Social

More information

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD. Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.

More information

BIG. Big Data Analysis John Domingue (STI International and The Open University) Big Data Public Private Forum

BIG. Big Data Analysis John Domingue (STI International and The Open University) Big Data Public Private Forum Big Data Analysis John Domingue (STI International and The Open University) Project co-funded by the European Commission within the 7th Framework Program (Grant Agreement No. 257943) 1 The Data landscape

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

An Esri White Paper April 2011 Esri Business Analyst Server System Design Strategies

An Esri White Paper April 2011 Esri Business Analyst Server System Design Strategies An Esri White Paper April 2011 Esri Business Analyst Server System Design Strategies Esri, 380 New York St., Redlands, CA 92373-8100 USA TEL 909-793-2853 FAX 909-793-5953 E-MAIL [email protected] WEB esri.com

More information

Safe Harbor Statement

Safe Harbor Statement Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment

More information

ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU

ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU ONTOLOGIES p. 1/40 ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU Unlocking the Secrets of the Past: Text Mining for Historical Documents Blockseminar, 21.2.-11.3.2011 ONTOLOGIES

More information

Optimization of Image Search from Photo Sharing Websites Using Personal Data

Optimization of Image Search from Photo Sharing Websites Using Personal Data Optimization of Image Search from Photo Sharing Websites Using Personal Data Mr. Naeem Naik Walchand Institute of Technology, Solapur, India Abstract The present research aims at optimizing the image search

More information

How To Create A Text Classification System For Spam Filtering

How To Create A Text Classification System For Spam Filtering Term Discrimination Based Robust Text Classification with Application to Email Spam Filtering PhD Thesis Khurum Nazir Junejo 2004-03-0018 Advisor: Dr. Asim Karim Department of Computer Science Syed Babar

More information

Requirements Analysis Concepts & Principles. Instructor: Dr. Jerry Gao

Requirements Analysis Concepts & Principles. Instructor: Dr. Jerry Gao Requirements Analysis Concepts & Principles Instructor: Dr. Jerry Gao Requirements Analysis Concepts and Principles - Requirements Analysis - Communication Techniques - Initiating the Process - Facilitated

More information