A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks



Similar documents
Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

An Introduction to Data Mining

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Using Artificial Intelligence to Manage Big Data for Litigation

Management Decision Making. Hadi Hosseini CS 330 David R. Cheriton School of Computer Science University of Waterloo July 14, 2011

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Bayesian networks - Time-series models - Apache Spark & Scala

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Selecting a Taxonomy Management Tool. Wendi Pohs InfoClear Consulting #SLATaxo

Database Marketing, Business Intelligence and Knowledge Discovery

Foundations of Business Intelligence: Databases and Information Management

DATA MINING TECHNIQUES AND APPLICATIONS

Sanjeev Kumar. contribute

Healthcare Measurement Analysis Using Data mining Techniques

Dan French Founder & CEO, Consider Solutions

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

E-Learning Using Data Mining. Shimaa Abd Elkader Abd Elaal

Software Development Training Camp 1 (0-3) Prerequisite : Program development skill enhancement camp, at least 48 person-hours.

A Review of Data Mining Techniques

Promises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends

ANALYTICS IN BIG DATA ERA

Data Mining Analytics for Business Intelligence and Decision Support

Internet of Things, data management for healthcare applications. Ontology and automatic classifications

Mining. Practical. Data. Monte F. Hancock, Jr. Chief Scientist, Celestech, Inc. CRC Press. Taylor & Francis Group

An Overview of Knowledge Discovery Database and Data mining Techniques

Master's projects at ITMO University. Daniil Chivilikhin PhD ITMO University

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

Integrated Data Mining and Knowledge Discovery Techniques in ERP

IT services for analyses of various data samples

TDS - Socio-Environmental Data Science

Data Warehousing and Data Mining in Business Applications

Final Project Report

Business Information Systems. IT Enabled Services And Emerging Technologies. Chapter 4: Facilitated e-learning Part 1 of 2 CA M S Mehta, FCA

Masters in Information Technology

How To Use Neural Networks In Data Mining

2015 Workshops for Professors

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

Web Data Mining: A Case Study. Abstract. Introduction

Fluency With Information Technology CSE100/IMT100

Master of Science in Health Information Technology Degree Curriculum

Exploration and Visualization of Post-Market Data

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

Improving Decision Making and Managing Knowledge

How To Make Sense Of Data With Altilia

Data Mining + Business Intelligence. Integration, Design and Implementation

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

Cleaned Data. Recommendations

Course MIS. Foundations of Business Intelligence

Data Mining Solutions for the Business Environment

Prediction of Heart Disease Using Naïve Bayes Algorithm

Overview. Background. Data Mining Analytics for Business Intelligence and Decision Support

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Doctor of Philosophy in Computer Science

Unstructured Threat Intelligence Processing using NLP

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

The Data Mining Process

Certificate Program in Applied Big Data Analytics in Dubai. A Collaborative Program offered by INSOFE and Synergy-BI

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Qlik Sense Enabling the New Enterprise

Protein Protein Interaction Networks

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

CORE CLASSES: IS 6410 Information Systems Analysis and Design IS 6420 Database Theory and Design IS 6440 Networking & Servers (3)

72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD

Developing Microsoft SharePoint Server 2013 Advanced Solutions

Some Research Challenges for Big Data Analytics of Intelligent Security

Auto-Classification for Document Archiving and Records Declaration

TEXT ANALYTICS INTEGRATION

Data Analysis. Management Information Systems 13

An Introduction to Advanced Analytics and Data Mining

Data Mining and Soft Computing. Francisco Herrera

About the Author. The Role of Artificial Intelligence in Software Engineering. Brief History of AI. Introduction 2/27/2013

Principles of Data Mining by Hand&Mannila&Smyth

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Master Specialization in Knowledge Engineering

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Big Data: Rethinking Text Visualization

CHAPTER 1 INTRODUCTION

Foundations of Business Intelligence: Databases and Information Management

Implementing Data Models and Reports with Microsoft SQL Server

Essential Components of an Integrated Data Mining Tool for the Oil & Gas Industry, With an Example Application in the DJ Basin.

Big Data Text Mining and Visualization. Anton Heijs

SAS Fraud Framework for Banking

Machine Learning and Statistics: What s the Connection?

Grow Revenues and Reduce Risk with Powerful Analytics Software

Masters in Human Computer Interaction

Masters in Advanced Computer Science

MEng, BSc Computer Science with Artificial Intelligence

Knowledge Discovery from patents using KMX Text Analytics

Masters in Artificial Intelligence

A Mind Map Based Framework for Automated Software Log File Analysis

Data Mining Applications in Higher Education

Implementing Data Models and Reports with Microsoft SQL Server 2012 MOC 10778

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

Considering Third Generation ediscovery? Two Approaches for Evaluating ediscovery Offerings

Transcription:

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks Text Analytics World, Boston, 2013 Lars Hard, CTO

Agenda Difficult text analytics tasks Feature extraction Bio-inspired computational models Systemic AI Feature extraction in a broad sense 2

Definitions Systemic refers to something that is spread throughout, system-wide, affecting a group or system, such as a body, economy, market or society as a whole. Artificial Intelligence (AI), in a broad sense spanning over machine learning, big data, bio-inspired computing (neural networks, evolutionary computing). Often computational intelligence is more appropriate 3

Large volumes of unstructured data Low quality metadata makes supervised training hard Non-existing Contradictions Duplicates Errors Manipulation Etc. Dynamical content (constant change) Keyword search phrases = sparse text analytics problem 4

Many text analytics problems can be translated into optimization problems (systemic and local model level) In the end, it is all about separation There is no ONE model, regardless if it is feature extraction, classification, etc. Verticalization is always good but requires connections between multiple models Hard optimization problems are best approached with bio-inspired models 5

Verticalization 6

Unstructured Data Feature Extraction - AI Powered problem specific feature extraction arrays - Rapid modeling combined with evolution - Dynamical organization allowing inference driven feature extraction based on actual data 7

Numerical features from any data Whole framework for effective pre-processing of data for rapid extraction of numerical features Features are easy to process computationally one example is treating texts as a matrix of numbers with multiple factors describing the texts Features can be transformed easily Features can be normalized Features can be extectuted dynamically (even driven by inference) as they are needed or when more information is available. Features can be optimized by an evolutionary process, to adapt to certain types of problems or difficulties 8

Probabilistic decision tree one typical model 9

More metadata 10

11

Objectives Data mining Recommendations Discovery / exploration Diagnosis Estimation Classification Etc. 12

Multipe AI Modules Diagnostics (troubleshooting, medical diagnosis: deterministic, probabilistic and hybrid) Optimization (finding the best solution for highly complex problems) Recommendation (product recommendation based on soft parameters, weight systems, feedback, filters, etc.) Estimation (provide numerical predictions based on artificial neural networks) Image recognition (specialized domains and/or hierarchical recognition) Graphs Density & distance calculations Configuration (combine multiple components) Text classification (automated metadata extraction, hierachical classification) 13

Multiple sources Analytics, transformations and domain modelling Automated feature extraction Semi-automatic model designed and trained manually 14

15

A systemic model 16

Desktop AI tools approach for insights and modeling 17

Computational Intelligence (CI) EVOLUTION AS A MODEL- FREE APPROACH TO AI BRAIN AS AN INSPRIATION Genetic Algortihms Genetic Programming Cellular Automata Gene Expression Programming etc. Bio-Inspired computing COMPUTATIONAL INTELLIGENCE 18

Multiple scoring models for factor importance 19

After Evolutionary-Fuzzy Complexity Reduction Reducing 100K entries downto a rule-set of 6 simple rules using only four dimensions. This rule-set is capable of correct 96% separation. 20

Publishing Server Publishing Server (Machine (Machine Machine Learning) Learning) Data Store ❸ Sync ExpertMaker ExpertMaker AI/CI High-Speed AI/CI High-Speed AI Processing PROCESSING Processing Log Store ❻ ❹ Load Balancer Data Mining Data Mining Server Servers Load Balancer ❶ API Model design ❷ Admin Interface ❼ Monitoring Application back-end Platform ❺ Client Application 21

Summary Rapid approach, quickly generate test models Multiple attack points and multiple solutions No advanced NLP Manual training sets (supervised) can often be derived from modeling process Ontologies are good if we need to understand, but most problems cannot be understood given many features (= high dimensionality) 22