A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks Text Analytics World, Boston, 2013 Lars Hard, CTO
Agenda Difficult text analytics tasks Feature extraction Bio-inspired computational models Systemic AI Feature extraction in a broad sense 2
Definitions Systemic refers to something that is spread throughout, system-wide, affecting a group or system, such as a body, economy, market or society as a whole. Artificial Intelligence (AI), in a broad sense spanning over machine learning, big data, bio-inspired computing (neural networks, evolutionary computing). Often computational intelligence is more appropriate 3
Large volumes of unstructured data Low quality metadata makes supervised training hard Non-existing Contradictions Duplicates Errors Manipulation Etc. Dynamical content (constant change) Keyword search phrases = sparse text analytics problem 4
Many text analytics problems can be translated into optimization problems (systemic and local model level) In the end, it is all about separation There is no ONE model, regardless if it is feature extraction, classification, etc. Verticalization is always good but requires connections between multiple models Hard optimization problems are best approached with bio-inspired models 5
Verticalization 6
Unstructured Data Feature Extraction - AI Powered problem specific feature extraction arrays - Rapid modeling combined with evolution - Dynamical organization allowing inference driven feature extraction based on actual data 7
Numerical features from any data Whole framework for effective pre-processing of data for rapid extraction of numerical features Features are easy to process computationally one example is treating texts as a matrix of numbers with multiple factors describing the texts Features can be transformed easily Features can be normalized Features can be extectuted dynamically (even driven by inference) as they are needed or when more information is available. Features can be optimized by an evolutionary process, to adapt to certain types of problems or difficulties 8
Probabilistic decision tree one typical model 9
More metadata 10
11
Objectives Data mining Recommendations Discovery / exploration Diagnosis Estimation Classification Etc. 12
Multipe AI Modules Diagnostics (troubleshooting, medical diagnosis: deterministic, probabilistic and hybrid) Optimization (finding the best solution for highly complex problems) Recommendation (product recommendation based on soft parameters, weight systems, feedback, filters, etc.) Estimation (provide numerical predictions based on artificial neural networks) Image recognition (specialized domains and/or hierarchical recognition) Graphs Density & distance calculations Configuration (combine multiple components) Text classification (automated metadata extraction, hierachical classification) 13
Multiple sources Analytics, transformations and domain modelling Automated feature extraction Semi-automatic model designed and trained manually 14
15
A systemic model 16
Desktop AI tools approach for insights and modeling 17
Computational Intelligence (CI) EVOLUTION AS A MODEL- FREE APPROACH TO AI BRAIN AS AN INSPRIATION Genetic Algortihms Genetic Programming Cellular Automata Gene Expression Programming etc. Bio-Inspired computing COMPUTATIONAL INTELLIGENCE 18
Multiple scoring models for factor importance 19
After Evolutionary-Fuzzy Complexity Reduction Reducing 100K entries downto a rule-set of 6 simple rules using only four dimensions. This rule-set is capable of correct 96% separation. 20
Publishing Server Publishing Server (Machine (Machine Machine Learning) Learning) Data Store ❸ Sync ExpertMaker ExpertMaker AI/CI High-Speed AI/CI High-Speed AI Processing PROCESSING Processing Log Store ❻ ❹ Load Balancer Data Mining Data Mining Server Servers Load Balancer ❶ API Model design ❷ Admin Interface ❼ Monitoring Application back-end Platform ❺ Client Application 21
Summary Rapid approach, quickly generate test models Multiple attack points and multiple solutions No advanced NLP Manual training sets (supervised) can often be derived from modeling process Ontologies are good if we need to understand, but most problems cannot be understood given many features (= high dimensionality) 22