A Technical Review of TIBCO Patterns Search



Similar documents
whitepaper The Evolutionary Steps to Master Data Management

SOLUTION BRIEF. TIBCO StreamBase for Algorithmic Trading

TIBCO StreamBase High Availability Deploy Mission-Critical TIBCO StreamBase Applications in a Fault Tolerant Configuration

TIBCO StreamBase High Availability Deploy Mission-Critical TIBCO StreamBase Applications in a Fault Tolerant Configuration

TIBCO ActiveSpaces Use Cases How in-memory computing supercharges your infrastructure

TIBCO Cyber Security Platform. Atif Chaughtai

SOLUTION BRIEF. TIBCO StreamBase for Foreign Exchange

whitepaper Five Principles for Integrating Software as a Service Applications

Integration Maturity Model Capability #1: Connectivity How improving integration supplies greater agility, cost savings, and revenue opportunity

Integration Maturity Model Capability #5: Infrastructure and Operations

Dell and JBoss just work Inventory Management Clustering System on JBoss Enterprise Middleware

TIBCO AT-A-GLANCE COMPANY OVERVIEW: CORPORATE EXECUTIVES: CUSTOMERS VERTICALLY DIVERSIFIED: CUSTOMERS GLOBALLY DIVERSIFIED: AREAS OF MARKET FOCUS:

TIBCO Managed File Transfer Suite

Introduction to TIBCO MDM

TIBCO Live Datamart: Push-Based Real-Time Analytics

Predictive Customer Interaction Management

Service Mediation. The Role of an Enterprise Service Bus in an SOA

Integration: Why Good Enough Doesn t Cut It 13 ways to mess with success

Resource Sizing: Spotfire for AWS

Four Clues Your Organization Suffers from Inefficient Integration, ERP Integration Part 1

Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier

BTIP BCO ipro M cess Suite

Streaming Analytics and the Internet of Things: Transportation and Logistics

Service-Oriented Integration: Managed File Transfer within an SOA (Service- Oriented Architecture)

Keeping up with the KPIs 10 steps to help identify and monitor key performance indicators for your business

Partner Collaboration Blueprint for ICD-10 Transition

Log Management Solution for IT Big Data

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

vrealize Business System Requirements Guide

Combating Fraud, Waste, and Abuse in Healthcare

TIBCO EVENT PROCESSING IN THE FAST DATA ARCHITECTURE OPERATIONAL INTELLIGENCE PLATFORM. TIBCO Live Datamart Continuous Query Processor

Extending the Benefits of SOA beyond the Enterprise

BIRT Document Transform

WHITEPAPER. Beyond Infrastructure Virtualization Platform Virtualization, PaaS and DevOps

SharePlex for SQL Server

Predictive Customer Interaction Management for Insurance Companies

How to Navigate Big Data with Ad Hoc Visual Data Discovery Data technologies are rapidly changing, but principles of 30 years ago still apply today

An Oracle White Paper March Best Practices for Real-Time Data Warehousing

End-to-end Processing with TIBCO Managed File Transfer (MFT) Improving Performance and Security during Internet File Transfer

Predictive Analytics with TIBCO Spotfire and TIBCO Enterprise Runtime for R

SOLUTION BRIEF. Advanced ODBC and JDBC Access to Salesforce Data.

Key Attributes for Analytics in an IBM i environment

Implementing TIBCO Nimbus with Microsoft SharePoint

SOLUTION BRIEF. TIBCO LogLogic A Splunk Management Solution

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

Oracle Database Scalability in VMware ESX VMware ESX 3.5

SOLUTION BRIEF. An ArcSight Management Solution

ORACLE TAX ANALYTICS. The Solution. Oracle Tax Data Model KEY FEATURES

FINANCIAL SERVICES: FRAUD MANAGEMENT A solution showcase

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

BROCADE PERFORMANCE MANAGEMENT SOLUTIONS

Enterprise Java Applications on VMware: High Availability Guidelines. Enterprise Java Applications on VMware High Availability Guidelines

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III

SOLUTION BRIEF. TIBCO Master Data Management Platform

What is it? What does it do? Benefits

The Methodology Behind the Dell SQL Server Advisor Tool

a division of Technical Overview Xenos Enterprise Server 2.0

Empowering the Masses with Analytics

DATA MASKING A WHITE PAPER BY K2VIEW. ABSTRACT K2VIEW DATA MASKING

JBOSS ENTERPRISE SOA PLATFORM AND JBOSS ENTERPRISE DATA SERVICES PLATFORM VALUE PROPOSITION AND DIFFERENTIATION

CASE STUDY: Oracle TimesTen In-Memory Database and Shared Disk HA Implementation at Instance level. -ORACLE TIMESTEN 11gR1

<Insert Picture Here> Oracle In-Memory Database Cache Overview

TIBCO Nimbus Cloud Service

Maximum performance, minimal risk for data warehousing

Dell* In-Memory Appliance for Cloudera* Enterprise

Elastic Application Platform for Market Data Real-Time Analytics. for E-Commerce

An Oracle White Paper October Maximize the Benefits of Oracle SOA Suite 11g with Oracle Service Bus

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

Web Traffic Capture Butler Street, Suite 200 Pittsburgh, PA (412)

Dell One Identity Manager Scalability and Performance

Version Overview. Business value

Pattern Insight Clone Detection

An Oracle White Paper August Oracle Database Auditing: Performance Guidelines

Cisco Data Preparation

The Top 10 Things DBAs Should Know About Toad for IBM DB2

C o n s u lt i n g S e r v i c e s. TIBCO SOA Project Organization, Staffing and Funding Best Practices: An Introduction

SOLUTION BRIEF. How to Centralize Your Logs with Logging as a Service: Solving Logging Challenges in the Face of Big Data

An Oracle White Paper February Oracle Data Integrator 12c Architecture Overview

High-Availability Fault Tolerant Computing for Remote and Branch Offices HA/FT solutions for Cisco UCS E-Series servers and VMware vsphere

Big Data Analytics with IBM Cognos BI Dynamic Query IBM Redbooks Solution Guide

How To Use Axway Sentinel

Understanding Oracle Certification, Support and Licensing for VMware Environments

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

tibbr Now, the Information Finds You.

JBoss enterprise soa platform

Key Requirements for a Job Scheduling and Workload Automation Solution

In-Database Analytics

Oracle Database. Products Available on the Oracle Database Examples Media. Oracle Database Examples. Examples Installation Guide 11g Release 2 (11.

Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

Cost-Effective Business Intelligence with Red Hat and Open Source

Complex, true real-time analytics on massive, changing datasets.

Transcription:

A Technical Review of TIBCO Patterns Search

2 TABLE OF CONTENTS SUMMARY... 3 ARCHITECTURAL OVERVIEW... 3 HOW DOES TIBCO PATTERNS SEARCH WORK?... 5 ELIMINATE THE NEED FOR RULES... 7 LOADING AND SYNCHRONIZING DATA... 8

3 Summary Since man first tried to decipher poorly formed characters on papyrus, humans have been using their innate ability to decipher errors and inconsistencies within data and recognize the underlying similarities. In more recent history, a number of automated techniques been developed to deal with poor data quality data. Each technique had its problems and limitations compared to the way humans work and most work only on names. They also require a significant amount of computing resources and often make errors. Unfortunately, as the size of enterprise and agency databases continues to grow, organizations have had little choice but to depend on these inadequate approaches, despite their limitations. TIBCO has approached the problem from a different angle: a mathematical pattern recognition that does not need to know the semantic or phonetic representation of data. The TIBCO Patterns algorithms look at the position of the characters and groups of characters (tokens) and their positional relationship to other data. While there have been several similar attempts over the years, other methods have all failed due to the immense computational needs this particular approach normally entails. In other words, they were impractical for today s huge data sets and sub-second performance requirements. The TIBCO patented approach uniquely scales to provide real-time responsiveness on very large databases about many different types of entities, in any language, with no specialized rules. As a result: it is now possible to deploy a powerful matching technology more rapidly one that delivers a significantly higher accuracy than other available methods. Architectural Overview TIBCO Patterns Search is an in-memory database search system that can be attached to virtually any data source, including Oracle, SQL Server, DBII, MYSQL. The search functions are integrated into applications via standard APIs using any common programming language, including Java,.Net, Python, and C/C++. Native integration with TIBCO BusinessWorks, TIBCO BusinessEvents, and TIBCO ActiveMatrix are also provided.

4 The engine is natively supported under Linux, various UNIX platforms, and Windows all on 32- and 64-bit processors. The engine can sustainably provide real-time, highly accurate search capabilities for small, medium, large, and extra large databases. The engine s architecture is such that all requests can be load-balanced across multiple instances and partitioned to handle databases of any size with subsecond latency. Multi-threaded, federated queries are possible enabling you to take advantage of a wide range of server environments, data schema, and business application needs. Not limited by data volume or throughput, any commodity hardware will do. One of our largest implementation has 700 million records processing 25 queries per second around the clock on a relatively modest blade server infrastructure. The highly compact engine is contained within a single executable of about 1MB size, which makes deployment easy on any size platform. Try the Live Demos To see how this works, please visit: www.netrics.com/demo. While viewing, note these have had no prepossessing, cleaning, matching rules, scrubbing, or normalizing whatsoever. They run on one commodity two-socket DELL PowerEdge Server with Intel CPUs running RedHat Linux.

5 How Does TIBCO Patterns Search Work? The search engine uses advanced mathematical modeling and bi-partite graphbased algorithms to calculate similarity scores. The clever (patented) part is how it processes an extremely large number of records in a very short amount of time all on standard hardware. The result: a powerful matching engine that can distinguish between patterns of data that strict SQL-type search and other types of matching solutions cannot perform. The engine is completely agnostic as to the type of data or domain and language. It makes no assumptions about whether your data is name and address, products, medical records, or double-byte characters representing supplier names languages can be intermixed. Its cultural and domain independence allows you to deploy the engine within hours, without prior knowledge of the type, structure, or state of your data. This all means that you don t have to build rules, perform data profiling, or normalize in order to find and capture significantly meaningful information from your data sources. Connecting the engine to all of your existing applications requires as few as 25 to 30 lines of code. Sample JAVA Implementation Code The following is a sample matching request in Java: this defines the connection to the matching engine, matches data (defined by query) within and across field boundaries (defined by field names) and returns a Java object result set for interpretation. import java.io.ioexception; import com.tibco.likeit.tibcoexception; import com.tibco.likeit.tibcoquery; import com.tibco.likeit.tibcosearchcfg; import com.tibco.likeit. TIBCOSearchOpts; import com.tibco.likeit.tibcosearchresponse; import com.tibco.likeit. TIBCOSearchResult; import com.tibco.likeit.tibcoserverinterface; public class test { public static String query(string host, String port, String table, String query) throws IOException, TIBCOException { TIBCOServerInterface si = new TIBCOSer verinterface(localhost, 5051, false, false);

6 //defines the connection to the engine TIBCOSearchResponse resp = null; String []fieldnames = { first, middle, last, street, city, zip, state }; TIBCOSearchOpts opts = new TIBCOSearchOpts(); TIBCOSearchCfg []tblcfgs = new TIBCOSearchCfg[1]; tblcfgs[0] = new TIBCOSearchCfg(people_table); tblcfgs[0].settibcoquery(tibcoquery.simple( jasoz fitgerlad klassen st paul mn 551,fieldNames,null)); //defines a cross-field query of the query sting against all fields in the table resp = si.search(tblcfgs, opts); // perform the query // extract result set as a string String s = ; TIBCOSearchResult []res = resp.getsearchresults(); for(int i = 0 ; i < res.length ; i++) { s += Double.toString(res[i].getMatchScore()); for (int j = 0; j<res[i].getfields().length; j++) { s += Integer.toString(i) + : + res[i].getfields()[j]; } s += \n ; } return s; } }

7 Eliminate the Need for Rules With TIBCO Patterns Search, rules do not matter. If required, the engine does provide for cross-token, cross field matching, back and forth across field boundaries in any way you choose. You also have fine-grained control of which fields are used for which part of the query, including how they are combined and how they relate to each other. You can also define the individual field or token sensitivity, weighting, and many other parameters that control the matching process. For most applications, very little tuning is required to obtain extremely accurate results. For every query, the engine returns a result set. Each are ranked and scored according to their similarity to the search text. The engine delivers not only a perrecord score across all fields, it can also provide individual scoring at the field and character level. A standard feature provides HTML embedded in the result record data that visually highlights which portions of the data records at the field and character level contributed to the match and to what degree. Typical Deployment TIBCO Patterns Search is invoked through an API that internally uses a TCP socket interface enabling horizontal scalability, as well as flexible load-balancing and failover options. Client libraries are provided, which allow the application to access the full functionality of the engine.

8 Loading and Synchronizing Data 1. Initial loading of the data into the engine 2. Synchronizing with updates of data and the engine index in near-real time Loading Data Static or Dynamic Data Source In this case, the data is loaded after a batch update to the data source (for example after the nightly update of product information). This is typically implemented by a cursor that iterates through the source table in the RDBMS and for each record or set of records invokes the TIBCO API to insert the records into the TIBCO Patterns Search table. In some situations, an initial load of the data has to be performed from an RDBMS while it is undergoing live changes. The challenge is to ensure that a constant set of data is loaded that also provides a well-defined entry point (timestamp) for the dynamic ongoing updates. The dynamic update then processes all changes from that timestamp moving forward. Synchronizing with Updates to the Underlying Tables

TIBCO Software Inc. (NASDAQ: TIBX) technology digitized Wall Street in the 80s with its event-driven Information Bus software, which helped make real-time business a strategic differentiator in the 90s. Today, TIBCO s infrastructure software gives customers the ability to constantly innovate by connecting applications and data in a service-oriented architecture, streamlining activities through business process management, and giving people the information and intelligence tools they need to make faster and smarter decisions, what we call The Power of Now. TIBCO serves more than 4,000 customers around the world with offices in more than 20 countries and an ecosystem of over 200 partners. Learn more at www.tibco.com. Global Headquarters 3303 Hillview Avenue Palo Alto, CA 94304 Tel: +1 650-846-1000 +1 800-420-8450 Fax: +1 650-846-1005 www.tibco.com 2010, TIBCO Software Inc. All rights reserved. TIBCO, the TIBCO logo, The Power of Now, and TIBCO Software are trademarks or registered trademarks of TIBCO Software Inc. in the United States and/or other countries. All other product and company names and marks mentioned in this document are the property of their respective owners and are mentioned for identifi cation purposes only.