Data Sheet In today s information economy, organizations rely heavily on the availability and quality of data at all levels to support investments in new business capabilities, business intelligence, compliance and for creating informationdriven strategic advantage. This requires moving away from largely siloed and vertical-focused data management solutions into architectures that offer a unified data layer through virtualization for the agile delivery of data services across an increasing number of physical data sources and to multiple applications and users. Denodo Platform 5.5 responds to the need of dealing with more data types, ubiquitous data and bigger data volumes, reducing data latencies, reducing data replication and delivering more agility and value than traditional technologies. The platform is an integrated Data Virtualization platform that creates, manages and delivers virtualized data services at both project and enterprise levels with speed, agility and high performance. Denodo Platform 5.5 connects to data sources be they structured, semi-structured or unstructured, internal or external and combines them into logical / virtual data services to provide unified access and integrated delivery through a single virtual data layer and publishes these to consuming applications in real-time (right-time). It includes key capabilities for real-time query optimization supported by intelligent caching and scheduled data orchestration, unified data governance and quality, and ability to deliver data services in multiple formats with managed security and service-levels. Denodo Platform 5.5 is an integrated software platform that delivers these fundamental capabilities required for world-class Data Virtualization: Universal Data Access: Read/write from any source or data type including Legacy, NoSQL and Cloud. Web Automation and Indexing automatically navigates and extracts data from Web and unstructured sources into structured views for integration. Unified Virtual Data Layer: Builds powerful transformations and relationships using an integrated modeling and execution environment to normalize, transform, improve quality and relate data across heterogeneous source types using common metadata and semantics. An extended relational data model allows disparate data types to be represented natively in the virtual layer minimizing effort and maximizing performance. Universal Data Provisioning: Expose the combined information as reusable Linked Data Services in multiple formats (SQL, SOAP WS, REST WS, fully RESTful Linked Data, XML, JSON, XHTML, etc.) and supporting hybrid delivery modes (virtual real-time, cache, batch, etc.) to consuming applications. Agile High Performance: Advanced real-time optimization supplemented by intelligent Caching and Scheduled Batch for flexible mixed workloads. Supports read/write access with enterprise class Denodo Platform 5.5 reliability and scalability. Unified Data Governance: Enterprise-wide single entry-point for data and metadata management, security, audit, logging and monitoring enabled through built-in tools and instrumentation as well as integration to external data management tools. Agile Development: Rapid ways to deliver pervasive, self-service data services using graphical, wizard driven UI and discovery tools. Hide complexity to application developers and business users; Decouple consuming applications and data sources; Allow easy creation, extension and use of data services.
Benefits Denodo Platform 5.5 provides key benefits and advantages when compared to custom coding or traditional data integration solutions by delivering more-for-less : more data access and provisioning in less time and cost. All data leveraged: Combine disparate data (Enterprise, Web, Big Data, NoSQL, Unstructured) to create meaningful new business data services. Right-time integration: Fresher information from the source of truth, balanced with performance. Cost saving: Allows physical replication, but only as needed. Saves HW, SW and people costs. Faster time to market: A library of high quality data services speeds development on new projects. Data Services reuse: Data is leveraged efficiently across transactional, analytical and informational applications to deliver the benefits of virtualization. Unified ease of use: A unified access platform to graphically model all data sources and publish and manage data services that is easy and intuitive. Scalable performance: Advanced automatic query optimization with option for manual override control, load balancing, and modular clustered scalability. Features CONSUMERS Enterprise Applications, ESB Reporting, BI, Portals JDBC WS WS... Mobile, Web, Users PUBLISH JDBC WS 1 WS 2 Expose entities as data services data virtualization MODEL Data services support multiple standard protocols including JDBC, ODBC, REST and SOAP/XML WS, etc. COMBINE Library of functions for transformation, normalization and data cleansing Graphical drag&drop tool for combining views using relational algebra Extended relational model supports hierarchical data sources CONNECT Normalized Views of Disparate Data JDBC WS U BIG EXCEL DISCOVERY GLOBAL SEARCH LINKED SERVICES META EXPLORATION... JDBC WS BIG EXCEL SOURCES Databases & Warehouses Enterprise Applications Cloud / SaaS Applications XML, Excel, Flat Files Big Data, NoSQL Collaboration Web 2.0 PDF, Docs, Index, Emails More Structured Less Structured 2
Data Access Point-and-click adapters to Enterprise, Web / Semi-structured and Unstructured data in any format or location. High-performance optimized adapters for all the main sources. Bi-directional read/ write. Relational databases such as Oracle, DB2, Sybase, MS SQL Server, MySQL, PostgreSQL, Informix, MSAccess, Apache Derby; including graphical introspection of tables, views and stored procedures. In-memory relational databases such as SAP HANA and Oracle TimesTen. Parallel databases and appliances such as Teradata, Netezza, Oracle Exadata, Sybase IQ, ParAccel, Greenplum. Multidimensional OLAP engines such as SAP BW, MS SQL Server Analysis Services, Mondrian, Essbase. Enterprise Applications such as Oracle E-Business suite, SAP R3 / ECC, Siebel, Peoplesoft, Salesforce. Big Data / NoSQL databases such as Hadoop (Denodo is certified with multiple vendors such as Cloudera, Hortonworks, etc., included Kerberos-secured clusters, and multiple technologies such as Hive, Impala, sequence/maps/key-value/avro files, HDFS, Map/Reduce, HBase), Mongo DB, Couch- DB, Neo4J, MarkLogic. SOAP / REST Web Services and data feeds, including support for XML, RSS, ATOM, JSON and CSV formats. Flat and binary files: CSV, pipe-delimited, MS Excel (xls & xlsx), MS Access, XML, JSON. Regular expression-parsed flat files. All files can be locally accessible or in remote filesystems, through FTP/ SFTP/FTPS, and in clear, zipped and/or encrypted format. Connect and introspect LDAP and Active Directory services as source data (apart from security access). Cloud, SaaS sources including Salesforce, Google, Amazon, LinkedIn, Facebook, Twitter via APIs with simplified OAuth integration (1.0, 1.0a and 2.0); Any Website, Form, WebApp via browser automation. Semantic repositories in Triple Stores / RDF accessed through SPARQL endpoints. Mainframe / Legacy connectivity through third party adapters. Sophisticated tools to expose Web, Semi & Unstructured data as virtual relational data/service. The Denodo Platform also provides unlimited extensibility through a custom connector API that allows users to create connections to other information systems not accessible out of the box. Web, Semi/Unstructured Data Integration Automates extraction and integration of less-structured data from Web sites, forms, applications, PDF, MS Word. Flexible Automation of Web integration processes modeled using a library of pre-built templates and components for workflow, navigation, extraction, and structuring of Web and semi-structured data. Automatic Web Navigation: Transparently handles dynamic sites with AJAX, JavaScript, authentication, secure servers, sessions, cookies, popup windows or sequences. Data Extraction: Example-based heuristic extraction of semi-structured content into defined schema from dynamic Web 2.0 content, PDF forms, documents. High Performance: Parallel Execution using MS IE or Denodo browser pools; Smart browsers load only optimal navigation sequences to minimize memory usage. Content Integration: CMS, file systems, Sharepoint, Email servers, knowledge base, index, ontologies. Search/Index: Built-in tools expose unstructured data as inverted indexes. 3
Data Integration Transform, clean, combine disparate information into virtual canonical business views. Extended Relational Model: Natively represents and seamlessly combines relational, hierarchical, NoSQL and semantic representations of data into abstracted relational model; Takes into account source capabilities and constraints for query optimization. Metadata-driven Integration: Graphical wizard-driven UI and tools to introspect siloed data, create unified logical data views, transform, cleanse, group, aggregate, modify output, define data workflow, etc. Top-Down and Bottom-up Modeling: Interface Views allows specifying contract-first schema for high level business entities, which later can be linked to bottom-up implementation views and become fully executable. Modeling utilizing views created from sources, other virtual views, imported from enterprise modeling tools or industry data models. Linked Data / Associations: Establishes relationships between virtual data entities using referential constraints (Primary Keys, Foreign Keys, Varying levels of Multiplicity, Conditionals); Associations can be introspected from external metadata/model tools (ERWin, ERStudio, Rational...) for governance, browsed/ traversed by users manually, or programmatically by applications using navigational queries. Global search: Data discovery and exploration is enhanced through data and metadata searches across all enterprise data sources, enabling cross-silo search from an intuitive graphical interface that exploits the linked data services capabilities offered by the Denodo Platform. Transformation, Quality, Matching: Large library of built-in functions for SQL, XSLT, XQuery, XPath, Java, and Semantic transformations with ability to plug-in external tools or custom functions in every step of query execution lifecycle. Built-in data workflow tool for complex transformation / data quality processes. Support for arbitrary numeric precision for scenarios that require maximum accuracy of results. Semantic Integration: Transform, relate and merge unstructured data with structured using text mining, taxonomy filters, semantic tools like textual similarity. Data Modeling to shape output, flatten or create new hierarchical data structures to match target schema. Bidirectional integration supporting read, write and transactions (2-phase commit and XA transactions). Data Provisioning Deliver virtualized data as SQL views, data services, portlets/widgets to suit every need. Bespoke, optimized SQL views of unified virtual data layer accessed via ODBC, JDBC and ADO.NET. Linked Data Services: Fully RESTful interface exposes enterprise data assets as a unique URI accessible via search, browse or query drill-down using standard Web protocols, interfaces, and HTTP verbs (GET, POST, PUT, DELETE); Output in XHTML, XML, JSON for human and/or machine consumption. Data Services for SOA: Publish SOAP web services that conform to contract-first schema; Support XSLT tools, WS-security, ESB/JMS access, SOA catalog. Publish SharePoint WebParts, Java portlets, AJAX widgets, RSS to use in major portal/mashup servers. Publish and subscribe to data via JMS (with JSON support), including MQ- Series, Sonic MQ, Active MQ. Deliver data using semantic formats: Ability to answer SPARQL queries returning RDF (via D2R mappings). 4
Scheduler & Data Orchestration Enable complex, hybrid integration processes, integrating ETL within a broader Data Virtualization approach. Delivers reliable, high performance virtual data services through balance orchestration of real-time, cached, scheduled batch or hybrid execution modes. Materialize unified data views by exporting to databases, warehouses, flat files, Excel, XML, etc. using built-in Denodo Scheduler or external ETL tools. Support persistent tasks through continuation of query after restart, transparent retries in case of failures, intermittent Web access or human intervention. Task dependencies allow linked tasks to start only when others have appropriately finished. Data Governance Data and Data Model assurance to ensure consistent, meaningful data services to users. Metadata Repository with multiple visualization (tree view, linked data, attribute origin, source impact, catalog search, etc.); Includes metadata API, model export and introspection from external systems. Discover, introspect, and transform source metadata. Refresh or propagate source metadata when it changes. Flatten or create new hierarchical data structures. Contract-first/top-down and bottom-up modeling for greater flexibility, parallel work, change governance. Data Lineage, Change Impact, Dependency analysis, metadata migration tools, version control, granular policy-based tiered security deliver a controlled data virtualization and enterprise data services capability. Development Easy to use, enterprise-class tools. Graphical, wizard-driven UI for all functions and modules. Also, documented scripting for advanced users. Platform extensions enabled via Eclipse IDE plug-in for developing, testing, debugging and deploying custom functions, connectors and stored procedures. Integration with Version Control Systems (checkout, commit, update of virtual entities) with automatic dependency control directly from within Denodo. Graphical support for lifecycle process management (development, staging, production) or geographically dispersed environments. 5
CONSUMERS Enterprise Applications, ESB Reporting, BI, Portals Mobile, Web, Users data virtualization OPTIMIZER HIGH AVAILABILITY AND CLUSTERING SCHEDULER CACHE DISTRIBUTED TRANSACTION MANAGER PERFORMANCE AND SCALABILITY ENTERPRISE-GRADE MANAGEMENT GOVERNANCE DEPLOYMENT LIFECYCLE MONITORING AND AUDITING SECURITY SOURCES Databases & Warehouses Enterprise Applications Cloud / SaaS Applications XML, Excel, Flat Files Big Data, NoSQL Collaboration Web 2.0 PDF, Docs, Index, Emails More Structured Less Structured Performance, Scalability, Reliability Integrated query optimization with cache and ETL support for agile high performance. Flexibility to scale in steps to enterprise class needs. Workload management. Query Optimization: Intelligent and automated techniques (cost & rule-based using information from source introspection, query capabilities & constraints) including asynchronous delivery, query delegation, automatic parallel access to data sources, support for high-performance sub-queries, automated data movement between sources, access to data sources using native high-performance protocols, automatic query rewriting for restructuring SQL sentences into more optimal form, etc. Tunable optimization: Visual trace to inspect all details of query, before, during and after execution by view and by source with manual plan override of automatic strategy selections for optimized query. Advanced Multi-mode Cache: Configurable view-by-view with full, partial or incremental loads, with Intelligent Query Matching to answer queries directly from cache. Supports push-down of complex queries to cache for high-performance. Cache refresh can be based on schedule, event-trigger or expiration-time. Disk and in-memory databases supported for cache. Built-in Scheduler and external ETL support for data prefetch to cache or materialized source improves query delegation and balances source latencies. 6
Database indexes and primary keys are introspected from sources and exposed to consuming applications and reporting tools for high-performance. Ability to expose these in virtual and cached views also. Web Services and Cloud optimizations include automatic unfolding of single queries into multiple parallel accesses, advanced session-reuse mechanisms, etc. High Availability: Support clustering, load balancers and federated deployments to distribute server workload and share metadata across virtualization layer. Robustness: Automated swapping, support for XA compliant endpoints, fully transactional catalog storage with support for different levels of locking, custom policies for workload management. Support for horizontal and vertical partitioning of data. Queries that are issued against partitioned data sets are split into subqueries; the execution engine then chooses intelligently which subqueries must be executed and which subqueries can be ignored, retrieving only the required data for optimal performance. Security & Management Secure and differentiated role-based access control to data services and sources; Enterprise tools for management, monitoring dashboards. Role-based authentication and authorization using LDAP, Active Directory and/or built-in user directory. Security & Workload management using custom policies to restrict or constrain service-levels and workload based on external factors (source load, network, time); Integrates with external systems for security, access control, SLA policy management. User-defined memory consumption limits at the view level for improved resource management. Support for fine-grained security at virtual data view, column, or row level and for masking sensitive data. Also supports pass-through authentication to leverage security infrastructure in the datasources. Authentication of published Denodo Web Services with HTTP, NTLM and WS-Security protocols. Support for importing and exporting encrypted data; Communication between modules can be encrypted and authenticated using SSL also. Firewall support: All components can be distributed in different network segments. Denodo Dashboard with multiple tools for lifecycle management of servers, clusters and HA configurations. Metadata migration tools. Denodo Monitor to view real-time queries and historic audit/logs; Support JMX, SNMP, WS-Management standards to integrate with leading external monitoring tools including HP Openview, IBM Tivoli, Microsoft RM, Nagios, etc. Denodo Platform Requirements Denodo Platform is a complete data virtualization and enterprise data services solution that runs as a stand-alone server and includes several components: design tools, connectors, virtualization server, Eclipse IDE, administration tools, monitoring dashboard, etc. Denodo Platform Control Center provides a single-point of management of the various components. Operating System: Microsoft Windows (32-bit and 64-bit platforms) Windows XP, Windows Vista, Windows 7, Windows 8 Windows Server 2003, Windows Server 2008, Windows Server 2012 7
Linux (32-bit and 64-bit platforms) Ubuntu Linux CentOS Linux Red Hat Enterprise Linux (RHEL) SuSe Linux Enterprise Server (SLES) UNIX (64-bit platforms) Sun Solaris Directory Services: LDAP v3; Microsoft Active Directory 2003, 2008; Denodo embedded service. Browsers: Internet Explorer 10.x, 11.x; Denodo Browser. Deployment Patterns Denodo Platform can be deployed in the data center or private cloud (either in physical or virtual servers) or in the public cloud (e.g. Amazon EC2). Strong Web Services support and available automation scripts make Denodo very cloud-friendly. Multiple Configurations supported including: Basic single server configuration; Basic server with proxy configuration for firewalls; High Availability Clusters with load balancer in either active-passive (hot standby) or active-active (horizontal scaling) configurations; Clusters with shared cache or distributed local cache; Geographically distributed servers environments; Multiple Denodo instances in peer-topeer or multi-layered environments. Denodo Data Virtualization can also be embedded in other product architectures (for example Reporting & Dashboards, Single-View applications, Information Services delivery platforms, BPM/Workflow). Denodo In Your Enterprise Most organizations have siloed data sources that multiple users and applications are trying to access. This has created multiple point-to-point integrations and the acquisition of multiple integration middleware - ETL for data warehousing and business intelligence, BPM/ESB for process integration, portals and file sharing systems for collaboration and teamwork, and search and indexing for unstructured content. Below is a high-level view of how Data Virtualization fits in your enterprise information strategies and the tactical and strategic roles it plays. At the tactical level, Denodo adds to your integration toolkit the capability for agile, real-time data integration. At a strategic level, Denodo enables a truly Unified Data Layer that abstracts and virtualizes all enterprise data assets into canonical business views that can be exposed as Linked Data Services or bespoke SQL. In this way it integrates existing middleware and data tools (modeling, data quality, etc.) to create a high performance and agile information architecture to serve all of your applications. Denodo is the leader in Data Virtualization. The company provides unmatched performance and unified access to the broadest range of enterprise, Big Data, cloud and unstructured sources. Denodo offers the most agile data services provisioning and governance solutions at less than half the cost of traditional data integration offerings. With reference customers in every major industry, Denodo users have gained significant business agility and ROI by creating a unified virtual data layer that serves strategic enterprise-wide information needs for agile BI, Big Data analytics, web and cloud integration, single-view applications, and SOA data services. Founded in 1999, Denodo is privately held. Visit www.denodo.com Email info@denodo.com twitter.com/denodo North America & APAC (+1) 877 556 2531 EMEA (+44) (0) 20 7869 8053 Iberia & Latin America (+34) 912 77 58 55