Business Intelligence on a Budget: Open Source BI Paul O Rorke
Goals provide background & motivation discuss business models & licenses survey open source BI compare open versus closed BI identify trends provide conclusion & suggest next step
Background & Motivation: Why OSBI? Viva Software Libre! it s free low cost it s free: provides greater control over the code more knowledge (e.g., can examine source) can adapt, include, and use can redistribute
Background & Motivation: Why Now? OSBI is at a tipping point: open source is an integral part of most companies software and well trusted now OSBI has reached a level of maturity comparable to many of the earlier open source successes OSBI is mega-trend number one of nine listed by Intelligent Enterprise for 2009 Economic hard times will encourage or force many companies to save money OS is achieving world domination: according to Gartner, 85% of enterprises use it and the remaining 15% plan to do so in the next year. Talend: performance, usability, avoid lock-in, no licensing costs, source code access
OSBI Business Models give away free version, eliminate costly software licenses and sell (typically by subscription)... advanced features / pro & enterprise editions consulting, services, support maintenance access to hosting & platforms
Open Source Licenses I AM NOT A LAWYER Goal: provide a quick overview of OS and OSBI licenses What are the (major) different licenses? How are they different? What are the important issues?
OS Licenses: Key Concepts Reciprocal: give freedom to licensee Copyleft: but also bind the licensee (usually with the goal of preserving and propagating freedom) strong: all code that is based on, adapts, or links with open source must be open weak: adaptations must be open but linked software need not be open Richard Stallman s Free Software Foundation s copyleft
OS Licenses: + Viva Software Libre! increase developers and licensees freedom to study, adapt, and redistribute flexible: allow modifications support purchases and subscriptions
OS Licenses: Examples Restricted : Gnu (GPL) Examples: Emacs, Linux, & MySQL (GPLv2) Less Restricted : Mozilla (MPL), Eclipse (EPL) Examples: Eclipse (EPL), Firefox, Java (MPL) Free : Apache, Berkeley (BSD) Examples: Apache, PostgreSQL (BSD) Whether a license is less or more restricted depends on POV: licensor versus licensee. strong copyleft / GPL: Good if building something with other GPL software. Good for one-off projects if client doesn t care if software is made open. Good for world domination. weak copyleft / LGPL Good if you want to build a proprietary system on an open library.
OS Licenses: - Many companies insist on or prefer: indemnification SLA security support the right to include OSS without having to open their own code
Goals provide background & motivation discuss business models & licenses survey open source BI compare open versus closed BI identify trends provide conclusions & suggest next steps
Survey OSBI reporting & dashboards ETL & integration databases and data-warehouses OLAP analysis languages & tools data mining suites CRM, sales automation add or subtract?
Reporting & Dashboards BI Reporting Tools - BIRT (Actuate, Eclipse) JasperReports (JasperSoft) JReport (JInfonet) Palo (Jedox AG) OpenI (originally Loyalty Matrix, now OpenI) Pentaho Reporting (includes JFreeReport, Pentaho) OpenI: visualize data from OLAP, RDBMS, and data mining tools, and intuitively build and publish interactive reports, analyses, and dashboards. Most of the pure dashboard companies on the web appear to be very small and do consulting and custom design mostly for small companies. Excluded: DataVision: Java-based but uses JRuby for formulas. Seems too slow, small & not enterprise level. RLIB: C based although it has bindings for Java, PHP, etc.
ETL & Integration Jitterbit, easy to use. SaaS and SOA oriented Snaplogic, really simple OSDI for SaaS Web-based, including IDE Python Talend (also offered by JasperSoft) Eclipse-based IDE open studio generates Java or Perl Snaplogic - good example: mashup of linked-in and salesforce. According to a Talend Survey of 1000 enterprises reported by ComputerWorld UK in InfoWeek 2/6/2009, 31.2 percent of respondents use open source tools in combination with commercial applications for data integration... The key drivers for using open source tools were ease of use (59 percent), performance (53.9 percent), and no vendor lock-in (42.5 percent), followed by licensing costs with only 42.1 percent respondents performance, usability, avoid lock-in, no licensing costs, source code access
Databases & Data Warehouses Column-oriented databases: LucidDB (LucidEra) Infobright MonetDB
OLAP Julian Hyde s Mondrian now part of Pentaho
Analysis Tools Query and Reporting Tools OLAP based Languages for custom analysis: R replacing S in use at Facebook, Google, Spotfire (TIBCO)
Data Mining Weka (now part of Pentaho) commonly used machine learning algorithms (e.g., classification rule & tree learning, clustering, Bayes nets) Java Mahout Java, on the Hadoop MapReduce platform Weka rhymes with Mecca. Trying to schedule someone from Mahout (Jeff Eastman) for later in the year.
Suites JasperSoft Palo Pentaho SpagoBI ignoring Palo as it is relatively small: started with reporting and expanded to OLAP server are there more?
Suites: JasperSoft Components: JasperReports - for Java developers JasperStudio (aka/fka JReport) JasperServer - query & reporting server for end user JasperAnalysis - includes OLAP data analysis JasperETL (Talend) - for DBAs & developers
Suites: Pentaho Components: Reporting, Data Integration (Kettle), OLAP Server (Mondrian), Data Mining (Weka) Platform... engine: core, security, & services repository, & UI foundation
Suites: SpagoBI by Italian IT company Engineering Ingegneria Informatica (6k employees) Components: JasperReports Mondrian Talend
Customer Relationship Mgmt. & Sales Automation 369 Sourceforge CRM projects!!! SugarCRM (30M downloads, PHP) Splendid CRM (.Net) Hipergate (Java) Compiere (Java & Javascript) Concursive (backed by Intel Capital, Java) Top 5 as of December 2008 according to InsideCRM / Sourceforge. Database support differs, for example Compiere uses PL/SQL and Oracle. Some others (e.g., Hipergate) are database independent.
OSBI Licenses GPL Pentaho (Platform v2 - GPLv2) SugarCRM (Community ed. - GPLv3) LGPL: SpagoBI MPL: Jitterbit (JPL ~ MPL), OpenI (MPL1.1) Pentaho claims they went to GPLv2 rather than GPLv3 in their v2 because they wanted to make it easier for others to embed other GPLv2 software like MySQL with Pentaho
Closed versus Open BI CRM: Salesforce.com Reporting: BO/SAP Crystal, TIBCO Spotfire ETL, Integration: Datastage, Informatica column oriented databases: Sybase IQ (first), Vertica IBM, Oracle, MS expected to follow Data Mining: SGI?, IBM?, Oracle, SAS, Fair Isaac
Pentaho is a good example of a well developed OSBI Platform while Salesforce s force.com is a good example of a CBI platform. Trends Self service / Simplification Rich Internet Applications (RIAs) Expansion (to Suite; to Platform) Platforms Cloud Computing Everyone is trying to simplify and move their apps from developers to end-users (e.g., Snaplogic) Trend toward RIAs has been underway for at least six years. Spotfire is a good example. Many OSBI vendors start with a single offering (e.g., for reporting) and expand out to cover more of the BI spectrum. On the CBI side, Salesforce started with CRM & Salesforce automation and is expanding out.
Conclusion / Next Step OSBI is bigger and better and may be a good choice for your next project consider OSBI alternatives to custom development or closed source BI
References & Sources 2006, Open Source BI, Ventana Research 2008, Fine But Not Fine-Tuned Yet InformationWeek.com 2009, Nine BI Megatrends for 2009, Intelligent Enterprise, http://www.intelligententerprise.com/showarticle.jhtml? articleid=212700482 Free Software Foundation: http://www.gnu.org/copyleft/ Wikipedia: Copyleft, FSF, GPL, LGPL
Resources SDForum archives contain presentations by or on Actuate Jaspersoft LucidEra Mondrian Snaplogic SugarCRM
OSBI Companies Actuate Infobright JasperSoft Jedox AG JInfonet Jitterbit LucidEra Kickfire Pentaho Snaplogic SugarCRM Talend
Acknowledgments thanks to Sonja London for her contribution to the title thanks to Richard Taylor for contributing info about databases and licenses