Open is as Open Does: Lessons from Running a Professional Open Source Company Leon Rozenblit, JD, PhD Founder and CEO at Prometheus Research, LLC email: Leon@PrometheusResearch.com twitter: @leon_rozenblit
Financial Interest Disclosures Founder and CEO of Prometheus Research, LLC
Prometheus: Professional Open Source for Research Data Management Battle cry: Empower the analyst! Battle: Empower analysts to use relational databases to organize, use, and repurpose complex data. We care about: - Efficiency - Adaptability - Openness - Effective communication - Knowing what we don t know What kind of company are we? A services company that builds products for internal use (to reduce costs of service delivery) We hope to become a product company that builds products for others Our business model looks like a consulting firm, except we invest in R&D About 50 employees, headquartered in New Haven, CT
Prometheus Customers / Solutions Databases for Research : Academic Research Centers/Labs Academic Research Hospitals Professional Medical Societies Funders of Biomedical Research Dynamic Research Data Repositories Data Integration Pipelines Procedure/Disease Registries Scientific Asset Management Systems Data Collection and Curation Tools Data-marts and analytics
Serving..
How we fund OS development Direct feature funding from customers satisfied customer Grants $$ formerly known as profit! accelerates value delivery pays for service Professional Services Professional services delivers value to customers; features cost money, but increase value and accelerate delivery delivers product enhancement funds feature request R&D
Integrated Data Management w/ Open-Source RexDB
High-level Architecture for Institutional Data Integration
Easier Configuration and Interaction Instant RESTful web APIs Query and report builders Configurable screens Configurable data models Customizable web applications Efficient data transformations RexDB Simplified Navigational Representation Navigation via links Transformation via data flows Configuration via compositional ontologies HTSQL YAML Relational Models Structured (Tabular) Data
YAML:YAML Ain t Markup Language YAML is a human friendly data serialization standard for all programming languages. Developed by Clark C. Evans (partner at Prometheus) starting 2001 Large community: tens of thousands of installs. 1.6M Google Search results. Tens of Millions of end users License (for LibYAML*): MIT *Written and maintained by Kirillo Simonov, Prometheus Research, LLC http://yaml.org/
HTSQL: A Navigational Query Language http://htsql.org/ Search HTSQL is a comprehensive navigational query language for relational databases. HTSQL IS A COMPREHENSIVE NAVIGATIONAL QUERY LANGUAGE FOR RELATIONAL DATABASES. NEWS DOCUMENTATION GALLERY DOWNLOAD COMMUNITY Developed by Clark C. Evans and Kyrillo Simonov (both at Prometheus), current implementation starting in 2010 HTSQL is designed for data analysts and other accidental programmers who have complex business inquiries to solve and need a productive tool to write and share database queries. HTSQL is free and open source software. Overview a gentle introduction to HTSQL Handbook install, configure, and use HTSQL Small community: hundreds of installs. 15.5K Google Search results.thousands of end users Features Tutorial learn HTSQL by example Reference HTSQL syntax, types, and functions License: dual licensed AGPLv3 (OSI Compliant) + Permissive Use license that is like a BSD license, but only for Open Source RDBMS (interesting: NOT OSI Compliant) Advanced Query Language HTSQL is a complete query language featuring automated linking, aggregation, projections, filters, macros, a compositional syntax, and a full set of data types & functions. Web Service Integration HTSQL is a web service that accepts queries as URLs, returning results formatted as HTML, JSON, CSV or XML. With HTSQL, databases can be accessed, secured, cached, and integrated using standard web technologies. Development Environment HTSQL includes a command line and web based query editor with syntax highlighting, context-sensitive completion, and error messages with tips and suggestions. Relational Database Gateway HTSQL requests are translated to efficient SQL queries. HTSQL supports different SQL dialects including SQLite, PostgreSQL, MySQL, Oracle, and Microsoft SQL Server. Embedded Reporting HTSQL is a backend framework supporting visual dashboard and reporting tools. HTSQL can be included in client-side Javascript or server-side Python applications. HTSQL plugins can provide domain specific customizations. Communication Tool HTSQL is used for collaboration among business users, data analysts, and application developers. HTSQL queries can be emailed, embedded in reports, and included in feature requests.
RexDB: Research Exchange Database Teams Repositories Create owner/r rexdb prometheus Share Clone Branch A platform that allows analysts to configure relational databases and powerful web applications to manage biomedical research data. Very new: current implementation publicly released Mar 31! Tiny community: dozens of installs. 4.6K Google Search results. Hundreds of end users. License: AGPLv3 (OSI Compliant) Overview Source Commits Branches Pull requests Downloads RexDB Demo Installation Directions General Installation Install App Install rex.setup pip install rex.setup Install rex.rexdb from a package server pip install rexdb==3.1.0 == or == Install rex.rexdb from source hg clone ssh://hg@bitbucket.org/prometheus/rexdb pip install -e rexdb Choose if you want to start an instance of RexStudy, RexRegistry or RexSurvey Follow the choosen path below: RexStudy Specific Application Deploy DB https://bitbucket.org/ prometheus/rexdb rex rdoma-deploy rex.study --set=db=pgsql:<db_name> rex rdoma-deploy rex.explore --set=db=pgsql:<db_name> rex rdoma-deploy rex.formbuilder --set=db=pgsql:<db_name> rex rdoma-deploy rex.roads --set=db=pgsql:<db_name> rex rdoma-deploy rex.study_demo --set=db=pgsql:<db_name> (optional) HTTPS Owner Access level Type Last updated Created Size Membership htt 1 Branch T Invite user Send invitat Add User Use HTSQL to add a user (Example of user = rexdb.charles with email charles@rexdb.com) htsql-ctl shell pgsql:<db_name>-e tweak.etl Run the following HTSQL commands: /merge(user_group:={name:='rexdb',module:=[['rex_study'].core]}) /merge(user:={user_group:=['rexdb'],name:='charles',email:='charles@rexdb.com'})
The Right License Type Permissive (BSD/MIT/Apache) Copyleft (GPL/AGPL) What Do whatever you want with the code; don t sue us. Use and distribute the code however you want; but if you make a derivative work you can ONLY distribute it under a similar license Why Maximum adoption Some protection against a competitor licensing your work; may provide incentive to contribute back Why not Competitors can do whatever they want with your code Reduces adoption; need to worry about copyright assignment for contributors; risk of getting forked (if you have proprietary extensions)
Community Norms for OS Projects Public code repository usually on Github or BitBucket OSI-blessed license Public issue tracker Mailing list + irc channel on freenode.net Project Website; documentation and installation instructions Extra credit: get added to a standard Linux distribution (RedHat, Debian) Special Challenges of Copyleft licenses Copyright assignment by external contributors (for copyleft only, and only if you plan proprietary extensions) Issues solved by contributor covenants Can you have an open-source base product but keep the sweet features in a proprietary version? Open core is frowned upon creates perverse incentives and increases probability of getting forked
Real Openness: Benefits and Costs Lowers barriers to adoption Improves credibility Aligns community Helps resolve IP issues with customers Improves code quality Hire better developers?attracts external contributors? Your dirty laundry is always on full public display Managing the community: don t get forked Quality costs money Must support external collaborators Your competitors can monitor everything you re doing and can use it at any time
Would we do it again? Depends on the kind of software Small utility libraries that are broadly useful absolutely Large application frameworks maybe DevOps automation maybe Front ends specialized to specific domains only for large market f(#users, #contributors, clarity of problem definition) Potential Contributors 100000 10000 1000 100 10 Potential Users vs Potential Contributors Utility Library App Framework Domain-Specific App BSD/MIT GPL Proprietary 1 100 1000 10000 100000 1M Potential Users
Thank You! The entire Prometheus team Clark Evans my personal open source guru Charles Tirrell Product Manager for RexDB Frank Farach Product Manager for RexMart David Voccola Owen McGettrick