The KPMG-NL Big Data team 16 March 2015



Similar documents
Oracle Endeca Information Discovery Integrator

BMC Remedy Action Request System 7.0 Open Source License Agreements

Open Source Software used in the product

Open Source Used In Cisco IronPort Encryption SDK

Boost Libraries Boost Software License Version 1.0

Adobe Connect Collaboration SDK Third Party Notices and/or Additional Terms and Conditions

The terms "reproduce," "reproduction," "derivative works," and "distribution" have the same meaning here as under U.S. copyright law.

Open Source Used In Cisco Instant Connect for ios Devices 4.9(1)

Open Source Used In Cisco TelePresence TC Console TC7.1

Third Party Terms. Third Party License(s) of Terracotta Ehcache Opensource (TOE) Version :00

SOFTWARE SOFTWARE ACKNOWLEDGEMENTS. SecuriSync. SecuriSync

Open Source Used In LDSF 1.7.2

Intel Corporation Software Grant and Corporate Contributor License Agreement ("Agreement")

Allscripts Professional EHR

Privacy Policy and Terms of Use

Terms of Use The Human Face of Big Data Website

THE P4 LANGUAGE CONSORTIUM MEMBERSHIP AGREEMENT

CKEditor - Enterprise OEM License

The MIT, BSD, Apache, and Academic Free Licenses

Open Source Used In Cisco D9865 Satellite Receiver Software Version 2.20

RTI Monitor. Release Notes

SOFTWARE ACKNOWLEDGEMENTS

Release Notes for CounterPath Bria Android Edition CounterPath Bria Android Tablet Edition Version 3.2.0

Bosch Video Management System

FDA Pre-Submission Cover Letter

Open Source Used In T28.12CP2 Client Component (Chat, Poll, QA, FT, FB, Notes, RP)

Open Source Used In Cisco WebEx Media Server 1.5

Vertica Third Party Software Acknowledgements HPE Vertica Analytic Database. Software Version: 7.2.x

Open Source in the Real World: Beyond the Rhetoric

CA Workload Automation Agent for Informatica

Open Source Software Declaration Cytell Image Cytometer

ACM Publishing License Agreement

User Agreement. Quality. Value. Efficiency.

Universal File Mover Status Monitor Installation and Operation Manual

Symantec Workflow 7.5 SP1 Third-Party Legal Notices

This agreement applies to all users of Historica Canada websites and other social media tools ( social media tools or social media channels ).

Object Level Authentication

GPL, MIT, BSD, GEHC (and me)

An Introduction to Open Source Software and Licensing

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT

Contents. Pentaho Corporation. Version 5.1. Copyright Page. New Features in Pentaho Data Integration 5.1. PDI Version 5.1 Minor Functionality Changes

Adobe Connect Add-in for Microsoft Outlook Third Party Software Notices and/or Additional Terms and Conditions

SOFTWARE LICENSE AGREEMENT

Issues in Software Licensing, Acquisition and

Find the needle in the security haystack

Gerrit and Jenkins for Big Data Continuous Delivery. Santa Clara, CA, September 2-3

List of open source components used by Intel System Management Software

Adobe LeanPrint Dashboard Software Notices and/or Additional Terms and Conditions

Mayfair EULA for Journal Office

Open Source Used In orion sso 1.0

AAUW Site-Resources Website Services Agreement. Contact Information. Website Information

CSPA. Common Statistical Production Architecture Descritption of the Business aspects of the architecture: business models for sharing software

WEBSITE TERMS & CONDITIONS. Last updated March 27, 2015

CKEditor for Drupal License Agreement

RTI Administration Console Release Notes

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Backup Exec Third-Party Information. Third-Party License Agreements

Symantec Data Center Security: Server Advanced Third-party License Agreements. Version 6.0

Appendix D. Rocks Copyright and Trademark

EXEDE (R) ANALYTICS APPLICATION END USER LICENSE AGREEMENT

ACOT WEBSITE PRIVACY POLICY

AMERICAN INSTITUTES FOR RESEARCH OPEN SOURCE SOFTWARE LICENSE

PERFORCE End User License Agreement for Open Source Software Development

Extension Module (XMOD): SiteMap Generator

Advanced Computing Tools for Applied Research Chapter 4. Version control

RELOCATEYOURSELF.COM B.V - TERMS OF USE OF SERVICES

90% of your Big Data problem isn t Big Data.

Individual Contribution License Agreement Strategy. Mark Radcliffe DLA Piper Silicon Valley Office

SAP HANA Big Data Intelligence rapiddeployment

WEBSITE DEVELOPMENT STANDARD TERMS AND CONDITIONS

Scan to SharePoint. Administrator's Guide

MTConnect Institute Public Comment and Evaluation License Agreement

343 Industries Gets New User Insights from Big Data in the Cloud

AXIS SOFTWARE LICENSE AGREEMENT

Copyright Sagicor Life Insurance Company. All rights reserved.

The Corporate Counsel s Guide to Open Source Software Policy Implementation

Crestron VMK-WIN TouchPoint Virtual Mouse & Keyboard Software for Windows Installation Guide

FME SOFTWARE LICENSE AGREEMENT

Software Continuous Integration & Delivery

PHOTOGRAPH LICENSE BETWEEN YOU AND DEATH TO THE STOCK PHOTO

SOLARWINDS, INC. ipmonitor 8.0 MANAGER END USER LICENSE AGREEMENT REDISTRIBUTION NOT PERMITTED

TERMS OF USE TomTom Websites

TERMS OF USE. Last Updated: October 8, 2015

MATLAB as a Collaboration Platform Marta Wilczkowiak Senior Applications Engineer MathWorks

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Transcription:

The KPMG-NL Big Data team 16 March 2015

Core analysis tools SQL Anaconda SciPy Matplotlib CERN C++ for advanced data science Statistical tools widely used in social sciences The development line

ETL ETL RAW ETL XML CSV WEB Video Audio

KPMG development Already existing open source Code Repository Git server (e.g. GitHub for OSS components E.g. Apache server - Package Repository Add-on KPMG services + tools + libs Add-on open-source services Ambari core services Gitlabs LDAP Hadoop Core analytics MongoDB JBoss Storm Future tools TWiki StormSD Archiva Jenkins Apache Sonar Hive Ganglia Ambari Open source component adopted as installation platform Service deployment model KAVE gathers together a toolkit of pre-existing third-party open-source software components. These software components are governed by their own licenses which KAVE installer does not modify or supersede, please consult the originating authors. These components altogether have a mixture of the following licenses: Apache 2.0, GPL 2.0, AGPL and LGPL, ZPL, MIT, PSF, BSD and some BSD-like simple licenses. For scipy and ipython see: http://docs.continuum.io/anaconda/licenses.html.

Topic Impact Chance Mitigation Insertion of malicious code by malicious third-party Reputational risk if software fails to perform adequately Reputational risk by association with other open-source providers Risk of withdrawal or lack of maintenance of baseline product Risk of using as-is limited software if it includes infringing content High Zero Select for open-source software with a wide user base or security critical function, and this will then have been scrutinized by thousands of people, experts in their field. Do not initially permit contributions to our software directly without our own review process. In principle this is much harder to do in OSS than in proprietary software. Medium Low Legal aspects can be handled with explicit limited liability licensing and explicit contracts should engagements revolve around KAVE. We use this software ourselves, for our own engagements, and at each stage we use our professional judgment about the performance of the tools included: and so we would be the first to notice shortfalls in functionality. Additionally we can gain reputation by contributing to existing open-source products with bug reports and feature requests. Low Low Installing an open source product does not in-and-of-itself associate us to any individual or entity which contributed to that product, however it is necessary to consider carefully any current reputation of organizations so associated and we use our professional judgment based on known software quality and company history. Additionally we can gain reputation by becoming part of the community. Medium Low The KAVE understands that the tools needed for Big Data will evolve with time. Should a more-widely used alternative come along at a later date we will adopt it. For now we choose tools which are considered mainstream and in heavy use, with an active user base and active contributions. In our opinion historically the risk of OSS is smaller than the withdrawal risk of proprietary software. High Low The Apache foundation has very strict rules for becoming an Apache product which include verifying existing conflicts such as copyright infringement. We base our installer on Ambari which was adopted as an Apache product, and prefer Apache products over others if there is a possible choice. However, we recognize that historically speaking organizations have sued individuals for unknown/unintentional or debatable infringement. Again, by selecting products which are already in use by large companies we can be assured that the risk must be minimal here. Reverse reputational risk by generating revenue from an Open Source product without contributing High High (if we don t release the software) Ethically speaking, if KPMG is using an open-source product and generating revenue off other s work, our team feels we are obliged to contribute to the community in some way, and so we intend to release our platform as open source, under an Apache-2.0 license

Branch-based development Test-driven development Services we add Feature-based releases Merging by central authority Updating paths Integration testing Self-hosting Agile development Prioritization by users

Hadoop Storm Named project leaders Web servers 150 projects Mesos 835 committers Facebook Twitter Defined project structure Yahoo Billions of end users of their products Google Strict consensus-driven project management Establishment Candidate Acceptance Podling engagement Project rejection temination rejection Boilerplate for each file includes copyright owner: Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/license-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. Full license available at http://www.apache.org/licenses/license-2.0 Apache License, Version 2.0 Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, nonexclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.

The development line Core analysis tools SQL Anaconda SciPy Matplotlib CERN C++ for advanced data science Statistical tools widely used in social sciences

RELEASING SPRINTING PLANNING Added to backlog Bug report Improve feature New feature Categorized Categorize Trivial Not Trivial Review with team Schedule in sprint Consider TDD Define priority Identify dependencies New features: 1. Exploratory install 2. Product demo 3. Assess against KAVE principles Develop Review Integration test Merge Develop on most appropriate branch 1. Implementation 2. Developer tests 3. Automated tests a. Fast review stage b. Fast (unit) testing stage c. Trivial merge Development loop Diverge a feature-specific branch (don t ever develop on the master) 1. Implementation 2. Developer tests 3. Automated tests a. Code review by different person b. Reviewer tests c. Automated tests Product demo of changes Release of new version Packaging and release

2015 KPMG Advisory N.V., registered with the trade register in the Netherlands under number 33263682, a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative ( KPMG International ), a Swiss entity. All rights reserved. Printed in the Netherlands. The KPMG name, logo and cutting through complexity are registered trademarks of KPMG International. Produced by Create Graphics Document number CRT039089