Educational Collaborative Develops Big Data Solution with MongoDB

Similar documents
Software Continuous Integration & Delivery

How Silk Central brings flexibility to agile development

Smarter Balanced Assessment Consortium. Recommendation

MarkLogic Server. Reference Application Architecture Guide. MarkLogic 8 February, Copyright 2015 MarkLogic Corporation. All rights reserved.

You ll need to have: It d be great if you have:

Bridging the Gap Between Acceptance Criteria and Definition of Done

Getting started with API testing

White Paper. Java versus Ruby Frameworks in Practice STATE OF THE ART SOFTWARE DEVELOPMENT 1

Best Overall Use of Technology. Jaspersoft

Application Lifecycle Management Using Visual Studio 2013 (SCRUM)

Continuous Integration for Snabb Switch

ASSURING SOFTWARE QUALITY USING VISUAL STUDIO 2010

Client Overview. Engagement Situation. Key Requirements

A Scalable Data Transformation Framework using the Hadoop Ecosystem

Software Engineering I (02161)

> Define the different phases of K2 development, including: understand, model, build, maintain and extend

Testing in a Mobile World

Metadata Quality Control for Content Migration: The Metadata Migration Project at the University of Houston Libraries

Cloud3DView: Gamifying Data Center Management

JavaScript Applications for the Enterprise: From Empty Folders to Managed Deployments. George Bochenek Randy Jones

SAS in clinical trials A relook at project management,

TEST AUTOMATION FRAMEWORK

Test Driven Development Part III: Continuous Integration Venkat Subramaniam

In depth study - Dev teams tooling

An Oracle White Paper September Oracle Team Productivity Center

November 12 th 13 th London: Mastering Continuous Integration with Jenkins

GLOBAL CONSULTING SERVICES TOOLS FOR WEBMETHODS Software AG. All rights reserved. For internal use only

Effective Team Development Using Microsoft Visual Studio Team System

Nova Software Quality Assurance Process

Tomáš Müller IT Architekt 21/04/2010 ČVUT FEL: SOA & Enterprise Service Bus IBM Corporation

Service Virtualization:

INCREASE YOUR WEBMETHODS ROI WITH AUTOMATED TESTING. Copyright 2015 CloudGen, LLC

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Continuous Integration (CI)

a new generation software test automation framework - CIVIM

Case Study: Financial Institution Deploys MBT to Test at the Speed of Agile Development

ALM2013VS_ACC: Application Lifecycle Management Using Visual Studio 2013

SOFTWARE TESTING TRAINING COURSES CONTENTS

TESTING FRAMEWORKS. Gayatri Ghanakota

Cúram Business Intelligence Reporting Developer Guide

METADATA-DRIVEN QLIKVIEW APPLICATIONS AND POWERFUL DATA INTEGRATION WITH QLIKVIEW EXPRESSOR

How To Build A Global Intranet On The Cloud With Sharepoint 2013

Application Development at Congruent

Mastering Continuous Integration with Jenkins

SA Tool Kit release life cycle

Meta-Framework: A New Pattern for Test Automation

Client Overview. Engagement Situation. Key Requirements

Automate Your BI Administration to Save Millions with Command Manager and System Manager

PEGA MOBILITY A PEGA PLATFORM WHITEPAPER

Introduction to Agile Software Development Process. Software Development Life Cycles

Cisco Data Preparation

MicroStrategy Course Catalog

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Test Automation: A Project Management Perspective

Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012

Developing for the App Store. (Legacy)

Testing Intelligent Device Communications in a Distributed System

Client Overview. Engagement Situation

Automate Your Deployment with Bamboo, Drush and Features DrupalCamp Scotland, 9 th 10 th May 2014

Implementing a Data Warehouse with Microsoft SQL Server 2012

Building Value with Continuous Integration

Adam Rauch Partner, LabKey Software Extending LabKey Server Part 1: Retrieving and Presenting Data

Test Automation Integration with Test Management QAComplete

Databricks. A Primer

Unit Testing webmethods Integrations using JUnit Practicing TDD for EAI projects

White Paper 6 Steps to Enhance Performance of Critical Systems

MOOCviz 2.0: A Collaborative MOOC Analytics Visualization Platform

How To Retire A Legacy System From Healthcare With A Flatirons Eas Application Retirement Solution

XpoLog Center Suite Data Sheet

Best Practices: Extending Enterprise Applications to Mobile Devices

Implementing a Data Warehouse with Microsoft SQL Server 2012

Jenkins Continuous Build System. Jesse Bowes CSCI-5828 Spring 2012

Contents. Introduction... 1

MongoDB Developer and Administrator Certification Course Agenda

DJANGOCODERS.COM THE PROCESS. Core strength built on healthy process

SQL Server 2012 Business Intelligence Boot Camp

EMPLOYEE LOCATION TRACKING SERVICE

WHITEPAPER. Managing Design Changes in Enterprise SBM Installations

A standards-based approach to application integration

Agile Best Practices and Patterns for Success on an Agile Software development project.

Viewpoint. Choosing the right automation tool and framework is critical to project success. - Harsh Bajaj, Technical Test Lead ECSIVS, Infosys

Continuous integration with Jenkins CI

WHITEPAPER OpenIDM. Identity lifecycle management for users, devices, & things

Databricks. A Primer

Structured Content: the Key to Agile. Web Experience Management. Introduction

Integrity 10. Curriculum Guide

Building Value with Continuous Integration

Testing. Chapter. A Fresh Graduate s Guide to Software Development Tools and Technologies. CHAPTER AUTHORS Michael Atmadja Zhang Shuai Richard

Group Assignment Agile Development Processes 2012

Tracker 7. System Overview

Unifi Technology Group & Software Toolbox, Inc. Executive Summary. Building the Infrastructure for emanufacturing

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy

Advantage of Jquery: T his file is downloaded from

Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning

Apache Sentry. Prasad Mujumdar

QA & Test Management. Overview.

Methods and tools for data and software integration Enterprise Service Bus

QUICK FACTS. Providing Application Development and Data Migration Support for a Leading Healthcare Company

Whitepaper - Security e-messenger

Transcription:

CASE STUDY OVERVIEW Educational Collaborative Develops Big Data Solution with MongoDB INDUSTRIES Education, Nonprofit LOCATION Durham, NC PROJECT LENGTH 1 year, 5 months APPLICATION SUPPORTED Data driven software providing tools to enhance the quality of education RELEVANT WEBPAGE OR SCREENSHOT IMAGE HERE Education is the most powerful weapon which you can use to change the world. - Nelson Mandela The Project In early 2012, Mammoth Data began working with a non-profit educational collaborative. The organization was eager to develop a robust application that would facilitate a more fluid collection and analysis of data for educators. The project consisted of a set of services designed to provide access to a unified database for student data across the United States. Initially Mammoth Data provided project management services on several of the smaller projects within the overall development umbrella. Following these successful contracts, Mammoth Data was eventually asked to join the customer s new in-house development team in taking over development of the product. The product consisted of an extensive RESTful API developed in Java/Spring and backed by a MongoDB data store, with management, data access, data manipulation, and other applications implemented in Java, Ruby, Freemarker, and other languages. Mammoth Data was not involved in the beginning of this project, and due to our delayed presence some time was spent bringing the existing project up to Mammoth Data s standards in terms of architecture, code quality, build infrastructure, and documentation. This project ran on industry standard 2-week iterations. Mammoth Data managed one scrum team and worked closely with the client as they brought some of the work in-house and created an internal scrum team. Mammoth Data diligently followed the established development practices and maintained clear communication during the course of that transition and throughout the duration of the project.

technologies & TECHNIQUES Test-driven development Business-driven development with cucumber Continuous integration with jenkins Peer reviewed code with reviewboard and crucible Issue tracking with rally and jira Agile/scrum/sprints revision control with git Accomplishments Instructions and Installation Mammoth Data refined and fixed installation scripts, while writing new instructions for the setup of these components. This included multiple Java application servers, Rails servers, a JMS broker, MongoDB database, and Liferay installation. New developers that joined the project were able to reduce their onboarding and development environment installation time from two days to three hours by using Mammoth Data s new scripts and instructions. These scripts included bash scripts that would download and install portions of the environment. The scripts were also used to perform tasks that could be automated, such as emptying out the test database and re-importing a clean set of data when switching between tasks. When Mammoth Data began working on this project there was no documentation on how to build and/or test the system. The Mammoth Data team was able to assess the appropriate actions required to have a whole system up and running while documenting those steps, as well as others that would be beneficial to maintaining a working local system. As part of this, a markdown file was created that was used as the documentation for how to build the system and was later publicly released along with the code when the system was open sourced. Language Updates We updated the project from Ruby 1.9.2 to Ruby 1.9.3 and then to Ruby 2.0.0, and from Java 1.6 to Java 1.7, as we identified and corrected compatibility issues in project code, tests, and the included libraries. Security Through code analysis and extending regression tests, we discovered multiple defects of various severity, which allowed application clients to execute functions which should not have been allowed for privileged and security-related entities. We designed, implemented, and tested appropriate solutions to close these issues. Additionally, we found that improperly sanitized API parameters allowed clients to corrupt the database by wiping out entity metadata with blank posts. We disallowed such posts in all of our modified services. To prevent these issues from recurring, we implemented new regression tests that attempted all of these forbidden posts and accesses, verified that the system did not permit them and logged events appropriately. To expose security events through the Rest API, Mammoth Data created a new resource (controller) and a new service to handle capturing the data from MongoDB. The data was in a similar format as the rest of the entities, and as a result it was easy to extend and use existing code to handle the connections to the database. Furthermore, it simplified retrieving representative objects. A new right was also added using existing security infrastructure to verify only approved users could view the security data.

REST API We performed various upgrades to the REST API in order to expose new entities and MongoDB collections for client consumption including data import job status reports and security event logs. These new implementations augmented the project s use of Spring Data for MongoDB. This project was a unique opportunity to secure the importance of big data in our nation s future, while also providing a seamless solution to increase the quality of our education. DREW NELSON CONSULTANT, MAMMOTH DATA We added API endpoints for ingestion data (ingestion is the process used for imported data from another source into the system). In the system, API and Ingestion were separate modules, and each module had implemented its own communication with MongoDB. We created new Spring Data MongoDB DAOs, services, and controllers to provide better access to the ingestion data, which unified the data access layer. The only thing able to be reused was the model. This change required a new right to be implemented. Databrowser We adapted a Ruby on Rails application that provided a raw view of the data to summarize and provide an administrator-focused view with numerous UI refinements. The administrator-focused view required extensive visual and functional changes. We designed our extension of the tool to serve this new audience while maintaining its original functionality of representing the Rest API and Mongo data in visual form. A fair number of the changes were to clean up the UI. These were mostly text changes that would explain what was going on behind the scenes, as well as provide more appropriate data as required. For example, the columns in tables of displayed data were inconsistent across different data types, and across the same data types as displayed in different locations. For this, Mammoth Data changed the column headers around to be more uniform across the entities that were being viewed in the system. Another item that was added was pagination. As it was, only a max of 50 entities would ever be displayed for a particular entity type, and there was no way to see any entities beyond this point. A new pagination system was built into the Databrowser to sync the display and retrieval calls to the API. This allowed the display to page through all of the entities made available by the API and to only return the specified number of entities for performance reasons. This was all made configurable through yml files that were purposed throughout the Databrowser product. Various tables and counts were also added to the UI for the Databrowser. These tables would give statistical information about the entities that were associated with the specific entity being viewed. There were only certain entities where this data made sense, and therefore controls were inserted to validate that these tables were only displayed at appropriate locations. Other counts were also placed next to various links throughout the system to show the number of entities that reside below those links. The organization was eager to develop a robust application that would facilitate a more fluid collection and analysis of data for educators. Data Schema Analysis We identified inefficiencies in the data model that were an artifact of the project s prior transition from an RDBMS to MongoDB. We found that the existing implementation used associative tables between collections instead of sub-documents, which are more

performant for NoSQL. The client s API exposed these association tables, introducing additional dependency on them. Mammoth Data recommended changes to the data model to bring it in line with standard best practices for a document database, and implemented an interim solution at the client level to fulfill our feature requirements. Test Coverage All new code was covered by unit and/or integration tests. We uncovered and corrected broken and/or avoided tests. For each step along the way, new tests were created and other tests were modified to evaluate the new functionality being added. This was done using JUnit and Cucumber/Gherkin. Each of the requirements that were given in the work being done were tested to verify their viability within the system. We developed a process where each relevant piece of code had appropriate tests designed and then tested locally. Once each test would pass locally, a specific module test was performed to verify there was no impact to the module and other relevant sections of code. After everything passed locally, a full suite of integration tests were implemented in a Jenkins environment. While the Release Candidate testing took place, the modified code would be placed into a review We found that the existing implementation used associative tables between collections instead of subdocuments,which are more performant for NoSQL. system for peer review. Changes at this stage were generally related to making sure appropriate comments were inserted and there were no glaring issues with the code that would impact performance and/or create bugs. Upon successful completion of the Jenkins test and code review, the code was then pushed to a Release Candidate environment which housed the system on multiple servers matching production as closely as possible. At this point, another full suite of tests was administered. While the code was in the Release Candidate environment a demo would be presented to the client. This would show what had been accomplished and to verify that the functionality and presentation met their expectations. After the aforementioned steps were completed, and the code had passed the necessary tests while meeting the requirements of the client, then and only then would the code merge into the master branch. After it was merged, a full set of Jenkins tests would run on the master branch to verify no issues were introduced by the merge. Additionally, at the end of every sprint cycle the master was pushed to the Release Candidate and the full suite of tests were run there to ensure that the master was ready for release. The release process would then be followed by the operations and release management team.

ABOUT US Mammoth Data is a Big Data consulting firm specializing in Hadoop, NoSQL databases, and designing modern data architectures that enable companies to become data-driven. By combining cutting-edge technologies with a highlevel strategy, we are able to craft systems that capture, organize and turn unstructured information into real business intelligence. Mammoth Data was founded as Open Software Integrators in 2008 by open source software developer, evangelist and now president, Andrew C. Oliver. Mammoth Data is headquartered in downtown Durham, North Carolina. MAIN OFFICE 345 W. MAIN ST. SUITE 201 DURHAM, NC 27701 (919) 321-0119 info@mammothdata.com @mammothdataco mammothdata.com