Karl Lum Partner, LabKey Software klum@labkey.com Evolution of Connectivity in LabKey Server
Connecting Data to LabKey Server Lowering the barrier to connect scientific data to LabKey Server Increased flexibility in routing data Cooperating with other systems Giving users more options with their data Historical look at LabKey Server connectivity Focus on some recent changes (REDCap, FreezerPro, ETL) Future directions 2
2003-2005 Data Pipelines LabKey Software focused on Proteomics CPAS server processing MS2 runs through the data pipeline Data uploaded through the browser and results saved to the database Pipeline tasks could parse specific data formats Analysis of Flow data FCS files processed via the pipeline Data entered into LabKey Server tables through web forms LabKey Server was the database of record 3
2005 Connectivity Summary Data Pipeline Form Entry Java Module LabKey Server 4
2006 Study and Assay Data Collaboration with SCHARP on the Atlas Portal Many data types associated with HIV/AIDS research Lots of study and assay data CRF and specimen data imported through the pipeline Assay data consisted of machine generated data files Assay framework and GPAT Imports data from spreadsheets or tab-separated text files No built-in specialized analysis or visualizations Appropriate for both raw and analyzed results Tool to infer fields from first file 5
2007 2008 APIs and Simple Modules Needed many custom applications for Atlas Java modules were complex to build and maintain Build custom applications without the module overhead LabKey APIs & Simple Modules Lowered the extensibility barrier Insert, update, delete programmatically Module based assays allowed easy entry into the assay framework Lists Create tables in LabKey Server and integrate with existing data Easily import file based data through the browser Tools to infer fields from files 6
2008 Connectivity Summary Client API Data Pipeline Form Entry File upload Java Module LabKey Server Simple Module 7
2008-2009 External Schemas Support for connecting to data sources not in the LabKey Server schema Relocating the data is no longer required LabKey Server security could be applied Editing of external table through the LabKey Server UI can be enabled Supported data sources: SAS PostgreSQL Microsoft SQL Server Oracle MySQL 8
2010-2012 APIs and Remote Connections LabKey Software continues to refine APIs Additional language bindings for Perl and Python Polish module based tools Remote connections LabKey Server as an external data source Connectivity through the LabKey Server API Folder level granularity 9
2012 Connectivity Summary Client API External SQL Data Sources Data Pipeline Form Entry File upload Java Module LabKey Server Simple Module Remote Server 10
2013-2014 External Application Integration REDCap Web application for building and managing online surveys and databases Developed and distributed by Vanderbilt University Popular in the academic and research community for designing clinical and translational research databases 11
2013-2014 External Application Integration International Center of Excellence for Malaria Research (ICEMR) at the University of Washington Demographic and clinical data in REDCap Wanted their REDCap data integrated into their LabKey Server Visualizations Queries Integration with experimental data 12
2013-2014 External Application Integration Data needed to be synchronized from REDCap to the LabKey Server REDCap API allowed programmatic and secure access to the projects of interest Data is extracted and saved in a format that can be imported into a LabKey Server study Scheduled automatic import 13
2013-2014 External Application Integration FreezerPro Commercial web application for frozen specimen inventory management Supports various sample types Tracks location and availability of specimens Allows user defined fields Users can create custom reports and export data 14
2013-2014 External Application Integration Novo Nordisk Type 1 Diabetes Research Center Uses FreezerPro to manage their research specimens Needed their specimen inventory integrated into LabKey Server Combine with experimental data Queries Visualization 15
2013-2014 External Application Integration API access to the remote FreezerPro server LabKey Server uses a secure storage to encrypt the FreezerPro credentials Inventory information is imported directly into LabKey Server Uses the data pipeline Study specimen repository Users control, field mapping, filtering, synchronization schedule 16
2013-2014 ETL Framework Stands for extract, transform and load Developed as part of HIDRA (Hutch Integrated Data Repository & Archive) Goals of building a LabKey Server ETL Framework Provenance Understanding the origin of the data, knowing when and how it got there Auditing Security Integration into the LabKey Server security model Flexible data integration strategy ETL 17
2013-2014 ETL Framework Built on top of Pipelines Functionality Query based ETLs Stored procedures Remote Sources Checkers (identify whether work is to be done) Scheduling Logging output ETL 18
2013-2014 ETL Framework ETLs are module based An ETL consists of a set of Transform Steps Key components of a transform Source table or query Destination table Filter strategy Identifies rows to transform & if there is work to do Schedule ETL 19
2013-2014 ETL Framework Filter Strategies Choose which rows to move to target table Select all Just get all the data, every time Last modified Rows with a date/time column newer than last run Records most recent value Run filter Checks a specified column, especially an incrementing integer column Any rows with higher value than last time are transformed Useful for rows written by previous ETLs ETL 20
2013-2014 ETL Framework Target Options How to add data to target table truncate - delete all rows and add the selected ones append Add new rows to the target table Will fail if duplicate primary keys merge Update or Insert Matches Primary Keys ETL 21
2013-2014 ETL Framework Schedule Options When to run the transform Poll option Check at a defined interval Cron option Can be used to check at a particular time of day ETL 22
Connectivity Summary Client API External SQL Data Sources Data Pipeline Form Entry File upload Java Module LabKey Server Simple Module ETL Remote Server External Systems 23
Future Directions Other connection strategies LabKey is investigating DatStat I2b2 Caisis Online data and study management software informatics framework that will enable clinical researchers to use existing clinical data for discovery research Open source, cancer data management system 24
Karl Lum klum@labkey.com Any questions? 25