integrating Data for Analysis, Anonymization, and SHaring idash Infrastructure to Host Sensitive Data: HIPAA Cloud Storage and Compute Claudiu Farcas, Antonios Koures
Outline Infrastructure Overview Typical Scientific Cloud Challenges idash Cloud & SHADE Repeatable Results Status and Future Plans 9/25/2014 Supported by the NIH Grant U54 HL108460 to the University of California, San Diego 2
idash Environments PHI Website PHI Repo. Miconcur icons Enterprise Non-PHI Non-PHI Repo. NLP Privacy Virtualization Hardware Cloud Proj.1 Proj.2 Proj.3 Proj.4 HIPAA Firewalls Separate VPN pool Physical separation Redundancy Two Factor Authentication Encryption at rest/in transit Centralized logging Intrusion detection Proxies and filters Hardened (secured) system configurations Remote Backups/DR 9/25/2014 Supported by the NIH Grant U54 HL108460 to the University of California, San Diego 3
Typical Scientific Analysis Short reads Call Deleterious SNPs Complex stuff SaaS PaaS IaaS Biomedical researchers, Clinicians, Other end-users Examples: Google Docs, Office 365 Bioinformatics researchers, Front-end developers Examples: Heroku, Google App Engine Algorithm developers, Bioinformatics researchers, Sysadmin Examples: Amazon EC2, Microsoft Azure 9/25/2014 Supported by the NIH Grant U54 HL108460 to the University of California, San Diego 4
To Cloud or Not to Cloud? Typical bioinformatics applications are NOT cloud aware Almost nothing at PaaS this is not web development Most published cloud papers use public Amazon VMs Privacy & Security are afterthought Data still goes around with unencrypted FTP End-to-end analyses need serious work This is a young field of science, practitioners have limited IT skills. 9/25/2014 Supported by the NIH Grant U54 HL108460 to the University of California, San Diego 5
idash Cloud & SHADE Overview Compute & storage elastic, HIPAA-compliant On-demand User-friendly Data analysis environment AUTOMATED Compute nodes Memory Disk storage Networking Powered by VMware compute request, direct upload & download of proprietary data, tool, recipe to CLOUD Data Tools Recipes upload & download to SHADE middleware and HIPAA security developed by idash Safe HIPAA-compliant d Data deposit box Environment HIPAA and non-public data Powered by MIDAS public data, tools, recipes 9/25/2014 Supported by the NIH Grant U54 HL108460 to the University of California, San Diego 6
Repeatable Results idash Short reads Context Blueprint Reference DB Test data Configuration Helper tools Context Instance Reference DB Input Test data Results Configuration Helper tools Call Deleterio us SNPs idash OS Short reads Call Deleterious SNPs 9/25/2014 Supported by the NIH Grant U54 HL108460 to the University of California, San Diego 7 Context Blueprint Reference DB Test data Configuration Helper tools Short reads Call OS Deleterious SNPs Context Blueprint Reference DB Test data Configuration Helper tools Short reads Call OS Deleterious SNPs Context Bookshelf Blueprint Reference DB Test data Configuration Helper tools Short reads Call OS Deleterious SNPs MyDATA idash Cloud + SHADE OS Short reads Call Deleterious SNPs
Improvements in Y4 Ordered and installed additional hardware to increase cloud capacity and provide tiered services: 180TB Dell Compellent tiered storage (SSD, 15K, 7.2K) 2 Dell R920 servers with 1TB Ram, 4 Intel E7-4870v2 CPU s/15 Core Software and Security Improvements Implemented Data Replication for DR Upgraded to vcloud 6.0 Improved VM provisioning automation Improved user portal Improved automation of storage tiering 9/25/2014 Supported by the NIH Grant U54 HL108460 to the University of California, San Diego 8
idash Cloud 3 computation tiers 3 storage tiers 10GbE throughout Full redundancy RSA Two Factor Auth. Remote data replication 800+ cores 7TB+ RAM 600TB+ storage
Future plans Improve User Experience and Management Improve collaborative environment (SocialCast, SHADE) Implement seamless vmotion of VM s between physically separate datacenters Experiment with VMware EVORail with idash Cloud -> Cloud in a Box Implement ITBM (IT Business Management) 9/25/2014 Supported by the NIH Grant U54 HL108460 to the University of California, San Diego 10
Thank you! (More) Questions? 9/25/2014 Supported by the NIH Grant U54 HL108460 to the University of California, San Diego
Secure VM Templates Full disk encryption Built-in Firewall Secure shared memory No root SSH Protected su Harden sysctl networking Disabled Open DNS Recursion IP Spoofing protection Hardened PHP for webapps Apache application firewall - ModSecurity ModEvasive protection of webapps from DDOS attacks Automatic logs scanning and banning of suspicious hosts - DenyHosts and Fail2Ban Intrusion Detection - PSAD Periodic checking for RootKits - RKHunter and CHKRootKit Autoscan for open Ports - Nmap Analysis of system log files - LogWatch SELinux / Apparmor application boundary enforcement System security auditing with Tiger