HPC infrastructure at King s College London and Genomics England

Size: px
Start display at page:

Download "HPC infrastructure at King s College London and Genomics England"

Transcription

1 HPC infrastructure at King s College London and Genomics England Tim King s College London, King s Health Partners Genomics England Wellcome Trust Sanger Institute Farr-ADRN-MB einfrastructure workshop 16 th January 2015

2 King s College London Anchor tenant at Infinity Data Centre in Slough Existing HPC recently procured for KCL will be relocated alongside new HPC procured for two BRCs to create a large research facility

3 HPC 3 KCL s existing HPC environment consists of: Beowulf Linux Cluster Infiniband Network Grid Engine 1464 CPU cores, 8 GPU cores 4.2TB RAM 87TB Usable Lustre Storage

4 KCL collaborates with two other members of King s Health Partners to form NIHR funded Biomedical Research Centres (BRC): The Biomedical Research Centre for Mental Health, with the South London and Maudsley NHS Foundation Trust The Comprehensive Biomedical Research Centre for Mental Health, with Guy s and St. Thomas NHS Foundation Trust

5 Both BRCs have a significant (and similar) research computing requirement to process biomedical imaging, -omics and clinical record data. Recently been awarded grants to expand research computing capacity from Maudsley Charity and Guy s and St Thomas s Charity totalling around 2.5 million. Joining forces with KCL HPC 3 to create a flexible research computing environment for KHP in the new Infinity SDC...

6 Hardware: Infinband compute In addition to HPC 3 : 64 x Haswell (2 x 10 core) nodes, 128GB RAM 32 x Haswell (2 x 10 core), 256GB RAM 10GbE Compute Existing BRC-MH 15 x 16core, 256GB RAM HP blade servers 32 x Haswell (2 x 10 core), 256GB RAM Tier 1 storage (Lustre) ~500TB scratch Tier 2 storage (ceph) ~ 3PB usable storage Off-site backup storage

7 Research Computing Service Platform Grid Engine Compute Cluster - infiniband network - lustre scratch OpenStack Cloud - 10GbE network - Cloud nodes may be used to temporarily expand Grid Engine Cluster irods Research Data Store - Built on Ceph object store Hardware

8 Public Domain OpenStack irods Store Research Domain Grid Engine OpenStack irods Store NHS Domain Grid Engine OpenStack irods Store

9 Genomics England

10 Genomics England Proposed data flows Sequencing Centres Sample repository Refreshable identifiable Clinical Data, linked to anonymised Whole Genome Sequence Annotation Apps Sample Patient Consent EHR Primary Care Hospital episodes Clinical Report Clinical Genetics, Cancer & Public Health. NHS Trusts, Patients & Public Pilots: Selected Centres, CRUK, BRCs Fire wall Patient data stays on NHS side Only processed results pass outside Safe haven: Anonymised Clinical data and DNA sequence Clinicians & Academics GeCIP Industry Main Program: Genomic Medicine Centres

11 MRC infrastructure award m Skyscape 5Pb storage, tape through NSSA tendering Rental of CPUs m Full procurement

12 Data Sharing Open to all Human Genome Projects where subject consented: Hapmap, 1000 genomes Repository: Genbank, ENA, DDBJ (INSDC) Managed distribution (must be bona fide researcher) Genetic data for disease cohorts, with phenotypes Repository: DbGaP, EGA (Encrypted distributions etc.) Managed access, no redistribution Genomics England datasets Repository: GeL Datacentre

13 A future with closed datasets Multiple sets of Hospital/National datasets with no redistribution policies Value for research in generating statistics across this global set

14 Global Alliance for Genomes and Health

15 Global Alliance for Genomes and Health

16 Developing the UK infrastructure for e-health research John Ainsworth Deputy Director, HeRC 16 January 2015 UCL Workshop

17 From Big Data to Big Scale DATA METHODS & MODELS EXPERTISE Vast data volume, velocity, variety TSUNAMI Supra-linear growth in papers & tools BLIZZARD Similar number of analysts DROUGHT Three Big Health E-Research Challenges 1. Assist hypothesis formation with data 2. Weed out non-reproducible findings early 3. Couple data-intensive healthcare and research

18 Who is Farr? Diseases are more easily prevented than cured and the first step to their prevention is the discovery of their exciting causes. William Farr 30 November April 1883

19 What is Farr? A distributed research institute that will integrate and scale, at the UK level, the work of four Health Informatics Research Centres (HIRCs)

20 History In August2012, ten UK funding agencies awarded four Centres of Excellence in e- health informatics research The four HIRCs aim to optimize the use of health records in research and address the UK s capacity building requirements to support a sustainable health informatics research base.

21 Health Informatics Research Centres Scotland Dundee, Glasgow, Edinburgh, St Andrews, Aberdeen, Strathclyde, MRC HGU, NHS NSS HeRC Manchester, York, Lancaster, Liverpool, Sheffield, AHSNs CIPHER Swansea, Bristol, Cardiff, Exeter, Leicester, Sussex, NWIS, Public Health Wales UCL Partners UCL, LSHTM, Queen Mary, Public Health England Map Source:

22 More History In 2013, the Farr Institute was created to support the HIRCs collective work. Farr CIPHER Farr HeRC Farr Scotland Farr UCL Partners Together, they bring a total of 20 academic institutions and two MRC units. Farr will act as the nexus of the UK Health Informatics Research Network

23 Aims of the Farr Create a physical and electronic infrastructure to support and accelerate the Centres collaborative work Support partnerships by providing a physical structure to co-locate NHS organisations, industry, and other UK academic centres Facilitate collaboration, the sharing of datasets, and the adoption of common standards Develop new opportunities for future data linkage at scale

24 UK Health Informatics Research Network Farr will lead the UK Health Informatics Research Network Farr will develop the Network s 5-year strategy plan and provide a blueprint for its activities The Network aims to strengthen the UK s capability in health informatics research by harnessing the expertise in the Farr and the wider UK research community The Network is open to all members of the research community Prof Carole Goble Prof Carole Goble

25 HeRC elab Based at Vaughan House IGTK L2 ISO27001:2013 in process Initial Farr Investment Labs Safe Haven HPC Devices VC Additional MRC CRI funding Clinical Proteomics Centre UK Dementia Platform Single Cell Genomics Secure file storage Secure file exchange Secure file transfer across NHS N3 Secure file transfer across public networks elab data management services via web interface Data linkage Data repository Research data extracts Data analysis software and compute Virtual machine service from remote locations Virtual machine service from secure data analysis environment Dataset inventory Personal health data repository HPC remote access

26 N3 NHS User N8 HPC Janet HAN HeRC Safe Haven : ISO27001 ISMS Phase 10 Research Repository Single Sign On Transient Repository Applications & Compute Remote Desktop elab 2 factor auth 2 factor auth Researchers HeRC Governance Board HeRC NHS : NHS IGTK Remote Repository AAAI NHS Pseudo Data Repository Data Transfer NHS elab Patients & Devices Dataset Catalogue

27 Big Data funding for health, medical and administrative data MRC 20M for the four Farr Institute nodes, for einfrastructure and buildings, June 2013 ESRC 34M for four Administrative Data Research Centres (ADRC) and Administrative Data Service, Nov 2013 MRC 39M for six Medical Bioinformatics Initiative projects, Feb 2014

28 The safe share project Background There is significant investment in medical research trying to unlock the value of data collected by the NHS and the wider government in order to further knowledge of disease and ill-health and improve medical treatments Building on the recent development of the, MRC and partner funded, Farr Institute, Medical Bioinformatics Initiative and the Administrative Data Research Network, and their infrastructure requirements Challenges Health Data is very personal and sensitive, and there is rightly public concern about any real or perceived inappropriate access Significant numbers of ethical, consensual and practical hurdles to making use of the data for research Title of presentation 00/00/

29 Meeting the Big Data challenge Being able to access data securely Being able to share data safely Being able to work together collaboratively Solve the problem once for everyone, potential solutions at scale and give public confidence that data is appropriately protected Project to be run in two parts, each with a set of pilots: 1. Secure connectivity, higher assurance network (HAN) 2. Authentication, Authorisation and Accounting Infrastructure (AAAI)

30 Secure Connectivity Use Cases Inter-Farr initial trial between Farr centre at Manchester and the N8 HPC at Leeds, but will extend to the other Farr centres (Swansea, London and Dundee) Intra-Farr to securely link the Swansea Farr centre with one of its collaborative projects with Bristol (ALSPAC) ADRC / Farr Pod to Data Centre connectivity between accredited secure rooms that can be connected to ADRC data centres for remote working

31 Authentication, Authorisation and Accounting Infrastructure (AAAI) Use Cases Dementia Study by Oxford University with the objective to demonstrate researchers using home institution credentials and a generic user request model to authenticate access to a set of relevant national and study specific datasets HeRC N8, HPC, DiRAC access between these facilities using home institution credentials emedlab partners will be able to analyse human genome data, medical images, clinical, psychological and social data. To demonstrate using a common AAAI with access via a common credential Swansea University Health Informatics Group investigating whether Moonshot can provide an authentication mechanism, allowing use of home institution credentials

32 Partners The project is funded and managed by Jisc working in partnership with: Wider Initiatives: The Farr Institute The MRC Medical Bioinformatics Initiative The Administrative Data Research Network Incorporating organisations involved in the pilots: University of Manchester UCL Swansea University University of Dundee Francis Crick Institute University of Oxford University of Leeds University of Sheffield University of Southampton University of Bristol HSCIC

33 Timetable Agreement on requirements and use cases - complete Funding approval - complete Detailed project planning in progress Detailed design and architecture of infrastructure in progress Operational standards, development controls Q Infrastructure deployment, installation and commissioning Q Initial operational and testing with customers Q Customer trials begin Q External certification ISO27001 process Q Recommendations Q2 2016

34 The 3Rs of data science: Repeat, Reproduce, Reuse

35 The 3Rs of data science: Repeat, Reproduce, Reuse The 1T of data science: Transparency

36 Reproducibility A principle of the scientific method Evidence to test and justify claims Comparison of results and methods Peer review Prof Carole Goble

37 Defining drug exposure 192 different datasets 1. Selecting stop date 2. Handling missing stop date 3. Overlapping prescriptions Decision nodes 4. Small treatment gaps

38 A Data Science Commons Publish, Discover, Reuse Data Science Artefacts as Research Objects Rules 1. Each unique research object placed into the Commons must have a unique identifier. 2. That unique identifier must allow the research object to be found, shared and attributed. 3. Attribution requires associated provenance that, minimally, identifies the creator(s) of the unique research object, those that have subsequently modified it, and how it was modified. More at

39 Farr ADRN Medical Bioinformatics e-infrastructure Workshop Simon Thompson The Swansea University version

40 Health Informatics Group, Swansea University FARR ADRC Swansea Bio-Info (SAIL)

41 FARR Based on SAIL Databank Linked Routine Data, Internationally Recognised data linkage system 4.7 million people 9 billion rows of data Over 20 core national datasets, 200+ project specific datasets GP Primary Care Inpatient & outpatients Secondary Care A&E, Emergency care Pathology & LIMS Births & Deaths Child Health & Perinatal Screening Screening Breast, Cervical Cancer registries WCB, CARIS, WCISU Education data Central Repository / Wharehouse 300 users, > 70m research induced NHS Wales connectivity (DAWN2-N3) infrastructure inside NHS core data centers

42 Based on Split File Principal File 1 Demographics + Link Key ID Name Address 56 Fred Bloggs The Big house 78 Jim Jones 87 peterson rd 45 Harry Lucas 19 meirwen Supplier Data File 2 Linkage ID Name Address BP Diag 56 Fred Bloggs The Big house 120/80 G Jim Jones 87 peterson rd 135/45 P Harry Lucas 19 meirwen 125/75 G77.. Clinical (s) + Link Key ID BP Diag /80 G /45 P /75 G77.. File 3 ID ALF Conf Load into SAIL ALF_E BP Diag /80 G /45 P /75 G77..

43 FARR Evolution Remote Desktop (VDI) Technology, Single Sign on (Active Directory) Shared Security Model / Provisioning(v3) Two factor authentication Introduction of addition services Secure Filestore, WIKI, Helpdesk, Training Anonymising of GIS datasets (residencies and geo data) Active Directory Pooled standard config Vmware View Security Server (VPN) (x3) Vmware view Connection Broker Dedicated configurable Data Warehouse c Two Factor Authentication Server Specialist / Custom config

44 FARR Evolution Building on initiatives Data /Dataset documentation Data Quality measurement Automation of processes / Self service New probabilistic matching engine Natural language processing New technologies SQL Server 2014 cluster, HADOOP, R cluster Local & Remote capabilities Data Appliance UKSeRP White Labelling SAIL infrastructure Security Model v3 and provisioning v3 (some federation) Choice of two factor authentication platform Geo restrictions Project Level Encryption

45 National Research Data Appliance (NRDA) Simplistic Viewpoint NRDA1 NRDA2 User interface for dataset management Matching and Linkage Data Loader Data Quality Data Catalogue Pluggable architecture NRDA3 1 st deployment to NHS Trust this month

46 UK Secure Research Platform (UKSeRP) Simplistic view:- PORTAL Virtual Desktops NRDA Security Probabilistic Linkage Data Catalogue, Documentation, Metrics, Quality T1 T2 T3 IBM DB2 MP-DB SQL 2014 Cluster PostgreSQL + Post GIS ARCGIS HADOOP Cluster Virtualisation Stack IBM ICA HPC / Specialist Shared Filestore Doc / Community Support

47 UKSeRP uses NRDA User Portal ServiceDesk Data Appliance Security v3 Provisioning Capabilities Permissions People DataSets Data Loading Data management Data Documentation Data Quality Versioning Data Catalogue Probabilistic Linkage Transport / Sharing Anonymisation Trusted 3 rd Party Probabilistic Linkage Data Catalogue NLP Shared Infrastructure Data OLAP D.M. IBM DB2 Data OLAP BI SQL Server Hadoop PIG HDFS Cloudera Hadoop 1 12 Files DFS Webdav Filestore SAS SPSS VDI VDI Templates VDI Templates Templates Vmware View VDI VDI Virtual Templates Templates Servers SCVMM DB2 IBM C.A. EDMS CliniThink VMware HyperV Backup, Recovery and DR Core Active Directory Accounts DHCP DNS WSUS

48 Data Science Building New building solely for MRC / ESRC Whole building considerably more secure/controlled than any existing building on campus. SEAP Level 4 area on top floor incorporating a server room and safe setting.

49 The tour so Farr!!

50 Health Informatics Group, Swansea University FARR ADRC Swansea Bio-Info (SAIL)

51 ADRC link FARR but Administrate Data Use of previous investment in systems / knowledge / development Very similar to FARR at the 1000 foot view, lots of differences in detail Lots of time on perfecting the design of the wheel must be a better design than square??? These dataset have not be shared at scale before lots of nervousness NRDA UKSeRP

52 New world for these data suppliers Not a repository model Compile dataset Do research Publish Destroy Data is transitory and specific to a project Data Linkage New linkage capabilities in NRDA required Possible Encryption at source with linkage based on encrypted demographics

53 New world for these data suppliers Security Much higher security requirements Hoping for shared infrastructure, ADRC on UKSeRP All researcher must have Safe Researcher Training / Cert System Admin / Developers Security Cleared Safe Settings Physical location with dataset locked to these Remote locations Cardiff, Bristol (link back - FARR)

54 New world for these data suppliers Security Much higher security requirements hoping for shared infrastructure, ADRC on UKSeRP All researcher must have Safe Researcher Training / Cert (link back FARR) System Admin / Developers Security Cleared Safe Setting linking to Cardiff, Bristol NRDA and new linkage Encryption at source linkage

55 The joining up of efforts and re-use is absolutely critical Routine Data Free Text Remote NRDA Systems Systems Free Text Routine Data TTP NRDA Bespoke Data Compute Cluster NRDA Doc/Meta Data Devices Medical Images Research Image Rep. Image NRDA Structured Data UKSeRP Research Platform Anon. Images CLIMB System Bio-Info NRDA

56 The joining up of efforts and re-use is absolutely critical Routine Data Free Text Remote NRDA Systems Systems Free Text TTP NRDA Routine Data SAIL Doc/Meta Data Devices Bespoke Data Compute Cluster NRDA Medical Images Research Image Rep. Image NRDA Structured Data UKSeRP Research Platform Anon. Images CLIMB System Bio-Info NRDA

57 The joining up of efforts and re-use is absolutely critical Routine Data Free Text Remote NRDA Systems Systems Free Text TTP NRDA Routine Data FARR Doc/Meta Data Devices Bespoke Data Compute Cluster NRDA Medical Images Research Image Rep. Image FARR NRDA Structured Data UKSeRP Research Platform Anon. Images CLIMB System Bio-Info NRDA

58 The joining up of efforts and re-use is absolutely critical Routine Data Free Text Remote NRDA Systems Systems Free Text TTP NRDA Routine Data ADRC Doc/Meta Data Devices Bespoke Data Compute Cluster NRDA Medical Images Research Image Rep. Image NRDA Structured Data UKSeRP Research Platform Anon. Images CLIMB System Bio-Info NRDA

59 The joining up of efforts and re-use is absolutely critical Routine Data Free Text Remote NRDA Systems Systems Free Text TTP NRDA Routine Data Bespoke Data Compute Cluster NRDA MS Platform Doc/Meta Data Devices Medical Images Research Image Rep. Image NRDA Structured Data UKSeRP Research Platform Anon. Images CLIMB System Bio-Info NRDA

60 The joining up of efforts and re-use is absolutely critical Routine Data Free Text Remote NRDA Systems Systems Free Text Routine Data TTP NRDA Bespoke Data Compute Cluster NRDA Doc/Meta Data Devices Medical Images Research Image Rep. Image NRDA Biobank ProjectAnon. Structured Data UKSeRP Research Platform Images CLIMB System Bio-Info NRDA

61 The joining up of efforts and re-use is absolutely critical Routine Data Free Text Remote NRDA Systems Systems Free Text Routine Data TTP NRDA Bespoke Data Compute Cluster NRDA Doc/Meta Data Devices Medical Images Research Image Rep. Image NRDA Structured Data UKSeRP Research Platform Anon. Images CLIMB System Bio-Info NRDA CLIMB

62 The joining up of efforts and re-use is absolutely critical Routine Data Free Text Remote NRDA Systems Systems Free Text Routine Data Devices Bespoke Data UKDP Compute TTP NRDA NRDA Doc/Meta Data Cluster Medical Images Research Image Rep. Image NRDA Structured Data UKSeRP Research Platform Anon. Images CLIMB System Bio-Info NRDA

63 The joining up of efforts and re-use is absolutely critical Routine Data Free Text Remote NRDA Systems Systems Free Text Routine Data TTP NRDA SAFE Share Bespoke Data Compute Cluster NRDA Doc/Meta Data Devices Medical Images Research Image Rep. Image NRDA Structured Data UKSeRP Research Platform Anon. Images CLIMB System Bio-Info NRDA

64 The tour of routine data ends here!!

65 ADRC-Scotland & Farr Institute - Scotland Dr Stephen Pavis NHS Scotland

66 History in Scotland NHS National Services Scotland linking data for over 20 yrs Scottish Health Informatics Programme Empirical research Infrastructural design Public engagement Law and subsequent Guiding Principles Computing infrastructure (with separation of function) Data Linkage Framework (Scottish Government) Funding from ESRC (ADRC-S), MRC and 9 others (Farr and HIRC), Scottish Government (Data Linkage and Sharing Service)

67 The Scottish Model Facilitating research that is in the public interest whilst protecting individuals privacy Avoiding large data warehouses but ensuring data can be brought together efficiently to answer important research questions Creating partnerships and networks across sectors (academia, public and commercial sectors). But not selling data or allowing commercial companies direct access to individuals personal information Sharing resources and expertise to create efficient public services (Campbell Christie report)

68 Farr Scotland and ADRS-S data resources Neonatal Record GP consultations Mental Health Substance misuse Community care BIRTH Dental Out patients Hospital Admissions DEATH Maternity Prescribing A&E Screening Suicide Cancer registrations Child health surveillance Immunisation Imaging Laboratory BIRTH Education Looked after children Marriage Community care Care homes DEATH HMRC DWP Census (Scotland & UK)

69 IT Security Assurance NHS require System Security Protocol approved by IT Security Officer within National Services Scotland ADRC-S data suppliers require UK Government security classification ADRN have agreed that: Project data will not exceed the Official Sensitive category Each ADRC will provide an environment which is able to process data at the Official Sensitive level

70 Scottish Informatics and Linkage Collaboration Farr Institute (MRC) Administrative Data Research Centre (ESRC) Urban Big Data Centre? Shared computing resources at University of Edinburgh edris Research coordination and advice (NSS) Shared TTP Linkage service at NRS Shared office space at BioQuarter (UoD and UoE) SILC (Shared services for research initiatives that process sensitive data)

71

72 edata Research & Innovation Service Provide analyses, interpretation and intelligence about data (where required) 8 1 A named Person from start to finish Liaison with technical infrastructure (safe havens) 7 Single point of entry for health research Support projects from start to finish 2 Help with study design Facilitate completion of required permissions 6 5 Build relationship between data suppliers and customers 4 3 Provide expert advice on coding, terminology, meta data and study feasibility Liaison with data suppliers to secure data Agree deliverables and timelines

73 ADS Essex Advice/ Data Request Researcher requires access to linked data edris Co-ordinator refers data request to sources Advice and guidance Project IDs 1 Personal IDs 1 TTP Training & Researcher Approval Data Sources (e.g. NHS, Social Services, Police or local datasets) Project IDs Mapping Linking Service Project IDs 2 Personal IDs 2 Project IDs 2 with payload data 2 Project IDs 1 with payload data 2 Once trained and approvals for linkage are in place, the Researcher can access the linked dataset with in the safe haven. De-identified dataset within Safe haven

74 Challenges Software, various packages with different pricing mechanisms. Can we negotiate once for ADRC and Farr UK wide? Being clear for researchers on role of ADS and edris Different funding and charging models ADRC and Farr Scotland

75 Thank you for listening Stephen Pavis

76 CLIMB Simon Thompson Research Computing Team University of Birmingham

77 CLIMB Project Funded by Medical Research Council (MRC) Four partner Universities Birmingham Cardiff Swansea Warwick ~ 8m (~$13M) grant Private cloud, running 1000 VMs over 4 sites For Microbial bioinformatics

78 The CLIMB Consortium Professor Mark Pallen (Warwick) and Dr Sam Sheppard (Swansea) Joint PIs Professor Mark Achtman (Warwick), Professor Steve Busby FRS (Birmingham), Dr Tom Connor (Cardiff)*, Professor Tim Walsh (Cardiff), Dr Robin Howe (Public Health Wales) Co-Is Dr Nick Loman (Birmingham)* and Dr Chris Quince (Warwick) ; MRC Research Fellows * Principal bioinformaticians architecting and designing the system

79 And Marius Bakke (University of Warwick, CLIMB) Since January 2015 Simon Thompson (University of Birmingham) Matthew Ismail (University of Warwick) Simon Thompson (Swansea University)

80 CLIMB Separate OpenStack region per site Federated single gateway to access Local GPFS high performance ~0.5PB per site CEPH storage cluster replicated across sites For archive of VMs Between 2-5PB total usable over 4 sites

81 Where are we? - OpenStack Birmingham kit delivered for OpenStack Proof of concept running (with real users) Cardiff, Swansea and Warwick awaiting deployment with OCF (NSSA mini tender) Collaborating with IBM GPFS development team on OpenStack issues

82 Where are we? - CEPH Mini tender under NSSA, awarded to Dell CEPH cluster orders placed with Dell Inktank/RedHat engaged to provide architecture and services assistance

83 What is emedlab? Jacky Pallas, UCL David Fergusson, Crick

84 emedlab is Joint project with 6 institutions UCL, QMUL, LSHTM, Crick, Sanger, EBI Clinical, imaging and genomics data Cancer, cardiovascular and rare diseases Linked to KCL, Farr London and Genomics England Shared infrastructure in off site datacentre Minimum 9,000 cores and 4Pb data Colocation costs, networking

85 What is emedlab?

86 Benefits Data/compute architecture designed for medical bioinformatics Shared expertise and training 4 junior group leaders funded Farr/eMedLab Training Academy

87 Biomedical compute requirement Bags of memory Not so much about compute power, Lots of low power cores for through-put More storage, MORE, MORE! Not just storage volume data complexity, heterogeneity

88 Data First Design? compute STORAGE

89 Logical Architecture for emedlab

90 Technology Highlights X86 (6000 cores) High capacity 40Gb Mellanox networking Chubby nodes ~ 500Gb RAM per node Open Stack/Enterprise Red Hat GPFS storage (9Pb raw)

91 irods (digression) Data management is critical but enforcing systems in research is difficult irods (Integrated Rules Oriented Data System) DICE team, UNC, San Diego Federated system, different zones, administrative domains Project Workflows, Micro Services (rules/policies) triggered by specific events to implement workflows Each group can implement workflows to suit their needs Federated instances for large data management Wide area instances have been implemented

92 Shared Co-location JANet framework any research organization can contract with supplier without full OJEU process. Anchor tenants: UCL, Kings, LSE, QMUL, Crick, Sanger Interested: Bristol, Cancer Research Institute, Imperial Genomics England? Physically co-locating large data sets to allow secure shared computation across them.

93 Offsite Data Centre Community Cloud Model LRI LRI UCL NIMR Clinical data sharing private networks through lightpaths? Others The Crick King s King s College College SANGER IMPERIAL Others UK JANet pilot projects expected this year. ELIXIR/CSC (Finland) have come to the same technical solutions independently. Hope to collaborate between UK and Finland to extend the connections.

94 Collaborative Space Life Science Hub emedlab & beyond (?) Promote Skills Development (Systems, Informatics) Prototyping and deploying standards across multiple entities (Global Alliance) Promotes collaboration (both at IT and Informatics levels faster development, less duplication of effort de-facto standards) Produce real world infrastructure tools (production use across collaborating partners) Provide Sandboxes (testing development) Attractive to Industry partners (hardware evaluations, new technology deployment) Prototype public cloud techniques in private setting (safe environment) Safe Haven for sensitive data that should not move to public cloud Provide easier access to larger data sets. Pooled resources maximise Capital investment benefits for small and large user

95 WHO? MRC Medical Informatics Project UK MED-BIO: aggregation, integration, visualisation and analysis of large, complex data Dr Sarah Butcher Head Bioinformatics Support Service Applicant: Prof. Paul Elliott Co-Is Nicholson, Glen, Guo Partner Institutions: Imperial Institute of Cancer Research (ICR, Ashworth) European Bioinformatics Institute (EMBL-EBI, Steinbeck) Centre for the Improvement of Population Health through E-health Research (CIPHER, Lyons) MRC Clinical Sciences Centre (CSC, Petretto) MRC Human Nutrition Research (MRC-HNR, Griffin). Industrial partners: Waters Corp. Bruker Biospin Huawei Technologies Co. Ltd. Thomson Reuter Astra Zeneca Award later than others April 2014 BUT same deadlines Science case the Exposome Data The exposome Concept Strategy for knowledge generation by UK MED-BIO Main primary data volume producer is Phenome Centre = metabolomics Also: NGS (exomes, genomes, targetted) Proteomics (mass spec) Transcriptomics and methylation-based Gut metagenomics and meta-transcriptomics Genome wide association studies So need to support primary data analyses AND Integration and intelligent data-mining of large, heterogeneous, high dimensional datasets (from all of above) 1

96 Metabolomics Data Pre-grant Starting Point - Storage Abundance m/z A single UPLC-MS profile ~8 GB Maximum annual throughput is 50k samples ~ 2 PB of data Intermediate data modelling will inflate this further Raw data copied straight to archive, maybe re-use twice in 5 years for methods validation De-noising can shave 15-40% of data sizes Peak picking will extract ~ 1MB of data from each profile Proprietary formats rife open formats possible but tend to compress less No central storage and limited back-up and archiving for research data and not linked directly to HPC centre Phenome centre has own limited storage capacity (250TB) and managed backups Phenome centre projected to need multiple petabytes raw data archive Bioinformatics service underpins some groups but limited (old, full) storage (~200TB), back-up Several crucial data management solutions in different places e.g. Phenome centre LIMS server, IC Healthcare Tissue Bank Database Very little physical data centre space on one College site only Pressing need for a centralised tiered storage system with archiving Pre-grant Starting Point - Compute Challenges Heterogeneous job profiles Heavy use of cluster and cache-coherent memory systems in piecemeal way Sequence-based analyses mainly on bioinformatics servers (max. 128GB RAM per server) Windows desktops for some non-scaling analyses No shared: compute environment, software stack, job scheduling or storage between all groups Already significant compute bottleneck for large jobs - numbers processors but particularly jobs requiring large RAM Some jobs already requiring >1 TB RAM for extended periods and getting larger Requirement for sand-boxed development environments Requirement to centrally host non-hpc services Make system fit for purpose when purpose will change over project lifetime Big unknowns in user requirements new groups, new fellowships, emerging technologies, software, methods, partners Heterogeneous user profiles Emerging codebase e.g. metabolomics feature extraction currently running on commercial windows software, moving towards open source solutions on cluster (or even GPU eventually) Matlab / R code being ported to C++ Little central infrastructure to build on Limited central knowledgebase for parallel file systems, irods etc. TRANSmart & etriks integration not specifically funded 2

97 Location, Location, Location South Kensington data centre Cluster nodes SGI cache-coherent memory nodes Tiered storage Tape archive Video wall, touch overlay for meeting centre Tiered storage duplication site Tape archive duplication site High memory servers System Summary Cluster nodes - PowerEdge C6000/ C6220 Xeon E5-2660v2 2.2GHz total 3040 cores already High memory servers - PowerEdge R920 7 with 1TB RAM each, 40 cores, 16TB fast internal storage, 20TB local array and Infiniband to tier1 Cache-coherent memory nodes SGI UV cores, 8 TB RAM, 350TB usable locally attached scratch tiered storage from DDN on each of 2 sites: 350 TB useable tier 1 GPFS 2 petabyte tier 2 WOS TSM tape archive on SpectraT950 (2 petabyte LTO6 capacity) Asynchronous replication between sites layout Where Are We Now? Unpacking, racking, installing In use 3

98 Challenges/questions Operations Group All hardware set-up Existing data transferred, tiering rules configured Establish standardised software environment for compute Data flow established User grouping established Data flow outwards with partners THEN irods?? (have test setup to configure) Data Sharing environment? Interaction with Patient data systems TransSMART/eTRIKS? BUSINESS MODEL Full time sysadmin - TBC - being recruited Bioinformatician/ data manager - James Abbott (Bioinformatics Support Service) + TBC Sarah Butcher (ops chair) Bioinformatics Support Service Steve Lawlor ICT Data Centre Manager Simon Burbidge ICT HPC Manager Jake Pearce NINR/MRC Phenome Centre Data Manager 4

99 UVRI/MRC Medical Informatics Centre (UMIC) PIs Pontiano Kaleebu (MRC Uganda) Manj Sandhu (Sanger) Budget: ~ 2.9m funded by MRC ~ 900k capital equipment ~ 2m resource budget (staff, network connectivity, ) Capital spend all committed as of 12/2014 Physical infrastructure ( 280k) Host building funded by Wellcome Trust ( 0) Existing DR building ( 0) Contributions to electrical upgrade onsite ( 60k) Data Centre and DR upgrades ( 220k) IT equipment ( 620k)

100 UMIC Location Ugandan Virus Research Institute (UVRI) Campus Entebbe, Uganda

101 UMIC Physical Infrastructure Offices: 30m 2 Data Centre: 32m 2

102 UMIC Compute & Storage Compute equipment 4x HP BLc7000 blade enclosures Main compute resource 512 cores each (AMD CPUs) 4TB RAM each (8GB/core) 2x 10 GbE per enclosure 4x HP DL380p servers Virtual machine hosts (infrastructure) 20 cores each 256GB RAM each Storage equipment 2x high speed scratch storage filesystems Intel Enterprise Edition Lustre 2 MDT/MGS servers (HA) 4 OSS servers each 256TB usable on each filesystem 2x long-term reliable (aka slow ) storage HP SL4540 tray node servers 348TB replicated across two servers (one in DR building)

103 UMIC Networking Network equipment Juniper MX104 router HA pair Juniper SRX3400 firewall HA pair 5x Juniper EX4300 1GbE switches 3x Juniper EX GbE switches 3x Aruba Instant 115 wireless access points Connectivity Google installing 2x (redundant) 1Gb fibre links Regional connectivity at up to 1Gbps via RENUnet Overseas connectivity initially at 10Mbps Resource spend will stay constant over time --> bandwidth will increase

104 Management & Personnel Technical Infrastructure Working Group Scientific Working Group Support staff Project Administrator (hired) Informatics staff 1x Senior Bioinformatician (recruitment ongoing) Technical staff 1x Systems Manager (hired; currently training in UK) 3x Other Systems posts (recruitment in 2015 Q2)

105 MRC Medical Bioinformatics Centre ESRC Consumer Data Research Centre Integrated Research Campus MRC Medical Bioinformatics Centre ESRC Consumer Data Research Centre Integrated Research Campus V1 David Golding, Tom Fleming January 2015 University of Leeds

106 MRC Medical Bioinformatics Centre ESRC Consumer Data Research Centre Integrated Research Campus Organisational design MRC Medical Bioinformatics Centre Joint Projects Board (LTHT & University) Leeds Institute of Data Analytics (LIDA) ESRC Consumer Data Research Centre Researchers Example specialisms: Clinical, Data Scientist, Statistician, Epidemiologist, Health Economist Researchers Example specialisms: Data Scientist, Geographer, Statistician Centre Director (MBC) IT Director Centre Director (CDRC) Centre Manager (MBC) IRC Lead Centre Manager (CDRC) Research Operations (MBC) Centre Operations Team for MBC IRC Development Manager IRC Developer (and steady state) Research Operations (CDRC) Centre Operations Team for CDRC Integrated Research Campus (IRC) Team Head of Service Management Service Support team HPC team Servers and Storage Team Networking Team Datacentre Team Desktop Support Team Desktop Development Team Security Team University IT Operations Teams University of Leeds

107 MRC Medical Bioinformatics Centre ESRC Consumer Data Research Centre Integrated Research Campus Service design Research services Centre / IRC staff Technology services IT Operations staff Centre Operations Teams UoL IT Operations Data administration Data profiling Data linkage Centre Manager (Reports to Principle Investigators) Physical desktops build Desktop Development Team Data cleaning Data analysis Job obfuscation Security Data aggregation / abstraction Audit Research Operations Virtual desktops Applications support Desktop Support Team Operating systems IRC Data Services Team Data transfer management IRC Development Manager (Reports to UoL IT Head of Development) Virtual servers (e.g. SQL, Achiever) Application & environment management IRC Developer Applications Operating systems Logical storage management Storage areas and access controls for research groups Storage areas and access controls for data administration services Storage areas and access controls for data deposit / gateway Virtual desktop platform Virtualisation hypervisor Physical servers Servers and Storage Team Physical storage High Performance Computing Applications on HPC HPC Team Operating systems for HPC Storage for HPC Network Networking Team Power and cooling Racks Datacentre Team University of Leeds

108 MRC Medical Bioinformatics Centre ESRC Consumer Data Research Centre Integrated Research Campus Platform design Researchers / data scientists outside physical Centre (Worsley L11) Integrated Research Campus: Research centres IRC Data Services Team Researchers / data scientists inside physical Centre (Worsley L11) External gateway zone Data Services Zone VDI Control Service Research zone VDI Control Service Data Services Working Area Research Working Area Deposit Gateway Data Services servers Data Services virtual desktops Research-group servers Researcher virtual desktops Data deposited under Data Transfer Agreements; projects have ethical and governance approval Deposit gateway Database servers Working areas Risk-profiling tools Analysis tools Database servers Shared-licence apps Working areas Statistics tools Analysis tools A B Linking tools Research group sharing A B Collaboration tools Data providers Security Audit Data profiling Landing areas A B Data profiling Data cleaning Data linkage Data analysis Internal release of datasets for risk-profiling and linking Data profiling Data cleaning Data linkage Data analysis Data visualisation Security Data exploration Audit Internal release of working copies for daily working (check-out, check-in) Data analysis Data visualisation Data cleaning Data exploration Internal release HPC job obfuscation Data aggregation / abstraction HPC job obfuscation Data aggregation / abstraction Security Audit of datasets arrived from provider Security Audit Security Audit Publishing Ggateway Data Services Store Research Working Store Publishing gateway Data Services Store Manager Internal release of linked, risk-profiled datasets Working Store Manager Data consumers External release of authorised, risk-profiled datasets Security Audit Publishing area Internal release of published datasets after risk profiling / Data controller Code controller Security Candidates for publishing: internal release for risk profiling Data controller Code controller Security Data sharing A B assurance / etc Audit Audit Internal release of Storage areas Version control Storage areas Version control obfuscated / anonymous data for analysis and then storage of output A B A B Provided datasets Master linking table Published datasets Linked, riskprofiled data sets Working copies Outputs Internal gateway zone Analysis Gateway System administration zone IRC Administration Security Analysis gateway Virtual desktop administration Audit monitoring and alerting Audit Identity Security System update management System monitoring and alerting management External release of obfuscated / anonymous data for analysis and then storage of Analysis transit area A B Audit Virus scan management Application monitoring and alerting System update services Directory service output High Performance Compute LICAP A home B Data analysis MBC HPC (Cluster) nobackup scratch BCGene Data analysis IRC Data Services Team IT Operations Servers & Storage Team Corporate system Corporate directory update services Identity Security management University core systems MBC HPC (SGI UV2) Data analysis IT Operations HPC team scratch Shared network and deployment hosts? Farr HPC (SGI UV2) Data analysis HMR only scratch University of Leeds

Keystones for supporting collaborative research using multiple data sets in the medical and bio-sciences

Keystones for supporting collaborative research using multiple data sets in the medical and bio-sciences Keystones for supporting collaborative research using multiple data sets in the medical and bio-sciences David Fergusson Head of Scientific Computing The Francis Crick Institute The Francis Crick Institute

More information

Big Data for health. Farr Institute, Administrative Data Research Centres, Medical Bioinformatics. 9 July 2015. Jacky Pallas, UCL

Big Data for health. Farr Institute, Administrative Data Research Centres, Medical Bioinformatics. 9 July 2015. Jacky Pallas, UCL Big Data for health Farr Institute, Administrative Data Research Centres, Medical Bioinformatics 9 July 2015 Jacky Pallas, UCL Overview UK Research Council funding for big data in health, medical and administrative

More information

Secure networking and AAAI

Secure networking and AAAI Secure networking and AAAI Farr Institute, Administrative Data Research Centres, Medical Bioinformatics 22 th May 2014 Overview Research Council funding for big data in health, medical and administrative

More information

Project Assured Data Access. Henry Hughes

Project Assured Data Access. Henry Hughes Project Assured Data Access Henry Hughes Background Significant investment in medical research specifically in trying to unlock the value of the data collected by the government and NHS in order to further

More information

Data Appliance Sailing to Data Islands

Data Appliance Sailing to Data Islands Data Appliance Sailing to Data Islands By Simon Ellwood-Thompson Chief Technical Officer: SAIL DataBank &Health Informatics Research Unit, Swansea University SAIL Databank Swansea, WALES WALES most beautiful

More information

Farr Institute of Health Informatics Research Harnessing Data for Health Science and e-health Innovation

Farr Institute of Health Informatics Research Harnessing Data for Health Science and e-health Innovation Farr Institute of Health Informatics Research Harnessing Data for Health Science and e-health Innovation Georgina Evans Industry Engagement Manager Farr Institute Who is Farr? Diseases are more easily

More information

Data platforms to support research, evaluation & practice. David V Ford Professor of Health Informatics School of Medicine, Swansea University

Data platforms to support research, evaluation & practice. David V Ford Professor of Health Informatics School of Medicine, Swansea University Data platforms to support research, evaluation & practice David V Ford Professor of Health Informatics School of Medicine, Swansea University Outline 1. Swift overview of SAIL Databank as used in Wales

More information

The 100,000 genomes project

The 100,000 genomes project The 100,000 genomes project Tim Hubbard @timjph Genomics England King s College London, King s Health Partners Wellcome Trust Sanger Institute ClinGen / Decipher Washington DC, 26 th May 2015 The 100,000

More information

Dundee e-health Centre of Excellence

Dundee e-health Centre of Excellence Dundee e-health Centre of Excellence 1 Arthritis Research UK, British Heart Foundation Cancer Research UK Economic and Social Research Council Engineering and Physical Sciences Research Council National

More information

Life Sciences and Large Data Challenges

Life Sciences and Large Data Challenges Life Sciences and Large Data Challenges David Fergusson Head of Scientific Computing The Francis Crick Institute WHAT IS THE CRICK? The Francis Crick Institute Sir Paul Nurse Nobel Prize with Hartwell

More information

Automated and Scalable Data Management System for Genome Sequencing Data

Automated and Scalable Data Management System for Genome Sequencing Data Automated and Scalable Data Management System for Genome Sequencing Data Michael Mueller NIHR Imperial BRC Informatics Facility Faculty of Medicine Hammersmith Hospital Campus Continuously falling costs

More information

CONSUMER DATA RESEARCH CENTRE DATA SERVICE USER GUIDE. Version: August 2015

CONSUMER DATA RESEARCH CENTRE DATA SERVICE USER GUIDE. Version: August 2015 CONSUMER DATA RESEARCH CENTRE DATA SERVICE USER GUIDE Version: August 2015 Introduction The Consumer Data Research Centre (CDRC or Centre) is an academic led, multi-institution laboratory which discovers,

More information

Informatics: Opportunities & Applications. Professor Colin McCowan Robertson Centre for Biostatistics and Glasgow Clinical Trials Unit

Informatics: Opportunities & Applications. Professor Colin McCowan Robertson Centre for Biostatistics and Glasgow Clinical Trials Unit Informatics: Opportunities & Applications Professor Colin McCowan Robertson Centre for Biostatistics and Glasgow Clinical Trials Unit Routine Data "routine data" is data collected as byproducts of clinical

More information

Integrated Rule-based Data Management System for Genome Sequencing Data

Integrated Rule-based Data Management System for Genome Sequencing Data Integrated Rule-based Data Management System for Genome Sequencing Data A Research Data Management (RDM) Green Shoots Pilots Project Report by Michael Mueller, Simon Burbidge, Steven Lawlor and Jorge Ferrer

More information

Advanced research computing

Advanced research computing Advanced research computing HPC in UK health research Alan Real, University of Leeds Outline A bit about me Regional & national activities Healthcare in UK Reality check Ambition Recent initiatives Farr

More information

Joint School Computing Service (JSCS)

Joint School Computing Service (JSCS) Joint School Computing Service (JSCS) Requirements and Design Workshops: Scientific Computing School of Biological Science & School of Clinical Medicine Today s Agenda Project background Overview of related

More information

University of Birmingham & CLIMB GPFS User Experience. Simon Thompson Research Support, IT Services University of Birmingham

University of Birmingham & CLIMB GPFS User Experience. Simon Thompson Research Support, IT Services University of Birmingham University of Birmingham & CLIMB GPFS User Experience Simon Thompson Research Support, IT Services University of Birmingham University of Birmingham Research intensive University ~19000 Undergraduate Students

More information

Big Data in BioMedical Sciences. Steven Newhouse, Head of Technical Services, EMBL-EBI

Big Data in BioMedical Sciences. Steven Newhouse, Head of Technical Services, EMBL-EBI Big Data in BioMedical Sciences Steven Newhouse, Head of Technical Services, EMBL-EBI Big Data for BioMedical Sciences EMBL-EBI: What we do and why? Challenges & Opportunities Infrastructure Requirements

More information

Cloud Sure - Virtual Machines

Cloud Sure - Virtual Machines Cloud Sure - Virtual Machines Maximize your IT network The use of Virtualization is an area where Cloud Computing really does come into its own and arguably one of the most exciting directions in the IT

More information

Private cloud computing advances

Private cloud computing advances Building robust private cloud services infrastructures By Brian Gautreau and Gong Wang Private clouds optimize utilization and management of IT resources to heighten availability. Microsoft Private Cloud

More information

ESRC Big Data Network Phase 2: Business and Local Government Data Research Centres Welcome, Context, and Call Objectives

ESRC Big Data Network Phase 2: Business and Local Government Data Research Centres Welcome, Context, and Call Objectives ESRC Big Data Network Phase 2: Business and Local Government Data Research Centres Welcome, Context, and Call Objectives Dr Fiona Armstrong, Professor Peter Elias, Dr Paul Meller Today s event This event

More information

Virtual Server and Storage Provisioning Service. Service Description

Virtual Server and Storage Provisioning Service. Service Description RAID Virtual Server and Storage Provisioning Service Service Description November 28, 2008 Computer Services Page 1 TABLE OF CONTENTS INTRODUCTION... 4 VIRTUAL SERVER AND STORAGE PROVISIONING SERVICE OVERVIEW...

More information

Individual Referencing Systems. Anthea Springbett Programme Principal SHIS-R NHS Information Services Division

Individual Referencing Systems. Anthea Springbett Programme Principal SHIS-R NHS Information Services Division Individual Referencing Systems Anthea Springbett Programme Principal SHIS-R NHS Information Services Division Structure of workshop Introduction Referencing and linkage ISD linkage CHI Practical application

More information

DATA SCIENCE @ ED EDINBURGH DATA SCIENCE AND MANAGING NATIONAL DATA SERVICES AT EDINBURGH PROF MARK PARSONS

DATA SCIENCE @ ED EDINBURGH DATA SCIENCE AND MANAGING NATIONAL DATA SERVICES AT EDINBURGH PROF MARK PARSONS Data Science @ Edinburgh 1 DATA SCIENCE @ ED EDINBURGH DATA SCIENCE AND MANAGING NATIONAL DATA SERVICES AT EDINBURGH PROF MARK PARSONS EPCC Executive Director Associate Dean for e-research Data Science

More information

BRISSkit: Biomedical Research Infrastructure Software Service kit

BRISSkit: Biomedical Research Infrastructure Software Service kit BRISSkit: Biomedical Research Infrastructure Software Service kit http://www.le.ac.uk/brisskit Jonathan Tedds University of Leicester jat26@le.ac.uk BRISSkit 12/10/11 BRISSkit: Biomedical Research Infrastructure

More information

CEDA Storage. Dr Matt Pritchard. Centre for Environmental Data Archival (CEDA) www.ceda.ac.uk

CEDA Storage. Dr Matt Pritchard. Centre for Environmental Data Archival (CEDA) www.ceda.ac.uk CEDA Storage Dr Matt Pritchard Centre for Environmental Data Archival (CEDA) www.ceda.ac.uk How we store our data NAS Technology Backup JASMIN/CEMS CEDA Storage Data stored as files on disk. Data is migrated

More information

Service Description CloudSure Public, Private & Hybrid Cloud

Service Description CloudSure Public, Private & Hybrid Cloud Service Description CloudSure Public, Private & Hybrid Cloud Table of Contents Overview - CloudSure... 3 CloudSure Benefits... 3 CloudSure Features... 3 Technical Features... 4 Cloud Control... 4 Storage...

More information

BRISSkit: Biomedical Research Infrastructure Software Service kit http://www.le.ac.uk/brisskit Jonathan Tedds

BRISSkit: Biomedical Research Infrastructure Software Service kit http://www.le.ac.uk/brisskit Jonathan Tedds BRISSkit: Biomedical Research Infrastructure Software Service kit http://www.le.ac.uk/brisskit Jonathan Tedds jat26@le.ac.uk Nick Holden (LCBRU BRICCS) Malcolm Newbury (GuildFoss) JISC University Modernisation

More information

Virtual Data Centre Public Cloud Simplicity Private Cloud Security

Virtual Data Centre Public Cloud Simplicity Private Cloud Security Virtual Data Centre Public Cloud Simplicity Private Cloud Security www.interoute.com Interoute Virtual Data Centre Virtual Data Centre (VDC) is Interoute s Enterprise class Infrastructure as a Service

More information

Digital Health: Catapulting Personalised Medicine Forward STRATIFIED MEDICINE

Digital Health: Catapulting Personalised Medicine Forward STRATIFIED MEDICINE Digital Health: Catapulting Personalised Medicine Forward STRATIFIED MEDICINE CRUK Stratified Medicine Initiative Somatic mutation testing for prediction of treatment response in patients with solid tumours:

More information

ECDF Infrastructure Refresh - Requirements Consultation Document

ECDF Infrastructure Refresh - Requirements Consultation Document Edinburgh Compute & Data Facility - December 2014 ECDF Infrastructure Refresh - Requirements Consultation Document Introduction In order to sustain the University s central research data and computing

More information

ACME Enterprises IT Infrastructure Assessment

ACME Enterprises IT Infrastructure Assessment Prepared for ACME Enterprises March 25, 2014 Table of Contents Executive Summary...2 Introduction...2 Background...2 Scope of IT Assessment...2 Findings...2 Detailed Findings for Key Areas Reviewed...3

More information

Personalized Medicine and IT

Personalized Medicine and IT Personalized Medicine and IT Data-driven Medicine in the Age of Genomics www.intel.com/healthcare/bigdata Ketan Paranjape General Manager, Life Sciences Intel Corp. @Portlandketan 1 The Central Dogma of

More information

Big Data for Population Health

Big Data for Population Health Big Data for Population Health Prof Martin Landray Nuffield Department of Population Health Deputy Director, Big Data Institute, Li Ka Shing Centre for Health Information and Discovery University of Oxford

More information

SURFsara Data Services

SURFsara Data Services SURFsara Data Services SUPPORTING DATA-INTENSIVE SCIENCES Mark van de Sanden The world of the many Many different users (well organised (international) user communities, research groups, universities,

More information

Data management challenges in todays Healthcare and Life Sciences ecosystems

Data management challenges in todays Healthcare and Life Sciences ecosystems Data management challenges in todays Healthcare and Life Sciences ecosystems Jose L. Alvarez Principal Engineer, WW Director Life Sciences jose.alvarez@seagate.com Evolution of Data Sets in Healthcare

More information

Research Data Understanding your choice for data placement

Research Data Understanding your choice for data placement Research Data Understanding your choice for data placement V1.4 Draft Aslam Ghumra 30 October 2014 Executive Summary This document is an overview of the technical infrastructure and supporting processes

More information

Virtualisation Cloud Computing at the RAL Tier 1. Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013

Virtualisation Cloud Computing at the RAL Tier 1. Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013 Virtualisation Cloud Computing at the RAL Tier 1 Ian Collier STFC RAL Tier 1 HEPiX, Bologna, 18 th April 2013 Virtualisation @ RAL Context at RAL Hyper-V Services Platform Scientific Computing Department

More information

Object storage in Cloud Computing and Embedded Processing

Object storage in Cloud Computing and Embedded Processing Object storage in Cloud Computing and Embedded Processing Jan Jitze Krol Systems Engineer DDN We Accelerate Information Insight DDN is a Leader in Massively Scalable Platforms and Solutions for Big Data

More information

HP Software Defined Networking - Eugene Berger, Chief Technologist, HP Enterprise Group

HP Software Defined Networking - Eugene Berger, Chief Technologist, HP Enterprise Group Draft Minutes of the Meeting of the South Eastern Region Janet User Group held 15 October 2014 Woburn House, 20 Tavistock Square, London. WC1H 9HQ Attendees Name Affiliation E-mail Ashley Culver University

More information

Steven Newhouse, Head of Technical Services

Steven Newhouse, Head of Technical Services Challenges at EMBL-EBI Steven Newhouse, Head of Technical Services European Bioinformatics Institute Outstation of the European Molecular Biology Laboratory International organisation created by treaty

More information

Big Data in BioMedical Sciences. Steven Newhouse, Head of Technical Services, EMBL-EBI

Big Data in BioMedical Sciences. Steven Newhouse, Head of Technical Services, EMBL-EBI Big Data in BioMedical Sciences Steven Newhouse, Head of Technical Services, EMBL-EBI Big Data for BioMedical Sciences EMBL-EBI: What we do and why? Challenges & Opportunities Infrastructure Requirements

More information

CloudSure Managed IaaS

CloudSure Managed IaaS CloudSure Managed IaaS Contents Contents... 1 Overview - CloudSure... 3 CloudSure Benefits... 3 CloudSure Features... 3 Technical Features... 4 Cloud Control... 4 Storage... 4 Data Location and Integrity...

More information

Translational research facilitating experimental medicine in dementia in the UK

Translational research facilitating experimental medicine in dementia in the UK Translational research facilitating experimental medicine in dementia in the UK Simon Lovestone Director Biomedical Research Unit for dementia at Maudsley and King s Route Map for Dementia Research June

More information

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training

More information

REDCENTRIC INFRASTRUCTURE AS A SERVICE SERVICE DEFINITION

REDCENTRIC INFRASTRUCTURE AS A SERVICE SERVICE DEFINITION REDCENTRIC INFRASTRUCTURE AS A SERVICE SERVICE DEFINITION SD021 V2.2 Issue Date 01 July 2014 1) OVERVIEW Redcentric s Infrastructure as a Service (IaaS) enables the to consume server, storage and network

More information

Building Storage Service in a Private Cloud

Building Storage Service in a Private Cloud Building Storage Service in a Private Cloud Sateesh Potturu & Deepak Vasudevan Wipro Technologies Abstract Storage in a private cloud is the storage that sits within a particular enterprise security domain

More information

Big Data and the social sciences a perspective from the ESRC. Peter Elias

Big Data and the social sciences a perspective from the ESRC. Peter Elias Big Data and the social sciences a perspective from the ESRC Peter Elias What do we mean by Big data Electronic data generated from research infrastructures (e.g. astronomy, particle physics, micro-biology,

More information

Research IT Application Development

Research IT Application Development Research IT Clinical IS OR Services Offered to Research Faculty & Staff: - Application development - Data storage servers - Epic Data extraction (I2B2) - Licensed Software for Researchers - General technology

More information

Wrangler: A New Generation of Data-intensive Supercomputing. Christopher Jordan, Siva Kulasekaran, Niall Gaffney

Wrangler: A New Generation of Data-intensive Supercomputing. Christopher Jordan, Siva Kulasekaran, Niall Gaffney Wrangler: A New Generation of Data-intensive Supercomputing Christopher Jordan, Siva Kulasekaran, Niall Gaffney Project Partners Academic partners: TACC Primary system design, deployment, and operations

More information

Driving ehealth Innovation: The Economic Development Model by David V Ford. Med-e-Tel Luxembourg, 6 8 April 2011

Driving ehealth Innovation: The Economic Development Model by David V Ford. Med-e-Tel Luxembourg, 6 8 April 2011 Driving ehealth Innovation: The Economic Development Model by David V Ford Med-e-Tel Luxembourg, 6 8 April 2011 Contents 1. The context 2. The opportunity 3. Feasibility assessment 4. Our solution ehealth

More information

Mit Soft- & Hardware zum Erfolg. Giuseppe Paletta

Mit Soft- & Hardware zum Erfolg. Giuseppe Paletta Mit Soft- & Hardware zum Erfolg IT-Transformation VCE Converged and Hyperconverged Infrastructure VCE VxRack EMC VSPEX Blue IT-Transformation IT has changed dramatically in last past years The requirements

More information

Cloud Security: An Independent Assessent

Cloud Security: An Independent Assessent Cloud Security: An Independent Assessent A Quantix White Paper Dec 2010 Call us on: 0115 983 6200 Visit us on-line at: www.quantix-uk.com E-mail us at : enquiries@quantix-uk.com Why are people concerned

More information

G Cloud 4 Service Definition Document: CDG Common Digital Platform

G Cloud 4 Service Definition Document: CDG Common Digital Platform G Cloud 4 Service Definition Document: CDG Common Digital Platform Table of Contents 1.0 Document Introduction... 3 2.0 Service Definition: CDG Common Digital Platform... 3 2.1 Benefits of the Common Digital

More information

The Greenplum Analytics Workbench

The Greenplum Analytics Workbench The Greenplum Analytics Workbench External Overview 1 The Greenplum Analytics Workbench Definition Is a 1000-node Hadoop Cluster. Pre-configured with publicly available data sets. Contains the entire Hadoop

More information

ICT Services for the Charity Sector

ICT Services for the Charity Sector ICT Services for the Charity Sector Charity Management System JADe is an IT consultancy providing high quality, cost-effective IT services to the charity sector. JADe specialises in understanding an organisation

More information

irods in complying with Public Research Policy

irods in complying with Public Research Policy irods User Group 2015 irods in complying with Public Research Policy Vic Cornell Senior Storage Consultant Overview Compliance overview UK examples Imperial College MedBio Requirements Architecture irods

More information

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya

Oracle Database - Engineered for Innovation. Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya Oracle Database - Engineered for Innovation Sedat Zencirci Teknoloji Satış Danışmanlığı Direktörü Türkiye ve Orta Asya Oracle Database 11g Release 2 Shipping since September 2009 11.2.0.3 Patch Set now

More information

Diploma in Information Technology Network Intergration Specialist COURSE INFORMATION PACK

Diploma in Information Technology Network Intergration Specialist COURSE INFORMATION PACK Diploma in Information Technology COURSE INFORMATION PACK REGISTRATION AND ACCREDITATION Prestige Academy (Pty) Ltd is a widely recognized and credible institution. Prestige Academy is registered with

More information

Using Electronic Health Records to Support Patient Empowerment. Mike Denis CIO, South London and Maudsley NHS Foundation Trust

Using Electronic Health Records to Support Patient Empowerment. Mike Denis CIO, South London and Maudsley NHS Foundation Trust Using Electronic Health Records to Support Patient Empowerment Mike Denis CIO, South London and Maudsley NHS Foundation Trust History Bethlem Royal Hospital Founded in 1247 Oldest psychiatric institution

More information

Optimised Managed IT Services, Hosting and Infrastructure. Keep your business running at peak performance

Optimised Managed IT Services, Hosting and Infrastructure. Keep your business running at peak performance Optimised Managed IT Services, Hosting and Infrastructure Keep your business running at peak performance Managed Services from your Trusted Advisor Looking for comprehensive, expert support? As a trusted

More information

Hyperscale Use Cases for Scaling Out with Flash. David Olszewski

Hyperscale Use Cases for Scaling Out with Flash. David Olszewski Hyperscale Use Cases for Scaling Out with Flash David Olszewski Business challenges Performanc e Requireme nts Storage Budget Balance the IT requirements How can you get the best of both worlds? SLA Optimized

More information

Diploma in Information Technology Network Integration Specialist COURSE INFO PACK

Diploma in Information Technology Network Integration Specialist COURSE INFO PACK Registered as a Private Higher Education Institution with the Department of Higher Education and Training in South Africa under the Higher Education Act 1997 Registration Nr. 2001/HE07/005 Diploma in Network

More information

Technical. Overview. ~ a ~ irods version 4.x

Technical. Overview. ~ a ~ irods version 4.x Technical Overview ~ a ~ irods version 4.x The integrated Ru e-oriented DATA System irods is open-source, data management software that lets users: access, manage, and share data across any type or number

More information

Digital Pathways. Harlow Enterprise Hub, Edinburgh Way, Harlow CM20 2NQ. 0844 586 0040 intouch@digitalpathways.co.uk www.digpath.co.

Digital Pathways. Harlow Enterprise Hub, Edinburgh Way, Harlow CM20 2NQ. 0844 586 0040 intouch@digitalpathways.co.uk www.digpath.co. Harlow Enterprise Hub, Edinburgh Way, Harlow CM20 2NQ 0844 586 0040 intouch@digitalpathways.co.uk Security Services Menu has a full range of Security Services, some of which are also offered as a fully

More information

Experience of Data Transfer to the Tier-1 from a DIRAC Perspective

Experience of Data Transfer to the Tier-1 from a DIRAC Perspective Experience of Data Transfer to the Tier-1 from a DIRAC Perspective Lydia Heck Institute for Computational Cosmology Manager of the DiRAC-2 Data Centric Facility COSMA 1 Talk layout Introduction to DiRAC?

More information

Globus and the Centralized Research Data Infrastructure at CU Boulder

Globus and the Centralized Research Data Infrastructure at CU Boulder Globus and the Centralized Research Data Infrastructure at CU Boulder Daniel Milroy, daniel.milroy@colorado.edu Conan Moore, conan.moore@colorado.edu Thomas Hauser, thomas.hauser@colorado.edu Peter Ruprecht,

More information

Cooling and thermal efficiently in

Cooling and thermal efficiently in Cooling and thermal efficiently in the datacentre George Brown HPC Systems Engineer Viglen Overview Viglen Overview Products and Technologies Looking forward Company Profile IT hardware manufacture, reseller

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Arkivum's Digital Archive Managed Service

Arkivum's Digital Archive Managed Service ArkivumLimited R21 Langley Park Way Chippenham Wiltshire SN15 1GE UK +44 1249 405060 info@arkivum.com @Arkivum arkivum.com Arkivum's Digital Archive Managed Service Service Description 1 / 13 Table of

More information

CVE-401/CVA-500 FastTrack

CVE-401/CVA-500 FastTrack CVE-401/CVA-500 FastTrack Description The CVE-400-1I Engineering a Citrix Virtualization Solution course teaches Citrix engineers how to plan for and perform the tasks necessary to successfully integrate

More information

A Cloud WHERE PHYSICAL ARE TOGETHER AT LAST

A Cloud WHERE PHYSICAL ARE TOGETHER AT LAST A Cloud WHERE PHYSICAL AND VIRTUAL STORAGE ARE TOGETHER AT LAST Not all Cloud solutions are the same so how do you know which one is right for your business now and in the future? NTT Communications ICT

More information

ArcGIS for Server: In the Cloud

ArcGIS for Server: In the Cloud DevSummit DC February 11, 2015 Washington, DC ArcGIS for Server: In the Cloud Bonnie Stayer, Esri Session Outline Cloud Overview - Benefits - Types of clouds ArcGIS in AWS - Cloud Builder - Maintenance

More information

ACANO SOLUTION VIRTUALIZED DEPLOYMENTS. White Paper. Simon Evans, Acano Chief Scientist

ACANO SOLUTION VIRTUALIZED DEPLOYMENTS. White Paper. Simon Evans, Acano Chief Scientist ACANO SOLUTION VIRTUALIZED DEPLOYMENTS White Paper Simon Evans, Acano Chief Scientist Updated April 2015 CONTENTS Introduction... 3 Host Requirements... 5 Sizing a VM... 6 Call Bridge VM... 7 Acano Edge

More information

The Impact of PaaS on Business Transformation

The Impact of PaaS on Business Transformation The Impact of PaaS on Business Transformation September 2014 Chris McCarthy Sr. Vice President Information Technology 1 Legacy Technology Silos Opportunities Business units Infrastructure Provisioning

More information

Scotland s Digital Future: Scottish Public Sector Data Centre Virtualisation Guidance

Scotland s Digital Future: Scottish Public Sector Data Centre Virtualisation Guidance Scotland s Digital Future: Scottish Public Sector Data Centre Virtualisation Guidance 1 Contents Introduction... 3 Who is the guidance aimed at?... 3 Wider strategic principles... 4 What is virtualisation?...

More information

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business

More information

Virtual Server Hosting Service Definition. SD021 v1.8 Issue Date 20 December 10

Virtual Server Hosting Service Definition. SD021 v1.8 Issue Date 20 December 10 Virtual Server Hosting Service Definition SD021 v1.8 Issue Date 20 December 10 10 Service Overview Virtual Server Hosting is InTechnology s hosted managed service for virtual servers. Our virtualisation

More information

Hosted SharePoint: Questions every provider should answer

Hosted SharePoint: Questions every provider should answer Hosted SharePoint: Questions every provider should answer Deciding to host your SharePoint environment in the Cloud is a game-changer for your company. The potential savings surrounding your time and money

More information

Ubuntu OpenStack on VMware vsphere: A reference architecture for deploying OpenStack while limiting changes to existing infrastructure

Ubuntu OpenStack on VMware vsphere: A reference architecture for deploying OpenStack while limiting changes to existing infrastructure TECHNICAL WHITE PAPER Ubuntu OpenStack on VMware vsphere: A reference architecture for deploying OpenStack while limiting changes to existing infrastructure A collaboration between Canonical and VMware

More information

Dell Reference Configuration for Hortonworks Data Platform

Dell Reference Configuration for Hortonworks Data Platform Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution

More information

Guardian365. Managed IT Support Services Suite

Guardian365. Managed IT Support Services Suite Guardian365 Managed IT Support Services Suite What will you get from us? Award Winning Team Deloitte Best Managed Company in 2015. Ranked in the Top 3 globally for Best Managed Service Desk by the Service

More information

SMB Direct for SQL Server and Private Cloud

SMB Direct for SQL Server and Private Cloud SMB Direct for SQL Server and Private Cloud Increased Performance, Higher Scalability and Extreme Resiliency June, 2014 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server

More information

NVIDIA GPUs in the Cloud

NVIDIA GPUs in the Cloud NVIDIA GPUs in the Cloud 4 EVOLVING CLOUD REQUIREMENTS On premises Off premises Hybrid Cloud Connecting clouds New workloads Components to disrupt 5 GLOBAL CLOUD PLATFORM Unified architecture enabled by

More information

Microsoft Analytics Platform System. Solution Brief

Microsoft Analytics Platform System. Solution Brief Microsoft Analytics Platform System Solution Brief Contents 4 Introduction 4 Microsoft Analytics Platform System 5 Enterprise-ready Big Data 7 Next-generation performance at scale 10 Engineered for optimal

More information

The Research Capability Programme. Peter Knight, Group Programme Director

The Research Capability Programme. Peter Knight, Group Programme Director The Research Capability Programme Peter Knight, Group Programme Director 11/03/2010 RESEARCH FOR PATIENT BENEFIT WORKING PARTY FINAL REPORT For us, science and research constitute a front-line service,

More information

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar Computational infrastructure for NGS data analysis José Carbonell Caballero Pablo Escobar Computational infrastructure for NGS Cluster definition: A computer cluster is a group of linked computers, working

More information

The cross-disciplinary Roots of the British collaboration between scholars in humanities and

The cross-disciplinary Roots of the British collaboration between scholars in humanities and HALOGEN RESEARCH DATA MANAGEMENT BENEFITS CASE STUDY 1. BACKGROUND The cross-disciplinary Roots of the British collaboration between scholars in humanities and genetics at the University of Leicester (Wellcome

More information

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads 89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com @EdisonGroupInc 212.367.7400 IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads A Competitive Test and Evaluation Report

More information

Symantec Endpoint Protection 11.0 Architecture, Sizing, and Performance Recommendations

Symantec Endpoint Protection 11.0 Architecture, Sizing, and Performance Recommendations Symantec Endpoint Protection 11.0 Architecture, Sizing, and Performance Recommendations Technical Product Management Team Endpoint Security Copyright 2007 All Rights Reserved Revision 6 Introduction This

More information

Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT

Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT Building Bioinformatics Capacity in Africa Nicky Mulder CBIO Group, UCT Outline What is bioinformatics? Why do we need IT infrastructure? What e-infrastructure does it require? How we are developing this

More information

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything BlueArc unified network storage systems 7th TF-Storage Meeting Scale Bigger, Store Smarter, Accelerate Everything BlueArc s Heritage Private Company, founded in 1998 Headquarters in San Jose, CA Highest

More information

CloudDesk - Security in the Cloud INFORMATION

CloudDesk - Security in the Cloud INFORMATION CloudDesk - Security in the Cloud INFORMATION INFORMATION CloudDesk SECURITY IN THE CLOUD 3 GOVERNANCE AND INFORMATION SECURITY 3 DATA CENTRES 3 DATA RESILIENCE 3 DATA BACKUP 4 ELECTRONIC ACCESS TO SERVICES

More information

Microsoft Hyper-V chose a Primary Server Virtualization Platform

Microsoft Hyper-V chose a Primary Server Virtualization Platform Roger Shupert, Integration Specialist } Lake Michigan College has been using Microsoft Hyper-V as it s primary server virtualization platform since 2008, in this presentation we will discuss the following;

More information

WebFOCUS Cloud Express. The WebFOCUS Cloud Express service is delivered as a managed G-Cloud service by Amtex Solutions Ltd.

WebFOCUS Cloud Express. The WebFOCUS Cloud Express service is delivered as a managed G-Cloud service by Amtex Solutions Ltd. Service Definition The name of the Service is: WebFOCUS Cloud Express An overview of WebFOCUS Cloud Express The WebFOCUS Cloud Express service is delivered as a managed G-Cloud service by Amtex Solutions

More information

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Rekha Singhal and Gabriele Pacciucci * Other names and brands may be claimed as the property of others. Lustre File

More information

[Type text] SERVICE CATALOGUE

[Type text] SERVICE CATALOGUE [Type text] SERVICE CATALOGUE IT Services 1 IT Support and Management Services SERVICE AREA: SERVICE DESK Users can contact the Service Desk via the phone or an online web form for all their ICT service

More information

How To Use A Vmware View For A Patient Care System

How To Use A Vmware View For A Patient Care System Delivering Epic Hyperspace Through VMware View Using Kiosk Mode and Zero Clients Reference Implementation for a VMware Point-of-Care Solution WHITE PAPER About VMware Reference Implementations VMware Reference

More information

THE UNIVERSITY OF MANCHESTER PARTICULARS OF APPOINTMENT FACULTY OF MEDICAL & HUMAN SCIENCES INSTITUTE OF POPULATION HEALTH

THE UNIVERSITY OF MANCHESTER PARTICULARS OF APPOINTMENT FACULTY OF MEDICAL & HUMAN SCIENCES INSTITUTE OF POPULATION HEALTH THE UNIVERSITY OF MANCHESTER PARTICULARS OF APPOINTMENT FACULTY OF MEDICAL & HUMAN SCIENCES INSTITUTE OF POPULATION HEALTH CENTRE FOR HEALTH INFORMATICS HEALTH DATA SCIENTIST Vacancy ref: M&HS-07879 Salary:

More information

Overview of HPC Resources at Vanderbilt

Overview of HPC Resources at Vanderbilt Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015 2 Computing Resources

More information