DATA SCIENCE @ ED EDINBURGH DATA SCIENCE AND MANAGING NATIONAL DATA SERVICES AT EDINBURGH PROF MARK PARSONS



Similar documents
HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

Data management challenges in todays Healthcare and Life Sciences ecosystems

Citrix XenApp 6.5 Advanced Administration (CXA-301)

CNS-207 Implementing Citrix NetScaler 10.5 for App and Desktop Solutions

Big Data for health. Farr Institute, Administrative Data Research Centres, Medical Bioinformatics. 9 July Jacky Pallas, UCL

Honeywell Industrial Cyber Security Overview and Managed Industrial Cyber Security Services Honeywell Process Solutions (HPS) June 4, 2014

Outgoing VDI Gateways:

GE Measurement & Control. Cyber Security for Industrial Controls

Amazon Web Services Annual ALGIM Conference. Tim Dacombe-Bird Regional Sales Manager Amazon Web Services New Zealand

Virtualization Essentials

SA Citrix Virtual Desktop Infrastructure (VDI) Configuration Guide

Citrix XenDesktop Modular Reference Architecture Version 2.0. Prepared by: Worldwide Consulting Solutions

"Charting the Course... Implementing Citrix NetScaler 11 for App and Desktop Solutions CNS-207 Course Summary

CNS Implementing NetScaler 11.0 For App and Desktop Solutions

Experience of Data Transfer to the Tier-1 from a DIRAC Perspective

Get Success in Passing Your Certification Exam at first attempt!

Solutions for Web. Citrix NetScaler

CTS Advisory Council. CTS Disaster Recovery (DR) Project. August 6, 2014

The business value of improved backup and recovery

Communication Ports Used by Citrix Technologies. April 2011 Version 1.5

SA Citrix Virtual Desktop Infrastructure (VDI) Configuration Guide

OBSERVEIT DEPLOYMENT SIZING GUIDE

ICT budget and staffing trends in the UK

Payment Card Industry and Citrix XenApp and XenDesktop Deployment Scenarios

Summary of Technical Information Security for Information Systems and Services Managed by NUIT (Newcastle University IT Service)

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

Quantum StorNext. Product Brief: Distributed LAN Client

Owner of the content within this article is Written by Marc Grote

Planning and Designing Microsoft Virtualization Solutions

ExamPDF. Higher Quality,Better service!

Course 50273B: Planning and Designing Microsoft Virtualization Solutions. Level: 300. About this Course

Boas Betzler. Planet. Globally Distributed IaaS Platform Examples AWS and SoftLayer. November 9, IBM Corporation

Extending FactoryTalk View Site Edition with Microsoft's Remote Desktop Services

Hyper-V Replica Broker Configuration Lab By Yung Chou, Microsoft Platform Evangelist,

Communication ports used by Citrix Technologies. July 2011 Version 1.5

International High Performance Computing. Troels Haugbølle Centre for Star and Planet Formation Niels Bohr Institute PRACE User Forum

CLOUD COMPUTING FOR THE ENTERPRISE AND GLOBAL COMPANIES Steve Midgley Head of AWS EMEA

Seagate Lustre Update. Peter Bojanic

IBM Grid Medical Archive Solution

Interwise Connect. Working with Reverse Proxy Version 7.x

Customer Cloud Architecture for Mobile.

MedBroker A DICOM and HL7 Integration Product. Whitepaper

Citrix Netscaler Advanced guide for SMS PASSCODE SMS PASSCODE 2014

Security Frameworks. An Enterprise Approach to Security. Robert Belka Frazier, CISSP

ICT budget and staffing trends in Healthcare

USING GENIE REMOTELY

OnX Big Data Reference Architecture

Disaster Recovery Checklist Disaster Recovery Plan for <System One>

Mississippi State University High Performance Computing Collaboratory Brief Overview. Trey Breckenridge Director, HPC

How To Build A Call Center From Scratch

Websense Support Webinar: Questions and Answers

Benefits of Image-Enabling the EHR

Tim Tharratt, Technical Design Lead Neil Burton, Citrix Consultant

Microsoft Azure Cloud oplossing als een extensie op mijn datacenter? Frederik Baert Solution Advisor

The safer, easier way to help you pass any IT exams. Exam : Storage Sales V2. Title : Version : Demo 1 / 5

JASMIN/CEMS Services and Facilities

VESZPROG ANTI-MALWARE TEST BATTERY

This course is intended for IT professionals who are responsible for the Exchange Server messaging environment in an enterprise.

TECHNOLOGY LEADER IN GLOBAL REAL-TIME TWO-FACTOR AUTHENTICATION

7i Imaging on Demand PACS Solution FAQ s

IAN MASSINGHAM. Technical Evangelist Amazon Web Services

Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall.

End-to-End Infrastructure Solutions

Unprecedented Malware Growth

WebSphere Portal 8 Using GPFS file sharing in a Portal Farm. Test Configuration

Amazon Web Services. For Government, Education, and Nonprofit Organizations. Jakob Huhn. Partner Manager Benelux, Public Sector

Reclaiming Primary Storage with Managed Server HSM

Deploying Exchange Server 2007 SP1 on Windows Server 2008

Virtualized Disaster Recovery (VDR) Overview Detailed Description... 3

Business Process Desktop: Acronis backup & Recovery 11.5 Deployment Guide

Advanced Hosting Solutions from Interoute

PROPALMS TSE 6.0 March 2008

Symantec Backup Exec 2014

Availability Acceleration Access Virtualization - Consolidation

icrosoft TMG Replacement with NetScaler

BACKUP BEST PRACTICES FOR A XENSERVER & XENDESKTOP ENVIRONMENT

CVE-401/CVA-500 FastTrack

Cray s Storage History and Outlook Lustre+ Jason Goodman, Cray LUG Denver

Transcription:

Data Science @ Edinburgh 1 DATA SCIENCE @ ED EDINBURGH DATA SCIENCE AND MANAGING NATIONAL DATA SERVICES AT EDINBURGH PROF MARK PARSONS EPCC Executive Director Associate Dean for e-research

Data Science @ Edinburgh 2 Edinburgh Data Science An initiative spanning all of the University to bring together all of the world-class research on, and using, Data Science techniques Goals for next 3 years Set the pace for strategic activities Be at the hub of the UK Data Science network Develop a safe haven for unique data assets Be deeply integrated in pursuit of bold goals Be recognised as a world centre

Data Science @ Edinburgh 3 Accumulating success Alan Turing Institute Edinburgh Genomics 2.0 Higgs Centre April 2015 Data Lab CDT in Data Science Administrative Data Research Centre Farr Institute Digital Health Institute April 2014

Data Science @ Edinburgh 4 EDS networks and I don t mean wiring Turing Farr ADRC ICs ICT Labs Networks hold > 150M UK funding in next 4 years Will influence distribution of further funding

Data Science @ Edinburgh 5 Example of EDS stimulating research integration Bold New Foundations for medicine Modernise healthcare Transformational research Molecular Pathology Precision Medicine Real-time analytics Telehealth Telecare + Appropriate computational methods Machine learning Natural language Data cleaning Image analysis + Key data with unique curation/linkage PACS Generation Scotland UK Biobank Edinburgh Genomics

Physical Presence Data Science @ Edinburgh 6 Building 9 BioQuarter Healthcare/administrative data Higgs Innovation Centre Physics/engineering data Data Technology Institute Advanced Computing Facility HPC and Big Data Easter Bush Innovation Centre Bio/genomics data

Executive Data Science @ Edinburgh 7 What EDS is and isn t An amplifier An integrator between centres A promoter of Data Science Quick to act Lightweight EDS is Research Education Innovation Architecture Policy EDS isn t A source of direct funding An integrator within centres A replacement for existing efforts A source of frequent committees Rich!

Data Science @ Edinburgh 8 Advanced Computing Facility The ACF Opened 2005 Purpose built, secure, world-class facility Houses wide variety of leading edge systems and infrastructures National services ARCHER 118,080 cores (Cray XC30) DiRAC 98,304 cores (IBM BlueGene/Q) RDF (25Pb Disk / 50Pb Tape) Local services INDY industry machine ULTRA SGI UV2000 HYDRA research system 12m expansion in 2013: 6MW, 850m 2 plant room, 550m 2 machine

Data Science @ Edinburgh 9 ARCHER Cray XC30 National HPC service contract managed by EPSRC 26 frames 118,080 cores Comes with large 5Pb work filesystem Directly linked to 25Pb Research Data Facility for storage of simulation results EPCC won Service Provision contract in August 2013 service opened in November 2013 for ~3,800 users

Data Science @ Edinburgh 10 Data management is very challenging Our ability to store data is out-stripping our ability to manage data Large supercomputers today are always slightly broken but managing this is well understood It s less clear with regard to data All EPCC s system challenges today are data related Disk and enclosure failures Unexpectedly poor performance Difficulties with software tools But...

Data Science @ Edinburgh 11 Data services On the RDF we guarantee to look after your data for a minimum of 10 years We have a 50Pb tape-based disaster recover system at KB But how should we provide data access? Today the RDF is a large data store Limited data services on top Big growth area will be proper user focussed - data services But everyone is struggling with this

Data Science @ Edinburgh 12 Example: Farr Institute Scotland has a nicely sized population ~5.5m Has a good history of archiving data E.g. 25m image sets from PACS system Joining unique resources for medical and social research data sets from Farr Institute Administrative Data Research Centre Urban Big Data Centre Pseudonymised unconsented public data for research

Internet/JA NET Firewall U1: Approved Researcher U2: Data Provider U4: Indexer U5: Research Coordinator U6: Internet User DMZ U3: Tech Support U7: Vendor Support PACS Caching Service MFT Proxy Netscaler SSL Gateway IDZ V10 PACS copy DICOM Cache Live Catalogue MFT Server Citrix Storefront Server Active Directory Server DICOM Quarantine DLE Staging Catalogue HPS Server (CIFS) SMS Radius Server Identifier DB Logging Database Management Citrix VMs VDI Nodes (Mgmt) SAFE Server For Analysis DB Private Tags DB Structured Report Text DB U3: Tech Support U7: Vendor Support PACS DICOM Archive U8: Linkage Team GPFS Project Work Spaces ANZ Analysis Desktop Citrix VDI VMs Nodes (Analysis) VDZ Data Science @ Edinburgh U3: Tech Support 13 U7: Vendor Support DR? Archive? SMZ

Data Science @ Edinburgh 14 Summary Edinburgh Data Science EPCC and its ACF data centre Data management is really hard Edinburgh has a unique set of resources and skills to work in the Big Data area Questions?