1 2015 THE ANALYTICS OF HIGH DIMENSIONAL DATA The British Library 9th June 2015 A joint venture between
2 Welcome to the Big Data Analytics Conference Series Hosted by Cancer Research UK and sponsored by Winton On behalf of Cancer Research UK and Winton we welcome you to the second annual Big Data Analytics Conference, entitled The Analytics of High Dimensional Data. The amount of data being collected across the world has been increasing at a staggering rate. This so-called big data movement has brought with it numerous opportunities; such data are expected to become a key source for innovation, growth and knowledge. Along with these opportunities come significant challenges, relating to storage, manipulation, sharing, analysis, visualisation, and interpretation of data. There are many conferences on big data, but the defining characteristic of the Big Data Analytics Conference Series is that it aims to provide an interdisciplinary forum within which researchers in big data methodology from a wide range of disciplines, including science, healthcare, defence and e-commerce, can share ideas and solutions. This meeting, the second in the series, will focus on the challenges in the analytics of high dimensional data. This is an area in which, in recent years, major theoretical and practical advances have been made - in domains including bioinformatics, image processing and signal processing. The 2015 Big Data Analytics Conference will bring some of these diverse areas together, with the aim of cross-fertilising the advances. To encourage interaction between delegates, speakers and panellists, we are holding a workshop session before lunch to provide opportunities to discuss key challenges, highlighted by you, along with the potential solutions. The conference has also been organised with a number of breaks throughout the day, which we hope you will use as an opportunity to engage with others from different disciplines who face similar analytic questions. We hope that you will find this to be a stimulating meeting, with informative discussions and the potential for new collaborative opportunities. Conference Organising Committee Dr Jamie Meredith Head of Programme Funding Cancer Research UK Professor David J Hand Chief Scientific Advisor Winton Senior Research Investigator Emeritus Professor Imperial College London Dr Barry Levethal Founder, BarryAnalytics Ltd Chair of Market Research Society s Census & Geodemographics Group Professor Nick Luscombe Winton Group Leader Francis Crick Institute Chair in Computational Biology University College London Professor Simon Tavaré Professor of Cancer Research (Bioinformatics) Department of Oncology Director, Cancer Research UK Cambridge Institute Mr Giles Pavey Chief Data Scientist Dunnhumby 1
3 2015 THE ANALYTICS OF HIGH DIMENSIONAL DATA Conference Agenda Welcome Iain Foulkes Executive Director of Strategy and Research Funding, Cancer Research UK Introduction to the Conference Professor David Hand Chief Scientific Advisor, Winton SESSION 1: Bioinformatics Dr Dana Pe er Columbia University Dr Wolfgang Huber European Molecular Biology Laboratory PANEL SESSION Dr Dana Pe er Columbia University Dr Wolfgang Huber European Molecular Biology Laboratory Professor David Neal Cancer Research UK Cambridge Institute and Elsevier Dr Francesca Buffa University of Oxford NETWORKING BREAK BIG DATA SOLUTIONS WORKSHOP LUNCH SESSION 2: Image Processing Professor Patrick Wolfe University College London Professor David Hawkes University College London PANEL SESSION Professor Patrick Wolfe University College London Professor David Hawkes University College London Professor Kenneth Young Royal Surrey County Hospital Dr Stuart Gibson University of Kent NETWORKING BREAK SESSION 3: Signal Processing Dr Tuncer Aysal Winton Professor Adrian Hilton University of Surrey PANEL SESSION Dr Tuncer Aysal Winton Professor Adrian Hilton University of Surrey Professor Krikor Ozanyan University of Manchester Professor Sofia Olhede University College London KEYNOTE LECTURE Professor Dame Wendy Hall University of Southampton CLOSE FOLLOWED BY DRINKS RECEPTION 1 See p12 for a list of topics to be pursued by each Solutions Workshop group. 2
4 Welcome to the Big Data Analytics Conference Series Dr Iain Foulkes Executive Director, Strategy & Research Funding at Cancer Research UK Dr Foulkes was appointed Executive Director, Strategy and Research Funding in August He is responsible for helping shape the long-term direction of Cancer Research UK and to ensure we fund research of the highest quality that will have the greatest impact on cancer. Dr Foulkes began his career as a research scientist, completing his PhD at the Cancer Research UK Beatson Institute, before becoming a science writer and medical journalist. He joined the Imperial Cancer Research Fund (ICRF) in 1999, where he worked closely with Sir Paul Nurse, who at that time was Director General, helping to drive through the merger of ICRF and Cancer Research Campaign to become Cancer Research UK. From January 2003, he was Director of Strategic Development in Fundraising and Supporter Marketing, and has supported the early-stage development of The Francis Crick Institute working with Sir Paul Nurse. Dr Foulkes now looks after the research funding teams and the strategy functions of Cancer Research UK. 3
5 SESSION 1: Bioinformatics Title: Data Driven Approach to Biology Title: Distributed and Collaborative Methods and Software Development for Big Data Analysis in Genomics Dr Dana Pe er Columbia University Dr Dana Pe er is an associate professor in the Departments of Biological Sciences and Computer Science. Her lab endeavours to understand the organisation, function and evolution of molecular networks, particularly how variation in DNA sequence alters regulatory networks and leads to the vivid phenotypic diversity of life. Her team develops computational methods that integrate diverse high-throughput data to provide a holistic, systems-level view of molecular networks. She is particularly interested in exploring how systems biology can be used to personalise care for people with cancer. By developing models that can predict how individual tumours will respond to certain drugs and drug combinations, her goal is to develop ways to determine the best drug regime for each patient. Her interest is not only in understanding which molecular components go wrong in cancer cells, but also in using this information to improve cancer therapeutics. Dr Pe er is the recipient of the 2014 Overton Prize and has been recognized with the Burroughs Wellcome Fund Career Award, an NIH Directors New Innovator Award, an NSF CAREER Award and a Stand Up To Cancer Innovative Research Grant. She was also named a Packard Fellow in Science and Engineering. Dr Wolfgang Huber European Molecular Biology Laboratory Dr Wolfgang Huber leads a computational biology research group in the European Molecular Biology Laboratory (EMBL). He has appointments at the Genome Biology Unit in Heidelberg and at the European Bioinformatic Institute (EBI), one of the world s largest bioinformatics service providers, in Cambridge. He is a co-founder of the Bioconductor Project (www.bioconductor.org), one of the world s largest bioinformatics software projects. It is a worldleading platform for the development and publication of software tools for functional genomics data analysis and modelling. In his research, Dr Huber develops bioinformatics and statistical methods for multi-omics. Dr Huber s research group at EMBL aims to understand interindividual differences by large-scale statistical modelling and integrating multiple levels of genomic and molecular information from individuals with their phenotypic variation in health and disease. The group applies its work to precision oncology by working with clinical researchers to help them develop predictive assays and algorithms, which allow them to convert biological data into medical action. Dr Huber s research is driven by new technologies and he employs data from high-throughput sequencing (ChIP-Seq, RNA- Seq), tiling microarrays, large scale cell based assays, automated microscopy, as well as the most advanced methods of computational statistics. 4
6 PANEL SESSION: Bioinformatics This session will be an interactive discussion focused around the challenges associated with big data and bioinformatics. The panel members will each have five minutes to describe particular bioinformatics challenges and the next big breakthroughs in their own domain before discussion is opened to the floor. We hope that this discussion will spark debate and promote interaction between all members of the audience. Professor David Neal Cancer Research UK Cambridge Institute and Elsevier Professor David Neal is a urological surgeon and translational researcher by training and background. He spent over 11 years at the University of Cambridge developing clinical urological services and forming a new translational research group. More recently, Professor Neal has moved to a new role in Elsevier, where he shall be supporting efforts to discover novel ways of supporting individual researchers. Professor Neal was awarded the Commander of the Order of the British Empire (CBE) for services to surgery in The Queen s 2014 New Year s Honours List. He was named as one the leading UK surgeons in The Times in Dr Francesca Buffa University of Oxford Dr Francesca Buffa leads the Applied Computational Genomics Group. Her work focuses on integrative functional and clinical genomics applied to molecular and radiation oncology. Dr Buffa took up her current post as Career Development Fellow in Oxford in 2013 where she leads a research team working at the interface between bioinformatics, computational genomics and biomarker research. In addition to her research programme, she teaches at national and international training courses and provides bioinformatics/biostatistics expertise for genomics and clinical research. She has authored or co-authored over 50 publications, several in high impact journals, and been invited to present her work at national and international conferences. 5
7 SESSION 2: Image Processing Title: Big Data and Imaging: Challenges and Opportunities Title: TBA Professor Patrick Wolfe UCL Department of Statistical Science Patrick Wolfe is Professor of Statistics and Honorary Professor of Computer Science at University College London, where he is a member of the Department s Senior Management Team and a Royal Society and EPSRC Established Career Research Fellow in the Mathematical Sciences. Professor Wolfe currently serves as Executive Director of the UCL Big Data Institute. Externally to UCL, he serves on the editorial board of the Proceedings of the Royal Society A (Mathematical, Physical & Engineering Sciences), the Research Section Committee of the Royal Statistical Society, the Program Committee of the 2015 Joint Statistical Meetings, and as an organizer of the 2016 Newton Institute program on Theoretical Foundations for Statistical Network Analysis. Professor David Hawkes UCL Centre for Medical Image Computing Professor David Hawkes research interests are focussed on both fundamental research in medical image computing and transfer of advanced computational imaging technologies across the whole spectrum of patient management, from screening to diagnosis, therapy planning, image guided interventions and treatment monitoring. His specific interests encompass image matching, data fusion, visualisation, shape representation, surface geometry and modelling tissue deformation, promoting medical imaging as an accurate measurement tool and image guided interventions. Most of Professor Hawkes translational work is done with some commercial sponsorship and Professor Hawkes is a strong advocate of the need for researchers to work with both healthcare providers and industry. He is co-founder of IXICO Ltd (www.ixico.com), a university spin-out that provides imaging solutions to the pharmaceutical industry, and was Director for several years. He is on the Scientific Advisory Board of several small start-ups and research organisations. Since 2010 he has been Director of the EPSRC Programme Intelligent Imaging: Motion, Form and Function across Scale and co-director of the CR-UK/ EPSRC joint UCL KCL Cancer Imaging Centre. 6
8 PANEL SESSION: Image Processing This session will be an interactive discussion focused around the specific challenges associated with big data and image processing. The panel members will each have five minutes to describe particular image processing challenges and the next big breakthroughs in their own domain before discussion is opened to the floor. We hope that this discussion will spark debate and promote interaction between all members of the audience. Professor Kenneth Young Royal Surrey County Hospital Professor Kenneth Young has been the Consultant Physicist in charge of the National Coordinating Centre for the Physics of Mammography (NCCPM) at the Royal Surrey County Hospital since 1990, and Visiting Professor of Medical Physics at the University of Surrey since From 2007 to 2014 he was also the Director of Research at the Royal Surrey County Hospital. He has played a leading role in developing the technical standards for mammography in the UK and across Europe and has published widely on the physics of breast cancer imaging with X-rays. Dr Stuart Gibson University of Kent Dr Stuart Gibson is Lecturer in Forensic Science within the School of Physical Sciences at the University of Kent, as well as the Strategy Planning Society (SPS) Director of Innovation & Enterprise. He is the co-inventor of the EFIT-V facial composite system, which is currently used by the majority of UK police constabularies and in numerous other countries. Professor Young s specific research interests include optimisation of mammographic image quality and radiation dose, standards and performance of digital mammography systems, objective assessment and clinical relevance of image quality, and simulation of the mammographic imaging process. 7
9 SESSION 3: Signal Processing Title: Signal Processing in Financial Applications Title: Big Visual Data Processing for Entertainment & Health Dr Tuncer Aysal Winton Dr. Tuncer Can Aysal joined Winton from Cornell University where he has worked on theoretical and practical aspects of signal processing and machine learning algorithms. His current research interests centre around return and risk prediction in financial markets. Professor Adrian Hilton University of Surrey In January 2012 Professor Adrian Hilton became Director of the Centre for Vision, Speech and Signal Processing (CVSSP) at the University of Surrey. CVSSP is one of the largest UK research groups in computer vision, pattern recognition, signal processing and multimedia communication. Professor Hilton also leads Visual Media Research (V-Lab) in CVSSP, which is conducting research in video analysis, computer vision and graphics for next generation communication and entertainment applications. From 2008 to 2012 he was supported by a Royal Society Industry Fellowship to conduct research with leading visual-effects company Framestore, investigating 4D technologies for Digital Doubles in film production. The goal of Professor Hilton s research is to bridge the gap between real and computer generated imagery. His research combines the fields of computer vision, graphics and animation to investigate new methods for reconstruction, modelling and understanding of the real world from images and video. Professor Hilton s current research is focused on videobased measurement in sports, multiple camera systems in film and TV production, and 3D video for highly realistic animation of people and faces. Research is conducted in collaboration with UK companies in the creative industries. 8
10 PANEL SESSION: Signal Processing This session will be an interactive discussion focused around the specific challenges associated with big data and signal processing. The panel members will each have five minutes to describe particular signal processing challenges and the next big breakthroughs in their own domain before discussion is opened to the floor. We hope that this discussion will spark debate and promote interaction between all members of the audience. Professor Krikor Ozanyan University of Manchester After several posts as researcher and academic, Professor Krikor Ozanyan joined the School of Electrical and Electronic Engineering at The University of Manchester in 1998 where he is currently Professor of Photonics Sensors and Systems. Professor Ozanyan is currently Head of the Sensors, Imaging and Signal Processing research group. His main interests are in photonic materials, devices, systems and their application across the near UV, visible, infrared and THz spectrum. Among his achievements are the shortest wavelength ZnS/CdZnS heterostructure laser (1995), the first surface-phase diagram of InP (1997) and the pioneering of Guided Path Tomography (2005). He was appointed by the IEEE Sensors Council as Distinguished Lecturer of IEEE from 2008 to 2011 and during that period, delivered 20 invited lectures in Sensing and Tomography. Professor Sofia Olhede University College London Professor Sofia Olhede is Professor of Statistics at University College London and holds an honorary chair in the UCL Computer Science department. Professor Olhede s research interests are big data, networks, non-stationary and non-linear time series and random fields, time-scale and time-frequency inference with applications in ecology, finance, and oceanography. She is currently associate editor for IEEE Trans on Signal Processing. She is the Royal Statistical Society Isaac Newton Institute correspondent and a member of the ICMS programme committee. Professor Ozanyan was the recipient of the technical award at the 2013 World Congress of Industrial Process Tomography, for his presentation of a new imaging system for Photonic Guided-Path Tomography. He has edited several Special Issues on sensor topics and is currently Editor-in-Chief of the IEEE Sensors Journal. 9
11 KEYNOTE SPEECH: The opportunities and challenges of interdisciplinary big data analytics Dr Wendy Hall University of Southampton Dame Wendy Hall, DBE, FRS, FREng is a Professor of Computer Science at the University of Southampton, and is the Executive Director of the Web Science Institute. She was Dean of the Faculty of Physical Sciences and Engineering from 2010 to One of the first computer scientists to undertake serious research in multimedia and hypermedia, she has been at its forefront ever since. The influence of her work has been significant in many areas including digital libraries, the development of the Semantic Web, and the emerging research discipline of Web Science. Her current research includes applications of the Semantic Web and exploring the interface between the life sciences and the physical sciences. With Sir Tim Berners-Lee and Sir Nigel Shadbolt she co-founded the Web Science Research Initiative in 2006 and she is currently a Director of the Web Science Trust, which has a global mission to support the development of research, education and thought leadership in Web Science. In addition to playing a prominent role in the development of her subject, she also helps shape science and engineering policy and education. Through her leadership roles on national and international bodies, she has shattered many glass ceilings, readily deploying her position on numerous national and international bodies to promote the role of women in SET, and acting as an important role model for others. She became a Dame Commander of the British Empire in the 2009 UK New Year s Honours list, and is a Fellow of the Royal Society. She has previously been Senior Vice President of the Royal Academy of Engineering, a member of the Prime Minister s Council for Science and Technology, was a founding member of the European Research Council and Chair of the European Commission s ISTAG She was elected President of the Association for Computing Machinery (ACM) in July 2008, and was the first person from outside North America to hold this position. She is currently a member of the World Economic Forum s Global Council on Artificial Intelligence and Robotics, and is a member of the Global Commission on Internet Governance. She is also Chair of the British Council s Education Advisory Group. 10
12 Big Data Solutions Workshop The aim of this session is to provide an interactive environment in which to discuss solutions to the major challenges that we are all faced with in working with big data. The topics for discussion have been selected from the responses provided through the conference registration and you will have the opportunity to sign up to the one that is of most interest to you. This session will consist of several topic discussion groups, running simultaneously, with the aim of coming up with the top 3 challenges and the top 3 directions of focus that could benefit from a collaborative approach. The topics are as follows: Integration of Datasets A growing topic of importance in the face of heterogeneous data structures. Including data from multiple sources as well as different types of data (e.g. clinical, genomic, etc.) Data Quality Always important, and especially so for large data sets which may conceal unseen and unexpected quality issues. Including (but not limited to) data access, collection, annotation and infrastructure. Data Processing Assessing the issue of computational constraints vs optimality. Simple or complex models; which are more useful, and possibly more realistic, models in the face of massive data sets? Data Analysis A broad but important topic encompassing issues such as the complexity of analysis (e.g. analysis of graphs and networks) and novel inferential challenges arising from large data sets. Data Ethics To use big data without appropriate safeguards may be unethical, but to fail to use available data to shed light on human problems may also be unethical. An important topic discussing the ethical considerations associated with using big data. Administrative Data Incorporating novel statistical challenges (e.g. sampling variation is irrelevant). All feedback from the sessions will be available online at after the conference. 11
13 The Francis Crick Institute The venue for the 2016 Big Data Analytics Conference on Ethical Issues in Big Data The Francis Crick Institute, named after the visionary British scientist who played a key role in discovering the structure of DNA, will be a groundbreaking inter-disciplinary centre for research into the biology of human health. Based in King s Cross, the institute is a collaboration between six of the world s most highly regarded medical research organisations Cancer Research UK, the UK s Medical Research Council, The Wellcome Trust, UCL (University College London), King s College London and Imperial College London bringing together 1,500 world-leading scientists from different disciplines under one roof. 12
14 The institute will focus on the major diseases that affect our global population, including cancer, cardiovascular disease, neurodegenerative diseases and infectious diseases. The emphasis at the institute will be on collaborative approaches to unravelling the interacting networks of genes and molecules, and the environment underpinning living processes in order to tackle the root causes of these diseases. The institute will be led by Nobel Laureate, Sir Paul Nurse, who brings a combination of ambitious scientific vision and experience in leading some of the world s most renowned research organisations. Its unrivalled leadership, state-of-the-art facilities, and opportunities for novel collaboration will make the institute a magnet for the brightest scientists from around the world, creating a hub of world-class scientists, working at the peak of their creativity. Following today s conference a Drinks Reception will be held in the British Library s Terrace Restaurant which boasts unique views of the Francis Crick Institute. Please join us to learn more about the vision for this exciting new institute. For more information contact Cancer Research UK 13
15 Citizen Science Initiative About the Conference Organisers At Cancer Research UK we re always looking for new ways to accelerate our progress towards our vision: to bring forward the day when all cancers are cured. We believe crowd sourcing can help us to do this, and our Citizen Science programme has already given us encouraging results to suggest there is great potential for this approach. Our first Citizen Science project, Cell Slider, transformed tumour tissue microarray data into an interactive website. The public can view images of breast cancer cells and click to cure from anywhere. We ve now had over two million classifications of data that ultimately will help scientists to predict how well a patient will respond to treatment. These results were achieved in a fraction of the time it would have taken a team of scientists, saving more time for our researchers to answer the really big questions about cancer. Play to Cure: Genes in Space is the world s first mobile phone game designed to speed up the identification of crucial genetic clues about cancer. Genes in Space was launched in February 2014 and had tens of thousands of downloads within the first 24 hours. It is already producing analysed datasets of DNA microarray data, which we hope will improve upon current analysis methods to provide more accurate information about genetic faults in breast cancer. The Big Data Analytics Conference Series is an excellent forum to discuss what other data challenges crowd sourcing could unlock. For more information or to contact the programme team, Cancer Research UK Cancer Research UK (CRUK) is the largest independent cancer research charity in the world. It was established in 2002 after the merger of the Cancer Research Campaign and the Imperial Cancer Research Fund. CRUK supports the work of over 4,000 scientists, doctors and nurses across the UK. Every year CRUK spends hundreds of millions of pounds on a widerange of research areas including cancer prevention, basic research that will improve our fundamental understanding of cancer, drug discovery and development, and clinical trials. Cancer Research UK is excited to host the second annual Big Data Analytics Conference alongside Winton to promote the advantages of utilising Big Data in a variety of specialities. At CRUK, we recognise the increasing relevance and importance of big data in cancer research, and are looking forward to the opportunities that will arise from this conference series. Winton Winton is a global investment management company specialising in mathematical analysis of financial time series. Winton has assets under management in excess of US$ 30 billion. Winton has over 400 employees based in offices in London, Hong Kong, New York, Oxford, Shanghai, Sydney and Zurich.
16 2016 ETHICAL ISSUES IN BIG DATA Next Year s Conference The big data promise is that it will revolutionise science and boost economic success. But all advanced technologies are bound by ethical considerations. Big data, especially that referring to customers, patients, or taxpayers in short, data referring to humans - have issues of confidentiality and privacy, as well as appropriate use. These issues are bound up with data protection acts, freedom of information acts, and other legal constraints, as well as technical questions about whether anonymity is actually possible in the modern world of social networks and data linkage. To use big data without appropriate safeguards may be unethical, but to fail to use available data to shed light on human problems may also be unethical. The Ethical Issues in Big Data Conference will be the first meeting of the Big Data Analytics Conference Series to be held at the Francis Crick Institute, shortly after the institute opens in 2016.