BILS Annual Report 2013 Contents



Similar documents
National Doctoral Programme in Infections and Antibiotics

A Primer of Genome Science THIRD

EMBL. International PhD Training. Mikko Taipale, PhD Whitehead Institute/MIT Cambridge, MA USA

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

Eligible master s level programs for the Visby Program Swedish Institute Baltic Sea Region Exchange Program

REGULATIONS FOR THE DEGREE OF BACHELOR OF SCIENCE IN BIOINFORMATICS (BSc[BioInf])

Big Data in BioMedical Sciences. Steven Newhouse, Head of Technical Services, EMBL-EBI

Three data delivery cases for EMBL- EBI s Embassy. Guy Cochrane

EXPLORE BIO SIMULATION. COMPUTATIONAL LIFE SCIENCE (MSc) GRADUATE PROGRAM

COMPUTATIONAL LIFE SCIENCE (MSc) GRADUATE PROGRAM

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

Dr Alexander Henzing

Ph.D. in Bioinformatics and Computational Biology Degree Requirements

Programme Specification (Undergraduate) Date amended: August 2012

Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives

M The Nucleus M The Cytoskeleton M Cell Structure and Dynamics

Attacking the Biobank Bottleneck

Sequence Information. Sequence information. Good web sites. Sequence information. Sequence. Sequence

Harald Isemann. Vienna Brno Olomouc. Research Institute for Molecular Pathology

BIO 3350: ELEMENTS OF BIOINFORMATICS PARTIALLY ONLINE SYLLABUS

G E N OM I C S S E RV I C ES

BIOSCIENCES COURSE TITLE AWARD

European Molecular Biology Laboratory Case Example

BIOLOGICAL SCIENCES REQUIREMENTS [63 75 UNITS]

The Open2Dprot Proteomics Project for n-dimensional Protein Expression Data Analysis

University of Glasgow - Programme Structure Summary C1G MSc Bioinformatics, Polyomics and Systems Biology

FACULTY OF MEDICAL SCIENCE

An introduction to bioinformatic tools for population genomic and metagenetic data analysis, 2.5 higher education credits Third Cycle

6 ELIXIR Domain Specific Services

The College of Science Graduate Programs integrate the highest level of scholarship across disciplinary boundaries with significant state-of-the-art

Eligible Master Programmes for the Swedish Institute (SI) Visby Programme - the Academic Year 2015/2016

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

SEQUENCING INITIATIVE SUOMI (SISU) SYMPOSIUM SPEAKERS August 26, 2014

Programme Specification ( ): MSc in Bioinformatics and Computational Genomics

The National Institute of Genomic Medicine (INMEGEN) was

Contents. Page 1 of 21

1. Program Title Master of Science Program in Biochemistry (International Program)

EMBL Identity & Access Management

university of copenhagen Bioinformatics Master Program

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Department of Biochemistry & Molecular Biology

European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute

Bachelor Curriculum in cooperation with

Kazan (Volga region) Federal University, Kazan, Russia Institute of Fundamental Medicine and Biology. Master s program.

Eligible Master Programmes for the Swedish Institute Visby Programme - the Academic Year 2014/2015 University Name of Master Programme Application

Biochemistry Major Talk Welcome!!!!!!!!!!!!!!

The University is comprised of seven colleges and offers 19. including more than 5000 graduate students.

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

UPBM CURRICULAR BROCHURE

BSc (Hons) Biology (Minor: Forensic Science or Marine & Coastal Environmental Science)/MSc Biology SC516 (Subject to Approval) SC516

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

Contents. Page 1 of 11

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Protein Protein Interaction Networks

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

Core Bioinformatics. Degree Type Year Semester Bioinformàtica/Bioinformatics OB 0 1

Bachelor Curriculum in cooperation with

FACULTY OF MEDICAL SCIENCE

GC3 Use cases for the Cloud

GRADUATE CATALOG LISTING

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

The National Centre for Biomedical Engineering Science

GeneProf and the new GeneProf Web Services

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

Building a Collaborative Informatics Platform for Translational Research: Prof. Yike Guo Department of Computing Imperial College London

K 066/875. Master Curriculum. Bioinformatics. (in English)

Data Sharing Initiative: International Cancer Genome Consortium

Eligible Master Programmes for the Swedish Institute (SI) Visby Programme the Academic Year 2016/2017 Application Credits

Biobanks, an under used resource

ATIP Avenir Program Applicant s guide

M.Sc. in Nano Technology with specialisation in Nano Biotechnology

Alison Yao, Ph.D. July 2014

School of Nursing. Presented by Yvette Conley, PhD

ITT Advanced Medical Technologies - A Programmer's Overview

Module 1. Sequence Formats and Retrieval. Charles Steward

Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT

The Novo Nordisk Foundation Center for Biosustanability, DTU. Presentation of the center at Plastdagen 5 May 2011 by Bo Skjold Larsen, COO.

Report of the DTL focus meeting on Life Science Data Repositories

Statistics in Applications III. Distribution Theory and Inference

Global Networking of Collections WFCC and GBRCN perspectives. EMbaRC Seminar David Smith Cantacuzino Institute, Bucharest, Romania 8-9 March 2010

TRACKS GENETIC EPIDEMIOLOGY

BIOINFORMATICS Supporting competencies for the pharma industry

Centre for Entrepreneurship. Master of Science in. Innovation and. graduate

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])

Transcription:

Annual Report

Contents Foreword... 4 Introduction... 5 Organisation... 5 Board... 6 Scientific Advisory Board... 6 Reference group... 6 Director... 6 Technical coordinator... 6 Administration and Economy... 6 Staff 2013... 7 BILS Activities 2013... 8 Growth of BILS... 9 Open call... 10 Infrastructure... 10 System Mosler for high security compute and storage... 11 Web site and Contact routes... 11 Biosupport... 11 Data publication service... 12 Coordination with Science for Life Laboratory... 12 Coordination with other research infrastructures... 13 Consultancy... 13 Highlights from consultancy projects... 13 Identification of biomarkers important to ALS... 13 Training activities... 14 Major courses organised by BILS:... 14 Other courses where BILS staff has been engaged in lecturing/training... 15 Outreach activities... 16 Annual Symposium and User Meeting... 16 Site Visits and All Hands Meetings... 16 Weekly chats... 16 Meeting with Scientific Advisory Board... 17 ELIXIR... 18 Human Protein Atlas... 19 Other international activities... 21 1

Economic report... 21 Outlook for 2014... 22 Appendix 1 BILS staff 2013... 23 Dag Ahrén, Lund University... 23 Andrey Alexeyenko, Karolinska Institutet... 23 Magnus Alm Rosenblad, Gothenburg University... 23 Eva-Britt Berglund, Linköping University, Economy... 24 Ann-Charlotte Berglund Sonnhammer, KTH... 24 Jorrit Boekel, Karolinska Institutet... 24 Mikael Borg, Stockholm University, Technical coordinator... 25 Ino de Bruijn, KTH... 25 Moritz Buck, Uppsala University... 25 Joakim Bygdell, Umeå University... 26 Luciano Fernandez, Gothenburg University... 26 Eva Freyhult, Uppsala University... 26 Pontus Freyhult, Uppsala University, UPPMAX... 27 David Gomez-Cabrero, Karolinska Institutet... 27 Jonas Hagberg, Stockholm University... 27 Marc Hoeppner, Uppsala University... 27 Lukasz Huminiecki, Stockholm University... 28 Yvonne Kallberg, Karolinska Institutet... 28 Diarmuid Kenny, Gothenburg University... 28 Samuel Lampa, Uppsala University, UPPMAX... 29 Henrik Lantz, Uppsala University... 29 Malin Larsson, Linköping University... 29 Fredrik Levander, Lund University... 29 Sara Light, Stockholm University... 30 Jessica Lindvall, Karolinska Huddinge... 30 Daniel Lundin, KTH... 31 Henrik Lysell, SLU Uppsala... 31 Jia Mi, Uppsala University... 31 Intawat Nookaew, Chalmers... 32 Johan Nylander, NRM (Swedish Museum of Natural History)... 32 Bengt Persson, Uppsala University, Director... 32 Rui Climaco Pinto, Umeå University... 33 Katarina Truvé, Gothenburg University... 33 Jeanette Tångrot, Umeå University... 33 2

Mats Töpel, Gothenburg University... 34 Estelle Wera, SLU Alnarp... 34 Victoria Westling, Linköping University, Administration... 34 Kristin Wiberg, Linköping University, Administration... 34 Appendix 2 List of projects 2013... 35 Appendix 3 Publications 2013... 40 Appendix 4 Abbreviations... 43 3

Foreword During 2013, BILS has continued to grow and it is with great pleasure that we can provide bioinformatics support to many projects nation-wide. Our staff now consists of 39 skilled experts that have contributed in over 400 projects. Still, the demand for bioinformatics analyses is increasing and we will continue to recruit the necessary expertise. In autumn, we have launched the BILS Annotation Platform, which provides advanced support in annotating complete genomes. We believe that this is an area where specialised expertise will facilitate the annotation process. The platform provides both expertise and annotation software, and the projects are run in close collaboration with the customers. We have also initiated the acquisition of a highly secure system for analysis and storage of sensitive data in close collaboration with the SNIC centre UPPMAX. In December, the European infrastructure for biological information ELIXIR was formally inaugurated. ELIXIR is now a legal entity, and the process of setting up the nodes across Europe is currently ongoing. In this report, you will find more details about our activities during 2013. We also provide regularly updated information at the BILS web site http://bils.se. Finally, I would like to thank all skilled and devoted co-workers in BILS and all our new and returning customers for valuable and interesting collaborations. I would also like to thank our engaged Board, Reference Group and Scientific Advisory Board for all your important contributions. Linköping/Stockholm 15 February 2014 Bengt Persson, director of BILS 4

Introduction BILS (Bioinformatics Infrastructure for Life Sciences) is a distributed national research infrastructure with support from the Swedish Research Council. BILS is hosted by Linköping University. The aim of BILS is to provide bioinformatics infrastructure and support for life science researchers in Sweden, on both local and national levels. The organisational structure allows for changes in services over time as new techniques are developed and utilised. BILS also matches international efforts, such as the new European infrastructure for biological information, ELIXIR, where BILS is the Swedish node and coordinates the Swedish contributions to the ELIXIR infrastructure. There is need for a coordinated national bioinformatics infrastructure for life sciences for several reasons: to match international efforts, of which ELIXIR is the most prominent, and make it possible for Swedish researchers to efficiently utilise these resources, including coordination of Swedish contributions within ELIXIR. to coordinate the activities at various sites in order to provide efficient and front-line research-based support to Swedish life science groups. to organise and provide sustainable large-scale storage of data from high-throughput sequencing, genomics, proteomics analyses, in close collaboration with SNIC 1 centres. to provide computational resources within bioinformatics and neighbouring disciplines for large-scale data processing, in close collaboration with SNIC centres. to provide a network between bioinformatics sites in Sweden. Organisation BILS is organised as a distributed national research infrastructure, with a number of nodes. The nodes provide support for bioinformatics issues, e.g. consultancy in experiment planning, analysis and interpretation of results, bioinformatics calculations, and training. Each node is responsible for one or several areas of expertise. Depending on the user needs, some areas are represented at more than one node. During 2013, the need for additional coordinators in major support areas has increased as BILS staff has grown in number. We now have coordinators in the areas of technical matters, genomics, proteomics, and systems development. During 2014, we plan to also appoint a training coordinator. Figure 1 Distributed BILS infrastructure 1 SNIC, Swedish National Infrastructure for Computing 5

Board Niklas Blomberg, AstraZeneca, chairman and board member until 30 April 2013 Siv Andersson, Uppsala University, chairman from 1 May 2013 Peter James, Lund University Kerstin Johannesson, Gothenburg University Inge Jonassen, University of Bergen Anders Krogh, Copenhagen University Scientific Advisory Board Amos Bairoch, Switzerland Ewan Birney, United Kingdom Jaap Heringa, the Netherlands Michal Linial, Israel Kathryn Lilley, United Kingdom Lars Malmström, Switzerland Torben Falck Ørntoft, Denmark Reference group Anders Blomberg, University of Gothenburg Erik Bongcam-Rudloff, SLU Arne Elofsson, Stockholm University Juha Kere, Karolinska Institutet Jan Komorowski, Uppsala University Jan-Eric Litton, Karolinska Institutet Joakim Lundeberg, Royal Institute of Technology (KTH) Jens Nielsen, Chalmers Uwe Sauer, Umeå University Erik Sonnhammer, Stockholm University Jesper Tegnér, Karolinska Institutet Johan Trygg, Umeå University Anders Tunlid, Lund University Björn Wallner, Linköping University Director Bengt Persson, Linköping University/Uppsala University Technical coordinator Mikael Borg, Stockholm University Figure 2 Organisation of BILS Administration and Economy Administration and economy have been handled by Victoria Westling and Eva-Britt Berglund at Linköping University. 6

BILS Annual Report 2013 Staff 2013 BILS has during 2013 increased its staff from 26 to 39 persons. BILS has staff located at all the six sites Umeå, Uppsala, Stockholm, Linköping, Gothenburg and Lund. All staff is listed in the table below. Figure 3 Group picture of 28 BILS staff in Umeå February 2014 Activity Director 50% Technical coordinator 100% Genomics coordinator 25% Proteomics coordinator 25% System development coordinator 25% Genomics (Large-scale sequencing) Genome annotation platform Proteomics Systems biology Metabolomics/Metabolic modelling Databases, phenomics Phylogenomics General bioinformatics SLU Protein bioinformatics and sequence analysis Biostatistics System development ELIXIR integration of HPA Economy Administration Name Bengt Persson Mikael Borg Magnus Alm Rosenblad Fredrik Levander Jonas Hagberg Dag Ahren, LU Estelle Wera, SLU (20%) Ino de Bruijn, KTH Jeanette Tångrot, UmU Jessica Lindvall, KI (50%) Katarina Truvé, GU Lukasz Huminiecki, UU Magnus Alm Rosenblad, GU Malin Larsson, LiU Mats Töpel, GU Moritz Buck, UU Marc Höppner, UU Henrik Lantz, UU Diarmuid Kenny, GU Fredrik Levander, LU Jia Mi, UU Joakim Bygdell, UmU Jorrit Boekel, KI Andrey Alexeyenko, KI Intawat Nookaew, Chalmers Rui Pinto, UmU Luciano Fernandez, GU Johan Nylander, NRM Henrik Lysell, SLU Yvonne Kallberg, KI Sara Light, SU Eva Freyhult, UU Samuel Lampa, UPPMAX (50%) Pontus Freyhult, UPPMAX (50%) Jonas Hagberg, SU Kalle von Feilitzen, KTH Pär Oksvold, KTH Eva-Britt Berglund, LiU (5%) Victoria Westling, LiU (15%) More detailed descriptions of the staff and their expertises are given in Appendix 1. 7

BILS Activities 2013 BILS activities are, in line with our strategy plan, predominantly infrastructure and consultancy. Consultancy implicates that BILS staff helps with bioinformatics issues in research projects. Infrastructure comprises work on large scale sequencing data storage, mass spectrometry proteomics data storage, setting up analytical pipelines, work on BILS web pages and similar. BILS staff has also been engaged in several training activities. Furthermore, some time has been allotted to the own education of BILS staff and to meetings. Meetings have been both internal and external where we present BILS activities and advertise our services. During 2013, BILS has handled 324 consultancy projects and 124 infrastructure projects, which is an increase from 2012 where the corresponding numbers were 184 and 76, respectively. A total of 268 PIs have been served, of which 190 are new and 78 are returning. For comparison, 134 PIs where served in 2012. BILS is working truly nationally as can be seen from that the PIs (Figure 4) and project time (Figure 5) are distributed among all major Swedish universities. Support requests are distributed by the area coordinators according to available expertise, so that all experts take on projects from all of Sweden. FOI; 1 LTH; 1 Norrlands Polismyndigheten; 1 SVA; 1 WABI; 1 ÖrU; 1 AstraZeneca; 1 Universitetssjukhus; 1 NGI; 2 Sahlgrenska University Hospital; 3 KI; 38 Chalmers; 6 KTH; 9 SLU; 11 SU; 17 NRM; 22 UU; 37 LiU; 22 GU; 28 UmU; 23 LU; 28 Figure 4 Distribution of PIs between Swedish organisations 8

Norrlands UPPNEX Univ.sjukh. LnU ÖrU CLiC SVA UPPMAX LTH NGI Sahlgrenska Univ Hosp FOI SciLifeLab KTH Chalmers Sahlgrenska Core Facility LiU Karolinska University Hospital AstraZeneca Other UU SLU UmU LU GU NRM SU KI Figure 5 Distribution of project time between Swedish organisations Growth of BILS The utilisation of BILS as measured by number of projects and number of PIs has increased considerably every year from 2011, reflecting the growing demand for bioinformatics analyses. This at the same time as increasingly more research groups are recruiting and training their own bioinformatics expertise. BILS meets this growing demand by increasing our staff. It seems likely that the demands will continue to grow, as high-throughput techniques become more and more widely used. Prognostication indicates that in 2018, our staff will be around 100 in order to be on par with the requirements from the users. 1000 900 800 700 600 500 400 300 200 100 0 Staff Projects Figure 6 Growth of staff, projects and PI 2011--2013 and future trends indicated by dotted lines PI 9

In order to meet the increased requirements, BILS acts along the three lines: increased consultancy to help users increased training activities to educate users (predominantly PhD students and post-docs) provision of more user friendly infrastructure (tools and databases) enabling researchers to perform more bioinformatics analyses on their own Open call There was an open call in September 2013 for suggestions for additional activities. A total of 13 applications were submitted, of which the board supported 7, put 3 for consideration during 2015, and found that 3 would fit in the normal BILS support. Infrastructure One important part of BILS is the formation of a sustainable bioinformatics infrastructure for life sciences. The BILS infrastructure is typically constructed as domain-specific supporting layers utilising SNIC resources for the computational needs. These computational and storage needs are set up in close collaboration with SNIC. In this respect, BILS is content provider while SNIC is hardware provider. Figure 7 Fido user friendly front-end to bioinformatics tools Methods and software developed within BILS are made publically available to the scientific community. We will also facilitate publication of Open Access Data. One important area is data storage, where BILS allocates resources to implement the technical solution irods (Integrated Rule Oriented Data System) for storage on large scale sequence data. Two software developers at UPPMAX are cofunded by BILS. Since earlier, BILS also has created a data repository for mass-spectrometry proteomics data, which did not exist in Sweden before BILS. We have also in collaboration with the SNIC centre NSC developed Figure 8 UPPMAX graphical client a secure and robust user interface, named Fido, that has been been further developed during 2013, but at a slower pace than planned due to lack of staff. Two recruitments will take place early 2014 in order to increase our systems development resources. As the main part of the computing done in academic life sciences is carried out at UPPMAX, BILS maintains extra strong collaboration and coordination with the SNIC centre UPPMAX and has co-founded computational and storage resources as well as staff during 2013. 10

One interesting project carried out by this staff is the introduction of a remote desktop service with traditional and web based clients, allowing new users to connect and start working from a web browser. Work has also been done on the storage services allowing for a speed up of up to ten times for data intensive jobs. Additionally, efforts have been made to develop the Bioclipse workbench platform to allow easier interaction with HPC resources and in supporting investigations of the optimal use of cloud systems and the Hadoop framework for distributed computing in life science. Figure 9 UPPMAX graphical client System Mosler for high security compute and storage During 2013, as stated in our strategy plan, BILS has investigated and planned for a high security system for compute and storage, given the name Mosler. The hardware was procured during late 2013, and the system will be set up early 2014 together with the SNIC centre UPPMAX. Mosler is a secure infrastructure to enable research on sensitive data, based upon the Norwegian system TSD 2.0, developed by Gard Thomassen and co-workers at the University Centre for IT (USIT) at the University of Oslo (UiO). All projects on the system will have their own set of virtual machines (VM). Users access their VM via a secured graphical terminal using two-factor authentication, and all communication is encrypted. This way the data is controlled and secured in the central secure vault and the risk of losing data is minimized. The environment is secured by a strict firewall and no internet access is allowed within the environment. Web site and Contact routes BILS puts essential information at http://bils.se, where we present BILS and our staff, activities and support information. We have a support queue (support@bils.se) which creates tickets in our internal project management system. Furthermore, we are part of the http://biosupport.se system (cf. below). However, the most common contact route is to take personal contacts with the BILS staff. At the BILS web page, we also provide links to Swedish bioinformatics tools and links to useful data resources. We have initiated a wiki site (http://wiki.bils.se) where we publish information and practical guidelines that will be useful both for BILS staff and our users. Biosupport We continue supporting the Biosupport platform launched 2012 as a joint effort between BILS, SciLifeLab, UPPNEX and SLU. The aim is to provide a single unified front for bioinformatics support in Sweden to avoid confusing the users with multiple initiatives. Martin Dahlö at SciLifeLab Uppsala together with BILS and UPPNEX staff has been instrumental in this initiative. 11

In the Biosupport forum, users can ask and answer questions, comment and vote for questions posed by others and their answers. The forum is moderated by its members and support representatives from the involved parties. The support representatives have a rolling weekly support duty schedule. The person on support duty is responsible for assigning open questions to experts. Data publication service BILS is committed to help Swedish scientists to publish Figure 10 Biosupport web interface their data so that the data is searchable, citable and reliably stored. To this end, BILS has a collaboration with SND (Swedish National Data Service) through which BILS can mint DataCite Digital Object Identifiers (doi:s), which is an international standard for permanent identifiers of research data. The data publication activities will be integrated with an irods-based data management system that is being developed in collaboration with SNIC-UPPMAX and SNIC-NSC. BILS has also initiated contacts with ECDS (Environment Climate Data Sweden) regarding best practices for publishing data. During 2014, we expect an increase in number of issued doi s and we also plan for automatisation of these parts during next year. Furthermore, BILS provides assistance to researchers in creating data publication plans, including provision of suitable template documents, which are available at our wiki pages. Coordination with Science for Life Laboratory Science for Life Laboratory (SciLifeLab) is an important establishment for high-throughput analysis. BILS has several staff placed at SciLifeLab, and these positions are co-funded by SciLifeLab. Our collaboration with WABI (Wallenberg Advanced Bioinformatics Infrastructure) staff at SciLifeLab has increased considerably during 2013 with common meetings and interchange of knowledge. BILS and WABI collaborate on outreach activities, at information meetings as well as on the respective web sites. Furthermore, the organisations frequently refer support requests between each other." BILS board SciLife NRK Not yet assigned BILS Figure 11 Current steering WABI SciLife Bioinformatics platform, incl. UPPNEX In order to further increase the coordination, we made a joint proposal to VR in September 2013 from BILS, WABI and SciLifeLab that the BILS board should also function as a board for WABI and the SciLifeLab Bioinformatics Platform. We expect that the extended functions of the BILS board will start during 2014. Extended BILS board BILS WABI Figure 12 Future steering SciLife Bioinformatics platform, incl. UPPNEX 12

Coordination with other research infrastructures BILS has regular meetings with SILS, since the two infrastructures work in neighbouring and partly overlapping fields, in order to coordinate infrastructure investments and staff hiring by BILS. The BILS Director has during 2013 had regular meetings with the SNIC Director. Furthermore, during 2013 there has been one meeting with all BMS infrastructure directors for mutual information exchange and coordination. Consultancy During 2013, BILS has continued to provide expertise in the areas of Large scale sequencing (NGS), Metagenomics, Biostatistics, Phylogenomics, Metabolomics, Systems biology, Protein bioinformatics, and Mass-spectrometry proteomics. We have also recruited staff in the areas of Genome Annotation, Marine Genomics, and Software development. A full list of all staff is given in Appendix 1. Highlights from consultancy projects During 2013, we report publications with BILS staff as co-authors or acknowledged. These publications are listed in Appendix 3. In addition, BILS staff has contributed with shorter consultancies that do not motivate co-authorship on the manuscript. Finally, much of the work performed during 2013 has not yet been compiled in a publication. An example of project where BILS staff has contributed is described below. Identification of biomarkers important to ALS Background: The search for useful biomarkers in patient samples is central to many areas of neurological research, diagnostics and drug discovery. Challenged to characterize complex neurological processes, investigators desire high-performance tools that offer simultaneous measurements of several low-abundant proteins in limited sample volumes. Multiplex solid phase proximity ligation assay (multiplex SP- PLA) has been shown to provide sensitive multiplex measurements of proteins levels with maintained target selectivity. Objectives: To measure levels of 47 proteins by multiplex SP-PLA in samples from neurologically healthy controls, and patients with amyotrophic lateral sclerosis (ALS), in order to identify ALS disease and treatment biomarker candidates. Methods: Samples were collected from 20 ALS patients with 20 age-matched controls. Levels of 47 proteins were analysed in all samples Figure 13 13

using multiplex SP-PLA. The protein levels are measured in Ct values (cycle threshold, i.e. number of PCR cycles required to reach a threshold) and can be translated to a concentration value by the use of a biomarker specific standard curve, this procedure involves outlier detection, curve fitting and determining limits of detection. The achieved concentration values are used to determine the concentration level in the samples, but in the statistical analysis the Ct values are used. Measurements for biomarker discovery were compared univariately using the non-parametric Mann-Whitney U test. Using the 0.05 significance level and Bonferroni multiple test correction, four biomarkers with a significant difference in mean between patients and controls were identified. In an attempt to find combinations of biomarkers that could better distinguish between patients and controls than a single biomarker, multivariate classification models were built using random forests. The models were built and evaluated in a repeated holdout procedure, including a variable subset selection based on Mann-Whitney U test statistics. Results: The levels of four biomarkers were found to be significantly lower in the ALS patient group compared to the age-matched control group. The multivariate classifiers had high performance accuracy (with a mean AUC of 0.95, mean probability of detection of 0.835 and mean probability of false alarm of 0.095), but also models based on only a single marker had similar performance measures, see Figure 13. Conclusions: The four ALS biomarker candidates identified in this study should be further validated in other ALS patient cohorts. If they prove valid they may increase understanding of ALS pathophysiology and aid in ALS diagnosis, prognosis and development of treatment. BILS staff in this project: Eva Freyhult, Uppsala University Training activities During 2013, BILS staff has continued to be involved in a multitude of training activities, ranging from participation in advanced bioinformatics courses, graduate student courses and similar to individual training of researchers in order to teach them new bioinformatics tools and to help them utilise bioinformatics tools more efficiently. These training activities are an efficient way to increase the flow of projects through the BILS organisation by helping scientists to be able to perform parts of the bioinformatics analyses themselves. Many training activities are organised in close collaboration with SciLifeLab, WABI and UPPNEX. Major courses organised by BILS: BILS Genome Browser Course 18 Feb 2013 SciLifeLab Stockholm BILS Genome Browser Course 16 Oct 2013 SciLifeLab Stockholm BILS Genome Browser Course 22 Oct 2013 SciLifeLab Stockholm BILS Workshops Genome assembly/metagenomics, SciLifeLab Uppsala, Nov 2013 BILS Genome Browser Course 28 29 Nov 2013 Gothenburg BILS Multivariate course for expression data Nov 2013 Lund 14

Other courses where BILS staff has been engaged in lecturing/training This list also includes contributions by the SciLifeLab Uppsala Bioinformatics Facility. The 2013 Workshop on Genomics 7--18 January, 2013 in Český Krumlov, Czech Republic. NABiS Samlingsvård och biodiversitetsinformatik, Stockholm University, 16 Jan 2013, (NABiS = Nordic Academy of Biodiversity and Systematic Studies) "Multivariate data analysis for metabolomics", Swedish NMR center, Gothenburg. January 28--30 2013 Phylogenetic systematics and molecular dating, Copenhagen University, 29 30 January 2013, (2 days) Basic Bioinformatics, (SciLife Uppsala), Feb 2013 Molecular Genetics, Umeå University, 28 Feb 2013 Bioinformatik och genomanalys, 5MO083, Umeå University 23 May 2013 Bioinformatics and Functional Genomics, Gothenburg University (25/3 5/6) 2 days BILS contributions Genome analysis(scilife), Uppsala, April 2013 Human genetic variation workshop (WABI), SciLife Stockholm, 10 11 June 2013 RNA-seq workshop (WABI), SciLife Stockholm, 12 13 June 2013 Phylogenetics workshop, PhD course, Naturhistoriska riksmuseet, Stockholm, 2013-06- 10 2013-06-14 Workshop on Systematics and Biodiversity: Göteborg, Aug 19 30, 2013 (BILS contributions 1 day) Masters course in infection Biology, Uppsala, Sept 2013 RNA-seq workshop (WABI), SciLife Uppsala, Sept 2013 Introduction to bioinformatics for biosystematics, PhD-kurs given av ForBio Norsk- Svenska forskerskolen i biosystematikk, Oslo Universitet, 2013-09-14 2013-09-20,. Genome analysis BIO962, 1 dag. Sveriges Lantbruksuniversitet, Uppsala, 2013-10-04 Advanced Next Generation Sequencing data analysis, Gothenburg University, 7 18 Oct, 2013. BILS contributions 1 day Computational Methods for Massively Parallel Sequencing 14 18 Oct 2013 Lund Analytical Techniques in Experimental Biosciences, Linköping University, Faculty of health sciences, 23 October 2013 (2 hours) Advanced Bioinformatics, Gothenburg University (4/11 17/1-2014) BILS contributions 3 days Methods in massive parallel sequencing, BMC Uppsala, 2013-11-20 Assembly workshop, BMC Uppsala, 2013-11-26 Bioinformatics and applications (Basic level course), Linköping University (IFM), November--December 2013 Training IVA/IPA -- variant filtrering and pathway analysis, Gothenburg University, 4 December Training Cartagenia program for variant filtrering, Gothenburg University, 5 6 December Course Molecular Mechanisms in Cancer, Molecular Medicine Master's Program, Uppsala University, Autumn 2013. SciLifeLab contributions 1 day. 15

Outreach activities Annual Symposium and User Meeting The annual symposium 2013 was held in Gothenburg 17 October 2013 with presentation of BILS activities and with interesting keynote lectures on the theme Systems biology by Ingemar Ernberg, Jens Nielsen and Anders Blomberg. About 80 persons attended the symposium which also gave ample opportunities to informal discussions with BILS staff. Figure 14 Annual Symposium 2014 Figure 15 Annual Symposium 2014 Site Visits and All Hands Meetings In a distributed infrastructure like BILS, it is important that all staff is aware of the special competences of their colleagues at other sites. In order to achieve this, we have regular All Hands Meetings with all BILS personnel. These meetings are also beneficial to create a BILS identity and to exchange ideas for the optimal daily operation of the infrastructure. Furthermore, in order to increase the coordination in bioinformatics support, at some meetings we also invite representatives from the SNIC centres UPPMAX (hosting the majority of computational NGS data analysis resources and application experts in bioinformatics) and NSC (with application expertise in bioinformatics and where BILS servers are running) and representatives from SciLifeLab. During 2013, we have had five All Hands Meetings: Göteborg 16/1, Stockholm 25/3, Linköping 27/5, Uppsala 29/8, electronic 7/10. BILS has also presented our activities at the SciLifeLab days 26 March and 26 August. Finally, we have during the year initiated Topical Meetings focusing on one specific topic each time and arranged using the distance conference system Adobe Connect. Weekly chats During 2013, BILS staff begun to have a weekly text-based chat, where current BILS activities are discussed in an informal manner. Shortly before the meeting, an automatic e-mail reminder is sent to all staff, containing information about open support requests, unanswered questions on biosupport.se, and new pages on the BILS wiki. The BILS chats also provide opportunities for staff members to ask general questions, exchange ideas and socialize. 16

Meeting with Scientific Advisory Board In March 2013, we had a meeting with our Scientific Advisory Board (SAB) at SciLifeLab Stockholm. This was the first physical meeting with the SAB and BILS Board and Management. Our internationally well-renowned SAB provided a lot of useful input and good advice to BILS activities, of which many have materialised already during 2013. Figure 16 SAB meeting with BILS Board and Management. From left to right: Bengt Persson, Anders Krogh, Kathryn Lilley, Jaap Heringa, Ewan Birney, Torben Falck Ørntoft, Amos Bairoch, Niklas Blomberg, Peter James. Photo by Mikael Borg 17

ELIXIR The aims of ELIXIR are to construct and operate a sustainable infrastructure for biological information in Europe to support life science research and its translation to medicine and the environment, the bio-industries and society. The infrastructure solution should deal effectively with the challenges of growth and new types of data, as well as safeguard and make the most of national investments in life science and biomedical research. ELIXIR is a distributed infrastructure with several nodes throughout Europe and a central hub. Sweden contributes with a node in ELIXIR in order to provide access to data and methods/tools originating from Sweden and to give Swedish researchers access to the European databases, tools, and biocomputational resources. Our initial contribution is integration of the Human Protein Atlas (HPA) in the ELIXIR landscape (cf. below). The ELIXIR preparatory project was from November 2007 to December 2013. From 2014, ELIXIR enters the construction phase. Figure 18 Robert-Jan Smits, Director General, DG Research and Innovation, European Commission; Soren Brunak, Chair of ELIXIR Interim Board; Janet Thornton, EBI Director and co-ordinator ot the ELIXIR preparatory phase; Niklas Blomberg, ELIXIR Director. Photo by Veldeman Photo Brussels Figure 17 Schematic view of ELIXIR organisation with hub and nodes The ELIXIR interim steering board has been active from 2011 to 2013 with two annual meetings. The Swedish representatives of the interim board are Bengt Persson, Linköping University, and Elin Swedenborg or Anna Wetterbom, Swedish Research Council. From 2014, a permanent ELIXIR board will be assigned. BILS coordinates the Swedish node in ELIXIR. During 2013, ELIXIR has transformed from the preparatory phase to the construction phase. Niklas Blomberg is the founding director since 1 May 2013; he was previously chair of the BILS board and working at AstraZeneca Mölndal. During autumn Rafael Jimenez has been appointed as technical coordinator of ELIXIR. Furthermore, during autumn, the ECA ELIXIR Consortium Agreement has been finalised, being the legal document for the ELIXIR organisation. It has now entered into force signed in December 2013 by five countries (United Kingdom, Sweden, Czech Republic, Switzerland, and Estonia) and EMBL. Additional countries are expected to sign the agreement during 2014. An ELIXIR launch ceremony was held in Brussels on 18 December 2013. 18

Figure 19 Group at ELIXIR Launch in Brussels 18 December 2013 Photo by Veldeman Photo Brussels Human Protein Atlas The Human Protein Atlas (HPA), the first activity of the Swedish node, joined ELIXIR in July 2013. Our intentions are that the data in HPA should be made more closely linked to other protein resources. Since 1 July 2013, the Swedish Research Council funds two persons working on this. We have initiated planning for training activities related to HPA and a first advanced workshop is planned to be held early 2015. In the beginning of 2013, a pilot project was launched as collaboration between HPA and EMBL-EBI. The objective was to explore the possibilities and evaluate the Distributed Annotation System (DAS) technical solution for this purpose. The pilot was successfully implemented and the outcome helpful in the continuation of the HPA Elixir adaptation. The outcome of the pilot was reported in May on the technical Elixir workshop in Hinxton. In the later part of 2013, HPA went through a major revision to adhere to the Elixir requests and standards. Version 12 of the HPA was released in December containing data for almost 22000 antibodies covering over 80% of the human protein coding genes. Figure 20 Screenshot of HPA version 12 (www.proteinatlas.org) 19

Among many small improvements the two largest achievements of this version is the division of the atlas into sub-sections, corresponding to the studied sample types, and the extension of the API providing HPA data in structured XML format. Essential for HPA s role in ELIXIR is a good interface for sharing data to other ELIXIR partners. This is now available through a REST interface providing structured data in XML format. More information on the data shared by HPA can be found here http://www.proteinatlas.org/about/download. This interface will evolve together with the inclusion of more and other types of data in the HPA. It will be continuously improved according requests and suggestions from the ELIXIR consortium. HPA data can be accessed programmatically through the API for a specific gene or for a set of genes through a query interface. Data can also be fetched in its entirety corresponding to all data published in the HPA. Both metadata and images (as URLs) are included in the data files. Figure 21 Web interface to search tool in the Protein Atlas The API is queried with the same syntax as the normal web interface. E.g. http://www.proteinatlas.org/search_download.php?format=xml&query1=insulin AND ih_tissue_reliability:supportive AND if_reliability:supportive HPA is already today exchanging data with many European initiatives, both Elixir partners and others. As of today nextprot, UniProt, UniProt GO and UniProt xref are using this newly released data format. Ensembl will convert to this format for the coming releases. Figure 22 Example section of the XML output rendered by an API call 20

Other international activities Sweden is engaging in questions regarding Open Access and provision of data. Within ELIXIR, Sweden will together with initially the Netherlands and United Kingdom establish contacts with the global organisation RDA (Research Data Alliance). BILS has a large Nordic engagement, where the Nordic ELIXIR nodes meet regularly to coordinate our activities. During 2013, we have done initial planning of the Tryggve project aiming at the creation of a federated secure environment for sensitive data storage and computation, enabling large Nordic collaborations in life sciences. We have received support from NeIC (Nordic einfrastructure Collaboration) and the project will also include additional BMS infrastructures BBMRI and BioImaging. Furthermore, BILS has received support from NordForsk generously funding coordination meetings and travel costs between the Nordic countries. Economic report Income Grant from VR for BILS 16 900 000 Grant from VR for ELIXIR 1 000 000 Co-funding from SciLifeLab and universities 6 190 208 User contributions 100 000 Income total 24 190 208 Expenses Personnel 21 426 997 Common costs 811 576 Travel and other costs 418 827 Expenses total 22 657 400 Surplus that is transferred to 2014 (incl. late transfers to participating universities) 1 532 808 In addition, BILS has received a grant from NordForsk for Nordic collaborations 509 331 kr for the period 2013 2015 21

Outlook for 2014 We are looking forward to an exciting 2014 with many actions by the bioinformatics infrastructures both in Sweden and in Europe. In BILS, we plan for extending the Genome Annotation Platform with genome assembly expertise. We will also recruit more genomics experts and systems developers in order to handle the increased requirements in these areas. In close collaboration with the SNIC centre UPPMAX, BILS is setting up a high security environment for analysis and storage of sensitive data, which is very timely now when SciLifeLab is launching the Swedish Genomes Project. We also expect more need for data storage and data publishing, so we plan for recruiting a data manager to BILS to continue to build the data infrastructure and to help users publish their data internationally in an optimal way. Furthermore, we expect the increased interactions between BILS and SciLifeLab s WABI and Bioinformatics platform to bear fruit in creating an integrated bioinformatics landscape in Sweden. On the European scene, ELIXIR is now taking off. Collaboration agreements between the ELIXIR hub and each ELIXIR node will be written during 2014. The first ELIXIR services will be launched, where the Swedish contribution starts with integrating the Human Protein Atlas into the ELIXIR landscape and enabling increased linking between different databases. Of international importance are also the Research Data Alliance and data aggregation and combination efforts like FAIRPORT (http://www.datafairport.org/). Finally, BILS will increase our training activities nation-wide since we see an increasing need for bioinformatics training at the PhD and post-doc levels. We also plan for shorter half-day seminars directed to PI s to present the possibilities that advanced bioinformatics can offer. 22

Appendix 1 BILS staff 2013 Dag Ahrén, Lund University Dag is specialized in comparative genomics, fungal transcriptomics, genomics and evolution. After he got his PhD at Lund University, he worked as senior scientist at the bioinformatics company BioBridge Computing AB. Later, he did a postdoc at the comparative genomics group at the European Bioinformatics Institute (EBI). As an assistant professor in Genomic Ecology Dag worked primarily with fungal genomics and evolution of fungal host interactions. Dag is active in the workshop Evolution and Genomics (http://www.evomics.org). Andrey Alexeyenko, Karolinska Institutet Andrey Alexeyenko is an expert in systems biology. He developed methods for integrating heterogeneous high-throughput, experimental, and literature data into global networks of functional coupling, and for applying the network to exploratory and predictive analyses of experimental and clinical data. The approach enables statistically sound interpretation of various data types, such as differential expression, methylation etc. The systems integration based on gene interaction networks has the following strengths: 1. all major types of molecular interactions are present in the global network of functional coupling between genes, proteins, and small molecules, 2. alterations of genomic sequence, methylation, transcription, protein abundance are rendered into the space of pathways and processes, and 3. the pathway- and network-based view enables efficient, low-dimensional statistical analysis and is transparent for biological interpretation of the data. Other existing methods in this field are e.g. Gene Set Enrichment Analysis and commercial products by GeneGO, genexplain, and Ingenuity (IPA). Assistance in using these tools and results interpretation can be provided. Andrey can also help with related issues in highthroughput data management, biostatistics, functional interpretation of genome variation, and bridging gaps between different sides of analysis. Magnus Alm Rosenblad, Gothenburg University Magnus will assist users in sequence analysis, both on a genomics level and transcriptomics. After brief studies in biology and chemistry Magnus got his MSc at Chalmers in computer technology 2001 and went on to a PhD in biomedicine focussing on RNA gene identification using bioinformatics. He worked at the Sahlgrenska Bioinformatics Core Facility before joining Anders Blomberg s group at Cell and Molecular Biology/GU as a postdoc in 2007. He is currently involved in several projects in genomics and/or transcriptomics for a diverse set of marine organisms, as well as metagenomics. 23

Eva-Britt Berglund, Linköping University, Economy Eva-Britt Berglund helps BILS with economy. Apart from BILS, Eva-Britt works at the SNIC centre NSC National Supercomputer Centre at Linköping University. Ann-Charlotte Berglund Sonnhammer, KTH Ann-Charlotte will provide national bioinformatics support for highthroughput analysis of data from next generation DNA sequencing at the national large scale sequencing center, (NGI, National Genomics Infrastructure, previously SNISS) located at the Science for Life Laboratory, Stockholm. Ann-Charlotte has a PhD in scientific computing. Between 2002 and 2005 Ann-Charlotte was a postdoc at the Stockholm Bioinformatics Centre in Stockholm, where she worked on orthology analysis, phylogenetics, and methods for adaptive evolution. 2005 2009 Ann-Charlotte held the position as SNIC application expert for bioinformatics, UPPMAX/LCB, Uppsala University. 2009 2010 Ann-Charlotte held the position as UPPMAX application expert for bioinformatics, Uppsala University. As application expert in bioinformatics, Ann-Charlotte helped users from the life science community to get access and use the computing resources provided by the SNIC centres. At NGI, Ann-Charlotte will work on setting up pipelines for the first analysis steps of data for the different types of projects. She will also evaluate, report on, and set up bioinformatics software at the different SNIC centres in collaboration with the application and system experts at the SNIC centres. More in-depth bioinformatics support for NGI projects will also be offered. Jorrit Boekel, Karolinska Institutet Jorrit Boekel has a background in molecular biology and bioinformatics, and is currently active as a programmer in the mass spectrometry labs of Janne Lehtiö and Lukas Käll at Science for Life Laboratories in Stockholm. His interests include biology, computational biology, computation, programming, and computer science. Within BILS he works on the infrastructure of proteomics analysis pipelines and user environments, and provides mass spectrometry analysis support. 24

Mikael Borg, Stockholm University, Technical coordinator Mikael Borg earned his PhD in physics at Lund University and has worked as a postdoc in biochemistry, computational biology and bioinformatics at the University of Toronto and at Copenhagen University. Mikael works as technical coordinator in BILS. Ino de Bruijn, KTH Ino de Bruijn has a MSc in Bioinformatics from Stockholm University and a BSc in Computer Science from the University of Amsterdam. While working as a project worker for the Environmental Genomics group at Scilifelab he specialized in metagenomics. Specifically assembly of next generation sequencing data from microbial communities, sequence alignment and large-scale computing. He has been a part of various metagenomic projects including projects on the human gut, human skin, moose rumen and the Baltic Sea. Moritz Buck, Uppsala University Moritz primarily gives support for the analysis of environmental metagenomic data of all sorts, however his expertise is wide spread including agent-base simulation, array transcriptomics, and diverse modelling and analysis techniques. Moritz studied Bioinformatics and modelling at the National Institute of Applied Sciences (INSA, Lyon, France). He holds a PhD in computer sciences from University of Hertfordshire (UK), where he modelled evolution of cooperation using multi-agent systems. His first postdoc was in a medical-systems biology group at University of Freiburg, where he analysed a large variety of data from mouse and cell-culture models. 25

Joakim Bygdell, Umeå University Joakim will offer general support on proteomic mass spectrometry data analysis. After finishing his PhD in biological mass spectrometry, where he focused on quantitative LC-MS based proteomics in plants, he joined the Computational Life Science Cluster at Umeå University. Luciano Fernandez, Gothenburg University Luciano has a background in computer sciences and has further specialized in biology and bioinformatics through a masters in bioinformatics at Chalmers and a PhD in the University of Gothenburg in microbiology with emphasis in bioinformatics. During his bioinformatics training he developed a bioinformatics framework for yeast phenomics data (PROPHECY) working towards the integration, analysis and presentation of yeast phenotypic data. Luciano is currently involved in expanding the PROPHECY framework and aiding research groups with the development of scientific applications and frameworks for biological data handling. Eva Freyhult, Uppsala University Eva supports statistical multivariate analysis and network analysis. Eva has a PhD in Bioinformatics from Uppsala University (2007), where she worked on various bioinformatic questions concerning non-coding RNA. After her PhD she worked as a postdoc/bioinformatician at Umeå University for two years (2008 2010), where she supported the medical faculty with various bioinformatic analysis and was involved in a project concerning normalization and clustering of microarray data. Back at Uppsala University (2010) she worked as a post doc in a Leukemia epigenetics project, where she used survival analysis to find factors important to relapse free survival of leukemic patients. Eva works as a BILS expert since February 2012. Eva supports statistical multivariate analysis and network analysis. This includes machine learning, cluster analysis, survival analysis, regression, classification, signal processing, data compression and transformation etc. A typical support question concerns samples of two or more types, where the goal is to distinguish between the types based on a set of measurements (gene expression, DNA methylation, absorption, clinical parameters etc). Usually part of the problem is to build a reliable classifier, but another important part is to detect measurements that are important for distinguishing the types. 26

Pontus Freyhult, Uppsala University, UPPMAX Pontus has worked in various roles within the fields of software development and operations and offers expertise in many different computer-centric fields, including performance. His current focus is working with solutions for large scale storage to provide short term as well as long term storage for biological data. David Gomez-Cabrero, Karolinska Institutet David Gomez-Cabrero offers bioinformatics services, especially those related to transcriptomics and methylation analysis. He has experience both in array analysis that includes standard arrays analysis and in the analysis of sequencing data, such as RNA-Seq and ChIP-Seq. David's background is statistics and mathematics, therefore he is specially interested in method development, and within BILS he can aid, not only in the analysis but into the selection of appropriate tools for the analysis of the different data types. Jonas Hagberg, Stockholm University After Jonas received his MSc in bioinformatics at Uppsala University, he worked as a research assistent at the Department of Evolutionary Biology and Molecular Evolution at the University. Between 2007 and 2011 Jonas held a position as a system expert at UPPMAX super computing centre, with focus on large scale storage, bioinformatics and NGS. He was the project manager at the start up of UPPNEX. Jonas developed the first irods system at UPPNEX and is also the developer of the portal for the National Genomics Infrastructure. Marc Hoeppner, Uppsala University Marc holds a PhD in Molecular Biology from Stockholm University. During his dissertation he worked on the origin and evolution of noncoding RNAs with a particular focus on eukaryote genomes. He then went on to do a post doc in the computational biology group of Manfred Grabherr at Uppsala University, exploring the use of next-generation sequencing data for genome annotation as well as collaborating on several 27

projects employing RNA sequencing to study gene function and evolution. Within BILS, Marc is part of the genome annotation team, providing support to Swedish genome projects in generating high quality annotations for their organism of interest. He can furthermore support transcriptome studies and has extensive experience in working with relational databases. Lukasz Huminiecki, Stockholm University Lukasz is working on the cancer data analysis tool and contributes with training activities in BILS. He obtained his MSc in human molecular genetics at The Institute of Human Genetics, Polish Academy of Sciences, Poznan, and completed doctoral studies in cancer research at Cancer Research UK, Institute of Molecular Medicine, Clinical Oncology Unit, Oxford University. He worked at the European Bioinformatics Institute with ENSEMBL, and underwent postdoctoral training in molecular evolution and bioinformatics with Ken Wolfe at the Trinity College Dublin, Dublin. In Sweden, Lukasz worked at the Karolinska Institutet Center for Genomics and Bioinformatics, as well as the EU Network of Excellence for systems biology (ENFIN) associate at Ludwig Institutet for Cancer Research Uppsala. Today, Lukasz is based at SciLifeLab, Stockholm, and affiliated with both Department of Cell Molecular Biology, Karolinska Institutet, and Department of Biochemistry and Biophysics, Stockholm University. Yvonne Kallberg, Karolinska Institutet Yvonne Kallberg is a bioinformatician with a background in computer science and as such she offers services within protein bioinformatics. She works with tools such as pair-wise comparison methods, multiple alignment methods, Hidden Markov models, secondary structure prediction methods, etc. Typical data resources involve Uniprot, Ensembl, Pfam and Interpro. Apart from this Yvonne can aid in local installations of bioinformatics tools and databases, create user defined databases and web-interfaces, and create pipelines for automatic execution. Diarmuid Kenny, Gothenburg University Diarmuid Received his PhD from the School of Chemistry at the National University of Ireland, Galway (Ireland). For his graduate studies he used mass spectrometry to investigate the structural properties of membrane and mucin associated oligosaccharides. Prior to joining BILS, Diarmuid worked for the proteomics core facility at the University of Gothenburg where he was involved in the design and implementation of qualitative and quantitative proteomic experiments using high-resolution mass spectrometers. Diarmuid will aid users in the interpretation of mass spectrometric data. In addition Diarmuid will work towards improving the data handling of the large amount of MS data accumulated in a typical mass spectrometric based proteomics laboratory. 28

Samuel Lampa, Uppsala University, UPPMAX Samuel's work within the UPPNEX project at UPPMAX super computing centre is focused on taking care of the data deluge from Next Generation Sequencing data. Samuel also occasionally do some work on the bioinformatics workbench Bioclipse, more specifically develops graphical clients for configuring and submitting bioinformatics jobs to compute clusters. Henrik Lantz, Uppsala University Henrik has a background in biology with a PhD in Systematic Biology from Uppsala University where he focused on phylogenetic patterns of flowering plants. Following this he did a post doc financed by the Swedish Taxonomy Initiative on plant-associated ascomycetes. This led to working as a bioinformatician at SLU, Uppsala, working with genome assembly and annotation of fungal and algal genomes, as well as continued phylogenetic analyses. Currently he is working with the annotation of several eukaryotic genomes in the group of Manfred Grabherr at Uppsala University and is in particular interested in merging and reconciling annotations based on different sources of data and/or different methods. Henrik can support genomic and transcriptomic projects with assembly and annotation, and is also able to aid in the planning of similar projects, for example with choosing the right type of sequence data for the project. Malin Larsson, Linköping University Malin Larsson has a PhD in biotechnology from the Royal Institute of Technology. Her graduate studies focused on analysis of genetic variation in gene regulation in relation to complex diseases. During her post doc at the Karolinska Institute, she has worked as a bioinformatician in projects involving whole genome association studies and analysis of copy number variation, in relation to cardiovascular disease. In BILS, Malin will provide bioinformatics support in projects involving next generation sequencing, biostatistics as well as general bioinformatics support. Fredrik Levander, Lund University Fredrik will help users within the mass spectrometry proteomics field. He will also work on setting up national data storage for proteomics data in close collaborations with SNIC centres. After his PhD in Applied Microbiology in 2001, Fredrik entered the field of computational proteomics via a bioinformatics company before joining the proteomics group of Peter James at Lund University. In this high-throughput environment he has since then been addressing the needs for automated data integration and data analysis, and he 29

has been active in the development of the Proteios platform for proteomics data management and analysis. He is also involved in the Proteomics Standards Initiative (PSI) of the Human Proteome Organisation (HUPO), which is creating standards for exchange of proteomics data. Sara Light, Stockholm University Sara Light has a PhD in theoretical chemistry from Stockholm Bioinformatics Center. Her graduate studies primarily concerned the evolution of metabolic networks and protein-protein interaction networks. Thereafter, she worked as a postdoctoral researcher at Lawrence Livermore National Laboratory and the Joint Genome Institute. There, she participated in developing a predictor of metabolic content from metagenomics and 16S rrna sequences. During 2009 and 2011 she worked primarily on the evolution of proteins containing domain repeats, but also worked on gene finding and genome annotation in prokaryotes. During the coming year, she will, among other things, contribute to the development annotation pipeline for proteins containing domain repeats. Jessica Lindvall, Karolinska Huddinge Jessica Lindvall earned her PhD in Molecular Cell Biology from Karolinska Institutet (Sweden) in 2005 and after a competitive challenge received a Bioinformatics post-doctoral fellowship in Oslo, Norway at the University of Oslo where she stayed in 2006 2009. She has a solid track record (currently over 20 peer-reviewed articles in high impact journals) and has taken part and led multiple research projects. The red thread in her line of expertise is the use of the applied bioinformatics methods that she uses and develops for analyzing high throughput data. Both past and present projects are collaborative by nature and new collaborations are continuously built, which will complement my present skills. To date Jessica s main focus lies on analyzing high throughput methylation data with special emphasis on the Illumina 450K methylation chips and the processing thereof. Here, the aim is to provide both a hypothesis-free analysis approach as well as a more hypothesis-driven approach using solid statistics and systems biology methods. This will give biological nuances and depth to the scientific data produced from high throughput screens helping the researcher to understand the data and answer the biological question. Jessica has extensive experience in project management and development. During her research years, she has successfully built a wide network of both national and international researchers as well as people connected to the University education area. She has also received several grants as well as actively participated in the scientific debate by giving presentations and participating in both international and national conferences on a regular basis. She sees herself as a natural ambassador for her area of expertise and continuously build close connections in various arenas, both within and outside academia. Competences: Project management, high throughput analyses, applied bioinformatics, Systems biology, Statistics, Communicating science, Entrepreneurial skills. 30

Daniel Lundin, KTH Daniel Lundin has a PhD in molecular biology with an emphasis on molecular evolution from Stockholm University. As a PhD student Daniel was involved in phylogenetic and other evolutionary analyses of the enzyme ribonucleotide reductase. As a postdoc, Daniel has been involved in environmental genomics, including metagenomics and amplicon based diversity analyses. Before his PhD period, Daniel was working as a consulting database developer. Daniel is an expert in phylogenetics, sequence analyses such as HMMER and high performance bioinformatics computing. Biological applications focus on environmental genomics including amplicon based diversity research, frequency oriented analyses based on metagenomic reads and assembly of metagenomes. Henrik Lysell, SLU Uppsala Henrik received his MSc in molecular biology at Uppsala University and works as a research engineer at SLU Global Bioinformatics Centre, a part of the Swedish University of Agricultural Sciences (SLU). His tasks include developing and maintaining servers and websites, setting up tools for bioinformatics analysis, and acting as a general bioinformatics platform for BILS. His work ties into the EU FP7 funded project AllBio, which aims to coordinate efforts in European bioinformatics. This includes collecting test cases - questionnaires provided by researchers - to detect gaps in existing bioinformatics tools. Jia Mi, Uppsala University Jia Mi obtained his PhD degree in proteomics from Uppsala University. Currently he is a bioinformatican at Mass Spectrometry Proteomics Platform, Science for Life Lab, Uppsala. Before joining BILS, he worked as bioinformatican in AstraZeneca. 31

Intawat Nookaew, Chalmers Intawat (PhD) will assist users in large-scale data analysis and integration (systems biology). This includes genomics analysis (genome sequencing/resequencing, metagenome sequencing, comparative genomics), transcriptomics analysis (RNA-seq and microarray), omics data integrations and genome-scale metabolic modeling. He has developed tools websites, databases and web-services for bioinformatics and systems biology analysis. He has worked at the Life Science division, department of Chemical and Biological Engineering at Chalmers University of Technology. He is one of the founding members of the Gothenburg Bioinformatics Network (GOTBIN). He is currently involved in several projects in genomics, transcriptomics and metagenomics for a diverse data set of clinical and biotechnological areas. Johan Nylander, NRM (Swedish Museum of Natural History) Johan will provide national support for evolutionary sequence analysis, phylogenetics and phylogenomics. After a PhD in Systematic Zoology, with emphasis on methods for phylogenetic inference and models of molecular evolution, Johan did a Post doc at Florida State University, USA, and has since then been working as a bioinformatician at the Department of Botany, Stockholm University, and at the Natural History Museum, University of Oslo, Norway. Johan has worked with many aspects of evolutionary data analysis, and areas where he can be of assistance includes the application and methods for phylogenetic inference, analysis of character/trait evolution, sequence alignment, historical biogeography, molecular dating, model selection and model averaging, applications of Bayesian Markov chain Monte Carlo methods, methods and strategies for large-scale phylogenetic analyses ("phylogenomics"), and diversity analyses from metagenomic data. Bengt Persson, Uppsala University, Director Bengt is professor of bioinformatics at Uppsala University and affiliated to Science for Life Laboratory and Karolinska Institutet. He has been leading BILS since its pilot phase and as director from 2010. Until February 2013, he was professor of bioinformatics at Linköping University. Bengt has also been active in ELIXIR since 2007 and is member of its iterim board. Bengt s research is centred on large-scale protein family classification and prediction of molecular effects of disease-causing mutants. 32

Rui Climaco Pinto, Umeå University Rui's role is to raise awareness of metabolomics and chemometrics at the national level, and provide chemometrics data analysis support and education. He is also be involved in the chemometrics pipeline development for metabolomics data and in the preparation of a database for its storage and retrieval. He works at the Chemistry department (KBC) of Umeå University, embedded with the Computational Life Science Cluster (CLiC) and the newly created Umeå metabolomics core. Rui has a PhD in Analytical Chemistry, focused on chemometrics and spectroscopic methods, from AgroParisTech (Prof. Douglas N. Rutledge) and University of Aveiro (Dr. António S. Barros). He has worked on the application of chemometrics methods to metabolomics data during both postdocs with Johan Trygg at Umeå University and Thomas Moritz at SLU-Umeå. The main focus of this work has been the analysis of data from different platforms for functional genomics and characterization of plants; studies related to the diagnosis of type I Diabetes; and the role of oxylipins in inflammatory disease. He has been teaching Chemometrics in MSc programs (Design of Experiments, univariate and multivariate analysis) and Matlab in PhD programs. Katarina Truvé, Gothenburg University Katarina received her PhD in molecular bioscience at the Swedish University of Agricultural Sciences. Her graduate studies focused on using bioinformatics methods to identify disease causing mutations. Since human and dogs share many common disorders, as e.g. cancer and autoimmune disease, the dog was used as a model for human disease. The projects involved genome wide association studies that were followed by targeted Next Generation Sequencing (NGS) of associated regions. In BILS, Katarina works with guidance and advice in the analysis and interpretation of clinical NGS data and with development of infrastructure and pipelines for data management and analysis. Jeanette Tångrot, Umeå University Jeanette will primarily help users with data from Next Generation Sequencing, but also offers general bioinformatics support. She has a PhD in Computing Science, focussing on hidden Markov models and prediction of protein domain structure. As a postdoc she worked on assembly, annotation, and comparison of bacterial genomes. Now, she's part of the newly formed genomics core at Umeå University, which, among other services, provides infrastructure and competence for preprocessing of NGS data. 33

Mats Töpel, Gothenburg University Mats earned his PhD in Systematic Botany at the University of Gothenburg where he used phylogenetic- and phyloclimatic methods to analyse evolutionary patterns in plants. He has after that worked with phylogenomics projects at the University of Leicester, UK, focusing on the evolution of chloroplasts and protein translocation. He is currently working in the CeMEB project where he is doing de novo whole genome sequencing of a number of marine organisms in close collaboration with Magnus Alm Rosenblad, a fellow BILS expert. He is one of ten founding members of the Gothenburg Bioinformatics Network (GOTBIN), and has a strong interest in teaching and using high performance computing in biological sciences. He provides support to genome and transcriptome projects with e.g assembly of de novo- or resequenced genomes as well as downstream comparative genomics analyses such as phylogenomics. Estelle Wera, SLU Alnarp Estelle Wera has a PhD in genetics with an emphasis on bioinformatics and molecular evolution from Trinity College Dublin, Ireland. As a PhD student Estelle developed a pipeline for the automatic annotation of yeast genomes. She is now a PostDoc at SLU Alnarp, working for PlantLink. Her main task is to analyze data generated by high-throughput sequencing technologies. During 2012, she has been affiliated wtih BILS, and from 2013 she works part-time within BILS. Victoria Westling, Linköping University, Administration Victoria Westling is from mid-august helping BILS with general administration. Apart from BILS, Victoria works at the SNIC centre NSC National Supercomputer Centre at Linköping University Kristin Wiberg, Linköping University, Administration Kristin Wiberg helped until mid-august BILS with general administration. Apart from BILS, Kristin is also at the SNIC centre NSC National Supercomputer Centre at Linköping University. 34

Appendix 2 List of projects 2013 Projects listed are those in which BILS experts have spent more than 20 hours Project PI Organisation ADAPT- galaxy/proteomics Janne Lehtiö KI Adenovirus receptor prediction using a proteinprotein Niklas Arnberg UmU interaction ALL epigenetics: Targets and function of DNA Ann-Christine Syvänen UU methylation in acute leukemia Amoeba RNA assembly/annotation Fredrik Söderbom UU Amphiura genome IMAGO Carl Andrée GU Amplicon (454) metagenomics analsysis Agnes Wold GU Analysis gut metagenomics T2D/EGO Fedrick Bäckhed GU Analysis H.pyrori starins genome sequenceing Samuel Ludin GU Analysis metatranscriptomics data from mouse gut Fedrick Bäched GU Analysis micro array from auto immune diseases Bob Olsen GU Analysis microarray data, mouse vacine trial Ali Harandi GU Analysis microarray from high salt fed rat Gregor Guron GU Analysis microarray from pufa fed mice related John-Olov Jansson GU with immune system Analysis of autoantibodies Nils Landegren UU Analysis of microarray data for mouse gut Fedrick Bäckhed GU Analysis of pdr2-l3 3'UTR clone for the ability to Andras Simon KI bind microrna sequences Analysis Rat fed with resveratrol RNA-seq Jens Nielsen Chalmers Analysis yeast butanol toloreance strain through Joakim Norbeck Chalmers genome sequencing Analysismicro array data from A. Oryzae for Jens Nielsen Chalmers optimization of malic acid production Annotation platform development BILS Aorta proteomics study Johan Gobom GU Back To PRECOG Anders Blomberg GU Bact genome annotation Jan-Willern de Gier SU Balanus genome IMAGO Anders Blomberg GU Balanus RNAseq (added species!!) Carl Andrée GU Basidiomycete RNASeq LU Biomarkers for Chronic Pain Pathophysiology Anne-Li Lind UU BioMET toolbox 2 Jens Nielsen Chalmers BOLD-mirror (boldsystems.org) website at NRM Johan Nylander, Fredrik NRM Ronquist Booking system and file repository for the Johan Nylander, Martin NRM Molecular Systematics Lab, NRM Irestedt BRICHOS Janne Johansson KI Cancer mutation analysis BILS Cancer reincidence project Martin Stenson, Per-Ola Sahlgrenska Andersson Cardiomyopathy Martin Bergö / Martin Sahlgrenska Dahlin Classification of drugs targeting soluble and Helena Strömbergsson UU membrane-bound targets Clonal Expansion Mikael Sigvardsson LiU Cloudgene UPPMAX UPPMAX Clustering for metagenomics Anders Andersson KTH Comparison of pneumococcal strains Birgitta Henriques-Normark KI 35

and Staffan Normark Crow genome annotation Jochen Wolf UU Crow genome project Jochen Wolf UU Cyanobacterial Metagenomics in the Indian Beatriz Diez, Birgitta SU Ocean Bergman Data curation, Kettle Imad Abugessaisa KI Dating the diversification of the major lineages of Per Ericson NRM Passeriformes (Aves). Dating the diversification of the major lineages of Per Ericson NRM Passeriformes (Aves). Development of tools in Matlab BILS Development of tools in R BILS Diatom genome IMAGO Anna Godhe GU Dictyostelium small RNA analysis Fredrik Söderbom UU Dinflagelate transcriptomics Moore Karin Rengefors LU Dinoflagellate Karin Rengefors LU Dinoflagellate phylogeny Karin Rengefors LU DNA methylation in brain development Henrik Alm UU DNA-Key - data base and web portal for genetic Johan Nylander, Fredrik NRM identification of Swedish fauna and flora Ronquist DNA-Methylation in MS Maja Jagodic KI DNA-nyckeln Per Ericson NRM doi BILS EB virus - human interaction network Elena Kashuba KI Epigenome 1.0. Compare duplication patterns for Andreas Lennartson KI epigenetic regulators with those for core histones Episome annotation Jan-Willern de Gier SU Erythrocyte fragility Linnea Eriksson, Chemical Chalmers and Biological Engineering, Food Science, Chalmers University of Technology Evaluation of Exclusion list software GU Evaluation of MS processing software (Maxquant) GU Evaluation of phylogenetic binning pipeline for Bjorn Andersson KI viral metagenomic analysis evaluation of Python computer vision libraries and Mats Nilsson, DBB, SU SU OpenCV for image analysis of Padlock probe data Evaluation of software variant filter Sahlgrenska Core Facility Evolution av tandemproteiner Anders Hofe UmU Examination of the mouth flora in children Pernilla Lif Holgersson NUS Examining the sequence difference between two Sven Bergström UmU pneumococcal colonies Exomprojekt Mona Ståhl SU FANTOM5 Yoshihide Hayashizaki Japan Fatty acid transporter Daniel Daley SU Fido-projektet (Webb-baserat gränssnitt för Jonas Hagberg BILS publicering av bioinformatiska verktyg) Finding gene fusion in cancer cells Frida Abel GU Francisella project Pär Larsson FOI Fucus genome IMAGO Ric Peyreira GU Functional activity of mutated p53 Galina Selivanova KI Fungal endophyte project Jan Stenlid SLU General support at SciLifeLab Glycomics prep plus meeting Niclas Karlsson LU HaloPlex Mohsen Kharimi SU HIV mosaic consensus Maria Issagouliantis KI SciLifeLab 36

Human Protein Atlas integration into the ELIXIR landscape Mathias Uhlén SciLifeLab och BILS Humna Metabolic Atlas project Jens Nielsen Chalmers Hydrophobin phylogeny Francois Rineau LU Identifying biomarkers for cancer Jonas Nilsson UmU Immunohistochemistry of gliomas Linda Sooman UU Inferring cell states using data sparsity Erik Aurell KTH Integration of Molecular lab LIMS with the Fredrik Ronquist, Martin NRM Museum collection data bases (DINA) Irestedt Ion Torrent Tomas Johansson LU IVF Jonas Bergquist UU komparativ genomik laktobaciller Stefan Roos SLU LC suitability GU Leukemia (CLL) research network Richard Rosenqvist UU Leukemia RNA-Seq Eva Hellström-Lindberg KI LIMS Thijs Ettema UU Littorina genome (added RNA data!) Carl Andrée GU Local Database for Cancer Local Galaxyserver at NRM Johan Nylander, Fredrik Ronquist SciLifeLab NRM MAF core facility Maintain BILS server system BILS BILS Maintain ScilifeLab MS facility Jonas Bergquist UU Martin Ott Martin Ott SU Master project teaching Claes von Wachenfeldt LU Meatbolomics analysis Ali Harandi GU Metabolomics pipeline development (DoE) in Umeå Metagenomic analysis of spider gut content Johan Trygg Elisabeth Weingartner, Peter Hambäck Huddinge UmU Metagenomics LnU Metagenomics Pipeline Alexander Eiler UU Methods update UmU micrornas in ANCA vasculitis Camilla Skoglund LiU MIMEBS Baltic Sea Assembly Birgitta Bergman SU mirna targeting in cancer Linda Bjölmar LiU ML tools UU Modelling of insulin Peter Bergsten UU Monacrosporium 454 Transcriptomics Anders Tunlid LU Monacrosporium proteomics Anders Tunlid LU Monacrosporium transcriptomics Dag Ahren, Anders Tunlid LU MS core facilty UmU MS data maintenace LU MS pipeline maintenance BILS BILS MS RNAseq vs genotype Tomas Ohlsson KI Mseq 16S setup Fedrick Bäckhed GU multivariate analysis of Chip-seq cohesin data Camilla Sjögren KI Multivariate analysis of Tapasin dependency Kajsa Paulsson LU Network enrichment analysis: software and web suite Neurodegenerative consequences of HSV1 infection Fredrik Elgh, Institutionen för klinisk mikrobiologi/virologi, UmU SU BILS UmU NGI Joakim Lundeberg NGI NGS analysis (procardis exome seq) Anders Hamsten KI NGS sequencing Helena Westerdahl LU 37

NMR data analysis pipeline development Göran Karlsson, GU GU Novel transporters Daniel Daley SU Nutrition data analysis Lars Ellegård (Sahlgrenska, Sahlgrenska Gothenburg) Nästa generations DNA-sekvensering i en Stellan Mörner, Folkhälsa och UmU population med hypertrofisk kardiomyopati klinisk medicin, UmU/NUS optimization UU Orphan proteins Arne Elofsson SU Oviposition behaviour of malaria mosquitoes Jenny Lindh KTH Phylogeny of Accentors (Aves:Prunellidae) Per Alström SLU Platform installation Uppsala Margareta Ramström UU PRECOG Anders Blomberg GU Predicting early development of obesity Peter Bergsten UU PRIDE export-mod peptides BILS Project Designer (New Project) Anders Blomberg, GU GU Protein interaction network Jeanette Hellgren Kotaleski KTH ProteomeXchange BILS PSI meeting Liverpool + prep BILS R package duplicator BILS Review paper Arne Elofsson SU RNA-seq analysis Georg Klein KI RNAseq analysis: Effects of exposure of Ti02-NP Susana Cristobal LiU on endothelial cell RNAseq of bacteria and mouse Patrik Rydén, Anders Sjöstedt UmU RNAseq pipeline running and testing Sahlgrenska Core Facility ScilifeLab MS-13-012 Britt Skogseid UU ScilifeLab MS-13-021 Irene Söderhall UU SDRdb Bengt Persson and UU international consortium Sequence capture Katarina Hedlund LU Server installation LU Set up software for variant filtering Sahlgrenska Core Facility Setting up local patient database Sahlgrenska Core Facility Short chain dehydrogenases reductases Bengt Persson UU Short chain dehydrogenases reductases Lars Arvestad SU Short-term support BILS Single cell project SciLifeLab SciLifeLab Small genome assembly pipeline Pär Larsson FOI Small projects support UmU Spliced variants in RNAseq Katarina Ejeskär GU Spruce Genome Sequencing and Annotation Stefan Jansson UmU Spruce transcriptome project Nathaniel Street UmU Statistical analysis (PTAA) Karin Magnusson LiU Support for Swedish NMR Center Göran Karlsson GU Surirella annotation project Anders Blomberg GU Swestore BILS Swestore and proteios installation preparation GU Targeted Metagenomics Katarina Hedlund LU Temporal switch under neuronal differentiation Johan Ericson KI Testing metagenomic tools BILS ThoAno Olle Terenius SLU TM function Daniel Daley SU TM protein evolution Daniel Daley SU 38

TPMT genotyping Malin Lindqvist Appell LiU Translation rate Daniel Daley SU Tryggve BILS BILS Utveckling av grafisk kluster-klient, baserad på Ola Spjuth UU Eclipse RCP / Bioclipse Variant filtering/ Periodic fever syndrome Mia Olsson GU Virus-infected butterflies Olle Terenius SLU VSS UU Xeromyces genome project Johan Schnürer, SLU-Ultuna SLU Yeast SNP Valeria Wallace LTH Zn finger evolution Jens Lagergren KTH 39

Appendix 3 Publications 2013 Below are listed publications with BILS staff as co-authors or acknowledged: Ahsberg, J., Ungerback, J., Strid, T., Welinder, E., Stjernberg, J., Larsson, M., Qian, H., and Sigvardsson, M. (2013) Early B-cell Factor 1 regulates the expansion of B-cell progenitors in a dose dependent manner. J. Biol. Chem., 288, 33449 33461. Andersson, K. M., Meerupati, T., Levander, F., Friman, E., Ahren, D., and Tunlid, A. (2013) Proteome of the nematode-trapping cells of the fungus Monacrosporium haptotylum. Appl. Environ. Microbiol., 79, 4993 5004. Anna Edberg, Eva Freyhult, Salomon Sand, Sisse Fagt, Vibeke Kildegaard Knudsen, Lene Frost Andersen, Anna Karin Lindroos, Daniel Soeria-Atmadja, Mats G. Gustafsson and Hammerling, U. (2013) Discovery and characterisation of dietary patterns in two Nordic countries: Using nonsupervised and supervised multivariate statistical techniques to analyse dietary survey data. TemaNord, 548. Bendz, M., Skwark, M., Nilsson, D., Granholm, V., Cristobal, S., Käll, L., and Elofsson, A. (2013) Membrane protein shaving with thermolysin can be used to evaluate topology predictors. PROTEOMICS, 13, 1467 1480. Butler, E., Alsterfjord, M., Olofsson, T., Karlsson, C., Malmstrom, J., and Vasquez, A. (2013) Proteins of novel lactic acid bacteria from Apis mellifera mellifera: an insight into the production of known extra-cellular proteins during microbial stress. BMC Microbiology, 13, 235. Hansen, K., Perry, B. A., Dranginis, A. W., and Pfister, D. H. (2013) A phylogeny of the highly diverse cup-fungus family Pyronemataceae (Pezizomycetes, Ascomycota) clarifies relationships and evolution of selected life history traits. Mol. Phylogenet. Evol., 67, 311 335. Jornvall, H., Hedlund, J., Bergman, T., Kallberg, Y., Cederlund, E., and Persson, B. (2013) Origin and evolution of medium chain alcohol dehydrogenases. Chem. Biol. Interact., 202, 91 96. Kontham, V., Holst, S. von, and Lindblom, A. (2013) Linkage analysis in familial non-lynch syndrome colorectal cancer families from sweden. PLoS ONE, 8, e83936. Lampa, S., Dahlo, M., Olason, P. I., Hagberg, J., and Spjuth, O. (2013) Lessons learned from implementing a national infrastructure in Sweden for storage and analysis of next-generation sequencing data. Gigascience, 2, 9. Light, S. and Elofsson, A. (2013) The impact of splicing on protein domain architecture. Curr. Opin. Struct. Biol., 23, 451 458. Light, S., Sagit, R., Ekman, D., and Elofsson, A. (2013) Long indels are disordered: A study of disorder and indels in homologous eukaryotic proteins. Biochim. Biophys. Acta, 1834, 890 897. Light, S., Sagit, R., Sachenkova, O., Ekman, D., and Elofsson, A. (2013) Protein expansion is primarily due to indels in intrinsically disordered regions. Mol. Biol. Evol., 30, 2645 2653. 40

Lind, U., Alm Rosenblad, M., Wrange, A. L., Sundell, K. S., Jonsson, P. R., Andre, C., Havenhand, J., and Blomberg, A. (2013) Molecular characterization of the \alpha-subunit of Na^+/K^+ ATPase from the euryhaline barnacle Balanus improvisus reveals multiple genes and differential expression of alternative splice variants. PLoS ONE, 8, e77069. Mangold, S., Rao Jonna, V., and Dopson, M. (2013) Response of Acidithiobacillus caldus toward suboptimal ph conditions. Extremophiles, 17, 689 696. Mayer, G., Montecchi-Palazzi, L., Ovelleiro, D., Jones, A. R., Binz, P.-A., Deutsch, E. W., Chambers, M., Kallhardt, M., Levander, F., Shofstahl, J., Orchard, S., Antonio Vizcaíno, J., Hermjakob, H., Stephan, C., Meyer, H. E., and Eisenacher, M. (2013) The HUPO proteomics standards initiative- mass spectrometry controlled vocabulary. Database, 2013. Meerupati, T., Andersson, K. M., Friman, E., Kumar, D., Tunlid, A., and Ahren, D. (2013) Genomic mechanisms accounting for the adaptation to parasitism in nematode-trapping fungi. PLoS Genet., 9, e1003909. Nookaew, I., Svensson, P. A., Jacobson, P., Jernas, M., Taube, M., Larsson, I., Andersson- Assarsson, J. C., Sjostrom, L., Froguel, P., Walley, A., Nielsen, J., and Carlsson, L. M. (2013) Adipose tissue resting energy expenditure and expression of genes involved in mitochondrial function are higher in women than in men. J. Clin. Endocrinol. Metab., 98, E370 378. Nookaew, I., Thorell, K., Worah, K., Wang, S., Lloyd Hibberd, M., Sjovall, H., Pettersson, S., Nielsen, J., and Lundin, S. B. (2013) Transcriptome signatures in Helicobacter pylori-infected mucosa identifies acidic mammalian chitinase loss as a corpus atrophy marker. BMC Med Genomics, 6, 41. Nørholm, M. H., Toddo, S., Virkki, M. T., Light, S., Heijne, G. von, and Daley, D. O. (2013) Improved production of membrane proteins in Escherichia coli by selective codon substitutions. FEBS Lett., 587, 2352 2358. Persson, B. and Kallberg, Y. (2013) Classification and nomenclature of the superfamily of shortchain dehydrogenases/reductases (SDRs). Chem. Biol. Interact., 202, 111 115. Rowe, M., Laskemoen, T., Johnsen, A., and Lifjeld, J. T. (2013) Evolution of sperm structure and energetics in passerine birds. Proc. Biol. Sci., 280, 20122616. Sandin, M., Ali, A., Hansson, K., Mansson, O., Andreasson, E., Resjo, S., and Levander, F. (2013) An Adaptive Alignment Algorithm for Quality-controlled Label-free LC-MS. Mol. Cell Proteomics, 12, 1407 1420. Sanli, K., Karlsson, F. H., Nookaew, I., and Nielsen, J. (2013) FANTOM: Functional and taxonomic analysis of metagenomes. BMC Bioinformatics, 14, 38. Sooman, L., Lennartsson, J., Gullbo, J., Bergqvist, M., Tsakonas, G., Johansson, F., Edqvist, P.- H., Pontén, F., Jaiswal, A., Navani, S., Alafuzoff, I., Popova, S., Blomquist, E., and Ekman, S. (2013) Vandetanib combined with a p38 MAPK inhibitor synergistically reduces glioblastoma cell survival. Medical oncology (Northwood, London, England), 30, 638. 41

Teleman, J., Waldemarson, S., Malmström, J., and Levander, F. (2013) Automated quality control system for LC-SRM setups. Journal of Proteomics, 95, 77 83. Teschendorff, A. E., Marabita, F., Lechner, M., Bartlett, T., Tegner, J., Gomez-Cabrero, D., and Beck, S. (2013) A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data. Bioinformatics, 29, 189 196. 42

Appendix 4 Abbreviations BILS Bioinformatics Infrastructure for Life Sciences EBI European Bioinformatics Institute ELIXIR European Infrastructure for Biological Information EMBL European Molecular Biology Laboratory FTE Full time equivalent GU Göteborgs Universitet, University of Gothenburg irods -- Integrated Rule Oriented Data System KI Karolinska Institutet, Stockholm KTH Kungliga Tekniska Högskolan, Royal Institute of Technology, Stockholm LiU Linköping University LnU Linnéuniversitetet LTH Lunds Tekniska Högskola LU Lund University NeIC Nordic escience Infrastructure Collaboration NGI National Genomics Infrastructure NGS Next Generation Sequencing NRM Naturhistoriska Riksmuseet, Swedish Museum of Natural History, Stockholm NSC National Supercomputer Centre at Linköping University NUS Norrlands Universitetssjukhus PDC PDC Centre for High Performance Computing at KTH PI Primary investigator SAB Scientific Advisory Board SciLifeLab Science for Life Laboratory SILS Systems Biology Infrastructure for Life Sciences SLU Sveriges Lantbruksuniversitet, Swedish University for Agricultural Sciences SNIC Swedish National Infrastructure for Computing SNISS Svensk Nationell Infrastruktur för Storskalig Sekvensning, from 2013 NGI SU Stockholm University UmU Umeå University UPPMAX Uppsala Multidisciplinary Center for Advanced Computational Science UPPNEX Project at UPPMAX providing computing and storage resources for NGS UU Uppsala University VR Vetenskapsrådet, Swedish Research Council WABI Wallenberg Advanced Bioinformatics Infrastructure 43

http://bils.se