HAPPy Heavy Atom Phasing in Python. Dan Rolfe, Paul Emsley, Charles Ballard Maria Turkenburg, Eleanor Dodson



Similar documents
Project Tracking System for Automated Structure Solution Software Pipelines

SQL Server Training Course Content

What's new in CCP4. Charles Ballard. CCP4, Research Complex at Harwell, Rutherford Appleton Laboratory, Didcot OX11 0FA, UK

Phasing & Substructure Solution

Neutron structure determination

An introduction in crystal structure solution and refinement

Oracle Data Integrator: Administration and Development

Application Notes for Configuring Dorado Software Redcell Enterprise Bundle using SNMP with Avaya Communication Manager - Issue 1.

SQL Server Administrator Introduction - 3 Days Objectives

Microsoft SQL Database Administrator Certification

PyRy3D: a software tool for modeling of large macromolecular complexes MODELING OF STRUCTURES FOR LARGE MACROMOLECULAR COMPLEXES

MySQL 5.0 vs. Microsoft SQL Server 2005

MicroStrategy Course Catalog

Microsoft SQL Business Intelligence Boot Camp

Table of Contents. P a g e 2

1 File Processing Systems

CCP4 NEWSLETTER ON PROTEIN CRYSTALLOGRAPHY

References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline

Introduction to Database Systems CSE 444

Managing Third Party Databases and Building Your Data Warehouse

Karl Lum Partner, LabKey Software Evolution of Connectivity in LabKey Server

Simple Powerful. Efficient! Inventory. Network Audit and Computer Inventory Within Minutes. DATA CONCEPT software. PC Hardware Inventory

IBM InfoSphere MDM Server v9.0. Version: Demo. Page <<1/11>>

(Refer Slide Time: 01:52)

Perl & NoSQL Focus on MongoDB. Jean-Marie Gouarné jmgdoc@cpan.org

Exam Name: IBM InfoSphere MDM Server v9.0

aaps algacom Account Provisioning System

MySQL 6.0 Backup. Replication and Backup Team. Dr. Lars Thalmann Dr. Charles A. Bell Rafal Somla. Presented by, MySQL & O Reilly Media, Inc.

Developing Microsoft SQL Server Databases 20464C; 5 Days

Object Relational Mapping for Database Integration

What is An Introduction

Basler scout AREA SCAN CAMERAS

LearnFromGuru Polish your knowledge

irods and Metadata survey Version 0.1 Date March Abhijeet Kodgire 25th

SSIS Training: Introduction to SQL Server Integration Services Duration: 3 days

File Server Migration

Basler dart AREA SCAN CAMERAS. Board level cameras with bare board, S- and CS-mount options

20464C: Developing Microsoft SQL Server Databases

Microsoft Dynamics NAV Connector. User Guide

A Service for Data-Intensive Computations on Virtual Clusters

Online EFFECTIVE AS OF JANUARY 2013

Outlines. Business Intelligence. What Is Business Intelligence? Data mining life cycle

SQL Server 2005 Features Comparison

What s new in TIBCO Spotfire 6.5

SQL Server 2012 Optimization, Performance Tuning and Troubleshooting

Introducing Version 7 SOFTWARE MODULES

Avid. Avid Interplay Web Services. Version 2.0

Deploying De-Duplication on Ext4 File System

Developing Microsoft SQL Server Databases

Implementing a Microsoft SQL Server 2005 Database

Use a Native XML Database for Your XML Data

Course 20464: Developing Microsoft SQL Server Databases

A Python Tour: Just a Brief Introduction CS 303e: Elements of Computers and Programming

Generic Log Analyzer Using Hadoop Mapreduce Framework

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

PASS4TEST 専 門 IT 認 証 試 験 問 題 集 提 供 者

2. Distributed Handwriting Recognition. Abstract. 1. Introduction

Introduction to Database Systems. Chapter 1 Introduction. Chapter 1 Introduction

An Oracle White Paper February Oracle Data Integrator 12c Architecture Overview

Azure Data Lake Analytics

Developing Microsoft SQL Server Databases MOC 20464

Oracle Data Integrators for Beginners. Presented by: Dip Jadawala Company: BizTech Session ID: 9950

CDS Invenio - a software solution for National Repository of Grey Literature

Programa de Actualización Profesional ACTI Oracle Database 11g: SQL Tuning Workshop

DBMS / Business Intelligence, SQL Server

Developing Microsoft SharePoint Server 2013 Advanced Solutions

2933A: Developing Business Process and Integration Solutions Using Microsoft BizTalk Server 2006

LEARNING SOLUTIONS website milner.com/learning phone

database abstraction layer database abstraction layers in PHP Lukas Smith BackendMedia

Database Services for CERN

Data Warehousing. Paper

How To Retire A Legacy System From Healthcare With A Flatirons Eas Application Retirement Solution

Availability Monitoring using Http Ping

II. PREVIOUS RELATED WORK

Microsoft SQL Server for Oracle DBAs Course 40045; 4 Days, Instructor-led

Modeling Web Applications Using Java And XML Related Technologies

THE SEMANTIC WEB AND IT`S APPLICATIONS

CitusDB Architecture for Real-Time Big Data

Online Courses. Version 9 Comprehensive Series. What's New Series

KonyOne Server Prerequisites _ MS SQL Server

Oracle Architecture, Concepts & Facilities

6231B: Maintaining a Microsoft SQL Server 2008 R2 Database

Big Data Governance Certification Self-Study Kit Bundle

ORACLE DATABASE 10G ENTERPRISE EDITION

Oracle Database 11g: SQL Tuning Workshop

Oracle to MySQL Migration

Data processing goes big

Test Design Strategies

SIPAC. Signals and Data Identification, Processing, Analysis, and Classification

Programmable Logic Controllers

Transcription:

HAPPy Heavy Atom Phasing in Python Dan Rolfe, Paul Emsley, Charles Ballard Maria Turkenburg, Eleanor Dodson 1

What is it? New automated experimental phasing pipeline. Replace and expand on the capabilities of Paul Emsley's Chart package. It will... Take integrated and merged experimental data amplitudes (posttruncate), de-twinned, consistently indexed. Determine the heavy atom structure and phase probabilities. Optimize the density map to give interpretable map. Build structure. First release will handle SAD data only. MAD, MIR, MIRAS modes later. 2

Implementation Written in Python. Uses EMMA from cctbx, Driver from XIA. Employ existing packages for the various stages of the structure solution. Use CCP4 programs where possible. Modular. Different modules for HA search, phasing, density modification/phase improvement will be interchangeable. SHELXD for heavy atom substructure, MLPHARE, PHASER, BP3 for SAD phasing, PIRATE for phase improvement. Will add DM module. BUCCANEER and COOT for model building in the future. Designed to cooperate with other automation packages e.g. take output from automated data processing software DNA/XIA-DP (Graeme Winter). Use well-defined APIs and data formats where data exchange is necessary. 3

Will use and is contributing to the next generation CCP4(i) database (Peter Briggs, Wanjuan (Wendy) Yang). Working on performance testing suite, including reference structure comparisons for test data sets using Clipper utilities. Enables tracking of HAPPy performance over time. 4

What it already does 5

What's still needed for SAD release (not exhaustive) More rigorous substructure determination. More intelligent use of SHELXD run it for longer when needed. Better logic for which phasing programs to use when. e.g Try quick solution (e.g. MLPHARE). Try slower more sophisticated method (e.g. PHASER) if MLPHARE fails. Improved and new scoring of phased data and maps. For hand selection (CFFT not helping, trying ABS, PIRATE). For deciding if we've got a good enough final result (trying PIRATE) Use connectivity, skeletonizability, DM statistics More intelligent choice of resolution limits? More intelligent values/defaults, e.g. min HA separation depends on the atom. Lots of testing going on now will lead to further tuning and improvements.. 6

Interface Currently: Command line from XML input and MTZs. Before release: CCP4i interface. Will also: Start directly from processing pipeline e.g. DNA/XIA through XML input description and MTZs. 7

Data formats (so far) Input MTZs and XML, PIR/FASTA sequences... Output: Reflection files: MTZ. Substructures: XML, PDB. Sequences: PIR. HAPPy data, e.g. history, knowledge, phased result metadata: XML. Whatever is necessary to talk to other software... HAPPy XML formats: Reflect HAPPy objects, make HAPPy easy to develop and expand. Informed by and designed to easily translate into related structure solution file formats and database schemas. 8

XML input file <?xml version='1.0'?> <!DOCTYPE project SYSTEM "http://www.ccp4.ac.uk/happy/xml/schemas/project.dtd"> <project name="gilu" phasing_mode="sad" date_created="2005-09-21" created_by="maria Turkenburg's own fair hands"> <target> <consider_space_group_list> <space_group>i 2 2 2</space_group> </consider_space_group_list> <monomers> <monomer name="8xia"> <sequence> MNYQPTPEDR FTFGLWTVGW QGRDPFGDAT FGSSDSEREE HVKRFRQALD DTGMKVPMAT RNIDLAVELG AETYVAWGGR EGAESGGAKD EPKPNEPRGD ILLPTVGHAL AFIERLERPE LFHIDLNGQN GIKYDQDLRF GAGDLRAAFW ASAAGCMRNY LILKERAAAF RADPEVQEAL DVDAAAARGM AFERLDQLAM DHLLGARG </sequence> </monomer> </monomers> RRALDPVESV TNLFTHPVFK VRDALDRMKE LYGVNPEVGH LVDLLESAGY RASRLDELAR QRLAELGAHG DGGFTANDRD AFDLLGEYVT EQMAGLNFPH SGPRHFDFKP PTAADGLQAL VTFHDDDLIP VRRYALRKTI SQGYDIRFAI GIAQALWAGK PRTEDFDGVW LDDRSAFEEF 9

</target> <native name="8xia"> <obs type="non-anom" file="sfdata/gilu.mtz"> <columns type="f" F="F_GILU" SIGF="SIGF_GILU"/> </obs> </native> <derivative name="mn" atom="mn"> <estimated_nsites>1</estimated_nsites> <obs type="peak" file="sfdata/gilu.mtz"> <columns type="ano" Fplus="F_GILU(+)" SIGFplus="SIGF_GILU(+)" Fminus="F_GILU(-)" SIGFminus="SIGF_GILU(-)"/> </obs> </derivative> <freer file="sfdata/gilu.mtz" label="freer_flag"/> </project> 10

Substructure XML <?xml version='1.0'?> <substructure crystal="mn" type="initial" id="1" inverted="false"> <space_group>unqid:23_i222</space_group> <program>shelxd</program> <hand>none</hand> <atom id="1"> <atom_id>mn1</atom_id> <element_name>mn</element_name> <x>0.082214</x> <y>0.632553</y> <z>0.066377</z> <anomalous_occupancy>1</anomalous_occupancy> <bfactor> <iso>20</iso> </bfactor> </atom> <atom id="2">... </atom> <score type="shelxd">1.39</score> <score type="shelxd_ccweak">19.94</score>... </substructure> 11

12

13

14

15

Future plans Multiple entry points (e.g. provide initial substructure). Further phasing modes. Model building. Modules for other substructure determination software. Effective use of clusters and multiple CPU machines. Integration with next generation CCP4(i) database.... 16

Further information Website www.ccp4.ac.uk/happy. Information will appear here. Suggestions, requests welcome. 17