2015 Dr. Walter F. de Azevedo Jr. SAnDReS Tutorial 01 Prof. Dr. Walter F. de Azevedo Jr. 1
Running in the Windows On the Windows, left click on Command Prompt. Go to SAnDReS directory (c:\sandres) and type: python sandres1_gui.py 2
Running in the Windows SAnDReS checks if everything is fine and launches GUI window. The window below, called text window, is kept open during SAnDReS session. 3
Running in the Windows From now on we can use the GUI window to work with SAnDReS. In the structure directory, we have to write information for pre-existing directory, where all work will be carried out. After you entered the complete information about the directory, click in Update, in the same row. Don t forget to type a pre-existing directory. 4
Running in the Windows Below we have a description of each button and field in the GUI window. Update button for structure directory Menu commands Clear button for structure directory Clear button for force field directory Clear button for input file name Update button for force fields Log window Run button (to run a specific input file) Clear button (to clear redocking file name) Information strip Clear Log window button Exit button 5
Getting Access Codes from PDB In a typical SAnDReS session, we should go to PDB (www.rcsb.org/pdb) to get PDB access codes. 6
Getting Access Codes from PDB We click on Advanced Search and go the window below. We will show all steps to get a dataset for CDK structures for which Ki information is available. 7
Getting Access Codes from PDB In Choose a Query Type, select Macromolecule Name, as shown below. 8
Getting Access Codes from PDB We type cyclin-dependent kinase in the field below. 9
Getting Access Codes from PDB Now we click on Add Search Criteria. 10
Getting Access Codes from PDB We have an extra field to filter our search. 11
Getting Access Codes from PDB We select Binding Affinity, as shown below. 12
Getting Access Codes from PDB We choose, as Affinity Type, Ki. We will retrieve all cyclin-dependent structures for which Ki information is available. 13
Getting Access Codes from PDB We click on 2.7.11.22 in Enzyme Classification, to make sure to have this enzymatic class only. 14
Getting Access Codes from PDB We ended up with 54 structures, as shown below. 15
Getting Access Codes from PDB In Filter, we select Download Checked, as shown below. 16
Getting Access Codes from PDB Now we have the access codes for all cyclin-dependent kinase structures for Ki information is available. We may download using PDB app or go back to SAnDReS. 17
Running SAnDReS to Download Structures from PDB In SAnDReS GUI window we click, File->Copy Input Files to Structure Directory, as shown below. SAnDReS will copy all necessary input files to structure directory. After choosing, SAnDRes will ask if you want to carry this task. Click yes. 18
Running SAnDReS to Download Structures from PDB Every time SAnDRes is running any process, we will see a red string showing what it is doing on the left below (information strip). 19
Running SAnDReS to Download Structures from PDB After finishing whatever process it was running, SAnDReS shows what it has done, with black letters on information strip. Additional information is also shown on the log window. 20
Running SAnDReS to Download Structures from PDB To download PDB files, we have to insert PDB access codes into the getstr_pdb.in file. To open this file, we click on File->Open File getstr_pdb.in, as shown below. 21
Running SAnDReS to Download Structures from PDB We Ctrl-C the acess codes. 22
Running SAnDReS to Download Structures from PDB In the new window, we Ctrl-V the PDB access codes, as shown below. Before the PDB access codes, the getstr_pdb.in file has to have the GETSTR command, and the type of file to be downloaded, PDB. All lines that starts with # are comments. They are ignored by SAnDReS. The last line is ENDOF command. After finishing editing your file, press Save button and Exit. We have to be careful when editing.in files, since SAnDReS uses these files to carry out its tasks and they have a fixed format to be followed. 23
Running SAnDReS to Download Structures from PDB Now we have to open getstr_html.in file, as shown below. This file will be used to download HTML files from PDB. SAnDReS will employ these file to extract binding affinity information, since PDB files don t keep binding affinity data. 24
Running SAnDReS to Download Structures from PDB In the new window, we Ctrl-V the PDB access codes, as shown below. Before the PDB access codes, the getstr_html.in file has to have the GETSTR command, and the type of file to be downloaded, HTML. The last line is ENDOF command. After finishing editing you file, press Save button and Exit. 25
Running SAnDReS to Download Structures from PDB To run getstr_pdb.in, we click Download from PDB->Download from PDB (PDB format), as show below. Check if your computer is connected to internet before start downloading. 26
Running SAnDReS to Download Structures from PDB We may follow download progress checking text window, as show below. When downloading a dataset with several hundreds of structures, PDB may complain about high user activity. In this case, SAnDReS starting waiting longer for each download. The default is 10 second between downloads, which may increase to 120 seconds, in the case of high user activity up to 360 s. 27
Running SAnDReS to Download Structures from PDB After finishing download, SAnDReS shows a summary on the log window, as shown below. 28
Running SAnDReS to Download Structures from PDB Next, we run getstr_html.in, clicking on Download from PDB->Download from PDB(HTML format), as shown below. As observed for PDB files, you may follow download progress on the text window. After finishing download, we will get a summary on the log window. 29
Pre-docking Analysis Now we are ready to start analyzing the structures in our dataset. We call this phase of Pre-Docking. First we check if all structures are in the structure directory. In the previous step, SAnDReS generated an input file, called chkstr_ki.in file to carry out this task. To run it, we click on Pre-Docking->Check Structures with Binding Information (Ki), as shown below. 30
Pre-docking Analysis Once finished, SAnDReS shows the summary of results on the log window. As we can see below, SAnDReS also checks if binding affinity is present the HTML files. For this dataset, this information was missing for five structures. These structures will be automatically removed from the dataset. See in the information strip, that chklig.in has been created. This new file will be employed to check if the active ligand is in the PDB file. 31
Pre-docking Analysis Below we can see part of the chklig.in file. SAnDReS automatically generated this file. Its format requires the command CHKLIG, followed by the PDB access code, then the ligand code, the ligand chain identification and ligand number in the PDB file. The last column brings the binding affinity information, as extracted from the HTML file in nm units. This format structure should be followed if any modification is to be carried out, chklig.in is a csv file. 32
Pre-docking Analysis To run chklig, click on Pre-Docking->Check Ligands, as shown below. 33
Pre-docking Analysis The summary is shown in the log window. 34
Pre-docking Analysis Now we generate the input files for the next steps, ststru.in, biomat.in, and fndwat.in. We click on Pre-Docking->Generate Inputs for Analysis of Crystallographic Structures, as shown below. 35
Pre-docking Analysis After finising running geninp.in, SAnDReS shows that ststru.in, biomat.in and fndwat.in files have been created. 36
Pre-docking Analysis We have an ensemble of structures that can be employed in protein-ligand docking simulations. It is of pivotal importance, to identity the overall quality of the crystallographic structures present in this ensemble. To do so, we run ststru.in. We click on Pre-Docking->Statistical Analysis of Crystallographic Structures, as shown below. 37
Pre-docking Analysis So far so good, we can see the results on the log window. SAnDReS generated a log file (ststru.log) with the results. We have also generated two-column csv files for each crystallographic parameter analyzed by SAnDReS. For this ensemble of structures, PDB access code 4ACM shows the highest crystallographic resolution. 38
Pre-docking Analysis We could also choose the best structure based on criteria such as R-factor, B-values etc. Our goal is to highlight which structure presents the highest quality considering the crystallographic information. We used the structure 4ACM for re-docking using Molegro Virtual Docker (Thomsen & Christensen, 2006) but SAnDReS accepts docking results from any docking program, since it is a CSV file. 39
Pre-docking Analysis In the next step, SAnDReS investigates each structure in the ensemble to verify if BIOMAT information is present. If so, homo-oligomeric structures are generated, based on the rotation matrix and translation vector. We run this command for didactical purposes only, since CDK structures don t show homo-oligomeric structures. 40
Pre-docking Analysis Below we have the summary of results, no homo-oligomeric structures for CDK. It is worth note that our ensemble of structures decreased from 49 to 32 structures. This is due to redundant information present in the dataset. SAnDReS got rid of repeated structures for the same ligand. In the case where two o more structures are present in the dataset for the same ligand, SAnDReS chooses the structure with highest resolution. 41
Pre-docking Analysis In the next step, we check if there are water molecules close to the active ligand. We click on Pre-Docking-> Find Waters in Crystallographic Structures, as shown below. 42
Pre-docking Analysis The summary of results is shown below. All structures for which water molecules were found close to the active ligand are written in the directory Water, which is in the structure directory. It is a good policy to rename this directory to Water2, to avoid loosing it, since SAnDReS overwrites Water directory every time it runs fndwat.in. 43
Pre-docking Analysis We may generate plots using the last two options in the Pre-Docking Menu, as show below. 44
Pre-docking Analysis Below we have a histogram plot generated for the distribution of crystallographic resolution for the structures present in the dataset. 45
Docking Analysis SAnDReS was developed to carry out analysis of docking results, doesn t matter where it came from. The only condition is that the results should be in a csv file as the one shown below. 46
Docking Analysis In this csv file, the first row brings the headers for each column and they are separated by semi-colons. 47
Docking Analysis To clean a csv file, we click Docking->Clean CSV File and Prepare Input Files, as shown below. The csv file is indicated in the Docking CSV File Name field. In this tutorial is the redock04.csv file. 48
Docking Analysis SAnDReS generated a new csv file, as shown below. 49
Docking Analysis Now we can perform the analysis of docking results, to verify the correlation between the score functions and RMSD. We click Docking->Statistical Analysis of RMSD X Score Functions, as shown below. 50
Docking Analysis The results are shown in the log window, as we can see the best result was obtained for Docking Score, if we consider correlation and p-values. SAnDReS generated separated csv files for each scoring function, which can be employed to generate scatter plots. 51
Docking Analysis To generate scatter plots, we click on Docking->Plot Re-dock Results as shown below. 52
Docking Analysis We have below the scatter plot generated for the Docking Score. 53
Docking Results x Structural Parameters We can analyze the correlation of docking (cross-docking) results against structural parameters and crystallographic information. This task is carry out using a csv file as input, where the RMSD and the values for scoring functions are stored in a csv file as the one shown below. 54
Docking Results x Structural Parameters The first column brings ligand identification, and we need to have a column for RMSD results. This column has to have RMSD as header. We also need a column for the number of torsion angle, this column should have Torsions as header. The rest of columns are for scoring functions. 55
Docking Results x Structural Parameters To carry out this task, we click Docking Results x Structural Parameters -> Generate Input for Statistical Analysis, as shown below. 56
Docking Results x Structural Parameters A new menu pops up, where the information about the cross-docking results can be entered. The cross-docking results are in a CSV file that should be indicated in the field CSV File Name for Scoring Function. Additional information about the ligand will be retrieved from previously generated chklig.in file. 57
Docking Results x Structural Parameters SAnDReS shows that new stdock1.in and stdock2.in files were created. 58
Docking Results x Structural Parameters Now we can run the statistical analysis. There are two options, the first runs a fast statistical analysis, with a reduced number of structural parameters. The second runs a complete analysis with hundred of parameters checked against cross-docking results. 59
Docking Results x Structural Parameters After finished running stdock1.in file, SAnDReS shows the results on the log window, as show below. SAnDReS generated stdock.csv and stdock.log files in the structure directory. 60
Docking Results x Structural Parameters We can generate scatter plot, as shown below. 61
Docking Results x Structural Parameters Below we have the scatter plot for R-free 62
Scoring Functions In the next step we perform the analysis of correlation between scoring functions and experimental binding affinity. SAnDReS can carry out analysis for datasets with binding information for Ki, Kd, and IC50. The following example is for Ki. We click Scoring Functions-> Generate Input for Statistical Analysis of Scoring Functions, as shown below. 63
Scoring Functions We see a new pop up menu, where we have to enter the CSV file for scoring functions and to choose whether the correlation will be determined for log(ki) ou log(ki). Then we click on Create New stscor.in, which will create the input file with scoring function values (last columns on log window), information for structures used in the dataset (from second to fifth column), and experimental biding affinity (sixth column). The first column is the keyword STSCOR to indicate that SAnDReS is required to perform statistical analysis of scoring functions. 64
Scoring Functions If everything is fine, SAnDReS generates the information the process is finished in the information strip. 65
Scoring Functions Now we click Scoring Functions->Statistical Analysis of Experimental Binding Affinity and Scoring Functions, as shown below. 66
Scoring Functions SAnDReS generated stscor.log and stscor.csv with the results, and shows the log file into the log window. We can generate scatter plot clicking into the option shown below. 67
Scoring Functions Below we have the scatter plot for Log(ki) x MolDock Score. 68
Cross-docking Results To analyze cross-docking results, we should first generate the input file stcrossdock.in. We click on Scoring Functions-> Generate Input for Statistical Analysis of Cross-Docking Results, as shown below. 69
Cross-docking Results We have a new pop-up menu for inputs to generate the stcrossdock.in file. We click on Create New stcrossdock.in. 70
Cross-docking Results Then we run the statistical analyzing, clicking Scoring Functions->Statistical Analysis of Cross-Docking Results and Scoring Functions, as shown below. 71
Cross-docking Results SAnDReS generated stcrossdock.csv and stcrossdock.log files. Log file is shown in the the log window. 72
Cross-docking Results We can generate scatter plots for our results, as shown below. 73
Cross-docking Results Below we have the scatter plot for RMSD x MolDock Score. 74
References Thomsen R, Christensen MH. MolDock: a new technique for high-accuracy molecular docking. J Med Chem. 2006;49:3315 21. 75