Integrating Bioinformatics, Medical Sciences and Drug Discovery

Integrating Bioinformatics, Medical Sciences and Drug Discovery M. Madan Babu Centre for Biotechnology, Anna University, Chennai - 600025 phone: 44-4332179 :: email: madanm1@rediffmail.com Bioinformatics Bioinformatics, a term coined for the applications of computer science in biology is now emerging as a major element in contemporary biology and biomedical research. There is a paradigm shift in biological research to use the computers, software tools and computational models in a large scale. Walter Gilbert, a renowned scientist, described this shift in biology as follows: "The new paradigm, now emerging, is that all of the genes will be known (in the sense of being resident in databases available electronically), and that the starting point of a biological investigation will be theoretical. An individual scientist will begin with a theoretical conjecture only then turning to experiment to follow or test that hypothesis." Bioinformatics deals with the exponential growth in biological data has led to the development of primary and secondary databases of nucleic acid sequences, protein sequences and structures. Some of the well-known databases include GenBank, SWISS-PROT, PDB, PIR, SCOP, CATH, etc., These databases are available as public domain information and hosted on various Internet servers across the world. Basic research and modelling is done using these databases with the help sequence analysis tools like BLAST, FASTA, CLUSTALW, etc., and the modelled structures are visualized using visualization tools such as WebLab, MOLMOL, Rasmol, etc., Bioinformatics plays an important role for the integration of broad disciplines of Biology to understand the complex mechanisms of the cell. Bioinformatics also aids the way in which biomedical investigators use the information in their testing. The complete process of data collection to analysis of the results of such tests may be categorized under a separate area named " Clinical Informatics". Informatics and Medical Sciences It is a known fact that most of the doctors are averse to computers. To overcome this problem, one of the solutions proposed, after an intensive research contacting 1500 doctors from different cities, is to introduce Palmtops specially tailored for physicians. These palmtops are of the size that easily fits in to the pocket of a lab coat. This helps the doctor to feed in the medical data in a sequential manner that he has collected when moving from ward to ward. This addresses the basic need of any medical analysis - data capture and creating Electronic Medical Records (EMR) which eventually develops in to a database for reference and analysis. The major advantage with the introduction of the concept of Electronic Medical Records (EMR) is that, the information can be easily accessed and shared in comparison to traditional medical records. EMR also drastically reduces the possibilities of introduction of errors due to frustration and other psychological disturbances during the manual data entry process after collecting the necessary information on paper. It also helps to eliminate the manual task of extracting data from charts or filling out specialized data sheets. The data required for a study can be obtained directly from the electronic record, thus making research data collection for analysis, a byproduct of routine clinical record keeping. The record environment can help to assure compliance with a research protocol, pointing out to a clinician when a patient is eligible for a study, or when the protocol for a study calls for a specific management plan given the currently available data about that patient (Fig. 1). In the near future one can see a situation where the complete information on the patient can be accessed from the EMR. These informations can be of any type, ranging from drug trial data to the various tests performed on that patient and the outcome of such experiments. The challenge in such cases will be to organize and integrate the heterogeneity of the information in to a comprehensive, knowledge based database from which an individual can access the necessary portion of the record for any research analysis.

Bioinformatics and Medical Sciences Bioinformatics has a profound impact in medical sciences. The biological databases are helping physicians to diagnose the disease and develop strategies for its therapy. Consider a situation where a patient with a genetic form of hemophilia meets a physician. The physician is not sure with the symptoms of the disease but has the only clue that the patient s family has suffered from hemophilia earlier. The physician could (Fig. 2) surf the web to get the information on the disease by checking out the OMIM (Online Mendelian Inheritance of Man) resources available at http://www.ncbi.nlm.nih.gov/omim/ which provides detailed information on genetic disorders. A focussed search for diabetes would reveal multiple disorders including Von Willerbrand Disease and also provides the information that the primary defect is due to the low antihemophilic globulin (AHG; factor VIII) in this disorder. Further, the search on Factor VIII in the protein sequence database would result in the match encoding the human Factor VIII with the complete cdna and corresponding protein sequence. The gene is linked to its DNA sequence, protein sequence and a set of references in the MEDLINE literature database. Following this MEDLINE literature database, the original research article (where the association of factor VIII with hemophilia is discussed) is obtained. By following the link to the protein sequence, the detailed information is obtained from the SWISS-PROT database and Protein Information Resource (PIR). The information on the crystal structure can be obtained by following the link to Protein Data Bank (PDB) provided in the SWISS- PROT database. Following the link to the DNA sequence in the genetic database, GENBANK, the nucleotide sequence of the gene is obtained along with records of gene irregularities. Thus the physician uses a number of databases to collect information about the disease which aids him to diagnose and device strategies for therapy. Bioinformatics and Drug Discovery Infectious diseases are now the world's biggest killers of children and young adults. "They account for more than 13 million deaths a year - one in two deaths in developing countries" as stated by the WHO. Most deaths from infectious diseases occur in developing countries. The cause for this has been attributed to the unavailability of efficient drugs and if at all available, the high cost associated with those drugs. Development of cheap and efficient drugs for a disease is one of the major problems faced by mankind. The solution to this problem could be from rational drug design using Bioinformatics. The focus of the pharmaceutical industry has shifted from the trial and error process of drug discovery to a rational, structure based drug design. A successful and reliable drug design process could reduce the time and cost of developing useful pharmacological agents. Computational methods are used for the prediction of drug-likeness which is nothing but the identification and elimination of candidate molecules that are unlikely to survive the later stages of discovery and development. Drug-likeness could be predicted by genetic algorithm and neural network based approaches. People have been working on constructing efficient algorithms and better energy functions to predict protein structures and interaction of small molecules with them. The technical barrier to these approaches is that they are computation intensive and we do not have the computational power to handle such massive requirement. Fig. 3 highlights the magnitude of computation involved for the various kinds of interactions that can be considered in biomolecules. The most naïve and simple approach to problems involving interactions between rigid ligand and rigid receptors requires a computer performance of 0.5 Gflop. The more sophisticated approach considering parameters to simulate real conditions like flexible receptor-ligand interactions in the presence of solvent molecules require a computer performance of approximately 1 Tflop, which is quite expensive in the present day scenario though has been achieved. Realizing the amount of raw computational power needed in such problems, IBM had recently announced a new $100 million exploratory research initiative to build a supercomputer which is 500 times more powerful than the worlds fastest existing computer and 2 million times faster than the today s fastest desktop PC. This new computer nicknamed "Blue Gene" by IBM researchers will be capable of performing close to one Petaflop (10 15 operations per second).

As stated earlier, from the pharmaceutical industry point of view, Bioinformatics is the key to rational drug design. It reduces the number of trials in the screening of drug compounds and in identifying potential drug targets for a particular disease using high power computing workstations and software like Insight. This profound application of Bioinformatics in genome sequence has led to a new area in pharmacology Pharmacogenomics, which is the study of genetic basis for the differences between individuals in response to drugs. This is mainly due to Single Nucleotide Polymorphisms (SNPs). In order to develop innovative and safe drugs, Pharmacogenomics needs to be integrated in the drug development process. Knowing the importance of SNPs, an international consortium to produce a map of human SNPs (which could aid Phramacogenomics) has been formed by major pharmaceutical companies in which IBM, is also a member. In future, drug design is going to rely on the variation in SNPs. In fact SNPs with combinatorial chemistry can speed up the process of drug discovery and may also result in identifying a new set of target proteins that cross-react with drugs in the preliminary clinical trials. Taking in to account all the above mentioned factors that has to go in for developing effective drugs, there has been a strong urge to start the Human Proteomics Initiative. This initiative aims at identifying the functions and polymorphism of all the proteins coded in the human genome and predict their structure, or solve the structure of these proteins if possible so that these could be used as potential targets for developing drugs. Need for Integration Rapid advances in the field of computers coupled with increasing computer literacy among professionals favor the implementation of computer applications in medical practice. Further, the availability of numerous databases on the Internet has revolutionized the way by which a physician devices a strategy for treatment. Projects like the Human Proteomics Initiative is a classic example to show the necessity of integrating Bioinformatics - to predict structures and functions of proteins, Medical Sciences - to identify proteins that are important in metabolic or other disorders and Pharmacology (drug discovery) - to identify novel drugs against the predicted targets. Thus it is apt to conclude that all the three areas must work in concert to achieve the ultimate goal of understanding the basis of life process and apply it for the betterment of human lives. References 1. Computational methods for the prediction of 'drug-likeness' D.E. Clark and S.D. Pickett. Drug Discovery Today. 2000. 5(2):49-57. 2. Bioinformatics in support of molecular medicine. R.B. Altman. Author's site at http://smi-web.stanford.edu/people/altman/ 3. New advances in Pharmacogenomics. B. Destenaves and F. Thomas. Current Opinion in Chemical Biology. 2000. 4:440-444. 4. Proteomics Factories. E. Russo. The Scientist. 2000. 14(3):1. 5. Trends in Computational Biology: A Summary based on a RECOMB Plenary lecture. J.C. Wooley. Journal of Computational Biology. 1999. 6(3/4):459-474. 6. The Computer Meets Medicine and Biology: Emergence of a Discipline. (Chapter. 1 of the course material). E.H. Shortliffe and M.S. Blois.

Information on prescribed drugs and their effects Information on patients health Evaluation of a drug Electronic Medical Records (EMR) Easy retrieval of data for analysis Information on the tests conducted Provides status on the patients compliance with a particular treatment procedure Suggests the outcome of a treatment procedure and thus helps in altering the procedure Fig. 1: Heterogeneity of Information in EMR and its advantages in designing better treatment procedures www.mrc-lmb.cam.ac.uk/genomes/madanm

Search for Hemophilia in OMIM produces the following result Following a link Links to Protein, DNA, Medline and other databases Following the link to Medline to get the original reference Fig. 2: The steps a physician needs to follow to get information on the disease and device strategies for diagnosis and cure www.mrc-lmb.cam.ac.uk/genomes/madanm

www.mrc-lmb.cam.ac.uk/genomes/madanm C O M P L E X I T Y of A P P R O A C H Search database for lead Lead Optimization Improved drug Characteristics Flexible Ligand Rigid Receptor Rigid Ligand Rigid Receptor De novo Drug design Discovery of targets Design of resistance evading drugs Multiple flexible Ligands Flexible Receptors Flexible Ligand Flexible Receptor Protein- Protein Docking Ligands, Receptors and Solvent Design of Biomaterials Prediction of large assemblies 0.2 GFlop 1 GFlop 10 GFlop 100 GFlop 1 TFlop 10 TFlop 100 TFlop Computer Performance Fig. 3: Requirement of Computational Power for studying various interactions