INDIAN SCIENCE CRUISER Volume 20 Number 1 January 2006 44

INDIAN SCIENCE CRUISER Volume 20 Number 1 January 2006 44 BIOINFORMATICS - THE RISING SUN Bibekanand Mallick (a), * Zhumur Ghosh (a), - is an optimized blend of interdisciplinary talents analyzing immense biological informations computationally. Its productive output is aiding different industries starting from agriculture to medicinal drugs in India and abroad. It s a rising sun in India which can produce immense economic growth-heat for the years to come. Introduction has become a frontline applied science and is of vital importance to the study of new biology, which is widely recognised as the defining scientific endeavour of the twenty-first century. It s very difficult to precisely define this new evolving subject. Some people say is like an amoeba; it comes in various shapes and sizes. And of course this statement is true. If you ask ten different scientists, what is bioinformatics? you will likely hear ten different responses. There will be common elements computers and biological databases top the list but the definition will depend on who s doing the defining. This is the reason people say it s like amoeba. is the use of mathematical, statistical and computer methods to analyze biological, biochemical, and biophysical data. The subject being very young and rapidly evolving field, it also has a number of other credible definitions. It can also be defined as the science and technology of learning, managing, and processing biological information. is often focused on obtaining biologically oriented data, organizing this information into databases, developing methods to get useful information from such databases, and devising methods to integrate related data from disparate sources. It is the study of biological information as it passes from its storage site in the genome to the various gene products in the cell. This new interdisciplinary branch of science is the application of information technology to store, organize and analyze the vast amount of biological data which is available in the form of sequences and structures of proteins (the building blocks of organisms) and nucleic acids (the information carrier). The biological information of nucleic acids is available as sequences while the data of (a) Computational Biology Group, Indian Association for the Cultivation if Science (IACS) 2A & 2B, Raja S C Mullick Road, Jadavpur, Kolkata-700032 E-mails: * vivekm@iitian.iitkgp.ernet.in, # tpzg@iacs.res.in

INDIAN SCIENCE CRUISER Volume 20 Number 1 January 2006 45 proteins is available as sequences and structures. Sequences are represented in single dimension where as the structure contains the three dimensional data of sequences. Significantly, bioinformatics can help answer such questions as whether a protein's sequence can suggest how the protein functions, and whether the genes turned on in a cancer cell are different from those turned on in a healthy cell. A living cell is a system with cellular components interacting with each other, and these interactions determine the fate of the cell, e.g., whether a stem cell is going to become a liver cell, or a cancer cell. These interacting components include- the genome, the gene transcript and the proteins. Characterization of these three types of components and the associated development of analytical methods lead to the establishment of the three closely related branches of bioinformatics-genomics, Transcriptomics and Proteomics. Genomics involves extensive analysis of nucleic acids through molecular biological techniques, before the data are ready for processing by computers. Genomics is a science that attempts to describe a living organism in terms of the sequence of its genome (its constituent genetic material). Proteomics represent the earliest attempt to identify a major subclass of cellular components, the proteins and their interactions. It has been coined from the word proteome which is the complete protein complement of a system. Proteomics involves the sequencing of amino acids in a protein, determining its three-dimensional structure and relating it to the function of the protein. Before computer processing comes into the picture, extensive data, particularly through crystallography and NMR, are required for this kind of a study. With such data on known proteins, the structure and its relationship to function of newly discovered proteins can be understood in a very short time. In such areas, bioinformatics has an enormous analytical and predictive potential. Proteomics also focuses on identifying when and where proteins are expressed in a cell so as to establish their physiological roles in an organism. Transcriptomics depicts the expression level of genes, often using techniques capable of sampling tens of thousands of different mrna molecules at a time (eg. DNA microarrays). Transcriptomics has been coined from the word - Transcriptome which is the set of all mrna molecules (or transcripts) in one or a population of biological cells for a given set of environmental circumstances. Apart from these there are few major branches mentioned below. Functional Genomics: Since the completion of the human genome, the emphasis has been changing from genes themselves to gene products. Functional genomics assigns functional relevance to genomic information. It is the study of genes, their resulting proteins, and the role played by the proteins. Cheminformatics: Drug design through bioinformatics is one of the most actively pursued areas of research. Since a great majority of drugs are LMW (Low Molecular Weight) compounds and since many of them are primarily derived from biological sources, there has always been a great interest in the study of LMW compounds of biological origin. Cheminformatics (or chemoinformatics) deals with such compounds, the products of secondary metabolism, often called natural products which has some kind of bioreactivity. This bioactivity can be turned to advantage for therapeutic purposes. Here the expertise of

INDIAN SCIENCE CRUISER Volume 20 Number 1 January 2006 46 a pharmacologist is required. Cheminformatics involves organisation of chemical data in a logical form to facilitate the process of understanding chemical properties, their relationship to structures and making inferences. Chemical structures are the input to identify similar compounds for screening for biological activity. It also helps to assess the properties of new compounds, by comparison with the known compounds. Scope/ Research Areas of (BI): (i) Genome and sequence analysis: Historically, bioinformatics as a concept was invented to describe the task of handling, presenting and analysing large amounts of sequence data. Today, due to intense efforts at a number of large research centres throughout the world, data can be rather easily accessed by anyone over the Internet and World Wide Web servers. As a consequence, it is currently almost an everyday activity in most molecular biology labs to screen these sequence databases to find sequence homologues of a particular gene. (ii) From sequence to 3D structural prediction: For most macromolecules their function is closely linked to the threedimensional structure. The experimental determination of these 3D structures is, however, a costly and slow process. Novel procedures for predicting the molecular fold from the primary sequence data is fructified by bioinformatics approach. (iii) Analysis of genome wide biomedical data and functional genomics: In the last couple of years the advent of biomedical large scale analysis tools have for ever changed the way scientists in biology and medicine will do research. These technologies make possible the simultaneous study of the expression of thousands of genes, either at the transcript or at the protein level, or the thousands of possible protein-protein interactions in a cell, or phenotypic analysis of thousands of mutants etc. All this data, regardless of type and format, has to be handled, presented and efficiently analysed. This challenge is already being explored by statisticians for the clustering of e.g. similarly regulated genes. This clustering information is currently being evaluated as a potentially useful way of predicting function of functionally uncharacterized genes in the following up on the genomics projects, a research area called functional genomics. (iv)database building and management: Whatever type of information is being generated, analysed and finally interpreted, the data has to be presented to the scientific community by establishing Internet based World Wide Web servers. The presentation of this data can be rather challenging, and problems that arise extend from formalism of data submission to intelligent and clear ways of presentation. Database management is thus not only an engineering problem, but also provides a clear scientific challenge. (v)clinical application of bioinformatics: The clinical applications of bioinformatics can be viewed in the immediate, short, and long term. The human genome project has produced a database of all the variations in sequences that distinguish us all. The project could have considerable impact on people living in 2020 for example, a complete list of human gene products may provide new drugs and gene therapy for single gene diseases may become routine (www.ornl.gov/hgmis/medicine/tnty.html). Basic bioinformatic tools are already accessed in certain clinical situations to aid in diagnosis and treatment plans. For example, PubMed (www.nlm.nih.gov) is

INDIAN SCIENCE CRUISER Volume 20 Number 1 January 2006 47 accessed freely for biomedical journals cited in Medline, and OMIM (Online Mendelian Inheritance in Man at www3.ncbi.nlm.nih.gov/omim/), a search tool for human genes and genetic disorders, is used by clinicians to obtain information on genetic disorders in the clinic or hospital setting. Ultimately, pharmacogenomics (using genetic information to individualise drug treatment) is likely to bring about a new age of personalized medicine; patients will carry gene cards with their own unique genetic profile for certain drugs aimed at individualised therapy and targeted medicine free from side effects. Scenario in India and Biotechnology goes hand in hand. is a subset of Biotechnology. Biotechnology hubs are emerging all over the world. The Americans, Europe, Eurasia, Southeast Asia, Western Asia and the Pacific rim all have hotspots of bio. India is particularly suited to become the country of choice for biotech and bioinformatics initiatives and endeavors. India holds an advantage over other countries in this respect for those ventures that seek to capitalize on the immense biodiversity that is available. Establishment of Centres of Excellence (COE) in is growing as an independent discipline and helping immensely to accelerate the growth of Biotechnology. Simultaneously there is enormous growth in the biological data. It has been therefore decided by the Govt. of India to establish advanced research & training centres in the country by enhancing the existing infrastructure, by providing additional man power and giving the flexibility in their governance. These centres are termed as Centres of Excellence (COE) in bioinformatics. The missions of the COE will be to undertake advance research in bioinformatics, provide Ph.D. and post doctoral training, develop new solutions so that the industry in India will get support to solve complex biological problems and get the required high end man power. Building up of the skilled Bioinformatic professionals There is a huge responsibility to built up properly trained Indian professionals in this field to keep pace with the international market and to avail the emerging opportunity to its fullest extent. India was the first country in the world to establish in 1987 a Biotechnology Information System (BTIS) network to create an infrastructure that enables it to harness, biotechnology through the application of. The Department of Biotechnology (DBT) has taken up this infrastructure development project and created a distributed network at a very low cost. BTIS is today recognized as one of the major scientific network in the world dedicated to provide the-state-of-theart infrastructure, education, manpower and tools in bioinformatics. The principal aim of the bioinformatics programme is to ensure that India emerges as a key international player in the field of bioinformatics; The following are the major thrusts of the To undertake advanced research in frontier areas of bioinformatics and computational biology. To develop world class human resource in bioinformatics. To establish effective academiaindustry interface. To pursue and promote international cooperation with leading institutions, organizations and countries in the world. To create world-class platforms for technology development, transfer and commercialisation.

INDIAN SCIENCE CRUISER Volume 20 Number 1 January 2006 48 programme. Training Activities on Short term and long term training courses in bioinformatics for scientists from different disciplines in biology, statistics, computer science are very important and over the years found highly useful. These activities will be intensified. Experts from other countries will be used as resource persons along with Indian experts. Upgradation of Table 1: Courses in India knowledge base and convergence of knowledge of experts from different disciplines to bioinformatics will be achieved. But training for proper manpower only at research level is not sufficient. Considering the importance of the subject, some institutions and university departments have introduced course at different levels. Following are the links to the Institutes offering courses in at different levels (Table 1). Name of the Institute/University Location BI course offered Websites Indian Institute of Information Allahabad M Tech -http://bi.iiita.ac.in Technology (IIIT) Sathyabama Institute of Science and Technology Chennai B. Tech & M.Tech- http://www.sathyabamauniv.a c.in Periyar University Salem, TN MSc in http://www.tnuniv.ac.in/periy & B.Sc., M.Sc. in BT ar/degree.htm International Institute of Information Technology Hyderabad M Tech- http://www.iiit.net Shanmugha Arts,Science,Technology and Resaerch Academy SASTRA (Deemed University) Pune University Amity Insitute of Biotechnology, Thanjavur, TN Pune Noida IGIB (Institute of Genomics & New Delhi Integrative Biology (formaly known as Center of Biochemical Technology) & informatics Bharathiar University Coimbatore B.Tech and http://www.sastra.edu/admissi M.Tech.(on/sastra.htm ) M.Sc. in http://bioinfo.ernet.in & Advanced Diploma in through entrance test held every year in the month of may-june B.Tech and MSc http://www.amity.edu/aib Postgraduate diploma in http://www.informatics.co.in/ bioinformatics bioinformatics.htm MSc in http://www.b-u.ac.in Jamia Millia Islamia university (a Central University) New Delhi M.Sc. in Bio- Informatics (Selffinancing) http://jmi.nic.in

INDIAN SCIENCE CRUISER Volume 20 Number 1 January 2006 49 Name of the Institute/University Location BI course offered Websites Pondicherry University(a Central Pondicherry Advanced post graduate http://www.pondiuni.org University) diploma course in bioinformatics Calcutta University Kolkata Post M.Sc. diploma in -- Institute of and Applied Biotechnology(IBAB) Bangalore Post Graduate diploma in http://www.ibab.ac.in DOEACC Society Selected branches O level and A level in https://www.doeacc.edu.in/jsp /bioinformatics.htm Useful books in of recent time Since the subject is a budding one, proper syllabus for the different courses is yet to be framed. So is there a need for proper syllabus oriented textbooks. Listed below are a few books which are very recent or is going to come out shortly. (i) : Sequence and Genome Analysis 2004 by David W. Mount. (ii) Hands-on in -At Hand 2006 (in Press) by Bibekanand Mallick & Zhumur Ghosh. (iii) An Introduction to Algorithms (Computational Molecular Biology Series)-2004 by Neil C. Jones & Pavel A. Pevzner. (iv) Fundamental Concepts in -2002 by Dan E. Krane & Michaeal L. Raymer (v) Structural (Methods of Biochemical Analysis, V. 44)-2003 by Philip E. Bourne R&D Activities in India Different Research Institutes of repute throughout India have set up infrastructure to pursue research activities in this emerging field. Not only that the Bio divisions of the Institutes are undergoing such activities. There are other divisions like Chemistry, Physics, and Statistics etc which are coming up to work hand in hand with experimental bio-people. Indian Institute of Science, Bangalore (www.iisc.ernet.in), CCMB (www.ccmb.res.in) Hyderabad, CDFD (www.cdfd.org.in) Hyderabad, CDRI (www.cdriindia.org) Lucknow, JNU (www.jnu.ac.in) Delhi are some of the Institutes of repute pursuing research in this field. In Kolkata, Bose Institute (www.boseinst.ernet.in), Saha Institute (www.saha.ac.in), Indian Institute of Chemical Biology (www.iicb.res.in) different groups are working on this subject. Even in the Theoretical Physics Dept. of Indian Association for the Cultivation of Science, a Computational Biology Sector (http://in.geocities.com/iacsbioinfo) under Prof. J. Chakrabarti is working in this field. Funding Agencies The major funding agencies in India that are supporting BI initiatives in the various Indian states are the following: DST- Department of Science and Technology (http://dst.gov.in); DBT-Department of

INDIAN SCIENCE CRUISER Volume 20 Number 1 January 2006 50 Biotechnology (http://dbtindia.nic.in); ICAR-Indian Council of Agriculture Research (www.icar.org.in); ICMR-Indian Council of Medical Research (http://icmr.nic.in); CSIR-Council of Scientific and Industrial Research(http://www.csirhrdg.res.in); UGC- University Grants Commission (http://www.ugc.ac.in); and DSIR- Department of Scientific and Industrial Research (http://dsir.nic.in). Employment opportunity At present India have more than 200 biotech companies. These companies grew 40% last year, principally in the areas of pharmaceuticals, agriculture and bioinformatics. The biotech sector in India comprises of a few hundred companies that have developed niche or greater expertise, and have many relationships with foreign partners. Below is the list (Table 2) of few companies in India hiring Bioinformatic professionals. Table 2: Companies in India Name of the Companies Location Websites Biocon Bangalore www.biocon.com Genotypic Technology Bangalore www.genotypictech.com AstraZeneca Bangalore www.astrazenecaindia.com Molecular Connections Bangalore www.molecularconnections.com Strand Genomics Bangalore www.strandgenomics.com Accelrys Bangalore www.accelrys.com Landsky Solutions Secunderabad www.landskyindia.com Jubilant Biosys Bangalore www.jubilantbiosys.com Helix Genomics Hyderabad www.helixgenomics.com Ocimum Biosolutions Hyderabad www.ocimumbio.com BioMakro Hyderabad www.biomakro.com Satyam Computers Hyderabad www.satyam.com Tata Consultancy Services Hyderabad www.tcs.com GVK BioSciences Pvt. Ltd Hyderabad www.gvkbio.com Dr Reddy s Laboratories Hyderabad www.drreddys.com Centre for Development of Advanced Computing (C-DAC) Pune (Head Quarter) www.cdac.in To conclude it is important to mention that this is a critical juncture for bioinformatics in India. The Indian bio-companies is presently the heart of the industry and growing fast, and how well they'll really perform depends on how well their R&D wings are well equipped. The main weakness being the deficit of properly trained professionals still today in this field. Once, properly trained Indian professionals put into sincere effort, floodgates of western business will open for India and bioinformatics will really begin to flourish. This milestone is yet to be crossed.