Using MATLAB: Bioinformatics Toolbox for Life Sciences MR. SARAWUT WONGPHAYAK BIOINFORMATICS PROGRAM, SCHOOL OF BIORESOURCES AND TECHNOLOGY, AND SCHOOL OF INFORMATION TECHNOLOGY, KING MONGKUT S UNIVERSITY OF TECHNOLOGY THOBURI ADVISOR: DR. ASAWIN MEECHAI
Goal of Presentation To introduce you about the using and advantage of MATLAB and Bioinformatics Toolbox. MATLAB and Bioinformatics Toolbox will be applied to the teaching, study and research in our Bioinformatics program.
Outline Introduction to MATLAB and Bioinformatics Toolbox Bioinformatics Toolbox s function Example of the study that used MATLAB and Bioinformatics Toolbox
What is MATLAB? MATLAB short for Matrix Laboratory. MATLAB is a tool for doing numerical computations with matrices and vectors. It is very powerful and easy to use integrates computation, visualization and programming Can be used on almost all platforms: Wade T. Rogers: Cira Discovery Sciences, Inc.
MATLAB is widely used in academic bioinformatics applications Teaching Bioinformatics graduate and undergraduate courses MIT, Harvard, Stanford, Cornell, Carnegie Mellon, Research -- recent papers use MATLAB for: Sequencing Base calling algorithm design Microarray analysis Statistical modeling of microarrays, image analysis Proteomics Mass spectrometry data classification Systems Biology Flux Analysis, Simulation of Metabolic Pathways, Interaction Network Identification Robert Henson: The MathWorks, Inc.
Bioinformatics Toolbox The toolbox provides access to genomic and proteomic data formats, analysis techniques, and specialized visualizations for genomic and proteomic sequence microarray analysis. Most functions are implemented in the open MATLAB language, enabling you to customize the algorithms or develop your own. User s Guide: Bioinformatics Toolbox for Use with MATLAB
Bioinformatics Toolbox Statistics Toolboxes MATLAB Bioinformatics Database Image Processing Neural Network Optimization Signal Processing Required Products Related Products For more information on related products, visit www.mathworks.com/products/bioinfo
Outline Introduction to MATLAB and Bioinformatics Toolbox Bioinformatics Toolbox s function Example of the study that used MATLAB and Bioinformatics Toolbox
Key Features Support for genomic, proteomic, and gene expression file formats Internet database access Sequence Analysis Microarray Analysis and visualization Phylogenetic Analysis Mass Spectrometry Preprocessing and Visualization
File Formats and Database Access Sequence data: FASTA, PDB, and SCF Microarray data: Affymetrix DAT, EXP, CEL, CHP, and CDF files, SPOT format data, ImaGene results format data, and GenePix GPR and GAL files Directly interface with major Web-based databases Supports other industry-specific file formats Microsoft Excel User s Guide: Bioinformatics Toolbox for Use with MATLAB
Key Features Support for genomic, proteomic, and gene expression file formats Internet database access Sequence Analysis Microarray Analysis and visualization Phylogenetic Analysis Mass Spectrometry Preprocessing and Visualization
Sequence Analysis The Bioinformatics Toolbox provides several MATLAB based sequence alignment functions, as well as graphical tools for viewing sequence alignment results. Sequence Utilities and Statistics Protein Feature Analysis Sequence Tool (GUI) Sequence Alignment User s Guide: Bioinformatics Toolbox for Use with MATLAB
Sequence Utilities and Statistics You can manipulate and analyze your sequences to gain a deeper understanding of your data. Bioinformatics Toolbox routines let you: Convert DNA or RNA sequences to amino acid sequences using the genetic code Perform statistical analysis on the sequences and search for specific patterns within a sequence Apply restriction enzymes and proteases to perform in-silico digestion of sequences or create random sequences for test cases User s Guide: Bioinformatics Toolbox for Use with MATLAB
Example: Sequence Statistics >> mitochondria = getgenbank('nc_001807','sequenceonly',true); >> basecount(mitochondria,'chart','pie'); >> ntdensity(mitochondria) >> dimercount(mitochondria,'chart','bar')
Example: Sequence Statistics >> codoncount(mitochondria) for frame = 1:3 figure('color',[1 1 1]) subplot(2,1,1); codoncount(mitochondria,'frame',frame,'figure',true); title(sprintf('codons for frame %d',frame)); subplot(2,1,2); codoncount(mitochondria,'reverse',true,'frame',frame,'figure',true); title(sprintf('codons for reverse frame %d',frame)); end
Protein Feature Analysis Calculate properties of a peptide sequence Determine the amino acid composition of protein sequences >> aacount(nd2aaseq, 'chart','bar') >> atomiccomp(nd2aaseq) ans = C: 1818 H: 3574 N: 420 O: 817 S: 25 >> molweight (ND2AASeq) ans = 3.8960e+004
Sequence Tool >> seqtool
Sequence Alignment The Bioinformatics Toolbox offers a comprehensive list of analysis methods for performing pairwise sequence and sequence profile alignment. These analysis methods include: MATLAB implementations of standard algorithms for local and global sequence alignment, such as the Needleman- Wunsch, Smith-Waterman, and profile-hidden Markov model algorithms Graphical representations of alignment results matrices Standard scoring matrices, such as the PAM and BLOSUM families of matrices User s Guide: Bioinformatics Toolbox for Use with MATLAB
Example: Sequence Alignment Globally align the two amino acid sequences, using the Needleman-Wunsch algorithm. >> [Score, Alignment] = nwalign(humanproteinorf, mouseproteinorf); >> showalignment(alignment) Locally align the two amino acid sequences using a Smith-Waterman algorithm. >> [LocalScore, LocalAlignment] = swalign(humanprotein,... mouseprotein) >> showalignment(localalignment)
Key Features Support for genomic, proteomic, and gene expression file formats Internet database access Sequence Analysis Microarray Analysis and visualization Phylogenetic Analysis Mass Spectrometry Preprocessing and Visualization
Microarray Normalization The Bioinformatics Toolbox provides several methods for normalizing microarray data, lowess, global mean, and median absolute deviation (MAD) normalization. Filtering functions let you clean raw data before running analysis and visualization routines. User s Guide: Bioinformatics Toolbox for Use with MATLAB
Data Visualization Together, the Bioinformatics Toolbox, the Statistics Toolbox, and MATLAB provide an integrated set of visualization tools. >> maimage >> maboxplot >> maloglog
Data Visualization >> mairplot >> cluster >> kmeans >> clustergram >> princomp
Key Features Support for genomic, proteomic, and gene expression file formats Internet database access Sequence Analysis Microarray Analysis and visualization Phylogenetic Analysis Mass Spectrometry Preprocessing and Visualization
Phylogenetic Analysis The Bioinformatics Toolbox enables you to create and edit phylogenetic trees. You can calculate pairwise distances between aligned or unaligned nucleotide or amino acid sequences using a broad range of similarity metrics, such as Jukes-Cantor, p-distance, alignment-score, or a user-defined distance method. User s Guide: Bioinformatics Toolbox for Use with MATLAB
Phylogenetic Analysis Through the graphical user interface (GUI), you can prune, reorder, and rename branches; explore distances; and read or write Newickformatted files. User s Guide: Bioinformatics Toolbox for Use with MATLAB
Mass Spectrometry Data Analysis The mass spectrometry functions are designed for preprocessing and classification of raw data from SELDI-TOF and MALDI-TOF spectrometers. Reading raw data into MATLAB Preprocessing raw data Spectrum analysis User s Guide: Bioinformatics Toolbox for Use with MATLAB
Outline Introduction to MATLAB and Bioinformatics Toolbox Bioinformatics Toolbox s function Example of the study that used MATLAB and Bioinformatics Toolbox
THE CHALLENGE To accurately predict the clinical outcome for breast cancer patients THE SOLUTION Use MathWorks products to develop a tool that lets clinicians make a prognosis based on the gene expression profile of the patient s primary tumor THE RESULTS Accurate prediction of disease outcome Fast, effective response to scientists needs Flexibility to adjust algorithms whenever necessary Dr. Hongyue Dai, Rosetta Inpharmatics/Merck & Company
Enable you to develop your own functions
Summary The Bioinformatics Toolbox appropriates to used in life sciences study Sequence Analysis Microarray Analysis and visualization Phylogenetic Analysis Mass Spectrometry Preprocessing and Visualization
Thank you for your attention ACKNOWLEDGEMENT Dr. Asawin Meechai