Normalizing Spectral Counts in Scaffold


 Grace Leslie Little
 2 years ago
 Views:
Transcription
1 Normalizing Spectral Counts in Scaffold When using MS/MS proteomics data to look for differentially expressed proteins, there are several issues that need to be considered: 1) peptides shared between proteins, 2) normalization between samples, 3) missing values, 4) test for differential abundance, 5) multiple tests These considerations have to be dealt with in some way for any type of MS/MS quantification: label free methods such as spectral counting and empai, and labeling methods such as itraq and ICAT. Scaffold uses spectral counting for quantification. This document will address how Scaffold handles these concerns specifically for spectral counting. Peptides shared between proteins Tandem mass spectrometers measure spectra. Sequest, Mascot, X! Tandem and similar search engines match these spectra with peptides. Most researchers want knowledge about proteins, not peptides. So it is necessary to map peptide quantities into protein quantities. However, for isoforms, homologous proteins, protein families and sometimes just by chance, peptides happen to be a part of several proteins. You can readily see which proteins in your samples have peptides which are shared by looking for stars in the column Protein Grouping Ambiguity on Scaffold's Samples page. Figure 1. Look for stars in the column Protein Grouping Ambiguity on Scaffold's Samples page to determine which proteins in your samples have shared peptides. When you measure the abundance of such a shared peptide, which protein are you measuring? Scaffold follows one approach, the
2 ProteinProphet approach, when deciding if there is enough evidence to conclude that the protein exists in the MS/MS samples. Scaffold, however, takes a somewhat different approach when assessing protein abundance. For the purposes of protein identification, Scaffold uses a ProteinProphet model, assigning the peptide exclusively to the protein with the most evidence. The result is that the peptide has a weight of 1 in one protein and a weight of zero in all other proteins. The Number of Assigned Spectra option on Scaffold's Samples page shows this way of looking at the spectral counts: each spectrum is counted in at most one protein. However, if you are convinced by independent evidence that two proteins exist, and each protein has the same peptide, then each spectrum for this peptide has ions contributed from both proteins. The Unweighted Spectrum Count option on Scaffold's Samples page will count this spectrum twice, once in the first protein and once in the second protein. This count is unweighted in the sense that the spectrum counts the same in each of the shared proteins. Scaffold counts unweighted spectra for determining protein abundance. An alternative approach is to ignore all the spectra matching any peptide which is shared between proteins. This approach eliminates the ambiguity of how much each protein contributes, but it reduces the number of spectra contributing to the estimate of the protein abundance. Scaffold's Similarity page allows you to uncheck the peptides shared between proteins. Unchecked peptides are removed from the spectral counts. If you do this, then the assigned spectra counts and the unweighted spectra counts are the same. Figure 2. Scaffold's Similarity page allows you to uncheck the peptides which differ between proteins. Scaffold will then combine the proteins into a larger protein group A third approach is to combine into one group all proteins that share a peptide. An example of a situation where this makes sense is if you have a number of IGG proteins which you want to combine into a single protein group. Scaffold's Similarity page allows you to uncheck the peptides which differ between proteins. Scaffold will then combine the proteins into a larger protein group. If you do this, the assigned spectra counts will again match the unweighted spectra counts. A final approach to the shared peptide problem is to weigh the contributions from the two proteins unequally. This requires independent evidence to know how much to weigh each contribution. Scaffold does not yet support this approach.
3 Normalization between samples Scaffold normalizes the MS/MS data between samples. This allows you to compare abundances of a protein between samples. The normalization scheme used works for the common experimental situation where individual proteins may be upregulated or downregulated, but the total amount of all proteins in each sample is about the same. It is not appropriate if the total amount of protein varies widely from one sample to the next. There are two levels of sample in Scaffold. The MS level is the sample run through the mass spectrometer. The second level is the biological sample. Frequently the biological sample is fractionated into multiple MS samples. Scaffold allows you to view the MS samples within a biological sample or to combine all the MS samples into a single sample using the MuDPIT option. Normalization is done on the MS sample level. The normalization method that Scaffold uses is to sum the Unweighted Spectrum Counts for each MS sample. These sums are then scaled so that they are all the same. The scaling factor for each sample is then applied to each protein group and adjusts its Unweighted Spectrum Count to a normalized Quantitative Value. As stated above this normalization method distorts the data if the total protein loaded varies from sample to sample. To see why this is so, first remember that low abundance peptides may be on the edge of detectability. If sample A has a lot of protein loaded, the low abundance peptides may be detected. If sample B has much less protein loaded, these low abundance peptides might not be detected; that is, their spectral count is zero. No amount of scaling is going to change zero to any other number. You can view the normalized data using the Quantitative Values option on the Scaffold's Samples page. If you switch from the Unweighted Spectrum Count to the Quantitative Value and all the values in one column change a lot compared to the values in the other columns, you are seeing evidence of uneven protein loading. If you see this, you may want to think carefully about how you should use this data. There are more sophisticated normalization schemes that attempt to normalize the data in a way that allows you to compare in a semiquantitative way the abundance of one protein, say tubulin beta2, with another protein, say vimentin, in the same sample. Scaffold does not support these schemes in this version. This means that you should exercise caution about trying to draw conclusions about the stoichiometry of the proteins from the spectral data as presented in Scaffold. In particular, you should be cautious about drawing conclusions about differential abundances for proteins where the spectral counts are small numbers. Scaffold tries to mitigate this problem by its treatment of missing values. Missing values For the purposes of the tests for differential protein expression, Scaffold replaces missing values with a specified minimum value. For coefficient of variance, Ttest and ANOVA, this value is set by the user
4 on the Quantitative Analysis Setup page, and it defaults to 0. For foldchanges, however, if there is a missing value, Scaffold sets the ratio to 1. Let s say you do a simple fold change test between sample A and sample B. If sample A has a count of 3 for protein Metallothionein and sample B has a missing value for this protein, then Scaffold will say that the fold change is 1. Scaffold does not flag Metallothionein as having different abundances between the samples. The alternative treatment of replacing missing values with zero would show that Metallothionein has an infinite fold change between sample A and Sample B, which is clearly not supported by the evidence. Note that the normalized values displayed by Scaffold do reflect the userspecified minimum value. The minimum value that is selected in the Quantitative Analysis setup is substituted for all missing values in calculating the Quantitative Values that Scaffold displays. This is true even if no statistical test is selected and the dialog controlling this value is grayed. Test for differential abundance Scaffold provides several tests that you can use to identify proteins which have different abundances in different categories of samples. Which test you use will depend upon your experimental design. In particular it will depend upon the number of replicates you have. In Scaffold you determine which test you want to use by selecting the menu option Experiment / Quantitative Analysis. The Quantitative Analysis dialog box lets you choose which test to use and which samples are being tested. Note: These tests are based upon the data that is being displayed in Scaffold's Samples page. If you filter this data in a different way by, for example, changing the Minimum Peptide probability filter, or the Req Mods filter, or the View lower scoring matches menu option, the displayed data will change, and the tests may select different proteins as having abundance level changes. Fold change The simplest test is the fold change: the ratio of the count in one sample to the count in a second sample. This fold change can give you some information if you only have two samples. As mentioned above, in the case of missing values the ratio is reported as 1. In some other people's presentations you will see the fold change displayed as the log base 2 of the ratio. Scaffold currently only shows the ratio. If you choose to sort your data based on the fold change, remember to look at both the top and the bottom of the sorted data. A 4 to 1 ratio will display as 4, but a 1 to 4 ratio will display as Fold change values need to be interpreted cautiously. A fold change of 2 is much more likely to be significant if the ratio is between 48 and 24 than if it is between 2 and 1. Scaffold's QQ scatter plot, which is discussed below, can give you some guidance in this matter. Scaffold colors green the fold changes less than 0.5 or greater than 2.0. The green color is meant to be a guide so your eye can pick out possible proteins of interest. This does not mean that Scaffold is
5 marking green all things that are statistically different between samples. You have to decide based upon your understanding of the experiment which proteins are really up or down regulated. This caution also applies to the other tests. For example you will have to decide which pvalue level to consider significant in a Ttest or ANOVA test independently of whether Scaffold colored the value green. Coefficient of Variance If you have two or more categories of samples, you can use the Coefficient of Variance test, although three or more are recommended. Let s say that you have four samples A, B, C, and D, each in its own category. The coefficient of variance is the standard deviation of the four values, expressed as a percentage of the mean. A small coefficient of variance means that the four samples have values close together compared to their average value. When the coefficient of variance is big, it then means that at least one of the four samples is different, but it doesn't tell you if it is A, B, C or D. It just tells you that this protein should be examined. You can examine the values on the bar graph on Scaffold's Quantify page. TTest If you have three or more replicates in two different categories, then the Ttest is the appropriate way to tell if the abundances are different between the categories. To use the Ttest, you must set up the biological samples in categories. Typical categories are treated/untreated or disease/control or cell line1/cell line2. We will use Treated/Untreated in the explanation below. The biological samples within each category are replicates. You can set up whatever categories you like when you enter the samples into Scaffold and may change the categories by switching to Scaffold's Load Data page and double clicking on a biological sample. Sometimes people make a distinction between technical replicates and biological replicates. This is a more advanced statistical analysis concept and is not supported by Scaffold. The Ttest is a measure of the distance between the mean of the replicate samples in category Treated from the mean of the replicate samples in category Untreated. This distance is scaled by the standard deviation of the replicates. The results of a Ttest are reported as the probability (pvalue) that this distance between means could occur by chance. A small pvalue means the samples in category Treated are most likely different than those in category Untreated. That is, this scenario is unlikely to have arisen by chance. Many people use a p value of 0.05 as a threshold. You may want to modify this in proteomics experiments since you are measuring many proteins at once. See the discussion on multiple testing below. The Ttest is generally considered a fairly robust test. This means that even if its basic assumptions are violated somewhat, it still tends to be fairly reliable at separating the categories which are the same from those that differ. Some researchers believe that spectral data should be transformed in some way, for example by taking its log, before doing a Ttest. Other researchers may tell you that 3 replicates is not enough to apply the Ttest. Still others believe that more advanced nonparametric tests would work better. So if the Ttest gives a borderline result, you may want to check it carefully. But if your Ttest has a very small pvalue, the robustness of the Ttest means that it is unlikely that a
6 more sophisticated statistical analysis will give a different result. If you try to push things by using the Ttest with less than 3 replicates, it is unlikely to give informative and trustworthy results. Fisher s Exact Test The Fisher's Exact Test, like the Ttest, compares the relative abundance between two sample categories. This test is more appropriate than the Ttest if there are few replicates because it directly calculates the probability of seeing the observed differences between the two samples, rather than relying on a large sample approximation. Like the Ttest, the Fisher s Test produces a pvalue. ANOVA If you have three or more replicates in three or more categories, then the ANOVA test is appropriate. For example, these categories may be different cell lines or different time points. Like the Ttest, the ANOVA test requires replicates in categories. Like the Ttest, less than 3 replicates is untrustworthy and the more replicates are better. Like the Ttest, the ANOVA test reports a pvalue and a small pvalue is good. Like the Ttest, the ANOVA test is fairly robust. The ANOVA test tells you that the variation between your categories is large. That is, it is large compared to the variation within each category. The ANOVA's pvalue tells you how unlikely this is. A small pvalue means that the categories are different. Like the Coefficient of Variance test, the ANOVA test tells you that something is different, but it doesn't tell you what. So if you have categories A, B, C, and D, the ANOVA test can tell you that something is different but not that category C is different. You have to examine the data for the protein, perhaps with Scaffold's Quantify page Bar Chart, to see which category is different. The ANOVA test in Scaffold is a simple oneway ANOVA. More sophisticated ANOVA tests are beyond Scaffold's capability. They are also much more mysterious to nonstatisticians. QQ Scatterplot The QQ Scatterplot on Scaffold's Quantify page plots each protein's normalized spectral count as a dot. The xaxis is the spectral count for one category; the yaxis is the spectral count for the samples in another category. If the categories have multiple replicates, then the average value in the category is plotted. You can elect to see this scatterplot for any two categories in your experiment. If the protein has the same abundance in both categories, it should plot close to the 45 degree line. The scatterplot has estimated error lines plotted. These lines are two standard deviations above and below the 45 degree line. Proteins that are plotted outside these error lines are possibly differentially expressed. Each dot is colored either red or blue. The red dots represent proteins that are flagged as differentially expressed by whatever statistical test (fold change, coefficient of variance, Ttest, ANOVA) is currently applied. The error lines on the QQ Scatterplot are estimated by looking at the Mean/Deviation Scatterplot. This graph plots the standard deviation for each protein against its mean spectral count. In general
7 larger spectral counts will have larger standard deviations. A line is fit to this Mean/Deviation data using least squares. This value is added to the 45% line in the QQ Scatterplot to get the error lines. This approach is based on the work of Michael Washburn s group: see the discussion of the Power Law Global Error Model (PLGEM) in a paper by Pavelka, et.al. in Mol Cell Proteomics 2007 Nov 19. The QQ Scatterplot lets you quickly find the outliers, and it gives you a reasonability check on statistical tests. You may, for example, use it to see whether the fold changes you see are believable. Multiple tests The statistical tests described above give some measure of how different the samples in the various categories are. These measures are either a ratio, a coefficient of variance, or a pvalue. How big do these measures have to be before they are significant? The naive approach is to set an arbitrary value, say a 2fold change, or a pvalue of less than This is what Scaffold does when it colors the fold change or pvalue green. This probably includes some proteins that should not be labeled as differentially expressed but have gotten on the list by chance. A better approach is to sort the proteins by their pvalue so the small pvalues are on top. The proteins at the top of the list are the most likely to be accurately classified as differentially expressed. As you go down the list your confidence in the classification should diminish. The question becomes where you should draw the threshold line. Scaffold leaves this to your judgment. The issue is complicated because considering a set of statistical inferences all together increases the chance of falsely identifying a difference as significant. You may want to seek guidance from a statistician and ask about methods of correction for multiple testing in analyzing the Ttest and ANOVA results. Conclusion Scaffold has addressed most of the concerns that arise while using spectral count to determine which proteins are differentially expressed in a proteomics experiment. Some of the analysis is automatic. But for much of the analysis you must look at your data and decide the right course. Scaffold supports you in this situation by providing you with multiple ways to look at your data so that you can determine the best way to proceed.
Tutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
More informationTwoWay ANOVA tests. I. Definition and Applications...2. II. TwoWay ANOVA prerequisites...2. III. How to use the TwoWay ANOVA tool?...
TwoWay ANOVA tests Contents at a glance I. Definition and Applications...2 II. TwoWay ANOVA prerequisites...2 III. How to use the TwoWay ANOVA tool?...3 A. Parametric test, assume variances equal....4
More informationProteinPilot Report for ProteinPilot Software
ProteinPilot Report for ProteinPilot Software Detailed Analysis of Protein Identification / Quantitation Results Automatically Sean L Seymour, Christie Hunter SCIEX, USA Pow erful mass spectrometers like
More informationData Analysis Tools. Tools for Summarizing Data
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
More informationMascot Search Results FAQ
Mascot Search Results FAQ 1 We had a presentation with this same title at our 2005 user meeting. So much has changed in the last 6 years that it seemed like a good idea to revisit the topic. Just about
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 OneWay ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationComparing three or more groups (oneway ANOVA...)
Page 1 of 36 Comparing three or more groups (oneway ANOVA...) You've measured a variable in three or more groups, and the means (and medians) are distinct. Is that due to chance? Or does it tell you the
More informationPASS Sample Size Software
Chapter 250 Introduction The Chisquare test is often used to test whether sets of frequencies or proportions follow certain patterns. The two most common instances are tests of goodness of fit using multinomial
More informationOneWay ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate
1 OneWay ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,
More informationUsing SPSS, Chapter 2: Descriptive Statistics
1 Using SPSS, Chapter 2: Descriptive Statistics Chapters 2.1 & 2.2 Descriptive Statistics 2 Mean, Standard Deviation, Variance, Range, Minimum, Maximum 2 Mean, Median, Mode, Standard Deviation, Variance,
More informationDeCyder Extended Data Analysis module Version 1.0
GE Healthcare DeCyder Extended Data Analysis module Version 1.0 Module for DeCyder 2D version 6.5 User Manual Contents 1 Introduction 1.1 Introduction... 7 1.2 The DeCyder EDA User Manual... 9 1.3 Getting
More informationDoing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:
Doing Multiple Regression with SPSS Multiple Regression for Data Already in Data Editor Next we want to specify a multiple regression analysis for these data. The menu bar for SPSS offers several options:
More informationUsing SPSS 20, Handout 3: Producing graphs:
Research Skills 1: Using SPSS 20: Handout 3, Producing graphs: Page 1: Using SPSS 20, Handout 3: Producing graphs: In this handout I'm going to show you how to use SPSS to produce various types of graph.
More informationEXCEL Analysis TookPak [Statistical Analysis] 1. First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it:
EXCEL Analysis TookPak [Statistical Analysis] 1 First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it: a. From the Tools menu, choose AddIns b. Make sure Analysis
More informationModifying Colors and Symbols in ArcMap
Modifying Colors and Symbols in ArcMap Contents Introduction... 1 Displaying Categorical Data... 3 Creating New Categories... 5 Displaying Numeric Data... 6 Graduated Colors... 6 Graduated Symbols... 9
More informationSPSS Explore procedure
SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stemandleaf plots and extensive descriptive statistics. To run the Explore procedure,
More informationANOVA Analysis of Variance
ANOVA Analysis of Variance What is ANOVA and why do we use it? Can test hypotheses about mean differences between more than 2 samples. Can also make inferences about the effects of several different IVs,
More informationDealing with Data in Excel 2010
Dealing with Data in Excel 2010 Excel provides the ability to do computations and graphing of data. Here we provide the basics and some advanced capabilities available in Excel that are useful for dealing
More informationAppendix E: Graphing Data
You will often make scatter diagrams and line graphs to illustrate the data that you collect. Scatter diagrams are often used to show the relationship between two variables. For example, in an absorbance
More informationt Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon
ttests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com
More informationComparing Means in Two Populations
Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we
More informationABSORBENCY OF PAPER TOWELS
ABSORBENCY OF PAPER TOWELS 15. Brief Version of the Case Study 15.1 Problem Formulation 15.2 Selection of Factors 15.3 Obtaining Random Samples of Paper Towels 15.4 How will the Absorbency be measured?
More informationMASCOT Search Results Interpretation
The Mascot protein identification program (Matrix Science, Ltd.) uses statistical methods to assess the validity of a match. MS/MS data is not ideal. That is, there are unassignable peaks (noise) and usually
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationEXCEL EXERCISE AND ACCELERATION DUE TO GRAVITY
EXCEL EXERCISE AND ACCELERATION DUE TO GRAVITY Objective: To learn how to use the Excel spreadsheet to record your data, calculate values and make graphs. To analyze the data from the Acceleration Due
More informationNewton s First Law of Migration: The Gravity Model
ch04.qxd 6/1/06 3:24 PM Page 101 Activity 1: Predicting Migration with the Gravity Model 101 Name: Newton s First Law of Migration: The Gravity Model Instructor: ACTIVITY 1: PREDICTING MIGRATION WITH THE
More informationAnalyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6
Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Overview This tutorial outlines how microrna data can be analyzed within Partek Genomics Suite. Additionally,
More informationKSTAT MINIMANUAL. Decision Sciences 434 Kellogg Graduate School of Management
KSTAT MINIMANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To
More informationSimple Linear Regression in SPSS STAT 314
Simple Linear Regression in SPSS STAT 314 1. Ten Corvettes between 1 and 6 years old were randomly selected from last year s sales records in Virginia Beach, Virginia. The following data were obtained,
More informationTIPS FOR DOING STATISTICS IN EXCEL
TIPS FOR DOING STATISTICS IN EXCEL Before you begin, make sure that you have the DATA ANALYSIS pack running on your machine. It comes with Excel. Here s how to check if you have it, and what to do if you
More informationA Guide for a Selection of SPSS Functions
A Guide for a Selection of SPSS Functions IBM SPSS Statistics 19 Compiled by Beth Gaedy, Math Specialist, Viterbo University  2012 Using documents prepared by Drs. Sheldon Lee, Marcus Saegrove, Jennifer
More informationLab 1: The metric system measurement of length and weight
Lab 1: The metric system measurement of length and weight Introduction The scientific community and the majority of nations throughout the world use the metric system to record quantities such as length,
More informationExperimental workflow
Experimental workflow Pg. 3 Lysis Protein quant Protein precipitation Pg. 4 Digest Pgs. 5&6 Label Peptides Peptide quant Ratio check Search, filter & compile data Pgs. 912 Combine Samples LCMS3 Pg. 8
More informationProjects Involving Statistics (& SPSS)
Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,
More informationUsing Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data
Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable
More informationQuantitative proteomics background
Proteomics data analysis seminar Quantitative proteomics and transcriptomics of anaerobic and aerobic yeast cultures reveals post transcriptional regulation of key cellular processes de Groot, M., Daran
More informationTechnology StepbyStep Using StatCrunch
Technology StepbyStep Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate
More informationINTERPRETING THE ONEWAY ANALYSIS OF VARIANCE (ANOVA)
INTERPRETING THE ONEWAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the oneway ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of
More informationAn analysis method for a quantitative outcome and two categorical explanatory variables.
Chapter 11 TwoWay ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that
More informationSpectrum Quality Assessment in Mass Spectrometry Proteomics
Spectrum Quality Assessment in Mass Spectrometry Proteomics 1. Background Rheanna Mainzer Supervised by Dr. Luke Prendergast La Trobe University An important research problem in mass spectrometry is in
More informationExcel Tutorial. Bio 150B Excel Tutorial 1
Bio 15B Excel Tutorial 1 Excel Tutorial As part of your laboratory writeups and reports during this semester you will be required to collect and present data in an appropriate format. To organize and
More informationChoices, choices, choices... Which sequence database? Which modifications? What mass tolerance?
Optimization 1 Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance? Where to begin? 2 Sequence Databases Swissprot MSDB, NCBI nr dbest Species specific ORFS
More informationQUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NONPARAMETRIC TESTS
QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NONPARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.
More informationExcel  Creating Charts
Excel  Creating Charts The saying goes, A picture is worth a thousand words, and so true. Professional looking charts give visual enhancement to your statistics, fiscal reports or presentation. Excel
More informationComparing two groups (t tests...)
Page 1 of 33 Comparing two groups (t tests...) You've measured a variable in two groups, and the means (and medians) are distinct. Is that due to chance? Or does it tell you the two groups are really different?
More informationOnce saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.
1 Commands in JMP and Statcrunch Below are a set of commands in JMP and Statcrunch which facilitate a basic statistical analysis. The first part concerns commands in JMP, the second part is for analysis
More informationMARS STUDENT IMAGING PROJECT
MARS STUDENT IMAGING PROJECT Data Analysis Practice Guide Mars Education Program Arizona State University Data Analysis Practice Guide This set of activities is designed to help you organize data you collect
More informationBill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1
Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Calculate counts, means, and standard deviations Produce
More informationThe Dummy s Guide to Data Analysis Using SPSS
The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationMINITAB ASSISTANT WHITE PAPER
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. OneWay
More informationAnalyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study
Analyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study The data for this study is taken from experiment GSE848 from the Gene Expression
More informationOrganizing image files in Lightroom part 2
Organizing image files in Lightroom part 2 Hopefully, after our last issue, you've spent some time working on your folder structure and now have your images organized to be easy to find. Whether you have
More informationIntermediate PowerPoint
Intermediate PowerPoint Charts and Templates By: Jim Waddell Last modified: January 2002 Topics to be covered: Creating Charts 2 Creating the chart. 2 Line Charts and Scatter Plots 4 Making a Line Chart.
More informationData analysis. Data analysis in Excel using Windows 7/Office 2010
Data analysis Data analysis in Excel using Windows 7/Office 2010 Open the Data tab in Excel If Data Analysis is not visible along the top toolbar then do the following: o Right click anywhere on the toolbar
More informationDirections for using SPSS
Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...
More informationAP Statistics 2001 Solutions and Scoring Guidelines
AP Statistics 2001 Solutions and Scoring Guidelines The materials included in these files are intended for noncommercial use by AP teachers for course and exam preparation; permission for any other use
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationMarkerView Software 1.2.1 for Metabolomic and Biomarker Profiling Analysis
MarkerView Software 1.2.1 for Metabolomic and Biomarker Profiling Analysis Overview MarkerView software is a novel program designed for metabolomics applications and biomarker profiling workflows 1. Using
More informationusing ms based proteomics
quantification using ms based proteomics lennart martens Computational Omics and Systems Biology Group Department of Medical Protein Research, VIB Department of Biochemistry, Ghent University Ghent, Belgium
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationScientific Graphing in Excel 2010
Scientific Graphing in Excel 2010 When you start Excel, you will see the screen below. Various parts of the display are labelled in red, with arrows, to define the terms used in the remainder of this overview.
More information1.5 Oneway Analysis of Variance
Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments
More informationStatCrunch and Nonparametric Statistics
StatCrunch and Nonparametric Statistics You can use StatCrunch to calculate the values of nonparametric statistics. It may not be obvious how to enter the data in StatCrunch for various data sets that
More information6. An Introduction to Statistical Package for the Social Sciences
6. An Introduction to Statistical Package for the Social Sciences 53 Nick Emtage and Stephen Duthy This module provides an introduction to statistical analysis, particularly in regard to survey data. Some
More informationInfinite Campus Grade Book BETA
Infinite Campus Grade Book BETA This tool was released for an open beta testing period. This new Grade Book will continue to exist parallel to the current Grade Book. All Teachers in the Nelson County
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationUsing Minitab for Regression Analysis: An extended example
Using Minitab for Regression Analysis: An extended example The following example uses data from another text on fertilizer application and crop yield, and is intended to show how Minitab can be used to
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationA Determination of g, the Acceleration Due to Gravity, from Newton's Laws of Motion
A Determination of g, the Acceleration Due to Gravity, from Newton's Laws of Motion Objective In the experiment you will determine the cart acceleration, a, and the friction force, f, experimentally for
More informationSPSS: Descriptive and Inferential Statistics. For Windows
For Windows August 2012 Table of Contents Section 1: Summarizing Data...3 1.1 Descriptive Statistics...3 Section 2: Inferential Statistics... 10 2.1 ChiSquare Test... 10 2.2 T tests... 11 2.3 Correlation...
More informationStatistics courses often teach the twosample ttest, linear regression, and analysis of variance
2 Making Connections: The TwoSample ttest, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the twosample
More informationData Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs
Using Excel Jeffrey L. Rummel Emory University Goizueta Business School BBA Seminar Jeffrey L. Rummel BBA Seminar 1 / 54 Excel Calculations of Descriptive Statistics Single Variable Graphs Relationships
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More informationAnalysis of Variance ANOVA
Analysis of Variance ANOVA Overview We ve used the t test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.
More informationSession 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table covariation least squares
More informationMultivariate Analysis of Variance (MANOVA)
Chapter 415 Multivariate Analysis of Variance (MANOVA) Introduction Multivariate analysis of variance (MANOVA) is an extension of common analysis of variance (ANOVA). In ANOVA, differences among various
More informationStatgraphics Centurion XVII (Version 17)
Statgraphics Centurion XVII (currently in beta test) is a major upgrade to Statpoint's flagship data analysis and visualization product. It contains 32 new statistical procedures and significant upgrades
More information14 How to present your data analysis in graphs
14 How to present your data analysis in graphs In the final section of this course, you will learn how to present your data to a standard suitable for publication in a scientific journal (or your thesis
More informationLAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.
More informationExcel 2007 Tutorial  Draft
These notes will serve as a guide and reminder of several features in Excel 2007 that make the use of a spreadsheet more like an interactive thinking tool. The basic features/options to be explored are:
More informationGraphical presentation of research results: How to place accurate LSD bars in graphs
Graphical presentation of research results: How to place accurate LSD bars in graphs A.O. Fatunbi University of Fort Hare Email: afatunbi@ufh.ac.za Introduction S tatistical methods are generally accepted
More informationStepbystep guide to making a simple graph in Excel 2008 for Mac Mariëlle Hoefnagels, University of Oklahoma
Stepbystep guide to making a simple graph in Excel 2008 for Mac Mariëlle Hoefnagels, University of Oklahoma The following tutorial includes barebones instructions for using Microsoft Excel 2008 for
More informationCreating Charts in Microsoft Excel A supplement to Chapter 5 of Quantitative Approaches in Business Studies
Creating Charts in Microsoft Excel A supplement to Chapter 5 of Quantitative Approaches in Business Studies Components of a Chart 1 Chart types 2 Data tables 4 The Chart Wizard 5 Column Charts 7 Line charts
More informationGEOGRAPHIC INFORMATION SYSTEMS Lecture 05: Data Classification
GEOGRAPHIC INFORMATION SYSTEMS Lecture 05: Data Classification Types of Quantitative Thematic Maps (from last lecture) Demonstration: 48states > Layer Properties dialog box > Symbology tab  used to control
More informationAnalyzing Data with GraphPad Prism
1999 GraphPad Software, Inc. All rights reserved. All Rights Reserved. GraphPad Prism, Prism and InStat are registered trademarks of GraphPad Software, Inc. GraphPad is a trademark of GraphPad Software,
More informationCome scegliere un test statistico
Come scegliere un test statistico Estratto dal Capitolo 37 of Intuitive Biostatistics (ISBN 0195086074) by Harvey Motulsky. Copyright 1995 by Oxfd University Press Inc. (disponibile in Iinternet) Table
More informationModule 9: Nonparametric Tests. The Applied Research Center
Module 9: Nonparametric Tests The Applied Research Center Module 9 Overview } Nonparametric Tests } Parametric vs. Nonparametric Tests } Restrictions of Nonparametric Tests } OneSample ChiSquare Test
More informationCan SAS Enterprise Guide do all of that, with no programming required? Yes, it can.
SAS Enterprise Guide for Educational Researchers: Data Import to Publication without Programming AnnMaria De Mars, University of Southern California, Los Angeles, CA ABSTRACT In this workshop, participants
More informationWHICH TYPE OF GRAPH SHOULD YOU CHOOSE?
PRESENTING GRAPHS WHICH TYPE OF GRAPH SHOULD YOU CHOOSE? CHOOSING THE RIGHT TYPE OF GRAPH You will usually choose one of four very common graph types: Line graph Bar graph Pie chart Histograms LINE GRAPHS
More informationTutorial 9: SWATH data analysis in Skyline
Tutorial 9: SWATH data analysis in Skyline In this tutorial we will learn how to perform targeted postacquisition analysis for protein identification and quantitation using a dataindependent dataset
More informationCHARTS AND GRAPHS INTRODUCTION USING SPSS TO DRAW GRAPHS SPSS GRAPH OPTIONS CAG08
CHARTS AND GRAPHS INTRODUCTION SPSS and Excel each contain a number of options for producing what are sometimes known as business graphics  i.e. statistical charts and diagrams. This handout explores
More informationAnalyzing calorimetry data using pivot tables in Excel
Analyzing calorimetry data using pivot tables in Excel 1. Set up the Source Table: Start in format 1. a. Remove the table of weights from the top to a separate page so the top row has the column labels.
More informationLesson Lesson Outline Outline
Lesson 15 Linear Regression Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and
More informationOnce saved, if the file was zipped you will need to unzip it.
1 Commands in SPSS 1.1 Dowloading data from the web The data I post on my webpage will be either in a zipped directory containing a few files or just in one file containing data. Please learn how to unzip
More informationUsing Excel for inferential statistics
FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied
More informationCanonical Correlation
Chapter 400 Introduction Canonical correlation analysis is the study of the linear relations between two sets of variables. It is the multivariate extension of correlation analysis. Although we will present
More informationSimple Tricks for Using SPSS for Windows
Simple Tricks for Using SPSS for Windows Chapter 14. Followup Tests for the TwoWay Factorial ANOVA The Interaction is Not Significant If you have performed a twoway ANOVA using the General Linear Model,
More informationStatistical Inference and ttests
1 Statistical Inference and ttests Objectives Evaluate the difference between a sample mean and a target value using a onesample ttest. Evaluate the difference between a sample mean and a target value
More informationDescriptive Statistics and Measurement Scales
Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample
More information