Normalizing Spectral Counts in Scaffold

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Normalizing Spectral Counts in Scaffold"

Transcription

1 Normalizing Spectral Counts in Scaffold When using MS/MS proteomics data to look for differentially expressed proteins, there are several issues that need to be considered: 1) peptides shared between proteins, 2) normalization between samples, 3) missing values, 4) test for differential abundance, 5) multiple tests These considerations have to be dealt with in some way for any type of MS/MS quantification: label free methods such as spectral counting and empai, and labeling methods such as i-traq and ICAT. Scaffold uses spectral counting for quantification. This document will address how Scaffold handles these concerns specifically for spectral counting. Peptides shared between proteins Tandem mass spectrometers measure spectra. Sequest, Mascot, X! Tandem and similar search engines match these spectra with peptides. Most researchers want knowledge about proteins, not peptides. So it is necessary to map peptide quantities into protein quantities. However, for isoforms, homologous proteins, protein families and sometimes just by chance, peptides happen to be a part of several proteins. You can readily see which proteins in your samples have peptides which are shared by looking for stars in the column Protein Grouping Ambiguity on Scaffold's Samples page. Figure 1. Look for stars in the column Protein Grouping Ambiguity on Scaffold's Samples page to determine which proteins in your samples have shared peptides. When you measure the abundance of such a shared peptide, which protein are you measuring? Scaffold follows one approach, the

2 ProteinProphet approach, when deciding if there is enough evidence to conclude that the protein exists in the MS/MS samples. Scaffold, however, takes a somewhat different approach when assessing protein abundance. For the purposes of protein identification, Scaffold uses a ProteinProphet model, assigning the peptide exclusively to the protein with the most evidence. The result is that the peptide has a weight of 1 in one protein and a weight of zero in all other proteins. The Number of Assigned Spectra option on Scaffold's Samples page shows this way of looking at the spectral counts: each spectrum is counted in at most one protein. However, if you are convinced by independent evidence that two proteins exist, and each protein has the same peptide, then each spectrum for this peptide has ions contributed from both proteins. The Unweighted Spectrum Count option on Scaffold's Samples page will count this spectrum twice, once in the first protein and once in the second protein. This count is unweighted in the sense that the spectrum counts the same in each of the shared proteins. Scaffold counts unweighted spectra for determining protein abundance. An alternative approach is to ignore all the spectra matching any peptide which is shared between proteins. This approach eliminates the ambiguity of how much each protein contributes, but it reduces the number of spectra contributing to the estimate of the protein abundance. Scaffold's Similarity page allows you to uncheck the peptides shared between proteins. Unchecked peptides are removed from the spectral counts. If you do this, then the assigned spectra counts and the unweighted spectra counts are the same. Figure 2. Scaffold's Similarity page allows you to uncheck the peptides which differ between proteins. Scaffold will then combine the proteins into a larger protein group A third approach is to combine into one group all proteins that share a peptide. An example of a situation where this makes sense is if you have a number of IGG proteins which you want to combine into a single protein group. Scaffold's Similarity page allows you to uncheck the peptides which differ between proteins. Scaffold will then combine the proteins into a larger protein group. If you do this, the assigned spectra counts will again match the unweighted spectra counts. A final approach to the shared peptide problem is to weigh the contributions from the two proteins unequally. This requires independent evidence to know how much to weigh each contribution. Scaffold does not yet support this approach.

3 Normalization between samples Scaffold normalizes the MS/MS data between samples. This allows you to compare abundances of a protein between samples. The normalization scheme used works for the common experimental situation where individual proteins may be up-regulated or down-regulated, but the total amount of all proteins in each sample is about the same. It is not appropriate if the total amount of protein varies widely from one sample to the next. There are two levels of sample in Scaffold. The MS level is the sample run through the mass spectrometer. The second level is the biological sample. Frequently the biological sample is fractionated into multiple MS samples. Scaffold allows you to view the MS samples within a biological sample or to combine all the MS samples into a single sample using the MuDPIT option. Normalization is done on the MS sample level. The normalization method that Scaffold uses is to sum the Unweighted Spectrum Counts for each MS sample. These sums are then scaled so that they are all the same. The scaling factor for each sample is then applied to each protein group and adjusts its Unweighted Spectrum Count to a normalized Quantitative Value. As stated above this normalization method distorts the data if the total protein loaded varies from sample to sample. To see why this is so, first remember that low abundance peptides may be on the edge of detectability. If sample A has a lot of protein loaded, the low abundance peptides may be detected. If sample B has much less protein loaded, these low abundance peptides might not be detected; that is, their spectral count is zero. No amount of scaling is going to change zero to any other number. You can view the normalized data using the Quantitative Values option on the Scaffold's Samples page. If you switch from the Unweighted Spectrum Count to the Quantitative Value and all the values in one column change a lot compared to the values in the other columns, you are seeing evidence of uneven protein loading. If you see this, you may want to think carefully about how you should use this data. There are more sophisticated normalization schemes that attempt to normalize the data in a way that allows you to compare in a semi-quantitative way the abundance of one protein, say tubulin beta-2, with another protein, say vimentin, in the same sample. Scaffold does not support these schemes in this version. This means that you should exercise caution about trying to draw conclusions about the stoichiometry of the proteins from the spectral data as presented in Scaffold. In particular, you should be cautious about drawing conclusions about differential abundances for proteins where the spectral counts are small numbers. Scaffold tries to mitigate this problem by its treatment of missing values. Missing values For the purposes of the tests for differential protein expression, Scaffold replaces missing values with a specified minimum value. For coefficient of variance, T-test and ANOVA, this value is set by the user

4 on the Quantitative Analysis Setup page, and it defaults to 0. For fold-changes, however, if there is a missing value, Scaffold sets the ratio to 1. Let s say you do a simple fold change test between sample A and sample B. If sample A has a count of 3 for protein Metallothionein and sample B has a missing value for this protein, then Scaffold will say that the fold change is 1. Scaffold does not flag Metallothionein as having different abundances between the samples. The alternative treatment of replacing missing values with zero would show that Metallothionein has an infinite fold change between sample A and Sample B, which is clearly not supported by the evidence. Note that the normalized values displayed by Scaffold do reflect the user-specified minimum value. The minimum value that is selected in the Quantitative Analysis setup is substituted for all missing values in calculating the Quantitative Values that Scaffold displays. This is true even if no statistical test is selected and the dialog controlling this value is grayed. Test for differential abundance Scaffold provides several tests that you can use to identify proteins which have different abundances in different categories of samples. Which test you use will depend upon your experimental design. In particular it will depend upon the number of replicates you have. In Scaffold you determine which test you want to use by selecting the menu option Experiment / Quantitative Analysis. The Quantitative Analysis dialog box lets you choose which test to use and which samples are being tested. Note: These tests are based upon the data that is being displayed in Scaffold's Samples page. If you filter this data in a different way by, for example, changing the Minimum Peptide probability filter, or the Req Mods filter, or the View lower scoring matches menu option, the displayed data will change, and the tests may select different proteins as having abundance level changes. Fold change The simplest test is the fold change: the ratio of the count in one sample to the count in a second sample. This fold change can give you some information if you only have two samples. As mentioned above, in the case of missing values the ratio is reported as 1. In some other people's presentations you will see the fold change displayed as the log base 2 of the ratio. Scaffold currently only shows the ratio. If you choose to sort your data based on the fold change, remember to look at both the top and the bottom of the sorted data. A 4 to 1 ratio will display as 4, but a 1 to 4 ratio will display as Fold change values need to be interpreted cautiously. A fold change of 2 is much more likely to be significant if the ratio is between 48 and 24 than if it is between 2 and 1. Scaffold's Q-Q scatter plot, which is discussed below, can give you some guidance in this matter. Scaffold colors green the fold changes less than 0.5 or greater than 2.0. The green color is meant to be a guide so your eye can pick out possible proteins of interest. This does not mean that Scaffold is

5 marking green all things that are statistically different between samples. You have to decide based upon your understanding of the experiment which proteins are really up or down regulated. This caution also applies to the other tests. For example you will have to decide which p-value level to consider significant in a T-test or ANOVA test independently of whether Scaffold colored the value green. Coefficient of Variance If you have two or more categories of samples, you can use the Coefficient of Variance test, although three or more are recommended. Let s say that you have four samples A, B, C, and D, each in its own category. The coefficient of variance is the standard deviation of the four values, expressed as a percentage of the mean. A small coefficient of variance means that the four samples have values close together compared to their average value. When the coefficient of variance is big, it then means that at least one of the four samples is different, but it doesn't tell you if it is A, B, C or D. It just tells you that this protein should be examined. You can examine the values on the bar graph on Scaffold's Quantify page. T-Test If you have three or more replicates in two different categories, then the T-test is the appropriate way to tell if the abundances are different between the categories. To use the T-test, you must set up the biological samples in categories. Typical categories are treated/untreated or disease/control or cell line1/cell line2. We will use Treated/Untreated in the explanation below. The biological samples within each category are replicates. You can set up whatever categories you like when you enter the samples into Scaffold and may change the categories by switching to Scaffold's Load Data page and double clicking on a biological sample. Sometimes people make a distinction between technical replicates and biological replicates. This is a more advanced statistical analysis concept and is not supported by Scaffold. The T-test is a measure of the distance between the mean of the replicate samples in category Treated from the mean of the replicate samples in category Untreated. This distance is scaled by the standard deviation of the replicates. The results of a T-test are reported as the probability (p-value) that this distance between means could occur by chance. A small p-value means the samples in category Treated are most likely different than those in category Untreated. That is, this scenario is unlikely to have arisen by chance. Many people use a p- value of 0.05 as a threshold. You may want to modify this in proteomics experiments since you are measuring many proteins at once. See the discussion on multiple testing below. The T-test is generally considered a fairly robust test. This means that even if its basic assumptions are violated somewhat, it still tends to be fairly reliable at separating the categories which are the same from those that differ. Some researchers believe that spectral data should be transformed in some way, for example by taking its log, before doing a T-test. Other researchers may tell you that 3 replicates is not enough to apply the T-test. Still others believe that more advanced non-parametric tests would work better. So if the T-test gives a borderline result, you may want to check it carefully. But if your T-test has a very small p-value, the robustness of the T-test means that it is unlikely that a

6 more sophisticated statistical analysis will give a different result. If you try to push things by using the T-test with less than 3 replicates, it is unlikely to give informative and trustworthy results. Fisher s Exact Test The Fisher's Exact Test, like the T-test, compares the relative abundance between two sample categories. This test is more appropriate than the T-test if there are few replicates because it directly calculates the probability of seeing the observed differences between the two samples, rather than relying on a large sample approximation. Like the T-test, the Fisher s Test produces a p-value. ANOVA If you have three or more replicates in three or more categories, then the ANOVA test is appropriate. For example, these categories may be different cell lines or different time points. Like the T-test, the ANOVA test requires replicates in categories. Like the T-test, less than 3 replicates is untrustworthy and the more replicates are better. Like the T-test, the ANOVA test reports a p-value and a small p-value is good. Like the T-test, the ANOVA test is fairly robust. The ANOVA test tells you that the variation between your categories is large. That is, it is large compared to the variation within each category. The ANOVA's p-value tells you how unlikely this is. A small p-value means that the categories are different. Like the Coefficient of Variance test, the ANOVA test tells you that something is different, but it doesn't tell you what. So if you have categories A, B, C, and D, the ANOVA test can tell you that something is different but not that category C is different. You have to examine the data for the protein, perhaps with Scaffold's Quantify page Bar Chart, to see which category is different. The ANOVA test in Scaffold is a simple one-way ANOVA. More sophisticated ANOVA tests are beyond Scaffold's capability. They are also much more mysterious to non-statisticians. Q-Q Scatterplot The Q-Q Scatterplot on Scaffold's Quantify page plots each protein's normalized spectral count as a dot. The x-axis is the spectral count for one category; the y-axis is the spectral count for the samples in another category. If the categories have multiple replicates, then the average value in the category is plotted. You can elect to see this scatterplot for any two categories in your experiment. If the protein has the same abundance in both categories, it should plot close to the 45 degree line. The scatterplot has estimated error lines plotted. These lines are two standard deviations above and below the 45 degree line. Proteins that are plotted outside these error lines are possibly differentially expressed. Each dot is colored either red or blue. The red dots represent proteins that are flagged as differentially expressed by whatever statistical test (fold change, coefficient of variance, T-test, ANOVA) is currently applied. The error lines on the Q-Q Scatterplot are estimated by looking at the Mean/Deviation Scatterplot. This graph plots the standard deviation for each protein against its mean spectral count. In general

7 larger spectral counts will have larger standard deviations. A line is fit to this Mean/Deviation data using least squares. This value is added to the 45% line in the Q-Q Scatterplot to get the error lines. This approach is based on the work of Michael Washburn s group: see the discussion of the Power Law Global Error Model (PLGEM) in a paper by Pavelka, et.al. in Mol Cell Proteomics 2007 Nov 19. The Q-Q Scatterplot lets you quickly find the outliers, and it gives you a reasonability check on statistical tests. You may, for example, use it to see whether the fold changes you see are believable. Multiple tests The statistical tests described above give some measure of how different the samples in the various categories are. These measures are either a ratio, a coefficient of variance, or a p-value. How big do these measures have to be before they are significant? The naive approach is to set an arbitrary value, say a 2-fold change, or a p-value of less than This is what Scaffold does when it colors the fold change or p-value green. This probably includes some proteins that should not be labeled as differentially expressed but have gotten on the list by chance. A better approach is to sort the proteins by their p-value so the small p-values are on top. The proteins at the top of the list are the most likely to be accurately classified as differentially expressed. As you go down the list your confidence in the classification should diminish. The question becomes where you should draw the threshold line. Scaffold leaves this to your judgment. The issue is complicated because considering a set of statistical inferences all together increases the chance of falsely identifying a difference as significant. You may want to seek guidance from a statistician and ask about methods of correction for multiple testing in analyzing the T-test and ANOVA results. Conclusion Scaffold has addressed most of the concerns that arise while using spectral count to determine which proteins are differentially expressed in a proteomics experiment. Some of the analysis is automatic. But for much of the analysis you must look at your data and decide the right course. Scaffold supports you in this situation by providing you with multiple ways to look at your data so that you can determine the best way to proceed.

Tutorial for proteome data analysis using the Perseus software platform

Tutorial for proteome data analysis using the Perseus software platform Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information

More information

Two-Way ANOVA tests. I. Definition and Applications...2. II. Two-Way ANOVA prerequisites...2. III. How to use the Two-Way ANOVA tool?...

Two-Way ANOVA tests. I. Definition and Applications...2. II. Two-Way ANOVA prerequisites...2. III. How to use the Two-Way ANOVA tool?... Two-Way ANOVA tests Contents at a glance I. Definition and Applications...2 II. Two-Way ANOVA prerequisites...2 III. How to use the Two-Way ANOVA tool?...3 A. Parametric test, assume variances equal....4

More information

ProteinPilot Report for ProteinPilot Software

ProteinPilot Report for ProteinPilot Software ProteinPilot Report for ProteinPilot Software Detailed Analysis of Protein Identification / Quantitation Results Automatically Sean L Seymour, Christie Hunter SCIEX, USA Pow erful mass spectrometers like

More information

Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools. Tools for Summarizing Data Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

More information

Mascot Search Results FAQ

Mascot Search Results FAQ Mascot Search Results FAQ 1 We had a presentation with this same title at our 2005 user meeting. So much has changed in the last 6 years that it seemed like a good idea to re-visit the topic. Just about

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

Comparing three or more groups (one-way ANOVA...)

Comparing three or more groups (one-way ANOVA...) Page 1 of 36 Comparing three or more groups (one-way ANOVA...) You've measured a variable in three or more groups, and the means (and medians) are distinct. Is that due to chance? Or does it tell you the

More information

PASS Sample Size Software

PASS Sample Size Software Chapter 250 Introduction The Chi-square test is often used to test whether sets of frequencies or proportions follow certain patterns. The two most common instances are tests of goodness of fit using multinomial

More information

One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate 1 One-Way ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,

More information

Using SPSS, Chapter 2: Descriptive Statistics

Using SPSS, Chapter 2: Descriptive Statistics 1 Using SPSS, Chapter 2: Descriptive Statistics Chapters 2.1 & 2.2 Descriptive Statistics 2 Mean, Standard Deviation, Variance, Range, Minimum, Maximum 2 Mean, Median, Mode, Standard Deviation, Variance,

More information

DeCyder Extended Data Analysis module Version 1.0

DeCyder Extended Data Analysis module Version 1.0 GE Healthcare DeCyder Extended Data Analysis module Version 1.0 Module for DeCyder 2D version 6.5 User Manual Contents 1 Introduction 1.1 Introduction... 7 1.2 The DeCyder EDA User Manual... 9 1.3 Getting

More information

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices: Doing Multiple Regression with SPSS Multiple Regression for Data Already in Data Editor Next we want to specify a multiple regression analysis for these data. The menu bar for SPSS offers several options:

More information

Using SPSS 20, Handout 3: Producing graphs:

Using SPSS 20, Handout 3: Producing graphs: Research Skills 1: Using SPSS 20: Handout 3, Producing graphs: Page 1: Using SPSS 20, Handout 3: Producing graphs: In this handout I'm going to show you how to use SPSS to produce various types of graph.

More information

EXCEL Analysis TookPak [Statistical Analysis] 1. First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it:

EXCEL Analysis TookPak [Statistical Analysis] 1. First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it: EXCEL Analysis TookPak [Statistical Analysis] 1 First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it: a. From the Tools menu, choose Add-Ins b. Make sure Analysis

More information

Modifying Colors and Symbols in ArcMap

Modifying Colors and Symbols in ArcMap Modifying Colors and Symbols in ArcMap Contents Introduction... 1 Displaying Categorical Data... 3 Creating New Categories... 5 Displaying Numeric Data... 6 Graduated Colors... 6 Graduated Symbols... 9

More information

SPSS Explore procedure

SPSS Explore procedure SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

More information

ANOVA Analysis of Variance

ANOVA Analysis of Variance ANOVA Analysis of Variance What is ANOVA and why do we use it? Can test hypotheses about mean differences between more than 2 samples. Can also make inferences about the effects of several different IVs,

More information

Dealing with Data in Excel 2010

Dealing with Data in Excel 2010 Dealing with Data in Excel 2010 Excel provides the ability to do computations and graphing of data. Here we provide the basics and some advanced capabilities available in Excel that are useful for dealing

More information

Appendix E: Graphing Data

Appendix E: Graphing Data You will often make scatter diagrams and line graphs to illustrate the data that you collect. Scatter diagrams are often used to show the relationship between two variables. For example, in an absorbance

More information

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

ABSORBENCY OF PAPER TOWELS

ABSORBENCY OF PAPER TOWELS ABSORBENCY OF PAPER TOWELS 15. Brief Version of the Case Study 15.1 Problem Formulation 15.2 Selection of Factors 15.3 Obtaining Random Samples of Paper Towels 15.4 How will the Absorbency be measured?

More information

MASCOT Search Results Interpretation

MASCOT Search Results Interpretation The Mascot protein identification program (Matrix Science, Ltd.) uses statistical methods to assess the validity of a match. MS/MS data is not ideal. That is, there are unassignable peaks (noise) and usually

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

EXCEL EXERCISE AND ACCELERATION DUE TO GRAVITY

EXCEL EXERCISE AND ACCELERATION DUE TO GRAVITY EXCEL EXERCISE AND ACCELERATION DUE TO GRAVITY Objective: To learn how to use the Excel spreadsheet to record your data, calculate values and make graphs. To analyze the data from the Acceleration Due

More information

Newton s First Law of Migration: The Gravity Model

Newton s First Law of Migration: The Gravity Model ch04.qxd 6/1/06 3:24 PM Page 101 Activity 1: Predicting Migration with the Gravity Model 101 Name: Newton s First Law of Migration: The Gravity Model Instructor: ACTIVITY 1: PREDICTING MIGRATION WITH THE

More information

Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6

Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Overview This tutorial outlines how microrna data can be analyzed within Partek Genomics Suite. Additionally,

More information

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

More information

Simple Linear Regression in SPSS STAT 314

Simple Linear Regression in SPSS STAT 314 Simple Linear Regression in SPSS STAT 314 1. Ten Corvettes between 1 and 6 years old were randomly selected from last year s sales records in Virginia Beach, Virginia. The following data were obtained,

More information

TIPS FOR DOING STATISTICS IN EXCEL

TIPS FOR DOING STATISTICS IN EXCEL TIPS FOR DOING STATISTICS IN EXCEL Before you begin, make sure that you have the DATA ANALYSIS pack running on your machine. It comes with Excel. Here s how to check if you have it, and what to do if you

More information

A Guide for a Selection of SPSS Functions

A Guide for a Selection of SPSS Functions A Guide for a Selection of SPSS Functions IBM SPSS Statistics 19 Compiled by Beth Gaedy, Math Specialist, Viterbo University - 2012 Using documents prepared by Drs. Sheldon Lee, Marcus Saegrove, Jennifer

More information

Lab 1: The metric system measurement of length and weight

Lab 1: The metric system measurement of length and weight Lab 1: The metric system measurement of length and weight Introduction The scientific community and the majority of nations throughout the world use the metric system to record quantities such as length,

More information

Experimental workflow

Experimental workflow Experimental workflow Pg. 3 Lysis Protein quant Protein precipitation Pg. 4 Digest Pgs. 5&6 Label Peptides Peptide quant Ratio check Search, filter & compile data Pgs. 9-12 Combine Samples LC-MS3 Pg. 8

More information

Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

More information

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable

More information

Quantitative proteomics background

Quantitative proteomics background Proteomics data analysis seminar Quantitative proteomics and transcriptomics of anaerobic and aerobic yeast cultures reveals post transcriptional regulation of key cellular processes de Groot, M., Daran

More information

Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

More information

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

More information

An analysis method for a quantitative outcome and two categorical explanatory variables.

An analysis method for a quantitative outcome and two categorical explanatory variables. Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that

More information

Spectrum Quality Assessment in Mass Spectrometry Proteomics

Spectrum Quality Assessment in Mass Spectrometry Proteomics Spectrum Quality Assessment in Mass Spectrometry Proteomics 1. Background Rheanna Mainzer Supervised by Dr. Luke Prendergast La Trobe University An important research problem in mass spectrometry is in

More information

Excel Tutorial. Bio 150B Excel Tutorial 1

Excel Tutorial. Bio 150B Excel Tutorial 1 Bio 15B Excel Tutorial 1 Excel Tutorial As part of your laboratory write-ups and reports during this semester you will be required to collect and present data in an appropriate format. To organize and

More information

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance?

Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance? Optimization 1 Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance? Where to begin? 2 Sequence Databases Swiss-prot MSDB, NCBI nr dbest Species specific ORFS

More information

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

More information

Excel -- Creating Charts

Excel -- Creating Charts Excel -- Creating Charts The saying goes, A picture is worth a thousand words, and so true. Professional looking charts give visual enhancement to your statistics, fiscal reports or presentation. Excel

More information

Comparing two groups (t tests...)

Comparing two groups (t tests...) Page 1 of 33 Comparing two groups (t tests...) You've measured a variable in two groups, and the means (and medians) are distinct. Is that due to chance? Or does it tell you the two groups are really different?

More information

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences. 1 Commands in JMP and Statcrunch Below are a set of commands in JMP and Statcrunch which facilitate a basic statistical analysis. The first part concerns commands in JMP, the second part is for analysis

More information

MARS STUDENT IMAGING PROJECT

MARS STUDENT IMAGING PROJECT MARS STUDENT IMAGING PROJECT Data Analysis Practice Guide Mars Education Program Arizona State University Data Analysis Practice Guide This set of activities is designed to help you organize data you collect

More information

Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Calculate counts, means, and standard deviations Produce

More information

The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Analyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study

Analyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study Analyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study The data for this study is taken from experiment GSE848 from the Gene Expression

More information

Organizing image files in Lightroom part 2

Organizing image files in Lightroom part 2 Organizing image files in Lightroom part 2 Hopefully, after our last issue, you've spent some time working on your folder structure and now have your images organized to be easy to find. Whether you have

More information

Intermediate PowerPoint

Intermediate PowerPoint Intermediate PowerPoint Charts and Templates By: Jim Waddell Last modified: January 2002 Topics to be covered: Creating Charts 2 Creating the chart. 2 Line Charts and Scatter Plots 4 Making a Line Chart.

More information

Data analysis. Data analysis in Excel using Windows 7/Office 2010

Data analysis. Data analysis in Excel using Windows 7/Office 2010 Data analysis Data analysis in Excel using Windows 7/Office 2010 Open the Data tab in Excel If Data Analysis is not visible along the top toolbar then do the following: o Right click anywhere on the toolbar

More information

Directions for using SPSS

Directions for using SPSS Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...

More information

AP Statistics 2001 Solutions and Scoring Guidelines

AP Statistics 2001 Solutions and Scoring Guidelines AP Statistics 2001 Solutions and Scoring Guidelines The materials included in these files are intended for non-commercial use by AP teachers for course and exam preparation; permission for any other use

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

MarkerView Software 1.2.1 for Metabolomic and Biomarker Profiling Analysis

MarkerView Software 1.2.1 for Metabolomic and Biomarker Profiling Analysis MarkerView Software 1.2.1 for Metabolomic and Biomarker Profiling Analysis Overview MarkerView software is a novel program designed for metabolomics applications and biomarker profiling workflows 1. Using

More information

using ms based proteomics

using ms based proteomics quantification using ms based proteomics lennart martens Computational Omics and Systems Biology Group Department of Medical Protein Research, VIB Department of Biochemistry, Ghent University Ghent, Belgium

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

Scientific Graphing in Excel 2010

Scientific Graphing in Excel 2010 Scientific Graphing in Excel 2010 When you start Excel, you will see the screen below. Various parts of the display are labelled in red, with arrows, to define the terms used in the remainder of this overview.

More information

1.5 Oneway Analysis of Variance

1.5 Oneway Analysis of Variance Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

More information

StatCrunch and Nonparametric Statistics

StatCrunch and Nonparametric Statistics StatCrunch and Nonparametric Statistics You can use StatCrunch to calculate the values of nonparametric statistics. It may not be obvious how to enter the data in StatCrunch for various data sets that

More information

6. An Introduction to Statistical Package for the Social Sciences

6. An Introduction to Statistical Package for the Social Sciences 6. An Introduction to Statistical Package for the Social Sciences 53 Nick Emtage and Stephen Duthy This module provides an introduction to statistical analysis, particularly in regard to survey data. Some

More information

Infinite Campus Grade Book BETA

Infinite Campus Grade Book BETA Infinite Campus Grade Book BETA This tool was released for an open beta testing period. This new Grade Book will continue to exist parallel to the current Grade Book. All Teachers in the Nelson County

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Using Minitab for Regression Analysis: An extended example

Using Minitab for Regression Analysis: An extended example Using Minitab for Regression Analysis: An extended example The following example uses data from another text on fertilizer application and crop yield, and is intended to show how Minitab can be used to

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

A Determination of g, the Acceleration Due to Gravity, from Newton's Laws of Motion

A Determination of g, the Acceleration Due to Gravity, from Newton's Laws of Motion A Determination of g, the Acceleration Due to Gravity, from Newton's Laws of Motion Objective In the experiment you will determine the cart acceleration, a, and the friction force, f, experimentally for

More information

SPSS: Descriptive and Inferential Statistics. For Windows

SPSS: Descriptive and Inferential Statistics. For Windows For Windows August 2012 Table of Contents Section 1: Summarizing Data...3 1.1 Descriptive Statistics...3 Section 2: Inferential Statistics... 10 2.1 Chi-Square Test... 10 2.2 T tests... 11 2.3 Correlation...

More information

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance 2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

More information

Data Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs

Data Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs Using Excel Jeffrey L. Rummel Emory University Goizueta Business School BBA Seminar Jeffrey L. Rummel BBA Seminar 1 / 54 Excel Calculations of Descriptive Statistics Single Variable Graphs Relationships

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

Analysis of Variance ANOVA

Analysis of Variance ANOVA Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

Multivariate Analysis of Variance (MANOVA)

Multivariate Analysis of Variance (MANOVA) Chapter 415 Multivariate Analysis of Variance (MANOVA) Introduction Multivariate analysis of variance (MANOVA) is an extension of common analysis of variance (ANOVA). In ANOVA, differences among various

More information

Statgraphics Centurion XVII (Version 17)

Statgraphics Centurion XVII (Version 17) Statgraphics Centurion XVII (currently in beta test) is a major upgrade to Statpoint's flagship data analysis and visualization product. It contains 32 new statistical procedures and significant upgrades

More information

14 How to present your data analysis in graphs

14 How to present your data analysis in graphs 14 How to present your data analysis in graphs In the final section of this course, you will learn how to present your data to a standard suitable for publication in a scientific journal (or your thesis

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

Excel 2007 Tutorial - Draft

Excel 2007 Tutorial - Draft These notes will serve as a guide and reminder of several features in Excel 2007 that make the use of a spreadsheet more like an interactive thinking tool. The basic features/options to be explored are:

More information

Graphical presentation of research results: How to place accurate LSD bars in graphs

Graphical presentation of research results: How to place accurate LSD bars in graphs Graphical presentation of research results: How to place accurate LSD bars in graphs A.O. Fatunbi University of Fort Hare Email: afatunbi@ufh.ac.za Introduction S tatistical methods are generally accepted

More information

Step-by-step guide to making a simple graph in Excel 2008 for Mac Mariëlle Hoefnagels, University of Oklahoma

Step-by-step guide to making a simple graph in Excel 2008 for Mac Mariëlle Hoefnagels, University of Oklahoma Step-by-step guide to making a simple graph in Excel 2008 for Mac Mariëlle Hoefnagels, University of Oklahoma The following tutorial includes bare-bones instructions for using Microsoft Excel 2008 for

More information

Creating Charts in Microsoft Excel A supplement to Chapter 5 of Quantitative Approaches in Business Studies

Creating Charts in Microsoft Excel A supplement to Chapter 5 of Quantitative Approaches in Business Studies Creating Charts in Microsoft Excel A supplement to Chapter 5 of Quantitative Approaches in Business Studies Components of a Chart 1 Chart types 2 Data tables 4 The Chart Wizard 5 Column Charts 7 Line charts

More information

GEOGRAPHIC INFORMATION SYSTEMS Lecture 05: Data Classification

GEOGRAPHIC INFORMATION SYSTEMS Lecture 05: Data Classification GEOGRAPHIC INFORMATION SYSTEMS Lecture 05: Data Classification Types of Quantitative Thematic Maps (from last lecture) Demonstration: 48states > Layer Properties dialog box > Symbology tab - used to control

More information

Analyzing Data with GraphPad Prism

Analyzing Data with GraphPad Prism 1999 GraphPad Software, Inc. All rights reserved. All Rights Reserved. GraphPad Prism, Prism and InStat are registered trademarks of GraphPad Software, Inc. GraphPad is a trademark of GraphPad Software,

More information

Come scegliere un test statistico

Come scegliere un test statistico Come scegliere un test statistico Estratto dal Capitolo 37 of Intuitive Biostatistics (ISBN 0-19-508607-4) by Harvey Motulsky. Copyright 1995 by Oxfd University Press Inc. (disponibile in Iinternet) Table

More information

Module 9: Nonparametric Tests. The Applied Research Center

Module 9: Nonparametric Tests. The Applied Research Center Module 9: Nonparametric Tests The Applied Research Center Module 9 Overview } Nonparametric Tests } Parametric vs. Nonparametric Tests } Restrictions of Nonparametric Tests } One-Sample Chi-Square Test

More information

Can SAS Enterprise Guide do all of that, with no programming required? Yes, it can.

Can SAS Enterprise Guide do all of that, with no programming required? Yes, it can. SAS Enterprise Guide for Educational Researchers: Data Import to Publication without Programming AnnMaria De Mars, University of Southern California, Los Angeles, CA ABSTRACT In this workshop, participants

More information

WHICH TYPE OF GRAPH SHOULD YOU CHOOSE?

WHICH TYPE OF GRAPH SHOULD YOU CHOOSE? PRESENTING GRAPHS WHICH TYPE OF GRAPH SHOULD YOU CHOOSE? CHOOSING THE RIGHT TYPE OF GRAPH You will usually choose one of four very common graph types: Line graph Bar graph Pie chart Histograms LINE GRAPHS

More information

Tutorial 9: SWATH data analysis in Skyline

Tutorial 9: SWATH data analysis in Skyline Tutorial 9: SWATH data analysis in Skyline In this tutorial we will learn how to perform targeted post-acquisition analysis for protein identification and quantitation using a data-independent dataset

More information

CHARTS AND GRAPHS INTRODUCTION USING SPSS TO DRAW GRAPHS SPSS GRAPH OPTIONS CAG08

CHARTS AND GRAPHS INTRODUCTION USING SPSS TO DRAW GRAPHS SPSS GRAPH OPTIONS CAG08 CHARTS AND GRAPHS INTRODUCTION SPSS and Excel each contain a number of options for producing what are sometimes known as business graphics - i.e. statistical charts and diagrams. This handout explores

More information

Analyzing calorimetry data using pivot tables in Excel

Analyzing calorimetry data using pivot tables in Excel Analyzing calorimetry data using pivot tables in Excel 1. Set up the Source Table: Start in format 1. a. Remove the table of weights from the top to a separate page so the top row has the column labels.

More information

Lesson Lesson Outline Outline

Lesson Lesson Outline Outline Lesson 15 Linear Regression Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and

More information

Once saved, if the file was zipped you will need to unzip it.

Once saved, if the file was zipped you will need to unzip it. 1 Commands in SPSS 1.1 Dowloading data from the web The data I post on my webpage will be either in a zipped directory containing a few files or just in one file containing data. Please learn how to unzip

More information

Using Excel for inferential statistics

Using Excel for inferential statistics FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied

More information

Canonical Correlation

Canonical Correlation Chapter 400 Introduction Canonical correlation analysis is the study of the linear relations between two sets of variables. It is the multivariate extension of correlation analysis. Although we will present

More information

Simple Tricks for Using SPSS for Windows

Simple Tricks for Using SPSS for Windows Simple Tricks for Using SPSS for Windows Chapter 14. Follow-up Tests for the Two-Way Factorial ANOVA The Interaction is Not Significant If you have performed a two-way ANOVA using the General Linear Model,

More information

Statistical Inference and t-tests

Statistical Inference and t-tests 1 Statistical Inference and t-tests Objectives Evaluate the difference between a sample mean and a target value using a one-sample t-test. Evaluate the difference between a sample mean and a target value

More information

Descriptive Statistics and Measurement Scales

Descriptive Statistics and Measurement Scales Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

More information