IDEOM IDentification and Evaluation of Metabolomics data from LC-MS

Size: px
Start display at page:

Download "IDEOM IDentification and Evaluation of Metabolomics data from LC-MS"


1 IDEOM IDentification and Evaluation of Metabolomics data from LC-MS Darren J. Creek Summary: Ideom is an Excel template with many macros that enable userfriendly processing of metabolomics data from raw data files to annotated and hyperlinked metabolite lists. Major features include: A graphical user interface (GUI) for pre-processing raw data with msconvert, XCMS and mzmatch. Additional automated filtering and annotation procedures to remove noise/artefacts from LC-MS data. Automated identification procedure based on exact mass and retention time, with annotation of confidence levels. Data analysis and visualisation tools to enable biological interpretation of results. IDEOM 1

2 Quick Start Guide: Installation: The Ideom template doesn t require any installation, and can be loaded directly in a recent version of Microsoft Excel (2007 or 2010). (or Excel for Mac 2011) For full functionality users require a current installation of R ( and proteowizard tools ( Hyperlinks to Xcalibur (Thermo) and web browsers are included, however these programs are not required for successful implementation of Ideom. Getting Started: i. Open the Ideom.xlsb file in MS Excel and Save As with your study name (e.g. mystudy1.xlsb) ii. Enable Macros by clicking the Options button in the Security Warning ribbon. Select Enable this content. Click OK. iii. (First time only) Install all required R packages by clicking the appropriate blue button in the help section at the bottom of the page. (You may need to scroll down to row 57 to find this). The blue buttons execute functions in R (not within Excel), you may be prompted to select the Rgui.exe file on your computer. Please agree to any Windows security warnings. iv. All settings for XCMS, mzmatch and Ideom processing are located in column E. v. (optional) Up to 10 internal/external standards may be entered into cells U2:AD2. vi. (optional) Red sheets contain advanced settings and databases 2

3 Automated Data Processing - All automated data processing steps can be executed by clicking the blue and green buttons in columns A and B of the settings sheet. - Parameters can be optimised by adjusting the values in column E before running macros. Data Processing from raw LC-MS files - Ideom provides a graphical user interface to allow processing of raw files by other command-line based open source software: msconvert (, xcms ( and mzmatch.r ( - Before processing it is recommended (though not essential) to sort raw (or mzxml) files into folders that classify samples according to study group (e.g. controls, treatments, blanks, etc). - Commence processing by clicking the blue buttons for steps (b), (c) and (d), skip step (b) if using mzxml files. To avoid errors it is recommended to use the combined buttons to automatically execute these steps sequentially. Some of these steps may be time-consuming and desktop PC s usually complete these steps overnight. An mzmatch_output.peakml and mzmatchoutput.txt file should be generated after completion of these steps. - The R script for the processing method is saved as a text file in your Ideom folder. This allows advanced users to process the raw files on a more powerful computer (for large studies), or to include/adjust specific features within XCMS or mzmatch.r if required. Data Processing from peak lists (mzmatch output) 1. Import mzmatch (or mzmine) peak list by clicking step 1 (green button) and follow prompts. 2. Click step 2: Updates metabolite database with retention times based on observed standard retention times. Standard retention times can be imported from another Excel file, or generated by the Ideom functions on the Targeted sheet(s). 3. Click step 3: Runs Ideom filtering and identification macros. 4. (optional): Manually check the rejected sheet for false-rejections of known compounds. 5. Click step 5: Recalibrates mass based on putative identifications. 6. (optional): Use the information summarised on the Identification sheet to manually remove false-identifications. 7. (optional): For Orbitrap Exactive (Thermo) data process both positive and negative modes separately (steps 1-6), then Click step 7 to combine the data. 8. Click step 8: Summarises metabolite levels relative to the control group. The control group may be changed before running this step if required. Use data on the Comparison sheet to directly evaluate metabolomics data and to export to statistical/biochemical websites. 9. (optional) Click step 9 to update the Base-peaks list to attempt to identify unknown peaks that may be of interest. 3

4 IDEOM HELP, TUTORIAL AND DOCUMENTATION Summary:... 1 Quick Start Guide:... 2 Installation:... 2 Getting Started:... 2 Automated Data Processing... 3 Data Processing from raw LC-MS files... 3 Data Processing from peak lists (mzmatch output)... 3 General Information... 6 Processing steps... 7 Raw data processing... 7 a) Manually sort files into folders according to study group... 7 b) Convert RAW to mzxml files and split polarity... 7 c) Run XCMS (and mzmatch) to pick peaks and convert to peakml files... 8 d) Run MZmatch to combine data and annotate related peaks... 8 Ideom processing Import MZmatch data and enter grouping info Update DB with Retention Times Run Identification Macro Manually move any false rejections from 'Rejected' list to 'Identification' list (optional) Recalibrate mass (ppm) Manually check related peaks and isomers (optional) Combine Pos and Neg modes (optional) Compare all sets Assign BasePeaks (optional) Other Tools Targeted Analysis Isotope search Annotate DB Alternative peak list import options Trim File Size zformulagenerator

5 Xcaliburlink Excel Functions Exact Mass: Mass error (ppm) calculation: Formula match from exact mass: Formula validity check: Theoretical Isotope Abundance Calculator: Positive charge (average): Negative charge (average): Definitions Set Types (Groups) Column definitions for results sheets Description of columns in Identification, Rejected, Alldata and allbasepeaks sheets Description of columns in Comparison sheet Hyperlinks and shortcuts from results sheets Shortcuts by double-clicking in columns of Results sheets (Identification, Rejected, allbasepeaks) Additional shortcut for Rejected sheet Shortcuts by clicking in columns of Comparison sheet Tutorial Videos Frequently Asked Questions (FAQ S) I have a list of metabolites... how do I interpret this data? How is metabolite identification confirmed?: What test should I use to find the most significant metabolites? Many of my expected metabolites are being rejected? R scripts don't run on my computer How do you deal with Polarity?: Which file to use for the retention time updater? It runs slowly? TROUBLESHOOTING Known bugs: mzmatch and R help Contact details Appendix A: Evaluating Metabolomics Data with Ideom

6 General Information Save the template with a new filename when starting to process data (Ensure you save it as a macro-enabled workbook..xlsb Excel binary format is recommended). It is recommended to save the file before running each step. Macros are activated by clicking the coloured buttons. In-cell hyperlinks in Excel are activated by double-click. (weblinks by single-click). The Ideom template includes a number of Excel sheets Most functions are on the Settings, Identification and Comparison sheets: o Settings: is the home page and the starting point for all analysis. It contains many basic settings, help documentation and the main macro buttons for processing data. Default settings are suitable for 4.6 mm ZIC-HILIC chromatography (with formic acid/acn/h 2 O, 0.3 ml/min) coupled to the Exactive Orbitrap. o MZmatch: is a blank sheet to allow import of the peak list from mzmatch (or other) o Results sheets are initially empty, but will be written with data and annotations during step 3 (Identification Macro). Results sheets include: Alldata: Information about every peak set in the peak list is written to this sheet Rejected: Peak sets with putative identification, but confidence below 5, are copied to this sheet. Most of these peaks are noise or artefacts, but users may wish to scan them manually for false rejections. Identification: All identified metabolites with confidence of 5 or above are copied to this sheet. allbasepeaks: All Base Peaks (from mzmatch related peaks function) are copied to this sheet unless the peaks are not significant in any group (less than blank). Comparison: Comparative peak intensity data is placed here after running the Compare all sets macro. Many functions for evaluation and visualisation are available on this sheet. o DB: contains the full metabolite database and associated metadata. If required, additional metabolites may be added to the bottom of the list, or additional property columns to the right of existing columns. Additional information may also be added to existing columns, but do not insert new columns between existing data. o RTcalculator and Fragments: contain important tables of information required for the Ideom macros. Advanced users may wish to update these tables with instrument-specific values. o Targeted: allows a targeted analysis of specific metabolites o Tools: demonstrates the utilisation of Ideom s user-defined Excel functions o Method and Samples: Extra sheets to allow you to upload experiment metadata 6

7 Processing steps Raw data processing a) Manually sort files into folders according to study group - Please use Windows Explorer ('My Computer') to create folders for each group of replicates in your study (eg. WT, RES, KO, BLANK, QC, etc), and move relevant raw files into these folders - ** If using ReAdW to convert, you will need to sort out the mzxml files again after step B in order to keep positive and negative mode files in separate folders - ** You may skip this step and process all files without grouping, but if you want mzmatch to group and/or run RSDfilter you will need to put the peakml files into these folders before running step (d) - ** Avoid spaces in folders and filenames... R will make a mess of them - good idea to rename any folders that have spaces in the name. Underscore is OK, for example; use IDEOM_trial rather than IDEOM trial. Don t work in folders within My Documents or Program Files. b) Convert RAW to mzxml files and split polarity - Run step (b) (click blue button) if you are dealing with raw files. (Skip this step if you already have mzxml files.) - This step uses msconvert or ReAdW (through R) to convert raw LC-MS files to the.mzxml format. Msconvert is preferred. - The location of either msconvert.exe or ReAdW.exe on your computer needs to be assigned, and can be stored in cell E43. - **msconvert.exe needs to be in the same folder as the Pwiz dll files (keep all Pwiz files together). ReAdW requires zlib1.dll to be in your windows directory to work correctly - The split polarity option for ReAdW uses the mzmatch split function to separate Exactive files that contain both positive and negative polarity. See mzmatch help for further information - If Waters raw files fail to convert, try removing the ionisation mode filter from the script (otherwise use masswolf) - TIC CHECKER: Available in the R script menu at the bottom of the page allows users to run this additional script in R. It is not essential, but lets you check signal reproducibility of the mzxml files. (Provided by Gavin Blackburn) 7

8 c) Run XCMS (and mzmatch) to pick peaks and convert to peakml files - This step uses the Centwave function in xcms (through R) to pick peaks, then mzmatch converts each individual file to peakml format - Settings for the Centwave function can be changed in cells E4-E11 of the Settings sheet - for full documentation see xcms help ( Briefly, parameters are: o PPM: mass deviation from scan to scan o Peakwidth: range for baseline peakwidth o o S/N threshold: Signal to Noise ratio Prefilter: number of scans greater than a given intensity threshold - To process negative files that were converted with ReAdW you need to use the 'mzdata alternate' method (cell E4), which first converts mzxml to mzdata format before processing. This method is much more memory intensive and if you have more than 20 files you may need to process it in parts, or transfer to a server for processing. (Hence msconvert is preferred for conversion of raw files). d) Run MZmatch to combine data and annotate related peaks - This step uses mzmatch to group individual peakml files, filter peaks and annotate related peaks. - Automated mzmatch functions are: o Group: Groups peaks across replicate samples based on specified mass and RT windows. (optional, requires files to be assigned to group folders before starting) o RSD filter: (Relative Standard Deviation) Removes groups with high variability in peak intensity. (optional, requires previous group function) o Combine: Combines all groups/samples into one peakml file (peak list) o Noise filter: Peak shape is assessed by CoDA-DW score (0-1) o Intensity filter: Features are removed if no sample has a peak above the intensity threshold o Detections filter: Peaks must be present in a minimum number of samples o Gap-filler: Fills gaps in the peak intensity table where peaks were detected in some samples but not others. This requires access to the raw data (mzxml files). 8

9 o o o Related peaks annotation: Annotates peaks that are likely to be ESI artefacts (eg. Isotopes, adducts (Na +, K +, Cl -, ACN,...), fragments, multiply-charged species, dimers, multimers, complex adducts, FT or ringing signals). Based on retention time, peak shape and correlation of peak intensities across samples. Output: to peakml format and peak list table in.txt format - Settings can be changed in cells E14-E20, and E25 on the Settings sheet, additional parameters can be altered by saving the R script and running it externally - for documentation see mzmatch help ( - The default settings should be a good starting point, though you might need to change RSD filter (E16) if you have messy data, or Minimum Detections # (E19) if you have less than 3 replicates in a group. - **Gapfiller requires mzxml files to be in the same location as when they were converted to peakml files (step c), therefore it is a good idea to run steps (c) and (d) together (on the same machine) by using the 'COMBINED' button - **Gapfiller is very memory intensive. If the process crashes at this step try closing the R session, and continuing the script manually from the gapfiller stage. If this doesn't work then tighten the filters (RSD, mindetections) to make smaller peakml files. OR run it on a machine with more RAM. 9

10 Ideom processing Whilst most Ideom processing is automated, some study-specific input is required from the user. Optimal results are achieved by clicking steps 1-9 (green buttons) and following the on-screen prompts. Steps 1 and 2 can be run in any order, but must both be completed before running the main processing macro (step 3). 1. Import MZmatch data and enter grouping info - Use this function to import data from the mzmatchoutput.txt file produced by mzmatch. If you have already entered data manually, or by the 'import example data' or 'import Mzmine data' buttons, you may press cancel at the import file screen to skip this step. - The second part of this function asks the user to enter grouping information. 'autofill' can be used if the prefix of the sample names refers to the grouping, otherwise manually select groups using the 'add' buttons. - Set-Type needs to be selected for each group using the drop-down lists: always set one group as 'Treatment' and another as 'Control' to allow comparisons - If you have more than 15 groups there is a second tab with space for 15 more groups. - Ideom currently supports a maximum of 30 groups 10

11 - The third part of this function plots average sample intensities to allow a quick check of whether the data is consistent. Internal (external) standards will also be plotted if you have entered them in U2-AD2 of the settings sheet - The fourth part gives the option of normalising the data either by TIC, median, or user-defined values (column R on settings sheet). Normalisation is not routinely recommended for LC-MS data due to non-linear responses and the unpredictability of ion-suppression. - If you subsequently decide to normalise the data, you may re-run the whole step 1 any time before running step 3 (Identification Macro). - ** mzmatch output data needs to have replicates in adjacent columns. Please adjust this (by changing column order in the txt file) prior to import. This should automatically happen if your samples are labelled by group, and/or you used the mzmatch grouping function. 2. Update DB with Retention Times - This function enters standard retention times into the database (DB sheet), and (optional) enters predicted retention times for other metabolites. - A list of retention times from authentic standards is required (create this list either using the Targeted Sheet, or externally using Toxid, Xcalibur or similar) - The list of standard RTs may be either imported from any excel readable file, or entered directly into columns A and B on the 'RTcalculator' sheet - If importing.csv files of retention times: "_" in metabolite names will be replaced with "," - All authentic standards (column A) must have names that exactly match those in the DB sheet. - RT calculator uses physico-chemical properties in the DB sheet to predict retention times based on a multiple linear regression model with the authentic standards. (QSPR approach) - You have the opportunity to check the model fit before annotating all metabolites in the database. - If there is no good prediction model you can still use this function to upload standard retention times for those metabolites where you have authentic standards. - Rtcalculator sheet: The column (cell O8) and dead volume (mins) should be entered before running this macro (cell O9). Other data in columns N-U show the accuracy of the current RT prediction model. - Rtcalculator sheet: The mass range for application of the prediction model is defined in cells O10 and Q10. The default model is accurate for the Formic Acid:ZIC-HILIC method from MW Rtcalculator sheet: Columns W:X allow standard retention times to be uploaded to the database without being included in the prediction model. (e.g. for large metabolites outside the validated mass range) 11

12 - Rtcalculator sheet: Columns Z:AI allow predicted retention times to be uploaded to the database based on class properties, according to specific annotations in the DB sheet - Rtcalculator sheet: Headers E1:J1 can be adjusted to other phys-chem properties (from drop-down menus) if you wish to attempt to apply RT prediction to different chromatography (select manual column). For the ammonium carbonate:philic method change the column in cell O8 and agree to adjust ph to 9. - If you don't run this function then metabolite identification will only be based on exact mass (not retention time) - hence you will get a lot more false-identifications. 3. Run Identification Macro - Before running this macro check all settings in column E of the Settings sheet. Note: Settings for RSDfilter, intensity filter, minimum detections and related peaks window (under mzmatch settings, cells E16:E20) are also used in Ideom functions. - RT and mass windows (E23:E25) may need to be changed depending on the quality of your data. - Preferred DB (E33) should be chosen from the dropdown list. 'Central' refers to central pathways in KEGG (Cofactors and vitamins, amino acid, nucleotide, carbohydrate, lipid and energy metabolism). Additional organisms/annotations may be added to the database using the purple Annotate DB macro, or by manually appending a column to the end of the DB sheet (column BH). - Additional adducts (other than H + /H - ) can be searched by selecting them in cells D36:F38 on the settings sheet. (mzmatch has already corrected the data for 1 proton) - This macro runs most of the Ideom functions to filter data and identify peaks, data is recorded in the 'alldata' sheet, and then copied to the other results sheets. Functions applied to each peak include: - *SIGNIFICANT/NOISE: If a peak is present for every sample in a group and each peak is greater than every peak in the blanks, and the RSD is below the RSD filter threshold, then the metabolite detection is considered significant and that group name is appended to the 'groups' column (L). - * SIGNIFICANT/ NOISE: Peaks that are not significant in any group are assigned the lowest confidence (0) - *CHARGE: If 13 C isotope is present, the detected charge-state is recorded and taken into consideration for all mass- (or m/z-) dependent functions - *COMMON related peaks: peaks within the 'related peaks' RT window that are related to a larger peak by a mass difference (and relative intensity) in the Common Related Peaks table (Fragments sheet, columns H:M). mzmatch annotations are also checked for common related peaks (Fragments sheet, columns A:C). - *COMMON related peaks: These are annotated in the 'addfrag' column (P) with names starting with "x", and given low confidence (0.1) 12

13 - *DUPLICATES: Duplicate masses (within ppm error in settings E25) are determined by the duplicate or shoulder peak conditions in the settings (E28:E30), or if the intensity is less than 1% of the biggest peak with that mass or if the peak intensities correlate (Pearson s r > 0.95) with a more intense metabolite with that same mass - *DUPLICATES: The EIC Rel Intensity column records the intensity (in the most intense sample) relative to the highest intensity for that mass. Duplicates are given a low confidence (0.2). - *RSD filter: The RSD (relative standard deviation) is calculated for each group of replicates. The maximum RSD is entered in column T, and the RSD for the QCs is entered in column S. If no QC's are assigned then the RSD for the 'Treatment' group is entered in column S. - *RSD filter: o STRICT: If any group has RSD larger than this setting, the metabolite is considered unreliable and given low confidence (0.4). o TECHNICAL: If the QC group (or Treatment group in the absence of a QC group) has o RSD larger than this setting, the metabolite is given low confidence (0.4). GENEROUS: No specific RSD filter is applied because if no group has an RSD below the filter threshold then the peak is not significant and has confidence level of 0. NB: Groups with mean intensity below the LOQ are excluded from this test (as they often have higher RSD). - *INTENSITY filter: The maximum intensity for all included samples is entered in column U. If this is less than the Intensity filter (LOQ) then a low confidence is given (0.5) - *FORMULA ASSIGNMENT: based on the closest match to the database (within mass window) - if 2 matches within specified mass window the mass error for the furthest formula is noted in 'altppm' column (K) - *OTHER ADDUCTS: All Basepeaks (see mzmatch) are checked for other 'search adducts' according to the user selections in cells D36:F38 on the settings sheet - If a formula has already been assigned, the mass error for the adduct is noted in the 'altppm' column (K) - *METABOLITE IDENTIFICATION: for all formulae based on (a) standard retention time (if present) (b) predicted retention time (if present) - *ISOMERS: If metabolite assignment remains ambigous, the first matching metabolite in the DB is assigned. NOTE: DB has been sorted to give preference to your 'preferred DB', then to 'central' metabolites, then to metabolites present in more source databases (KEGG, MetaCyc, etc), then to lowest metabolite ID number in source databases - *FRAGMENTS: Known fragments of specific metabolites are removed (according to the table in "Fragments" sheet columns Q:U, from standards analysed on the HILIC-Orbitrap system) - *OTHER RELATED PEAKS: Related peaks determined by mzmatch are annotated in the 'addfrag' column (P) according to possible mass differences in the "Fragments" sheet (columns A:C). Annotations in Column C of the "Fragments" sheet designate whether specific adducts/fragments are automatically rejected with confidence level (0.4) or merely annotated with this information about a potential relationship. - *ISOTOPES: 13 C Isotope abundances are recorded in column Q as the %error relative to the theoretical isotope abundance for the putative metabolite formula, or for unidentified peaks, as the theoretical number of carbon atoms in the formula. 13

14 - *CONFIDENCE: Arbitrary confidence levels are assigned based on retention time error, or annotation as 'Xenobiotic' in 'Map' column of DB sheet. Confidence is modified by whether the metabolite is in the 'preferred' database, and mzmatch annotation as either 'base' or 'related' peak. See columns N:O of settings sheet for more details. Confidence levels for each filter may be adjusted if desired. - *FILTERING: All metabolites with confidence 5 or greater are copied to the 'Identification' sheet, those below 5 are copied to the 'rejected' sheet - *DUPLICATE ANNOTATIONS: If duplicates remain with the same identification, the identity of the smaller peak is changed to the next most likely metabolite that satisfies the retention-time window - *ADDITIONAL CALCULATIONS: detailed information is entered into columns A:AA for each metabolite. Described below. 4. Manually move any false rejections from 'Rejected' list to 'Identification' list (optional) - Only required for a thorough analysis: Metabolites that may have been wrongly rejected should appear near the top of the 'rejected' sheet. If you believe a rejected metabolite is real, double-click the 'Confidence' value to see if that metabolite is already in your identification sheet. - Wrongly rejected metabolites can be moved to the 'Identification' sheet by using the 'Retrieve Row' function 5. Recalibrate mass (ppm) - This macro can be started from either this settings sheet, or from the button at the top of the 'identification' sheet - It will plot the relationship between mass and mass accuracy (ppm error) for all 'identified' metabolites, with standards in red, and fit a 5th order polynomial function. (this should allow for the calibration errors observed on Thermo Orbitrap) - If the polynomial function appears to fit your data, agree to recalibrate masses. If the curve is not a good fit, but you see a trend, consider manual re-calibration efforts. - After calibration, check the new plot of mass errors, and set a new ppm window to remove outliers (false-identifications). In some cases it is worth checking the rejected peaks (bottom of 'rejected' list) for alternative identifications by clicking the 'altppm' column - If significant adjustment was made consider re-running analysis from step 3 (with the recalibrated data) to improve identification 14

15 6. Manually check related peaks and isomers (optional) - For thorough analysis, check all the information supplied for each peak in the 'identification' sheet. You may wish to skip this step initially, and return later to double-check specific metabolites of interest. Whilst it is always a good idea to return to raw data for confirmation of specific metabolites, the identification sheet allows rapid access to a large amount of meta-information to simplify the process of manual data curation and metabolite identification. - Click the orange colour macro' to make viewing of data easier - Click the orange 'Hyperlinks' macro to allow links to metabolite websites. Click again to turn off hyperlinks if it slows your computer too much. Hyperlink websites can be edited in the hyperlinks table in columns AF:AI on the settings sheet - Isomers: Sort by the Formula column and remove duplicates that are due to bad chromatography. (click in the RT column for a plot of relative intensities and quicklinks to Xcalibur or peakml chromatograms) - Click the light-blue peakml export button to filter a peakml file by identified masses, which automatically generates a pdf of chromatograms (or can be viewed in peakml viewer). NOTE: mzmatch currently doesn t allow identification by retention time, therefore the exported peakml file and pdf may contain isomeric peaks that have already been rejected by Ideom. Double-check the retention times before removing masses based on chromatographic peak shape. - Related Peaks: Click the orange 'Sort By Relation ID' button and look in columns M:P for ESI artefacts that were not filtered by the common adducts search (e.g. Metabolite-specific fragments). Double-click in the mass (column A) to see all co-eluting peaks red masses are related according to mzmatch. - Also look at 'C isotope error' (column Q) to see if the main isotope intensity doesn't match the formula, Related Peaks (column R) to see if all the fragments and isotopes are possible from the identified structure, and 'adduct' (column Y) to check whether the identified adduct (e.g. 2+, Na) is likely. (the 'charge' column at the end may help this if a 13 C isotope was present, but it is sometimes not correct in noisy data). - Alternative identification: Easily change identification by clicking the metabolite name (column E), then select from the isomers in the dropdown list. Further information about these isomers can be obtained by double-clicking the isomer number (column D). Alternative formulas can be investigated for individual masses by double-clicking the formula (column C). - Use the 'Remove row' button to easily transfer metabolites from the 'identification' sheet to the 'rejected' sheet. There is an option to merge peaks, which would be used for compounds with poor peak shape (e.g. lipids on a HILIC column) that appear with the same mass but slightly different retention times. If merging peaks it keeps the largest intensity from each sample. - To speed up viewing data you may set Excel's calculation to 'manual' (Formulas >> Calculation Options) - but be aware that you need to manually calculate if you make any changes. Some macros will automatically return calculation to 'Automatic' (Formulas >> Calculation Options). - There is an option to normalise data at this stage (purple button) to allow normalisation based on (putatively) identified metabolite peaks, rather than normalising on total LC-MS signal or peaks (which includes much more noise). - Columns W and X allow a rapid comparison of relative intensity and p-value (from t-test) between the 2 groups specified as control and treatment. - Compare 2 other groups (purple button) allows rapid comparison of two groups of samples with means, standard deviations and t-test. 15

16 - Compare with medium (purple button) gives similar statistics, and also checks whether all sample intensities are greater than all medium intensities. 7. Combine Pos and Neg modes (optional) If you have data from the Orbitrap Exactive in switching polarity mode, each polarity must be processed separately up to this point. The step combines all data from Identification, Rejected and allbasepeaks sheets. (but not the alldata sheets) If a metabolite gives peaks in both polarities (same formula and RT within 'duplicatepeaks' window setting), the one with the highest maximum intensity is chosen. 8. Compare all sets This macro takes the data from the 'identification' sheet, and summarises it in the 'Comparison' sheet. An option exists to also include all significant BasePeaks (regardless of their identification) to allow analysis of unidentified metabolites. It also makes the metabolite names in the 'identification' sheet dependent on the name in the 'comparison' sheet. So if you select a different name for a peak in the 'comparison' sheet, it will be reflected in the 'identification' sheet. Relative Intensities are expressed relative to the 'control' group (from the Settings sheet) P values are for unpaired t-test against 'control' group. Mean peak intensities, standard deviations, Relative Standards Deviations (RSD) and Fisher ratios are also given The correlation column gives a rank order that can enable sorting by intensity correlation (across samples) relative to all other metabolites Correlation Sort allows you to sort the list by intensity correlation (across samples) to a specified metabolite 16

17 The Sort function allows you to quickly sort by any column (e.g. intensity, pathway, p-value) Use Excel s native Sort and Autofilter functionality to improve data visualisation Columns G and H can be changed to any variable in the DB sheet Column I can be changed to any variable in the Identification sheet Double-clicking in column E (name) gives an intensity plot (with standard deviations) for that metabolite. Double-clicking in column I gives individual sample intensities. Double-clicking in column D (Isomers) gives information about the isomers in the database Double-clicking in column F (confidence) gives information from the 'identification' sheet that helps with determination of the identity confidence. Various summary plots are available from the orange Graphing button at the top Various export formats are available from the light blue Export button *To speed up viewing data you may set Excel's calculation to 'manual' (Formulas >> Calculation Options) - but be aware that you need to manually calculate if you make any changes. Some macros will automatically return calculation to 'Automatic'. 9. Assign BasePeaks (optional) This macro looks at your current 'Identification' and 'Rejected' sheets to update the information in the 'allbasepeaks' sheet to reflect any changes since the initial identification step, and annotates any 'related peaks' if the 'basepeak' itself could not be identified. You may wish to use the allbasepeaks sheet for untargeted statistical analysis. 17

18 Other Tools Targeted Analysis Targeted analysis for specific metabolites can be undertaken from the Targeted sheet. By following the process indicated by the buttons at the top of the page. i. Enter the Metabolite names in Column A. ii. Click step 2 to upload information for each metabolite from the metabolite database (DB sheet). For unique metabolites and/or masses enter the RT, formula and/or mass directly. iii. This step uses msconvert/mzmatch (through R) to process either raw, mzxml or peakml files and filters the results according to the specified masses. This function uses the mzmatch peak extraction method (not XCMS), and users are expected to visually inspect the resulting chromatograms. Output files include a text file, peakml file and pdf with chromatograms. Parameters are taken from the Setting sheet. iv. The results from step iii are searched to return results for the specific metabolites to the targeted page. Alternatively, any text file from mzmatch, or the data on the mzmatch sheet, can be used for this step. If multiple peaks are present it first takes the most intense peak within the RT window setting for standard RT (settings sheet; cell E23), then the most intense peak within the RT window setting for calculated RT (settings sheet; cell E24). Metabolites with no expected RT are assigned the peak with the largest intensity. v. The targeted analysis is commonly used to obtain retention times for authentic standards, and hence there is an option to export these results to the RT calculator which is used in the regular untargeted Ideom processing method. Additional sheets are provided containing the mixtures of standards that Scotmet uses in our standards mixtures for calibrating retention times. Isotope search This tool provides an untargeted search for labelled metabolites in data that was obtained using stable-isotope labelled precursors. Run this from the tools menu in comparison, allbasepeaks or Identification sheets to search for isotopes of putatively identified compounds. The isotope search macro looks within the RT window for related peaks. The isotope search result is the relative abundance, i.e. the ratio of the isotope peak to the unlabelled peak in each specified sample. The result is shaded yellow if the relative abundance is more than 10% greater than the expected abundance of the natural isotope of the unlabelled metabolite. This macro will only find isotopes if they were imported to IDEOM from mzmatch. In some cases isotopes are missed because the peaks were not detected by XCMS. To find these isotopes in raw data use the Targetted Isotopes macro in the export menu. Metabolite search This searches for a given metabolite in the dataset, and will also flag up isomers that have been found. 18

19 Add Chromatograms This allows you to easily import chromatograms from a peakml file as comments in the mass cells of Ideom (i.e. they appear when you hover the mouse over masses in column A). This function will work on any results sheet, although is usually used on the Comparison sheet. Running this function (click the light blue button and follow prompts) generates a list peak group identifiers from your current visible metabolite list, and extracts these chromatograms for each metabolite as image files. ** Ensure that you select the same peakml files that you used to obtain the initial mzmatch_output.txt files for Ideom. The chromatogram image files are saved into a new chromatograms folder and then automatically uploaded to the Ideom file. If your computer doesn t have enough memory to process the peakml file then R may crash or close, and Excel will hang... Hit Esc twice to regain control of Excel. For large studies you may upload chromatograms that were generated externally. They must be saved in a folder named chromatograms in the same directory as your peakml file. (If processing dual polarity files the negative-mode chromatograms should be in a folder named chromatograms_neg ). The script for generating chromatograms externally can be obtained from the R scripts menu on the settings sheet. This function will work for both single-polarity and dual-polarity datasets. Annotate DB This allows you to easily add annotations to the database from another Excel-readable file. Metabolite matching can be based on any column in the DB (eg. name, KEGG ID, INCHIkey), but note that matches must be exact. You could also use this function to compare results from different files (basically it is a lookup/matching function between two files). Alternative peak list import options Mzmine input format requires column A = m/z, column B = RT (mins), peak areas in any other set of adjacent columns (macro looks for a heading with the suffix "peak area") Other input format (e.g. from MetAlign or other pre-processor) can be manually entered onto the MZmatch sheet in the following format: column A = corrected mass (i.e. m/z ), column B = RT (seconds), peak intensities in adjacent columns starting from column C, other data may be included in columns to the right. Trim File Size Deletes all non-essential information from the Ideom file to make it smaller. Some functions (e.g. RTcalculator) may not work correctly after running this macro, but most should be fine. Only use this if you are having trouble ing your results to a collaborator or uploading to a journal site because the file size is too large. 19

20 zformulagenerator This macro is activated from the menu obtained by double-clicking a formula cell (or empty formula cell) in column C of any results sheet. The mass from column A is copied, and the rcdk package in R is used to determine all possible theoretical formulas (with some constraints). Results are pasted onto a new sheet in Ideom and tested for validity by 5 of Kind & Feihn s heuristic rules, with the most likely formula appearing at the top, with the option to search for this formula on the Chemspider website. (Kind & Feihn; BMC BIOINFORMATICS, 8: Art. No. 105 MAR Xcaliburlink Select a cell containing a mass of interest and press Ctrl-Shift-X to provide a quick link to an extracted ion chromatogram in Xcalibur Qualbrowser. (Note: On the first occasion you will be prompted to open the specific raw file in Xcalibur manually, however once open, this allows you to check the raw data for multiple metabolites very quickly. Please ensure that the chromatogram window is active in Qualbrowser, that the mass tolerance (ppm) is appropriate, and that the mass precision is set to at least 4 decimal places.) This requires a current installation of Xcalibur, otherwise you may also use this macro to extract the chromatograms for this specific mass from a peakml file through mzmatch.r (however this option is not instantaneous). 20

Quick Start Guide Chromeleon 7.2

Quick Start Guide Chromeleon 7.2 Chromeleon 7.2 7229.0004 Revision 1.0 July 2013 Table of Contents 1 Introduction... 1 1.1 About this Document... 1 1.2 Other Documentation... 2 2 Using Chromeleon... 3 2.1 Overview... 3 2.2 Starting Chromeleon...

More information

GelCompar. Quick Guide. Version 6.5.

GelCompar. Quick Guide. Version 6.5. GelCompar II Quick Guide Version 6.5 Contents 1 Database 3 1.1 Getting started 5 1.1.1 Introduction.......................................... 5 1.1.2 Software installation.....................................

More information

Guide to Using AMS 4.0 Marking Software

Guide to Using AMS 4.0 Marking Software Guide to Using AMS 4.0 Marking Software Guide to Using AMS 4.0 Marking Software Contents System Requirements...2 Software Installation...2 Selecting the Output Device and Changing Settings...2 Definitions...

More information

DHIS 2 End-user Manual 2.19

DHIS 2 End-user Manual 2.19 DHIS 2 End-user Manual 2.19 2006-2015 DHIS2 Documentation Team Revision 1529 Version 2.19 2015-07-07 21:00:22 Warranty: THIS DOCUMENT IS PROVIDED BY THE AUTHORS ''AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,

More information

Help File Version 1.0.14

Help File Version 1.0.14 Version 1.0.14 By Welcome Thank you for purchasing OptimumT the new benchmark in tire model fitting and analysis. This help file contains information about all

More information

Introduction to Endnote X6 Handbook

Introduction to Endnote X6 Handbook Introduction to Endnote X6 Handbook Updated October 2012 Contents 1. Introduction... 4 2. Working with EndNote libraries... 5 2.1 Starting EndNote... 5 2.2 Opening libraries... 5 2.3 EndNote: main screen...

More information

7.1 General 7.2 Preview 7.3 Scan Engine 7.3.1 Cloning Preferences 7.3.2 File Modules Top Level File Modules Preferences 7.3.2.

7.1 General 7.2 Preview 7.3 Scan Engine 7.3.1 Cloning Preferences 7.3.2 File Modules Top Level File Modules Preferences 7.3.2. Table of Contents 1 Introduction 1.1 Welcome 1.2 What Makes Data Rescue Different? 1.3 Latest Version of the Software 1.4 Contact Prosoft Engineering 1.5 About the Demo 1.6 System Requirements 1.7 General

More information


TECHNICAL REFERENCE GUIDE TECHNICAL REFERENCE GUIDE SOURCE Microsoft Exchange/Outlook (PST) (version 2003, 2007, 2010) TARGET Microsoft Exchange/Outlook (PST) (version 2013) Copyright 2014 by Transend Corporation EXECUTIVE SUMMARY

More information

icar itrain for Modelcars Dinamo/MCC

icar itrain for Modelcars Dinamo/MCC icar itrain for Modelcars Dinamo/MCC Manual 2.0 2014 This manual is written and supplied by MCC-ModelCarParts. This document, or any information contained herein, may not in whole or in parts be copied

More information

FrontStream CRM Import Guide Page 2

FrontStream CRM Import Guide Page 2 Import Guide Introduction... 2 FrontStream CRM Import Services... 3 Import Sources... 4 Preparing for Import... 9 Importing and Matching to Existing Donors... 11 Handling Receipting of Imported Donations...

More information


IDEP FOR WINDOWS USER MANUAL IDEP FOR WINDOWS USER MANUAL Version 2011 BE National Accounts Institute National Bank of Belgium External statistics Boulevard de Berlaimont 14 B-1000 Brussels Table of contents 1 INTRODUCTION.... 1 1.1

More information

VanillaSoft. Administrator s Manual

VanillaSoft. Administrator s Manual VanillaSoft Administrator s Manual Copyright 2009 VanillaSoft Inc. Contents Contents Overview... 5 This Manual...5 Sections...5 Conventions and Terminology...5 VanillaSoft...6 Basics...6 Logging In...7

More information

Context-sensitive Help Guide

Context-sensitive Help Guide MadCap Software Context-sensitive Help Guide Flare 11 Copyright 2015 MadCap Software. All rights reserved. Information in this document is subject to change without notice. The software described in this

More information

ithenticate User Manual

ithenticate User Manual ithenticate User Manual Updated November 20, 2009 Contents Introduction 4 New Users 4 Logging In 4 Resetting Your Password 5 Changing Your Password or Username 6 The ithenticate Account Homepage 7 Main

More information

Contents. A-61623 July 2008 i

Contents. A-61623 July 2008 i Contents Image Processing......................................................... 1 Overview.......................................................... 1 Terminology and features..............................................

More information

Getting Started with SAS Enterprise Miner 7.1

Getting Started with SAS Enterprise Miner 7.1 Getting Started with SAS Enterprise Miner 7.1 SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc 2011. Getting Started with SAS Enterprise Miner 7.1.

More information

Harmony Ultimate One User Guide

Harmony Ultimate One User Guide Harmony Ultimate One User Guide Version 1 (2014-02- 11) Harmony Ultimate One User Guide Ultimate One Table of Contents About this Manual... 6 Terms used in this manual... 6 At a Glance... 6 Features...

More information

A Guide to Sample Size Calculations for Random Effect Models via Simulation and the MLPowSim Software Package

A Guide to Sample Size Calculations for Random Effect Models via Simulation and the MLPowSim Software Package A Guide to Sample Size Calculations for Random Effect Models via Simulation and the MLPowSim Software Package William J Browne, Mousa Golalizadeh Lahi* & Richard MA Parker School of Clinical Veterinary

More information



More information

Tutorial for proteome data analysis using the Perseus software platform

Tutorial for proteome data analysis using the Perseus software platform Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information

More information

Choose a topic from the left to get help for CmapTools.

Choose a topic from the left to get help for CmapTools. Using CmapTools Create a Cmap Add a Concept Create a Proposition from One Concept Create a Proposition from Existing Concepts Save a Cmap Open a Cmap Create a Folder Drag in Resources Import Resources

More information

Symprex Email Signature Manager

Symprex Email Signature Manager Symprex Email Signature Manager User's Guide Version 6..1. Copyright 014 Symprex Limited. All Rights Reserved. Contents Chapter 1 1 Introduction System Requirements 3 Installing Email Signature Manager

More information

Student Guide to SPSS Barnard College Department of Biological Sciences

Student Guide to SPSS Barnard College Department of Biological Sciences Student Guide to SPSS Barnard College Department of Biological Sciences Dan Flynn Table of Contents Introduction... 2 Basics... 4 Starting SPSS... 4 Navigating... 4 Data Editor... 5 SPSS Viewer... 6 Getting

More information

LogMeIn Backup User Guide

LogMeIn Backup User Guide LogMeIn Backup User Guide Contents About LogMeIn Backup...4 Getting Started with LogMeIn Backup...5 How does LogMeIn Backup Work, at-a-glance?...5 About Security in LogMeIn Backup...5 LogMeIn Backup System

More information

Version 4.0. Base Handbook

Version 4.0. Base Handbook Version 4.0 Base Handbook Copyright This document is Copyright 2013 by its contributors as listed below. You may distribute it and/or modify it under the terms of either the GNU General Public License

More information

The InStat guide to choosing and interpreting statistical tests

The InStat guide to choosing and interpreting statistical tests Version 3.0 The InStat guide to choosing and interpreting statistical tests Harvey Motulsky 1990-2003, GraphPad Software, Inc. All rights reserved. Program design, manual and help screens: Programming:

More information

Pearson Inform v4.0 Educators Guide

Pearson Inform v4.0 Educators Guide Pearson Inform v4.0 Educators Guide Part Number 606 000 508 A Educators Guide v4.0 Pearson Inform First Edition (August 2005) Second Edition (September 2006) This edition applies to Release 4.0 of Inform

More information

IBM SPSS Direct Marketing 22

IBM SPSS Direct Marketing 22 IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release

More information

The KMyMoney Handbook

The KMyMoney Handbook The KMyMoney Handbook for KMyMoney version 1.0 Michael T. Edwardes Thomas Baumgart Ace Jones Tony Bloomfield Robert Wadley Darin Strait Roger Lum Revision 1.00.00 (2009-08-10) Copyright 2000, 2001, 2003,

More information

ACTi NVR User s Manual. Version 2.3.05 2012/10/12

ACTi NVR User s Manual. Version 2.3.05 2012/10/12 ACTi NVR User s Manual Version 2.3.05 2012/10/12 Table of Contents 1 Overview 6 ACTi NVR Architecture... 6 ACTi NVR Server... 6 ACTi NVR Workstation... 7 NVR Web Client... 7 Hardware System Requirements...

More information