Minería de Datos ANALISIS DE UN SET DE DATOS.! Visualization Techniques! Combined Graph! Charts and Pies! Search for specific functions

Size: px
Start display at page:

Download "Minería de Datos ANALISIS DE UN SET DE DATOS.! Visualization Techniques! Combined Graph! Charts and Pies! Search for specific functions"

Transcription

1 Minería de Datos ANALISIS DE UN SET DE DATOS! Visualization Techniques! Combined Graph! Charts and Pies! Search for specific functions

2 Data Mining on the DAG ü When working with large datasets, annotation results need to be summarized ü The DAG provides visualization of annotation data within its biological context ü In Blast2GO --> Combined Graph Function

3 Combined Graph Each term has a number of sequences associated Node shape to differentiate between direct and indirect annotation Each term is displayed around its biological context Nodes can be coloured to indicate relevance

4 Combined Graph Different GO branches Reduces nodes by number of annotate sequences Node data to be displayed Criterion for highlighting and filtering nodes

5 Combined Graph Let's paint the DAG of the dataset analized yesterday (1000 sequences) Too many nodes!!! Need way to find relevant information

6 Node Information Content Accumulated by node (Sequence Count) Incomming information (Node Score)

7 Node score We compute a node score that reflects the amount of direct information at the node

8 Node score GO4 2.5 dist=0 dist=2 GO dist=2 α = 0.6 dist=1 GO1 1 GO2 3 dist=1 dist=0 dist=0 1 3 NodeScore (GO1) = 1 * = 1 NodeScore (GO2) = 3 * = 3 NodeScore (GO3) = 1 * * = = 2.4 NodeScore (GO4) = 1 * * * = = 2.5

9 Node score vs Annotation score DO NOT MIX-UP!!!!! ROOT 2.5 GO1 GO child seq GO hit1 GO child GO hit2 hit3 1 3 Annotation Score: - In annotation context - Relates to Blast results of ONE sequence Node Score: - In data-mining context - Relates to analysis of a GROUP of sequences AS = max{%sim * ECw]}+ (#TPR_GOs-1) * GOw

10 Filtered Graph # Filtered Nodes Transition nodes Direct annotations

11 Compacting Graphs by GOSlim

12 Show node content

13 Save as picture and as txt Saving Options

14 Graph Charts

15 Graph Charts Sequence Distribution/GO as Bar-Chart Sequence Distribution/GO as Level-Pie (level selection) Sequence Distribution/GO as Multilevel-Pie (#score or #seq cutoff)

16 Multilevel vs. GO-Slim Chart Multi-level Pie with a sequence filter of 20 GO-Slim: Handy to summarize functional content

17 Use DAG to analyze a function DAG can be used to make queries on general concepts without direct annotations How many sequences are annotated to the function photosynthesis? Option 1: Find in the GO graph à direct & indirect annotation Option 2: Find through the Select function. Two sub options Option 2.1. Direct annotation (use GOid or description) Option 2.2. Direct&indirect (use GOid and include GO parents )

18 Example: analyze a specific function export search Find a function on the graph

19 Example: analyze a specific function Select all sequences annotated to this function and its descendents

20 Example: analyze a specific function Locate these sequences

21 Example: analyze a specific function Exporting the sequence table you can see all Sequences annotated to a given function (GO) Explore the annotation diversity of a given function within the graph

22 Conclusions ü DAGs are interesting for browsing functional annotation but can be too large ü With filtering and pruning options you can create more navigable DAGs ü Pies are good to compact information: try out levels ü GO-Slim compacts to more equivalent terms than filtering the GO ü You can use the DAG to query on general terms

23 Minería de Datos ANALISIS DE VARIOS SETS DE DATOS! Functional Enrichment! Enriched Graphs! Meta-analysis

24 Enrichment Analysis Interpretation of a large list of genes: which are relevant functions? One Gene List (A) The other list (B) Are this two groups of genes carrying out different biological roles???? Biosynthesis 54% Biosynthesis 18%??? Sporulation 18% Sporulation 27% Are these differences statistically significant?

25 Fisher's Exact Test One Gene List (A) The other list (B) Biosynthesis 54% Biosynthesis 18% Sporulation 18% Sporulation 27% Contingency table A B A B Biosynthesis 6 2 Sporulation 2 3 No biosynthesis 5 9 No sporulation 9 8 p-value for biosynthesis < 0.05 p-value for sporulation > 0.05

26 Multiple testing correction We do this for all GO term of our dataset!!! Many tests => Many false positive => We need correction! FDR control is a statistical method used in multiple hypothesis testing to correct for multiple comparisons. In a list of rejected hypotheses, FDR controls the expected proportion of incorrectly rejected null hypotheses. FWER control: The familywise error rate is the probability of making one or more false discoveries among all the hypotheses when performing multiple pairwise tests. (more conservative)

27 Fisher s Exact Test in Blast2GO Test-set Ref-set GO No GO A 2 9 B 3 8 Three files:! Blast2GO project with annotations (.dat/.annot)! One txt file with IDs: Test-set (.txt)! Other txt file with IDs: Ref-set (.txt)

28 Different types of comparisons Compare one condition against another Remove Common Ids Test and Ref-Set are interchangeable Compare a subset against the total Gossip default setting Test and Ref-Set are NOT interchangeable Common IDs Set 1 Set 2 Test- Set Common IDs Ref- Set Ref- Set Common IDs Test- Set

29 FET in Blast2GO Two-Tailed test not only identifies over but also under represented functions. If no Ref-Set is chosen all annotations are used as reference

30 Enrichment Results Result table with link outs to sequence lists

31 Most specific terms Retains only the lowest, most specific enriched term per GO branch

32 Enriched Graph View enriched terms data as DAG graphs! reduce => To draw all nodes, set filter to 1

33 Bar-Chart Export enriched terms as chart! => Filter results % of sequences in Test group % of sequences in Ref group If Test > Ref = overexpressed If Ref > Test = underexpressed

34 Meta-analysis in Blast2GO Annotation Result (.annot) Sequence_1 GO: Sequence_1 GO: Sequence_1 GO: Sequence_2 GO: Sequence_2 GO: Sequence_2 GO: Equivalent formats ó Enrichment Result Treatment_1 GO: Treatment_1 GO: Treatment_1 GO: Enrichment Result (.annot) By joining different functional enrichment results we can create and annotation file of conditions that capture their functional profile Treatment_1 GO: Treatment_1 GO: Treatment_1 GO: Treatment_2 GO: Treatment_2 GO: Treatment_2 GO:

35 Meta-analysis in Blast2GO FIND SIMILARITIES BETWEEN TREATMENTS Use seq names to see treatments Use color by SeqCount

36 Meta-analysis in Blast2GO DISPLAY FUNCTIONAL DISSIMILARITIES ON DAG Use second column number for color

37 Ejercicios: Minería de Datos

Tutorial for proteome data analysis using the Perseus software platform

Tutorial for proteome data analysis using the Perseus software platform Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information

More information

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses Introduction to Hypothesis Testing 1 Hypothesis Testing A hypothesis test is a statistical procedure that uses sample data to evaluate a hypothesis about a population Hypothesis is stated in terms of the

More information

Course on Functional Analysis. ::: Gene Set Enrichment Analysis - GSEA -

Course on Functional Analysis. ::: Gene Set Enrichment Analysis - GSEA - Course on Functional Analysis ::: Madrid, June 31st, 2007. Gonzalo Gómez, PhD. ggomez@cnio.es Bioinformatics Unit CNIO ::: Contents. 1. Introduction. 2. GSEA Software 3. Data Formats 4. Using GSEA 5. GSEA

More information

MultiExperiment Viewer Quickstart Guide

MultiExperiment Viewer Quickstart Guide MultiExperiment Viewer Quickstart Guide Table of Contents: I. Preface - 2 II. Installing MeV - 2 III. Opening a Data Set - 2 IV. Filtering - 6 V. Clustering a. HCL - 8 b. K-means - 11 VI. Modules a. T-test

More information

Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable

Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable Application: This statistic has two applications that can appear very different,

More information

Package empiricalfdr.deseq2

Package empiricalfdr.deseq2 Type Package Package empiricalfdr.deseq2 May 27, 2015 Title Simulation-Based False Discovery Rate in RNA-Seq Version 1.0.3 Date 2015-05-26 Author Mikhail V. Matz Maintainer Mikhail V. Matz

More information

7. Data Packager: Sharing and Merging Data

7. Data Packager: Sharing and Merging Data 7. Data Packager: Sharing and Merging Data Introduction The Epi Info Data Packager tool provides an easy way to share data with other users or to merge data collected by multiple users into a single database

More information

Correlational Research

Correlational Research Correlational Research Chapter Fifteen Correlational Research Chapter Fifteen Bring folder of readings The Nature of Correlational Research Correlational Research is also known as Associational Research.

More information

Blast2GO PRO Plug-in User Manual

Blast2GO PRO Plug-in User Manual Blast2GO PRO Plug-in User Manual CLC bio Genomics Workbench and Main Workbench Version 1.1.0 October 2013 BioBam Bioinformatics S.L. Valencia, Spain Contents Introduction 1 Quick-Start 2 Blast2GO PRO Plug-in

More information

1 Why is multiple testing a problem?

1 Why is multiple testing a problem? Spring 2008 - Stat C141/ Bioeng C141 - Statistics for Bioinformatics Course Website: http://www.stat.berkeley.edu/users/hhuang/141c-2008.html Section Website: http://www.stat.berkeley.edu/users/mgoldman

More information

Statistical issues in the analysis of microarray data

Statistical issues in the analysis of microarray data Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data

More information

Exercise with Gene Ontology - Cytoscape - BiNGO

Exercise with Gene Ontology - Cytoscape - BiNGO Exercise with Gene Ontology - Cytoscape - BiNGO This practical has material extracted from http://www.cbs.dtu.dk/chipcourse/exercises/ex_go/goexercise11.php In this exercise we will analyze microarray

More information

Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA. Analysis Of Variance Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

More information

Package copa. R topics documented: August 9, 2016

Package copa. R topics documented: August 9, 2016 Package August 9, 2016 Title Functions to perform cancer outlier profile analysis. Version 1.41.0 Date 2006-01-26 Author Maintainer COPA is a method to find genes that undergo

More information

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing Chapter 8 Hypothesis Testing 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim About a Proportion 8-5 Testing a Claim About a Mean: s Not Known 8-6 Testing

More information

Frequently Asked Questions Next Generation Sequencing

Frequently Asked Questions Next Generation Sequencing Frequently Asked Questions Next Generation Sequencing Import These Frequently Asked Questions for Next Generation Sequencing are some of the more common questions our customers ask. Questions are divided

More information

Shark Talent Management System Performance Reports

Shark Talent Management System Performance Reports Shark Talent Management System Performance Reports Goals Reports Goal Details Report. Page 2 Goal Exception Report... Page 4 Goal Hierarchy Report. Page 6 Goal Progress Report.. Page 8 Goal Status Report...

More information

the online and local desktop version of the Pathway tools) (2) and genome overview chart of

the online and local desktop version of the Pathway tools) (2) and genome overview chart of 1 Supplementary Figure S1. The Omics Viewer description. (1) the cellular overview (available in the online and local desktop version of the Pathway tools) (2) and genome overview chart of the maize gene

More information

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS The Islamic University of Gaza Faculty of Commerce Department of Economics and Political Sciences An Introduction to Statistics Course (ECOE 130) Spring Semester 011 Chapter 10- TWO-SAMPLE TESTS Practice

More information

Package dunn.test. January 6, 2016

Package dunn.test. January 6, 2016 Version 1.3.2 Date 2016-01-06 Package dunn.test January 6, 2016 Title Dunn's Test of Multiple Comparisons Using Rank Sums Author Alexis Dinno Maintainer Alexis Dinno

More information

False Discovery Rates

False Discovery Rates False Discovery Rates John D. Storey Princeton University, Princeton, USA January 2010 Multiple Hypothesis Testing In hypothesis testing, statistical significance is typically based on calculations involving

More information

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7. THERE ARE TWO WAYS TO DO HYPOTHESIS TESTING WITH STATCRUNCH: WITH SUMMARY DATA (AS IN EXAMPLE 7.17, PAGE 236, IN ROSNER); WITH THE ORIGINAL DATA (AS IN EXAMPLE 8.5, PAGE 301 IN ROSNER THAT USES DATA FROM

More information

Database Searching Tutorial/Exercises Jimmy Eng

Database Searching Tutorial/Exercises Jimmy Eng Database Searching Tutorial/Exercises Jimmy Eng Use the PETUNIA interface to run a search and generate a pepxml file that is analyzed through the PepXML Viewer. This tutorial will walk you through the

More information

Two Correlated Proportions (McNemar Test)

Two Correlated Proportions (McNemar Test) Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with

More information

DDBA 8438: Introduction to Hypothesis Testing Video Podcast Transcript

DDBA 8438: Introduction to Hypothesis Testing Video Podcast Transcript DDBA 8438: Introduction to Hypothesis Testing Video Podcast Transcript JENNIFER ANN MORROW: Welcome to "Introduction to Hypothesis Testing." My name is Dr. Jennifer Ann Morrow. In today's demonstration,

More information

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935)

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935) Section 7.1 Introduction to Hypothesis Testing Schrodinger s cat quantum mechanics thought experiment (1935) Statistical Hypotheses A statistical hypothesis is a claim about a population. Null hypothesis

More information

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing. Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

HYPOTHESIS TESTING WITH SPSS:

HYPOTHESIS TESTING WITH SPSS: HYPOTHESIS TESTING WITH SPSS: A NON-STATISTICIAN S GUIDE & TUTORIAL by Dr. Jim Mirabella SPSS 14.0 screenshots reprinted with permission from SPSS Inc. Published June 2006 Copyright Dr. Jim Mirabella CHAPTER

More information

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation Parkland College A with Honors Projects Honors Program 2014 Calculating P-Values Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating P-Values" (2014). A with Honors Projects.

More information

Introduction to SAGEnhaft

Introduction to SAGEnhaft Introduction to SAGEnhaft Tim Beissbarth October 13, 2015 1 Overview Serial Analysis of Gene Expression (SAGE) is a gene expression profiling technique that estimates the abundance of thousands of gene

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

From Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNA-seq data

From Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNA-seq data From Reads to Differentially Expressed Genes The statistics of differential gene expression analysis using RNA-seq data experimental design data collection modeling statistical testing biological heterogeneity

More information

Gene Expression Analysis

Gene Expression Analysis Gene Expression Analysis Jie Peng Department of Statistics University of California, Davis May 2012 RNA expression technologies High-throughput technologies to measure the expression levels of thousands

More information

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

More information

Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6

Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Overview This tutorial outlines how microrna data can be analyzed within Partek Genomics Suite. Additionally,

More information

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420

BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420 BA 275 Review Problems - Week 6 (10/30/06-11/3/06) CD Lessons: 53, 54, 55, 56 Textbook: pp. 394-398, 404-408, 410-420 1. Which of the following will increase the value of the power in a statistical test

More information

How Does My TI-84 Do That

How Does My TI-84 Do That How Does My TI-84 Do That A guide to using the TI-84 for statistics Austin Peay State University Clarksville, Tennessee How Does My TI-84 Do That A guide to using the TI-84 for statistics Table of Contents

More information

Gene expression analysis. Ulf Leser and Karin Zimmermann

Gene expression analysis. Ulf Leser and Karin Zimmermann Gene expression analysis Ulf Leser and Karin Zimmermann Ulf Leser: Bioinformatics, Wintersemester 2010/2011 1 Last lecture What are microarrays? - Biomolecular devices measuring the transcriptome of a

More information

How to create and interpret the predictive analysis of a compound

How to create and interpret the predictive analysis of a compound How to create and interpret the predictive analysis of a compound Platform with suite of tools Predict & understand biological effects of small molecules & compounds Predict targets and metabolites, potential

More information

QAD Usability Customization Demo

QAD Usability Customization Demo QAD Usability Customization Demo Overview This demonstration focuses on one aspect of QAD Enterprise Applications Customization and shows how this functionality supports the vision of the Effective Enterprise;

More information

Two-Way ANOVA tests. I. Definition and Applications...2. II. Two-Way ANOVA prerequisites...2. III. How to use the Two-Way ANOVA tool?...

Two-Way ANOVA tests. I. Definition and Applications...2. II. Two-Way ANOVA prerequisites...2. III. How to use the Two-Way ANOVA tool?... Two-Way ANOVA tests Contents at a glance I. Definition and Applications...2 II. Two-Way ANOVA prerequisites...2 III. How to use the Two-Way ANOVA tool?...3 A. Parametric test, assume variances equal....4

More information

Important Tips when using Ad Hoc

Important Tips when using Ad Hoc 1 Parkway School District Infinite Campus Ad Hoc Training Manual Important Tips when using Ad Hoc On the Ad Hoc Query Wizard screen when you are searching for fields for your query please make sure to

More information

HYPOTHESIS TESTING: POWER OF THE TEST

HYPOTHESIS TESTING: POWER OF THE TEST HYPOTHESIS TESTING: POWER OF THE TEST The first 6 steps of the 9-step test of hypothesis are called "the test". These steps are not dependent on the observed data values. When planning a research project,

More information

Rapid alignment methods: FASTA and BLAST. p The biological problem p Search strategies p FASTA p BLAST

Rapid alignment methods: FASTA and BLAST. p The biological problem p Search strategies p FASTA p BLAST Rapid alignment methods: FASTA and BLAST p The biological problem p Search strategies p FASTA p BLAST 257 BLAST: Basic Local Alignment Search Tool p BLAST (Altschul et al., 1990) and its variants are some

More information

Introduction to Hypothesis Testing

Introduction to Hypothesis Testing I. Terms, Concepts. Introduction to Hypothesis Testing A. In general, we do not know the true value of population parameters - they must be estimated. However, we do have hypotheses about what the true

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Statistiek II. John Nerbonne. October 1, 2010. Dept of Information Science j.nerbonne@rug.nl

Statistiek II. John Nerbonne. October 1, 2010. Dept of Information Science j.nerbonne@rug.nl Dept of Information Science j.nerbonne@rug.nl October 1, 2010 Course outline 1 One-way ANOVA. 2 Factorial ANOVA. 3 Repeated measures ANOVA. 4 Correlation and regression. 5 Multiple regression. 6 Logistic

More information

USING MYWEBSQL FIGURE 1: FIRST AUTHENTICATION LAYER (ENTER YOUR REGULAR SIMMONS USERNAME AND PASSWORD)

USING MYWEBSQL FIGURE 1: FIRST AUTHENTICATION LAYER (ENTER YOUR REGULAR SIMMONS USERNAME AND PASSWORD) USING MYWEBSQL MyWebSQL is a database web administration tool that will be used during LIS 458 & CS 333. This document will provide the basic steps for you to become familiar with the application. 1. To

More information

Navigating Through SpamTitan

Navigating Through SpamTitan Navigating Through SpamTitan Table of Contents Access SpamTitan How to Create/Edit Whitelist Whitelist (Sender Domain) Whitelist (Sender E-mail) Whitelist (Import Text) How to Create/Edit Blacklist Blacklist

More information

An introduction to IBM SPSS Statistics

An introduction to IBM SPSS Statistics An introduction to IBM SPSS Statistics Contents 1 Introduction... 1 2 Entering your data... 2 3 Preparing your data for analysis... 10 4 Exploring your data: univariate analysis... 14 5 Generating descriptive

More information

Mining Social Network Graphs

Mining Social Network Graphs Mining Social Network Graphs Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014 Social Network No introduc+on required Really? We s7ll need to understand

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Intelligent Process Management & Process Visualization. TAProViz 2014 workshop. Presenter: Dafna Levy

Intelligent Process Management & Process Visualization. TAProViz 2014 workshop. Presenter: Dafna Levy Intelligent Process Management & Process Visualization TAProViz 2014 workshop Presenter: Dafna Levy The Topics Process Visualization in Priority ERP Planning Execution BI analysis (Built-in) Discovering

More information

CRISP-DM, which stands for Cross-Industry Standard Process for Data Mining, is an industry-proven way to guide your data mining efforts.

CRISP-DM, which stands for Cross-Industry Standard Process for Data Mining, is an industry-proven way to guide your data mining efforts. CRISP-DM, which stands for Cross-Industry Standard Process for Data Mining, is an industry-proven way to guide your data mining efforts. As a methodology, it includes descriptions of the typical phases

More information

MicroStrategy Desktop

MicroStrategy Desktop MicroStrategy Desktop Quick Start Guide MicroStrategy Desktop is designed to enable business professionals like you to explore data, simply and without needing direct support from IT. 1 Import data from

More information

Reporting with Pentaho. Gabriele Pozzani

Reporting with Pentaho. Gabriele Pozzani Reporting with Pentaho Gabriele Pozzani A key feature Reporting is a key feature for a BI solution Used and delivered contents consist of Reporting 75-80% Analytical tools for OLAP 15-20% Data mining tools

More information

Using Excel in Research. Hui Bian Office for Faculty Excellence

Using Excel in Research. Hui Bian Office for Faculty Excellence Using Excel in Research Hui Bian Office for Faculty Excellence Data entry in Excel Directly type information into the cells Enter data using Form Command: File > Options 2 Data entry in Excel Tool bar:

More information

Visualization methods for patent data

Visualization methods for patent data Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes

More information

VISUALIZING HIERARCHICAL DATA. Graham Wills SPSS Inc., http://willsfamily.org/gwills

VISUALIZING HIERARCHICAL DATA. Graham Wills SPSS Inc., http://willsfamily.org/gwills VISUALIZING HIERARCHICAL DATA Graham Wills SPSS Inc., http://willsfamily.org/gwills SYNONYMS Hierarchical Graph Layout, Visualizing Trees, Tree Drawing, Information Visualization on Hierarchies; Hierarchical

More information

Hypothesis Testing --- One Mean

Hypothesis Testing --- One Mean Hypothesis Testing --- One Mean A hypothesis is simply a statement that something is true. Typically, there are two hypotheses in a hypothesis test: the null, and the alternative. Null Hypothesis The hypothesis

More information

EXCEL PIVOT TABLE David Geffen School of Medicine, UCLA Dean s Office Oct 2002

EXCEL PIVOT TABLE David Geffen School of Medicine, UCLA Dean s Office Oct 2002 EXCEL PIVOT TABLE David Geffen School of Medicine, UCLA Dean s Office Oct 2002 Table of Contents Part I Creating a Pivot Table Excel Database......3 What is a Pivot Table...... 3 Creating Pivot Tables

More information

Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics

Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics Analysis of Data Claudia J. Stanny PSY 67 Research Design Organizing Data Files in SPSS All data for one subject entered on the same line Identification data Between-subjects manipulations: variable to

More information

Final Project Report

Final Project Report CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

More information

5/31/2013. Chapter 8 Hypothesis Testing. Hypothesis Testing. Hypothesis Testing. Outline. Objectives. Objectives

5/31/2013. Chapter 8 Hypothesis Testing. Hypothesis Testing. Hypothesis Testing. Outline. Objectives. Objectives C H 8A P T E R Outline 8 1 Steps in Traditional Method 8 2 z Test for a Mean 8 3 t Test for a Mean 8 4 z Test for a Proportion 8 6 Confidence Intervals and Copyright 2013 The McGraw Hill Companies, Inc.

More information

Gene Expression Macro Version 1.1

Gene Expression Macro Version 1.1 Gene Expression Macro Version 1.1 Instructions Rev B 1 Bio-Rad Gene Expression Macro Users Guide 2004 Bio-Rad Laboratories Table of Contents: Introduction..................................... 3 Opening

More information

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test Experimental Design Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 To this point in the semester, we have largely

More information

Advanced Excel Charts : Tables : Pivots : Macros

Advanced Excel Charts : Tables : Pivots : Macros Advanced Excel Charts : Tables : Pivots : Macros Charts In Excel, charts are a great way to visualize your data. However, it is always good to remember some charts are not meant to display particular types

More information

When to use Excel. When NOT to use Excel 9/24/2014

When to use Excel. When NOT to use Excel 9/24/2014 Analyzing Quantitative Assessment Data with Excel October 2, 2014 Jeremy Penn, Ph.D. Director When to use Excel You want to quickly summarize or analyze your assessment data You want to create basic visual

More information

ecw Weekly Users Tip: My Settings: Template-Friendly Settings & My Favorites: Templates

ecw Weekly Users Tip: My Settings: Template-Friendly Settings & My Favorites: Templates ecw Weekly Users Tip: My Settings: Template-Friendly Settings & My Favorites: Templates Templates, regardless of how basic or how comprehensive, can make your notes overwhelming and visually harder to

More information

Monitoring Replication

Monitoring Replication Monitoring Replication Article 1130112-02 Contents Summary... 3 Monitor Replicator Page... 3 Summary... 3 Status... 3 System Health... 4 Replicator Configuration... 5 Replicator Health... 6 Local Package

More information

Ad Hoc Advanced Table of Contents

Ad Hoc Advanced Table of Contents Ad Hoc Advanced Table of Contents Functions... 1 Adding a Function to the Adhoc Query:... 1 Constant... 2 Coalesce... 4 Concatenate... 6 Add/Subtract... 7 Logical Expressions... 8 Creating a Logical Expression:...

More information

Package RDAVIDWebService

Package RDAVIDWebService Type Package Package RDAVIDWebService November 17, 2015 Title An R Package for retrieving data from DAVID into R objects using Web Services API. Version 1.9.0 Date 2014-04-15 Author Cristobal Fresno and

More information

Real Time Plant Monitor Brief

Real Time Plant Monitor Brief Real Time Plant Monitor Brief 1 P a g e Overview The Real Time Plant Monitor (RTPM) is a software solution providing multi-user access to Plant Production Data, leveraging the data produced and stored

More information

Reporting Student Progress and Achievement

Reporting Student Progress and Achievement Reporting Student Progress and Achievement CompassLearning Odyssey Manager takes pride in the quality of its product content. However, technical inaccuracies, typographical errors, and editorial omissions

More information

PANTHER User Manual. For PANTHER 9.0. Date: January 7, 2015. The PANTHER Team. Authors:

PANTHER User Manual. For PANTHER 9.0. Date: January 7, 2015. The PANTHER Team. Authors: PANTHER User Manual For PANTHER 9.0 Date: January 7, 2015 Authors: The PANTHER Team Contents 1 Welcome to PANTHER System 1 1.1 About this document........... 1 1.2 How to cite PANTHER.......... 1 1.3 PANTHER

More information

Non-Inferiority Tests for Two Proportions

Non-Inferiority Tests for Two Proportions Chapter 0 Non-Inferiority Tests for Two Proportions Introduction This module provides power analysis and sample size calculation for non-inferiority and superiority tests in twosample designs in which

More information

Data Mining Techniques Chapter 6: Decision Trees

Data Mining Techniques Chapter 6: Decision Trees Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................

More information

Microsoft Access 2010 Overview of Basics

Microsoft Access 2010 Overview of Basics Opening Screen Access 2010 launches with a window allowing you to: create a new database from a template; create a new template from scratch; or open an existing database. Open existing Templates Create

More information

To launch the Microsoft Excel program, locate the Microsoft Excel icon, and double click.

To launch the Microsoft Excel program, locate the Microsoft Excel icon, and double click. EDIT202 Spreadsheet Lab Assignment Guidelines Getting Started 1. For this lab you will modify a sample spreadsheet file named Starter- Spreadsheet.xls which is available for download from the Spreadsheet

More information

When a variable is assigned as a Process Initialization variable its value is provided at the beginning of the process.

When a variable is assigned as a Process Initialization variable its value is provided at the beginning of the process. In this lab you will learn how to create and use variables. Variables are containers for data. Data can be passed into a job when it is first created (Initialization data), retrieved from an external source

More information

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175)

The Big Picture. Describing Data: Categorical and Quantitative Variables Population. Descriptive Statistics. Community Coalitions (n = 175) Describing Data: Categorical and Quantitative Variables Population The Big Picture Sampling Statistical Inference Sample Exploratory Data Analysis Descriptive Statistics In order to make sense of data,

More information

PREDA S4-classes. Francesco Ferrari October 13, 2015

PREDA S4-classes. Francesco Ferrari October 13, 2015 PREDA S4-classes Francesco Ferrari October 13, 2015 Abstract This document provides a description of custom S4 classes used to manage data structures for PREDA: an R package for Position RElated Data Analysis.

More information

Methods for network visualization and gene enrichment analysis July 17, 2013. Jeremy Miller Scientist I jeremym@alleninstitute.org

Methods for network visualization and gene enrichment analysis July 17, 2013. Jeremy Miller Scientist I jeremym@alleninstitute.org Methods for network visualization and gene enrichment analysis July 17, 2013 Jeremy Miller Scientist I jeremym@alleninstitute.org Outline Visualizing networks using R Visualizing networks using outside

More information

MicroStrategy Analytics Express User Guide

MicroStrategy Analytics Express User Guide MicroStrategy Analytics Express User Guide Analyzing Data with MicroStrategy Analytics Express Version: 4.0 Document Number: 09770040 CONTENTS 1. Getting Started with MicroStrategy Analytics Express Introduction...

More information

Data Visualization. Prepared by Francisco Olivera, Ph.D., Srikanth Koka Department of Civil Engineering Texas A&M University February 2004

Data Visualization. Prepared by Francisco Olivera, Ph.D., Srikanth Koka Department of Civil Engineering Texas A&M University February 2004 Data Visualization Prepared by Francisco Olivera, Ph.D., Srikanth Koka Department of Civil Engineering Texas A&M University February 2004 Contents Brief Overview of ArcMap Goals of the Exercise Computer

More information

Hypothesis Testing. Reminder of Inferential Statistics. Hypothesis Testing: Introduction

Hypothesis Testing. Reminder of Inferential Statistics. Hypothesis Testing: Introduction Hypothesis Testing PSY 360 Introduction to Statistics for the Behavioral Sciences Reminder of Inferential Statistics All inferential statistics have the following in common: Use of some descriptive statistic

More information

Didacticiel - Études de cas

Didacticiel - Études de cas 1 Topic Linear Discriminant Analysis Data Mining Tools Comparison (Tanagra, R, SAS and SPSS). Linear discriminant analysis is a popular method in domains of statistics, machine learning and pattern recognition.

More information

Analyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study

Analyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study Analyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study The data for this study is taken from experiment GSE848 from the Gene Expression

More information

How do I view and download reports?

How do I view and download reports? How do I view and download reports? There are 2 key areas in the reporting suite: Overview & Detailed. Overview Reports Providing you with volume and value summaries by account and product. Detailed Reports

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

org.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers.

org.rn.eg.db December 16, 2015 org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank accession numbers. org.rn.eg.db December 16, 2015 org.rn.egaccnum Map Entrez Gene identifiers to GenBank Accession Numbers org.rn.egaccnum is an R object that contains mappings between Entrez Gene identifiers and GenBank

More information

Package cpm. July 28, 2015

Package cpm. July 28, 2015 Package cpm July 28, 2015 Title Sequential and Batch Change Detection Using Parametric and Nonparametric Methods Version 2.2 Date 2015-07-09 Depends R (>= 2.15.0), methods Author Gordon J. Ross Maintainer

More information

Difference of Means and ANOVA Problems

Difference of Means and ANOVA Problems Difference of Means and Problems Dr. Tom Ilvento FREC 408 Accounting Firm Study An accounting firm specializes in auditing the financial records of large firm It is interested in evaluating its fee structure,particularly

More information

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS DATABASE MARKETING Fall 2015, max 24 credits Dead line 15.10. ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS PART A Gains chart with excel Prepare a gains chart from the data in \\work\courses\e\27\e20100\ass4b.xls.

More information

This chapter introduces you to Microso2 Office Access 2013. The chapter focuses on what a database is, the components of a database, what a database

This chapter introduces you to Microso2 Office Access 2013. The chapter focuses on what a database is, the components of a database, what a database This chapter introduces you to Microso2 Office Access 2013. The chapter focuses on what a database is, the components of a database, what a database can do and how to create a database. 1 The objecaves

More information

Multiple-Comparison Procedures

Multiple-Comparison Procedures Multiple-Comparison Procedures References A good review of many methods for both parametric and nonparametric multiple comparisons, planned and unplanned, and with some discussion of the philosophical

More information

Online 12 - Sections 9.1 and 9.2-Doug Ensley

Online 12 - Sections 9.1 and 9.2-Doug Ensley Student: Date: Instructor: Doug Ensley Course: MAT117 01 Applied Statistics - Ensley Assignment: Online 12 - Sections 9.1 and 9.2 1. Does a P-value of 0.001 give strong evidence or not especially strong

More information

How To Test For Significance On A Data Set

How To Test For Significance On A Data Set Non-Parametric Univariate Tests: 1 Sample Sign Test 1 1 SAMPLE SIGN TEST A non-parametric equivalent of the 1 SAMPLE T-TEST. ASSUMPTIONS: Data is non-normally distributed, even after log transforming.

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information