STATISTICAL MODELS AND ISSUES IN THE ANALYSIS OF NETWORK DATA
|
|
|
- Gary Malone
- 9 years ago
- Views:
Transcription
1 STATISTICAL MODELS AND ISSUES IN THE ANALYSIS OF NETWORK DATA George Michailidis Department of Statistics, University of Michigan gmichail Plenary Talk UP-STAT 13 Rochester Institute of Technology, April 2013
2 WHY NETWORKS? Relative small field of study until late 1990s Explosive growth of interest and work on networks from 2000 forward Main factors: Development of high-throughput technologies Systems level perspective in science New modeling techniques and computational advances
3 SOME RECENT DEVELOPMENTS I Rapid increase in publications
4 SOME RECENT DEVELOPMENTS II New books, courses and journals
5 SOME RECENT DEVELOPMENTS III Dedicated workshops
6 WHAT IS A NETWORK? A collection of interconnected entities Mathematically, it is convenient to represent it as a graph G = (V,E), where V denotes the set of nodes (vertices) and E the set of edges
7 EXAMPLES OF NETWORKS Networks have become an integral tool for addressing diverse problems in a number of scientific fields. For example: Technological (e.g. communications, transportation, energy, sensor) Biological (e.g. gene regulation, protein interactions, predator-prey relations) Social (e.g. friendship, , trade flows) Informational (e.g. Web, Twitter, peer-to-peer)
8 STATISTICS AND NETWORK ANALYSIS - Network Analysis has attracted participants from diverse scientific fields, including information scientists, statisticians, mathematicians, applied physicists, complex systems theorists,... - Uneven developments - Some topics rediscover and employ established techniques, others require new models and tools - Nevertheless, some key topics have merged that are truly statistical in nature
9 STATISTICS AND NETWORK ANALYSIS Network characterization (e.g. importance of nodes, identification of network communities, properties of the degree distribution) Network sampling - novel sampling schemes to construct a network; e.g. induced subgraph sampling, incident subgraph sampling, snowball sampling, link tracing Network inference - identify the topology from data; e.g. link prediction, graphical modeling Network dynamics: (i) stochastic processes (flows) on graphs, (ii) evolution of graphs over time Network visualization Time-varying networks Incorporation of network information in statistical inference
10 NETWORK INFERENCE BASED ON GRAPHICAL MODELS Some background on graphical models Represent conditional independence relationships between a set of random variables No edge between X j and X j X j is independent of X j conditional on all other variables Typically, estimated from a set of n iid observations on p variables
11 EXAMPLE 1: TEXT MINING address mail phone offic inform gener develop project work time student graduat system includ program research group interest comput engin public fax home page link web scienc univers depart
12 EXAMPLE 2: ROLL CALL DATA Akaka Alexander Allard Allen Baucus Bayh Bennett Biden Bingaman Bond Boxer Brownback Bunning Burns Burr Byrd Cantwell Carper Chafee Chambliss Clinton Coburn Cochran Coleman Collins Conrad Cornyn Corzine Craig Crapo Dayton DeMint DeWine Dodd Dole Domenici Dorgan Durbin Ensign Enzi Feingold Feinstein Frist Graham Grassley Gregg Hagel Harkin Hatch Hutchison Inhofe Inouye Isakson Jeffords Johnson Kennedy Kerry Kohl Kyl Landrieu Lautenberg Leahy Levin Lieberman Lincoln Lott Lugar Martinez McCain McConnell Mikulski Murkowski Murray Nelson Nelson Obama Pryor Reed Reid Roberts Rockefeller Salazar Santorum Sarbanes Schumer Sessions Shelby Smith Snowe Specter Stabenow Stevens Sununu Talent Thomas Thune Vitter Voinovich Warner Wyden
13 EXAMPLE 3: GENE NETWORKS
14 GAUSSIAN GRAPHICAL MODELS X 1,...,X p jointly follow N(0,Σ) Dependence structure fully characterized by the covariance structure Let ρ j,j = cor(x j,x j others) denote the partial correlation. PARTIAL CORRELATION Nodes j and j are connected ρ j,j 0
15 GAUSSIAN GRAPHICAL MODELS (CTD) INVERSE COVARIANCE MATRIX Let Ω = Σ 1 denote the inverse covariance matrix. We have ρ j,j ω j,j. 1 ρ 13 3 ω 1,1 0 ω 1,3 ω 1,4 0 ρ 35 0 ω 2,2 0 ω 2,4 0 ρ ρ Ω = ω 3,1 0 ω 3,3 ω 3,4 ω 3,5 ω 4,1 ω 4,2 ω 4,3 ω 4,4 0 2 ρ ω 5,3 0 ω 5,5 Hence, estimating Gaussian graphical model Estimating Ω Also, estimating the graph corresponds to identifying the zeros in Ω.
16 THE CASE OF HIGH-DIMENSIONAL DATA What happens if we have few samples and many more variables? Some examples: Biological networks: samples in the hundreds (at best), molecular entities in the thousands Text mining: both documents and corpus size in the thousands, but one needs to estimate all pairwise relationships between words! Solution: impose sparsity
17 ESTIMATION OF A SPARSE INVERSE COVARIANCE MATRIX This issue was addressed in a paper by Dempster (1972) and then remained dormant for 35 years, until Meinshausen and Buhlmann (2006) developed a penalized (lasso) regression approach to solve it Since then, there have been over 100 papers looking at various modeling, computational and inference aspects of the problem
18 MAXIMUM LIKELIHOOD ESTIMATION OF A SPARSE INVERSE COVARIANCE MATRIX This goal can be accomplished by optimizing the following objective function, where Σ is the sample covariance matrix and 0 requires Ω to be positive definite max Ω 0 log(det(ω)) trace( ΣΩ) λ j j ω j,j Note that when λ = 0, Ω = ( Σ) 1
19 ILLUSTRATION OF SPARSITY AS A FUNCTION OF λ Sparse inverse covariance estimation with the graphical lasso 7
20 ILLUSTRATION: CS WEBPAGES AT CMU Faculty Project Computer Science Department Student Course
21 CS WEBPAGES AT CMU Used about 1400 webpages and focused on the 100 most frequent words Common Structure home web site fall fax spring page public mail send phone person list year select link note offic instructor problem book address topic relat hour work graduat number class professor access theori assist faculti algorithm specif time gener base student includ teach analysi develop interest associ structur data program model inform contact design project softwar languag applic system process area parallel construct implement comput recent research commun group engin member high perform paper current architectur laboratori distribut advanc lab support network studi scienc technolog introduct educ depart www univers center institut (A) Webpage site web page link home (C) Parallel programming distribut parallel system algorithm perform problem high (B) Research area/lab current research lab laboratori area member group (D) Software development softwar develop structur data program algorithm languag
22 ESTIMATED NETWORKS FOR FACULTY AND STUDENT WEBPAGES (A) Student scienc comput univers depart page research interest home inform student work offic system phone public program mail fax project engin link group graduat includ time web gener develop address paper area fall languag professor softwar teach current design applic base contact list relat recent class assist algorithm hour studi model analysi institut technolog laboratori implement introduct www number construct year faculti network center note topic process distribut instructor lab problem member person perform structur data architectur send educ associ access site spring parallel theori commun high book select support specif advanc (B) Faculty scienc comput univers depart page research interest home inform student work offic system phone public program mail fax project engin link group graduat includ time web gener develop address paper area fall languag professor softwar teach current design applic base contact list relat recent class assist algorithm hour studi model analysi institut technolog laboratori implement introduct www number construct year faculti network center note topic process distribut instructor lab problem member person perform structur data architectur send educ associ access site spring parallel theori commun high book select support specif advanc
23 INCORPORATING NETWORK INFORMATION IN STATISTICAL TESTING PROBLEMS Rationale: High-throughput techniques (sequencing, profiling) have enabled comprehensive monitoring of biological systems Analysis of high-throughput data typically yields a list of differentially expressed genes (proteins, metabolites, etc.), obtained by statistical testing for differences between two groups, for example, normal and disease or treatment and control This list has the potential to provide insight into a given biological phenomenon or phenotype, but in many cases it is hard to extract meaning from it
24 INCORPORATING NETWORK INFORMATION IN STATISTICAL TESTING PROBLEMS (CTD) Biomedical researchers in order to reduce the complexity in the data have resorted in grouping the genes into smaller sets (pathways) of related ones; e.g. according to their function The number of knowledge data bases and their content that can be used for such grouping is increasing at an accelerating pace (e.g. KEGG, GO, TRANSFAC, DIP,...)
25 PROBLEM FORMULATION Given n 1 samples for the control condition and n 2 samples for the treatment condition of expression data for p genes and the network of gene interactions (shown below), test for activation of selected subgraphs
26 A LATENT VARIABLE MODEL FORMULATION X 1 = γ 1 X 2 = ρ 12 X 1 + γ 2 = ρ 12 γ 1 + γ 2 X 3 = ρ 23 X 2 + γ 3 = ρ 23 ρ 12 γ 1 + ρ 23 γ 2 + γ 3
27 A LATENT VARIABLE MODEL FORMULATION X 1 = γ 1 Thus X = Λγ where X 2 = ρ 12 X 1 + γ 2 = ρ 12 γ 1 + γ 2 X 3 = ρ 23 X 2 + γ 3 = ρ 23 ρ 12 γ 1 + ρ 23 γ 2 + γ 3 Λ = ρ ρ 12 ρ 23 ρ 23 1
28 THE LATENT VARIABLE MODEL Let Y be the ith sample in the expression data Let Y = X + ε, with X the signal and ε N p (0,σ 2 ε I p ) the noise Define latent variables γ N p (µ,σ 2 γ I p ) Let the influence of the jth gene on the ith gene be Λ ij ; Λ = [Λ ij ] is called the Influence Matrix of the network. Y = Λγ + ε, Y N p (Λµ,σ 2 γ ΛΛ + σ 2 ε I p )
29 MIXED LINEAR MODEL REPRESENTATION Let (Yi C, µ C,Λ C ) and (Yi T, µ T,Λ T ) represent the data under control and treatment, then: Y = Ψβ + Πγ + ε where β = (µ C, µ T ) ( ΛC Λ Ψ = C Λ T Λ T ) Π = diag(λ C,...,Λ C,Λ T,...,Λ T ) [ γ E ε ] [ 0 = 0 ] [ γ ε ] [ σ 2 = γ I 0 0 σε 2 I ]
30 INFERENCE USING MLM Let l be an estimable linear combination of fixed effects (we call l a contrast vector) and consider the test: H 0 : lβ = 0 vs. H 1 : lβ 0 Consider the Wald test statistic: T = l ˆβ l ˆQl Under the null hypothesis, T has approximately a t distribution with degrees of freedom estimated using Satterthwaite s approximation method ν = 2(l ˆQl ) 2 τ K τ τ is the gradient of lql with respect to (σ 2 γ,σ 2 ε ) K is the empirical covariance matrix of (σ 2 γ,σ 2 ε )
31 ANALYSIS OF YEAST GALACTOSE UTILIZATION DATA
32 EXTRACTING INTERESTING PATTERNS FROM TIME-EVOLVING NETWORKS Time-evolving network data consist of ordered sequences of graphs, e.g., network time-series
33 POPULAR APPROACH: TIME SERIES ANALYSIS OF NETWORK STATISTICS Extracting time series of network statistics (e.g. centrality parameter) allows direct application of time-series methods
34 DRAWBACKS OF NETWORK STATISTICS ANALYSIS Which network statistics? Heavily context dependent Often unknown and the easiest statistics to compute may not be informative. Which are the important nodes and how did they evolve over time? Usually requires additional, ad-hoc analysis
35 DECOMPOSITION OF THE NETWORK ADJACENCY MATRICES Matrix decompositions achieve dimension reduction Preserve essential features Large amount of existing work that can be leveraged for the problem at hand Which matrix decomposition? Non-negative Matrix Factorization
36 NON-NEGATIVE MATRIX FACTORIZATION Let Y be an observed n p matrix that is non-negative. NMF expresses Y UV T, where U R n K +,V Rp K +, and K << min{n,p}
37 WHY NMF? Better interpretability: Y ij = K k=1 U ik V kj, U ik V kj measures the contribution of cluster k to Y ij. Adjacency matrices are typically non-negative
38 MODELING NETWORK TIME SERIES Decompose spatio-temporal (network time-series) data as Space Time Basis Factors Smoothness Conditions. Intuition: Networks have short term fluctuations, but latent factors are smooth and exhibit long term trends.
39 EVOLVING FACTORIZATIONS We observe {Y t,t = 1,...,T } (network time-series), and posit Y t UVt T or U t Vt T. Depends on context ( different network types) and goal (clustering, heaviest element search, visual exploration).
40 OBTAINING ESTIMATES Based on optimizing the following objective function T U 0,V t 0 t=1 O = min + T t, t=1 Y t UV T t 2 F W (t, t) V t V t 2 F + λ g T t=1 Tr(V T t L t V t ) W (t, t) is a weight function that is proportional to some kernel and controls sensitivity to short term fluctuations. Similar to a Hodrick-Prescott filter. λ g,l t form a group penalty that control the importance of a priori clustering knowledge.
41 GROUP PENALTY Main Idea: If nodes i and j belong to the same group, then they should have similar coordinates given by V t. Define the Laplacian as L t = D t G t, where { 1, if nodes i and j belong to the same group (G t ) ij = 0, otherwise D t = diag( (G t ) ij,j = 1,...,n). i
42 LAPLACIAN SMOOTHING Fact: For every n K matrix V t, we have λ g Tr(V t T L t V t ) = λ g (G t ) ij ((V t ) ik (V t ) jk ) 2. k i,j The group penalty {λ g,l t } creates an abstract manifold at time t, and the weight function W (t, t) creates an abstract manifold between times t and t. The penalties utilize external information to create a topology that we embed and view the data in
43 ARXIV CITATION NETWORK Citation network sequence from the e-print service arxiv for the high energy physics theory section, and covers papers from October 1993 to December There are papers (nodes) with edges (references) over 112 months. Since citations never die, we posit Y t = UV T t.
44 Estimates of V t (Time-varying Paper Impact Scores) 1st Component 2nd Component Sum of Components Citation Network Layouts I II III IV V
45 HIGHEST IMPACT PAPERS BY V t Title Authors In-Degree Out-Degree # citations (Google) Heterotic and Type I String Dynamics Horava and Witten from Eleven Dimensions Five-branes And M-Theory On An Orbifold Witten D-Branes and Topological Field Theories Bershadsky, et. al Lectures on Superstring and M Theory Dualities Schwarz Type IIB Superstrings, BPS Monopoles, Hanany and Witten And Three-Dimensional Gauge Dynamics 2000 onwards Title Authors In-Degree Out-Degree # citations (Google) The Large N Limit of Superconformal Field Maldacena Theories and Supergravity Anti De Sitter Space And Holography Witten Gauge Theory Correlators from Non-Critical Klebanov and Polyakov String Theory Large N Field Theories, String Theory Aharony, et. al and Gravity String Theory and Noncommutative Geometry Seiberg and Witten
46 STATIC CLUSTERING The degree (number of connections) of each paper over all time points, colored by a top community detection algorithm (Newman PNAS, 2006). The groupings are not interpretable in terms of the time-profile of each paper.
47 EIGENVECTOR CENTRALITY Average age top 5 authorities top 10 authorities top 50 authorities top 100 authorities top 500 authorities Year The average age in months of the top authority papers over time (Kleinberg, J.ACM 1999). We see evidence for a change point around year 2000, but what about paper growth, grouping structure? Need more, ad-hoc analysis.
48 GENERAL REFERENCES Kolaczyk, E.D. (2009), Statistical Analysis of Network Data: Methods and Models, Springer. Feinberg, S. (2012), A Brief History of Statistical Models for Network Analysis and Open Challenges, Journal of Computational and Graphical Statistics, 20, Michailidis, G. (2012), Statistical Challenges in Biological Networks, Journal of Computational and Graphical Statistics, 20, Hunter, D., Krivitsky, P. and Scheinberger, M. (2012), Computational Statistical Methods for Social Network Models, Journal of Computational and Graphical Statistics, 20,
49 SPECIFIC TO THIS PRESENTATION Guo, J., Levina, E., Michailidis, G. and Zhu, J. (2011), Joint estimation of multiple graphical models, Biometrika, 98, 1-15 Shojaie A. and Michailidis G. (2009), Analysis of Gene Sets Based on The Underlying Regulatory Network, Journal of Computational Biology, 16(3): Mankad, S. and Michailidis, G. (2012), Structural and functional discovery in dynamic networks with non-negative matrix factorization, Physical Review E, forthcoming
The United States Senate
The United States Senate LEGISLATIVE ACTIVITIES THE HONORABLE Arlen Specter, of Pennsylvania 110th Congress January 04, 2007 to January 02, 2009 PREPARED BY THE SENATE SERGEANT AT ARMS, LEGISLATIVE SYSTEMS
SMALL BUSINESS HEALTH PLANS S. 1955 RESOURCE KIT
SMALL BUSINESS HEALTH PLANS S. 1955 RESOURCE KIT Health Insurance Marketplace Modernization and Affordability Act Bringing Fortune 500 Benefits to Main Street, USA TABLE OF CONTENTS Introduction... 1 Quick
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
Statistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA [email protected]
NETZCOPE - a tool to analyze and display complex R&D collaboration networks
The Task Concepts from Spectral Graph Theory EU R&D Network Analysis Netzcope Screenshots NETZCOPE - a tool to analyze and display complex R&D collaboration networks L. Streit & O. Strogan BiBoS, Univ.
BayesX - Software for Bayesian Inference in Structured Additive Regression
BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich
Statistical and computational challenges in networks and cybersecurity
Statistical and computational challenges in networks and cybersecurity Hugh Chipman Acadia University June 12, 2015 Statistical and computational challenges in networks and cybersecurity May 4-8, 2015,
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
Graphical Modeling for Genomic Data
Graphical Modeling for Genomic Data Carel F.W. Peeters [email protected] Joint work with: Wessel N. van Wieringen Mark A. van de Wiel Molecular Biostatistics Unit Dept. of Epidemiology & Biostatistics
1 o Semestre 2007/2008
Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
Fitting Subject-specific Curves to Grouped Longitudinal Data
Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: [email protected] Currie,
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016
and Principal Components Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 Agenda Brief History and Introductory Example Factor Model Factor Equation Estimation of Loadings
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague. http://ida.felk.cvut.
Machine Learning and Data Analysis overview Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz psyllabus Lecture Lecturer Content 1. J. Kléma Introduction,
Part 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
Several Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
Social Media Mining. Network Measures
Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users
1 Solving LPs: The Simplex Algorithm of George Dantzig
Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.
Learning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
MSCA 31000 Introduction to Statistical Concepts
MSCA 31000 Introduction to Statistical Concepts This course provides general exposure to basic statistical concepts that are necessary for students to understand the content presented in more advanced
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
Multivariate Analysis (Slides 13)
Multivariate Analysis (Slides 13) The final topic we consider is Factor Analysis. A Factor Analysis is a mathematical approach for attempting to explain the correlation between a large set of variables
University of Lille I PC first year list of exercises n 7. Review
University of Lille I PC first year list of exercises n 7 Review Exercise Solve the following systems in 4 different ways (by substitution, by the Gauss method, by inverting the matrix of coefficients
Detection of changes in variance using binary segmentation and optimal partitioning
Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
An Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia [email protected] Tata Institute, Pune,
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its
11. Time series and dynamic linear models
11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd
Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
Review Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
Business Intelligence and Process Modelling
Business Intelligence and Process Modelling F.W. Takes Universiteit Leiden Lecture 7: Network Analytics & Process Modelling Introduction BIPM Lecture 7: Network Analytics & Process Modelling Introduction
THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok
THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
Credit Risk Models: An Overview
Credit Risk Models: An Overview Paul Embrechts, Rüdiger Frey, Alexander McNeil ETH Zürich c 2003 (Embrechts, Frey, McNeil) A. Multivariate Models for Portfolio Credit Risk 1. Modelling Dependent Defaults:
Introduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
Part 1: Link Analysis & Page Rank
Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Exam on the 5th of February, 216, 14. to 16. If you wish to attend, please
Introduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
CS 207 - Data Science and Visualization Spring 2016
CS 207 - Data Science and Visualization Spring 2016 Professor: Sorelle Friedler [email protected] An introduction to techniques for the automated and human-assisted analysis of data sets. These
Predict the Popularity of YouTube Videos Using Early View Data
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA
REVSTAT Statistical Journal Volume 4, Number 2, June 2006, 131 142 A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA Authors: Daiane Aparecida Zuanetti Departamento de Estatística, Universidade Federal de São
Statistical Analysis of Network Data
Statistical Analysis of Network Data A Brief Overview Eric D. Kolaczyk Dept of Mathematics and Statistics, Boston University [email protected] Introduction Focus of this Talk In this talk I will present
Understanding the Impact of Weights Constraints in Portfolio Theory
Understanding the Impact of Weights Constraints in Portfolio Theory Thierry Roncalli Research & Development Lyxor Asset Management, Paris [email protected] January 2010 Abstract In this article,
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS
THREE DIMENSIONAL REPRESENTATION OF AMINO ACID CHARAC- TERISTICS O.U. Sezerman 1, R. Islamaj 2, E. Alpaydin 2 1 Laborotory of Computational Biology, Sabancı University, Istanbul, Turkey. 2 Computer Engineering
Matrix Differentiation
1 Introduction Matrix Differentiation ( and some other stuff ) Randal J. Barnes Department of Civil Engineering, University of Minnesota Minneapolis, Minnesota, USA Throughout this presentation I have
CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
Applications to Data Smoothing and Image Processing I
Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is
Least-Squares Intersection of Lines
Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
How To Understand The Network Of A Network
Roles in Networks Roles in Networks Motivation for work: Let topology define network roles. Work by Kleinberg on directed graphs, used topology to define two types of roles: authorities and hubs. (Each
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Lecture 9: Introduction to Pattern Analysis
Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns
Extracting correlation structure from large random matrices
Extracting correlation structure from large random matrices Alfred Hero University of Michigan - Ann Arbor Feb. 17, 2012 1 / 46 1 Background 2 Graphical models 3 Screening for hubs in graphical model 4
The Method of Least Squares
Hervé Abdi 1 1 Introduction The least square methods (LSM) is probably the most popular technique in statistics. This is due to several factors. First, most common estimators can be casted within this
An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data
n Introduction to the Use of ayesian Network to nalyze Gene Expression Data Cristina Manfredotti Dipartimento di Informatica, Sistemistica e Comunicazione (D.I.S.Co. Università degli Studi Milano-icocca
Numerical methods for American options
Lecture 9 Numerical methods for American options Lecture Notes by Andrzej Palczewski Computational Finance p. 1 American options The holder of an American option has the right to exercise it at any moment
An explicit link between Gaussian fields and Gaussian Markov random fields; the stochastic partial differential equation approach
Intro B, W, M, & R SPDE/GMRF Example End An explicit link between Gaussian fields and Gaussian Markov random fields; the stochastic partial differential equation approach Finn Lindgren 1 Håvard Rue 1 Johan
Penalized Logistic Regression and Classification of Microarray Data
Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification
Lecture 2 Linear functions and examples
EE263 Autumn 2007-08 Stephen Boyd Lecture 2 Linear functions and examples linear equations and functions engineering examples interpretations 2 1 Linear equations consider system of linear equations y
Simple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
MapReduce Approach to Collective Classification for Networks
MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty
Social and Technological Network Analysis. Lecture 3: Centrality Measures. Dr. Cecilia Mascolo (some material from Lada Adamic s lectures)
Social and Technological Network Analysis Lecture 3: Centrality Measures Dr. Cecilia Mascolo (some material from Lada Adamic s lectures) In This Lecture We will introduce the concept of centrality and
jorge s. marques image processing
image processing images images: what are they? what is shown in this image? What is this? what is an image images describe the evolution of physical variables (intensity, color, reflectance, condutivity)
Science Navigation Map: An Interactive Data Mining Tool for Literature Analysis
Science Navigation Map: An Interactive Data Mining Tool for Literature Analysis Yu Liu School of Software [email protected] Zhen Huang School of Software [email protected] Yufeng Chen School of Computer
Introduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
MSCA 31000 Introduction to Statistical Concepts
MSCA 31000 Introduction to Statistical Concepts This course provides general exposure to basic statistical concepts that are necessary for students to understand the content presented in more advanced
Basics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu [email protected] Modern machine learning is rooted in statistics. You will find many familiar
APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
Estimating an ARMA Process
Statistics 910, #12 1 Overview Estimating an ARMA Process 1. Main ideas 2. Fitting autoregressions 3. Fitting with moving average components 4. Standard errors 5. Examples 6. Appendix: Simple estimators
Statistics for BIG data
Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before
MIMO CHANNEL CAPACITY
MIMO CHANNEL CAPACITY Ochi Laboratory Nguyen Dang Khoa (D1) 1 Contents Introduction Review of information theory Fixed MIMO channel Fading MIMO channel Summary and Conclusions 2 1. Introduction The use
A mixture model for random graphs
A mixture model for random graphs J-J Daudin, F. Picard, S. Robin [email protected] UMR INA-PG / ENGREF / INRA, Paris Mathématique et Informatique Appliquées Examples of networks. Social: Biological:
STANDING COMMITTEES OF THE SENATE
STANDING COMMITTEES OF THE SENATE [Democrats in roman; Republicans in italic; Independent in SMALL CAPS; Independent Democrat in SMALL CAPS ITALIC] [Room numbers beginning with SD are in the Dirksen Building,
CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht [email protected] 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht [email protected] 539 Sennott
So which is the best?
Manifold Learning Techniques: So which is the best? Todd Wittman Math 8600: Geometric Data Analysis Instructor: Gilad Lerman Spring 2005 Note: This presentation does not contain information on LTSA, which
NodeXL for Network analysis Demo/hands-on at NICAR 2012, St Louis, Feb 24. Peter Aldhous, San Francisco Bureau Chief. peter@peteraldhous.
NodeXL for Network analysis Demo/hands-on at NICAR 2012, St Louis, Feb 24 Peter Aldhous, San Francisco Bureau Chief [email protected] NodeXL is a template for Microsoft Excel 2007 and 2010, which
BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology http://tinyurl.com/bioinf525-w16
Course Director: Dr. Barry Grant (DCM&B, [email protected]) Description: This is a three module course covering (1) Foundations of Bioinformatics, (2) Statistics in Bioinformatics, and (3) Systems
Semi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network
, pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and
Course: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 [email protected] Abstract Probability distributions on structured representation.
Lasso on Categorical Data
Lasso on Categorical Data Yunjin Choi, Rina Park, Michael Seo December 14, 2012 1 Introduction In social science studies, the variables of interest are often categorical, such as race, gender, and nationality.
Fast Multipole Method for particle interactions: an open source parallel library component
Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,
Digital Imaging and Multimedia. Filters. Ahmed Elgammal Dept. of Computer Science Rutgers University
Digital Imaging and Multimedia Filters Ahmed Elgammal Dept. of Computer Science Rutgers University Outlines What are Filters Linear Filters Convolution operation Properties of Linear Filters Application
Bioinformatics: Network Analysis
Bioinformatics: Network Analysis Graph-theoretic Properties of Biological Networks COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Outline Architectural features Motifs, modules,
Analysis of Bayesian Dynamic Linear Models
Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main
Sanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 [email protected] 1. Introduction The field of data mining and knowledgee discovery is emerging as a
Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions
SMA 50: Statistical Learning and Data Mining in Bioinformatics (also listed as 5.077: Statistical Learning and Data Mining ()) Spring Term (Feb May 200) Faculty: Professor Roy Welsch Wed 0 Feb 7:00-8:0
Statistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
