Healthcare Analytics. Aryya Gangopadhyay UMBC

Similar documents

Graph Theory and Networks in Biology

Bioinformatics: Network Analysis

Social Media Mining. Network Measures

Complex Networks Analysis: Clustering Methods

General Network Analysis: Graph-theoretic. COMP572 Fall 2009

Network Analysis. BCH 5101: Analysis of -Omics Data 1/34

Practical Graph Mining with R. 5. Link Analysis

Introduction to Networks and Business Intelligence

Walk-Based Centrality and Communicability Measures for Network Analysis

1. Introduction Gene regulation Genomics and genome analyses Hidden markov model (HMM)

Understanding the dynamics and function of cellular networks

Protein Protein Interaction Networks

Social Network Mining

Graph Theory Approaches to Protein Interaction Data Analysis

Network/Graph Theory. What is a Network? What is network theory? Graph-based representations. Friendship Network. What makes a problem graph-like?

How To Understand The Network Of A Network

Graph Mining Techniques for Social Media Analysis

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

InSyBio BioNets: Utmost efficiency in gene expression data and biological networks analysis

Prevention of and the Screening for Diabetes Part I Insulin Resistance By James L. Holly, MD Your Life Your Health The Examiner January 19, 2012

How To Understand A Protein Network

DECENTRALIZED SCALE-FREE NETWORK CONSTRUCTION AND LOAD BALANCING IN MASSIVE MULTIUSER VIRTUAL ENVIRONMENTS

Graph theoretic approach to analyze amino acid network

Predicting Influentials in Online Social Networks

Supplementary Information

DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS

Tutorial, IEEE SERVICE 2014 Anchorage, Alaska

The National Institute of Genomic Medicine (INMEGEN) was

Graph models for the Web and the Internet. Elias Koutsoupias University of Athens and UCLA. Crete, July 2003

A discussion of Statistical Mechanics of Complex Networks P. Part I

Data Integration. Lectures 16 & 17. ECS289A, WQ03, Filkov

Online Social Networks and Network Economics. Aris Anagnostopoulos, Online Social Networks and Network Economics

If you were diagnosed with cancer today, what would your chances of survival be?

Dmitri Krioukov CAIDA/UCSD

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS

Some questions... Graphs

Search engines: ranking algorithms

Disease Re-classification via Integration of Biological Networks

Bioinformatics: Network Analysis

USE OF GRAPH THEORY AND NETWORKS IN BIOLOGY

Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations

Social Network Analysis

Support Program for Improving Graduate School Education Advanced Education Program for Integrated Clinical, Basic and Social Medicine

ProteinQuest user guide

Metabolic network analysis: Towards the construction of a meaningful network

! E6893 Big Data Analytics Lecture 10:! Linked Big Data Graph Computing (II)

Social and Economic Networks: Lecture 1, Networks?

Department of Cognitive Sciences University of California, Irvine 1

SPANNING CACTI FOR STRUCTURALLY CONTROLLABLE NETWORKS NGO THI TU ANH NATIONAL UNIVERSITY OF SINGAPORE

Diabetes and Insulin Signaling

Your Life Your Health Cariodmetabolic Risk Syndrome Part VII Inflammation chronic, low-grade By James L. Holly, MD The Examiner January 25, 2007

Graph Processing and Social Networks

Optimal Network Alignment with Graphlet Degree Vectors

Lecture 5: Singular Value Decomposition SVD (1)

Structural constraints in complex networks

Qualitative modeling of biological systems

Type 2 Diabetes and Prediabetes: A New Understanding of Cause and Treatment. Bruce Latham, M.D. Endocrine Specialists Greenville Health System

Using graph theory to analyze biological networks

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

The PageRank Citation Ranking: Bring Order to the Web

Part 1: Link Analysis & Page Rank

Molecule Shapes. 1

Big Data Healthcare. Fei Wang Associate Professor Department of Computer Science and Engineering School of Engineering University of Connecticut

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Diabetes mellitus. Lecture Outline

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon

SDS gel electrophoresis was performed using a 4% by 20% gradient gel 8. Quantification of Western blots was performed using Image J Processing and

Studying Graphs for Intelligence Monitoring and Analysis in the Absence of Semantic Information

Visualizing Networks: Cytoscape. Prat Thiru

Quantitative and Qualitative Systems Biotechnology: Analysis Needs and Synthesis Approaches

Give a NOD to diabetes:

Feed Forward Loops in Biological Systems

Mining Social-Network Graphs

Random graphs and complex networks

Enabling Anti-Fungal Discovery Through Advanced Biocomputation Tools

Unraveling protein networks with Power Graph Analysis

Graph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks

Autoimmunity and immunemediated. FOCiS. Lecture outline

Introduction To The Zone

DATA ANALYSIS II. Matrix Algorithms

Ranking on Data Manifolds

arxiv: v3 [math.na] 1 Oct 2012

STEM CELL FELLOWSHIP

Genetics Lecture Notes Lectures 1 2

Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

Biochemistry. Entrance Requirements. Requirements for Honours Programs. 148 Bishop s University 2015/2016

Introduced by Stuart Kauffman (ca. 1986) as a tunable family of fitness landscapes.

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Thomson Reuters Biomarker Solutions: Hepatitis C Treatment Biomarkers and special considerations in patients with Asthma

4/4/2013. Mike Rizo, Pharm D, MBA, ABAAHP THE PHARMACIST OF THE FUTURE? METABOLIC SYNDROME AN INTEGRATIVE APPROACH

The Network Structure of Hard Combinatorial Landscapes

Pathway Analysis : An Introduction

Valuation of Network Effects in Software Markets

GENERATING AN ASSORTATIVE NETWORK WITH A GIVEN DEGREE DISTRIBUTION

Extracting Information from Social Networks

Diabetes and Obesity. The diabesity epidemic

An Introduction to APGL

GloP1r - A New Frontier in Exercise and Nutrition

Introduction. Pathogenesis of type 2 diabetes

Transcription:

Healthcare Analytics Aryya Gangopadhyay UMBC

Two of many projects Integrated network approach to personalized medicine Multidimensional and multimodal Dynamic Analyze interactions HealthMask Need for sharing data Protecting privacy Protecting data utility

Outline Systems approach to healthcare (P4 medicine) Components Interactions dynamics CIDeR biological network Topological properties Link analysis Perturbation analysis Implications for personalized medicine Challenges, and future work

Motivation Within 10 years every healthcare consumer will be surrounded by a virtual cloud of billions of data points [Hood et al. 2013] Structural model of interactions Network of nodes: genes, proteins, etc. Interaction: edges Multiple, interdependent networks Gene regulatory networks Protein-protein interaction networks Metabolic networks Pharmaceuticals Medicine Insurance Some of the fundamental networks have been mapped Why networks? Patient Academia Biotech Diagnostics Health IT The behavior of complex systems arises from the coordinated actions of its interactive components

Motivation Biological processes as interconnected systems Analyze interactions Resilience against random perturbations Vulnerable to targeted attacks Large, multi-dimensional, multimodal, dynamic

Process Multimodal network Phenotype Drugs Genes/Proteins Disease

Different types of networks Simple: physical binding Nodes: Proteins, metabolites Edges: binding, reaction Biological Networks Directed: transcriptional regulatory network Nodes: transcription factors Edges: regulatory interaction Yeast regulatory network Directed but multi-typed edges: activating or inhibitory E. Coli metabolic network

Entire CIDeR Network properties (Lechner et al 2012) SNPs: 259 Cell. Comp: 62 Genes: 2267 Diseases: 140 Process: 622 Gene/protein mutants: 248 Drugs: 693 Environment: 75 Phenotypes: 266 microrna: 90 Tissue/cell: 105 Nodes: 5168 Edges: 14410 Diameter: 16 # CC: 51 Avg. PL: 4.46 Avg. degree: 2.8

Network properties Are biological networks random (Erdős-Rényi)? A graph with n nodes where each pair of nodes are connected with equal probability P(k): degree distribution assumed to follow the Poisson distribution Small world phenomenon (path length) In Erdős-Rényi random networks the average path length is ln N ln k Connected components Flow of Information, mass, or energy Implications for biological networks

Topological property: scale-free Scale-free property Power law degree distribution of biological networks Significance Most nodes with small degree (small neighborhood: 1-3) A small number of nodes with very high degree: hubs (>100) Degree distribution of Erdős Rényi random network

Network properties contd. Similarities of biological networks with Erdős-Rényi Giant connected component: 4343 in CIDeR Small-world effect 3 in metabolic networks for 43 different species 4-8 edges in protein-interaction and genetic networks Clustering coefficient: 1 for 15 genes/gene mutants in CIDeR if node y and z are both connected to x they will also be connected to each other If C v is the clustering coefficient of node v then C v 2.n v d.(d 1)

Importance of nodes Closeness centrality: a measure of how close a node is to all other nodes (average distance) Betweenness Centrality: the percentage of shortest paths that pass through the node Eigenvector centrality: value of the eigenvector of the adjacency matrix

Part of Alzheimer s disease network: CIDeR

CIDeR Diabetes network Topological properties: 1846 nodes, 5396 edges Connected components: 25 GCC: 992 nodes Diameter: 11 Average path length: 4 average degree: 2.9 Graph density: 0.0002 Average clustering coefficient: 0.057

CIDeR Diabetes network: highest out-degree nodes Environment: High-fat diet (158) Cellular component: LPIN1 (96) Genes: SRT1 (94), Angiotensin II (67), IL6 (61), GPAM (59), TNF (48) Disease: Diabetes mellitus type II (78), Obesity (66), Complex PPI: Insulin (76) Phenotype: Hyperglycemia (64) Drugs: Valsartin (57), Pioglitazone (51), Metformin (50)

CIDeR Diabetes network: Highest in-degree nodes Disease: D M II (172), Insulin resistance (164), Obesity (96) Process: Inflammatory response (83), Insulin receptor signaling pathway (55) Genes: TNF (76), IRS1 (70), IL6 (68), CCL2 (59), AKT1 (56), ADIPOQ (55), FASN (34), IL1B (34) Phenotype: Insulin sensitivity (63), Hypertension (47) Complex PPI: AMPK (58), NF-kappaB complex (51), Insulin (36), Glycated hemoglobin (33) Drug/chemical compound: Triacylglycerol (51)

Link Analysis

PageRank algorithm [BP98] Important nodes should be pointed to by other important nodes Each node has 1 vote that gets equally divided into all nodes it points to Random walk on the web graph pick a page at random with probability 1- α jump to a random page with probability α follow a random outgoing link Rank according to the stationary distribution 0.19 é ê ê ê ê ë 1 2 3 4 é ê ê ê ê ë 0 1 1 0 1 0 1 0 0 1 0 1 0 1 0 0 ù ú ú ú ú û 0 0.5 0.5 0 0.5 0 0.5 0 0 0.5 0 0.5 0 1 0 0 0.38 0.29 0.14 ù ú ú ú ú û

Effectors and Receptors Hubs/effectors Authorities/receptors Page with large in-degree Preferential attachment Important effectors point to important receptors Important receptors are pointed to by important effectors The seemingly circular description is solvable by SVD

Example E E R E/R

Singular Value Decomposition Receptors k k A k = U X k S X k V T Effectors v 1 σ 1

CIDeR Diabetes Network (SNPs) Pioglitazone Valsartan Palmitic acid Metformin Obesity TCF7L2-n.T KCNQ1-n.C

CIDeR Diabetes Network (genes/proteins) TLR4 IL6 SRT1 IKBKB Pioglitazone Obesity GPAM Valsartan High fat diet-induced obesity

SVD Results: U1 vs PR inflammatory response Diabetes mellitus, type II gluconeogenesis response to oxidative stress IL6 Obesity GPAM TLR4 TCF7L2-n.T FTO-n.A Valsartan KCNQ1-n.C High-fat diet

Diabetes-Significant Effectors (SVD) Index Element Type U1 Rank In-deg. Out-deg. 202 TCF7L2-n.T SNP 1 454 3 22 129 KCNQ1-n.C SNP 2 800 1 7 1383 TLR4 gene/protein 3 44 20 37 966 GPAM gene/protein 4 70 14 59 69 FTO-n.A SNP 5 1366 0 13 drug/chemical 633 Valsartan compound 6 1654 0 57 1040 IL6 gene/protein 7 5 69 62 188 SLC30A8-n.C SNP 8 802 1 9 96 HHEX-n.C SNP 9 1393 0 6 1311 SIRT1 gene/protein 10 209 7 94

HITS [Kleinberg98] Every node has an authority score and a hub score Start with initial hub and authority vectors 1 2 3 4 é 1ù ê ú ê 1 ú x (0) = ê ú, y (0) = ê ú ê ú ë ê 1û ú For the adjacency matrix L: é 1ù ê ú ê 1 ú ê ú ê ú ê ú ë ê 1û ú é ê L = ê ê ê ë 0 1 1 0 1 0 1 0 0 1 0 1 0 1 0 0 [ ] T x (1) = L T.y (0) = 1 3 2 1 ù ú ú ú ú û Calculate y (1) : y (1) = L.x (1) = [ 5 3 4 3] T x (1) y (1) Normalize and to L1-norm and repeat until convergence

Left and right singular vectors HITS converges to the principal eigenvectors of LL T and L T L Proof sketch L USV LL T T k ( LL ) T L USV US T T 2k T ( USV ) VSU U T T Hence, U and V and S 1/2 are the eigenvector and eigenvalue matrices of LL T and L T L respectively T US VSU 2 U L T L = VSU T USV T = VS 2 V T T T T k ( L L) VS 2k V T

PR vs Hub scores

Perturbation Analysis

Diseases as network perturbations Diseases as specific types of network perturbation (del Sol et al 2010, Hwuang et al 2009) Genetic mutations, environmental factors, epigenetic perturbations Some gene regulatory network states may correspond to disease states Study the topological properties of disease-perturbed networks Study the effect of network perturbations Perturbation of nodes Perturbation of edges

Hub and spoke communities (Kang et al 2011)

Effect of Perturbation on CCs

Conclusion Limitations Separate analysis for each type of edge Perturbation in gene regulatory networks Challenges Are all biological networks scale-free? Przuli et al (2004) show otherwise for PPI of S. Cerevisiae Incomplete and tissue specific data Capturing dynamism Analyzing the entire interactome as opposed to individual networks Big data Future work: Detecting community structure Other diseases: Alzheimer s, Parkinson s, Cancer