Cost-based Optimization of Graph Queries in Relational Database Management Systems

Size: px
Start display at page:

Download "Cost-based Optimization of Graph Queries in Relational Database Management Systems"

Transcription

1 Cost-based Optimization of Graph Queries in Relational Database Management Systems D I S S E R T A T I O N zur Erlangung des akademischen Grades Dr. rer nat. im Fach Informatik eingereicht an der Mathematisch-Naturwissenschaftlichen Fakultät II Humboldt-Universität zu Berlin von Dipl.-Ing. (FH) Silke Trißl M.Sc. Präsident der Humboldt-Universität zu Berlin: Prof. Dr. Jan-Hendrik Olbertz Dekan der Mathematisch-Naturwissenschaftlichen Fakultät II: Prof. Dr. Elmar Kulke Gutachter: 1. Prof. Dr. Ulf Leser 2. Prof. Johann-Christoph Freytag, Ph.D. 3. Prof. Dr. Thorsten Grust eingereicht am: Tag der mündlichen Prüfung:

2

3 Alles hat ein Ende nur die Wurst hat zwei. Stephan Remmler Acknowledgement This thesis would not have been possible without the help, support, and encouragement of many people. First of all, I would like to thank my supervisor Prof. Ulf Leser. He gave me the opportunity to start my PhD and provided a welcoming and pleasant working environment at Humboldt-Universität zu Berlin. I am greatly indebted to him for his patience, encouragement, and guidance during all these years with ups and downs. I could not have imagined a more motivated or dedicated advisor for my PhD study. I am grateful to all who gave me the opportunity to partly finance my PhD by teaching. I met committed and inquiring students in the courses and exercises I taught for Prof. Ulf Leser at HU Berlin and Prof. Felix Naumann at HPI Potsdam. Dr. Márta Gutsche and the Frauenförderung at HU Berlin gave me the opportunity to spark interest in girls to study computer science. I thank Prof. Louiqa Raschid at University of Maryland who invited me for a research exchange to the US. I am also grateful to the BMBF who supported my research. I would not have finished this PhD thesis without the help and support of many colleagues and friends. Thanks to Jörg, Timo, and Philippe who shared an office with me. Thanks to Jens, Melanie, Jana, Long, Roger, and Samira who also accompanied me for a long time during my thesis. I want to acknowledge all researchers and students from the groups WBI and DBIS at HU, Informationssysteme at HPI, and Genetik und Biometrie at FBN. Many thanks for constructive criticism and helpful suggestions. I am greatly indebted to all colleagues who tried to cheer me up during common lunch and coffee breaks. I acknowledge some students, who I met during my time in Berlin. Raphael and Philipp did a lot of programming in my first project Columba. Johannes, Christoph, Florian, and André used some ideas of GRIPP in their Studien- or Diplomarbeiten and gave feedback on the algorithm. Last but not least, würde ich mich gerne bei meiner Familie bedanken, die während der gesamten Zeit Freud und Leid mit mir geteilt hat. Meine Eltern hatten und haben immer ein offenes Ohr für meine Sorgen und Nöte von ganzem Herzen vielen Dank dafür. Also, many thanks to my sister. Whenever I needed to discuss a problem, she listened patiently and gave me good advice.

4

5 Abstract Graphs occur in many areas of life. We are interested in graphs in biology, where nodes are chemical compounds, enzymes, reactions, or interactions, which are connected by either directed or undirected edges. Efficiently querying these graphs is a challenging task. In this thesis we present GRIcano, a system that efficiently executes graph queries. For GRIcano we assume that graphs are stored and queried using relational database management systems (RDBMS). We use an extended version of the Pathway Query Language PQL to express graph queries, for which we describe the syntax and semantics in this work. We employ ideas from RDBMS to improve the performance of query execution. Thus, the core of GRIcano is a cost-based query optimizer, which is created using the Volcano optimizer generator. This thesis makes contributions to all three required components of the optimizer, the relational algebra, implementations, and cost model. Relational algebra operators alone are not sufficient to express graph queries. Thus, we first present new operators to rewrite PQL queries to algebra expressions. We propose the reachability φ, distance Φ, path length ψ, and path operator Ψ. In addition, we provide rewrite rules for the newly proposed operators in combination with standard relational algebra operators. Secondly, we present implementations for each proposed operator. The main contribution is GRIPP, an index structure that allows us to execute reachability queries on very large graphs containing directed edges. GRIPP has advantages over other existing index structures, which we review in this work. In addition, we show how to employ GRIPP and the recursive query strategy as implementation for all four proposed operators. The third component of GRIcano is the cost model, which requires cardinality estimates for the proposed operators and cost functions for the implementations. Based on extensive experimental evaluation of the proposed implementations we present functions to estimate the cardinality of the φ, Φ, ψ, and Ψ operator and the cost of executing a query. The novelty of our approach is that these functions only use key figures of the graph. We finally present the effectiveness of GRIcano using exemplary graph queries on real biological networks. v

6

7 Zusammenfassung Graphen sind in vielen Bereichen des Lebens zu finden, wobei wir speziell an Graphen aus der Biologie interessiert sind. Knoten in solchen Graphen sind chemische Komponenten, Enzyme, Reaktionen oder Interaktionen, die durch gerichtete oder ungerichtete Kanten miteinander verbunden sind. Eine effiziente Ausführung von Graphanfragen ist eine Herausforderung. In dieser Arbeit präsentieren wir GRIcano, ein System, das das effiziente Ausführen von Graphanfragen erlaubt. Wir nehmen an, dass die Graphen in relationalen Datenbankmanagementsystemen (RDBMS) gespeichert sind und darin auch angefragt werden. Als Graphanfragesprache schlagen wir eine erweiterte Version der Pathway Query Language (PQL) vor. Der Hauptbestandteil von GRIcano ist ein kostenbasierter Anfrageoptimierer, der mit Hilfe des Optimierergenerators Volcano erzeugt wird. Diese Arbeit enthält Beiträge zu allen drei benötigten Komponenten des Optimierers, der relationalen Algebra, Implementierungen und Kostenmodellen. Die Operatoren der relationalen Algebra alleine sind nicht ausreichend, um PQL Anfragen auszudrücken. Daher stellen wir zuerst die neuen Operatoren Erreichbarkeits- φ, Distanz- Φ, Pfadlängen- ψ und Pfadoperator Ψ vor. Zusätzlich geben wir Regeln für die Umformung von Ausdrücken an, die die neuen Operatoren zusammen mit den Standardoperatoren der relationalen Algebra enthalten. Des Weiteren präsentieren wir Implementierungen für jeden vorgeschlagenen Operatoren. Der Hauptbeitrag dabei ist GRIPP, eine Indexstruktur, die die effiziente Ausführung von Erreichbarkeitsanfragen auf sehr großen Graphen mit gerichteten Kanten erlaubt. Wir zeigen, wie GRIPP und die rekursive Anfragestrategie genutzt werden können, um Implementierungen für alle vorgeschlagenen Operatoren bereitzustellen. Die dritte Komponente von GRIcano ist das Kostenmodell, das Kardinalitätsabschätzungen für die vorgeschlagenen Operatoren und Kostenmodelle für die Implementierungen benötigt. Basierend auf umfangreichen Experimenten schlagen wir Funktionen für die Abschätzung der Kardinalitäten der Operatoren φ, Φ, ψ und Ψ vor. Zusätzlich leiten wir Funktionen für die Abschätzung der Kosten für die Ausführung von Graphanfragen ab. Der neue Ansatz der Kostenmodelle ist, dass die Funktionen nur Kennzahlen der Graphen verwenden. Abschließend zeigen wir die Wirkungsweise von GRIcano mit Beispielanfragen auf echten biologischen Netzwerken. vii

8

9 Contents 1. Introduction Queries on Graphs Motivation Contribution Structure of this Work Definitions and Terminology Graphs Definitions Storage and Traversal Relational Algebra Algebra and Relations Operators Equivalence Rules Cost-Based Query Optimization Query Processing Implementation of Operators Cost Function and Query Optimization Volcano Graph Queries Data Model Graph Queries Query Graph Evaluation of Graph Queries Pathway Query Language Graphs in PQL Syntax PQL and Non-graph Relations PQL Semantics Semantics of Node Conditions Semantics of Path Conditions Semantics of HAVING Conditions Semantic of the Subgraph Specification Conversion to Relational Algebra Related Work ix

10 Contents 4. Operators for Graph Queries Operators for Nodes Operators for Paths Path Operator, Ψ Reachability operator, φ Path Length Operator, ψ Distance Operator, Φ Summary Related Work Implementations for Operators GRIPP Index Structure Reachability Queries Distance Queries Path Length and Path Queries Other Index Structures Transitive Closure Dual Labeling Label + SSPI RDBMS Capabilities Recursive Strategies Summary Related Work Performance of GRIPP Experimental Setup Generated Graphs Real-world Graphs Implementation Details Index Creation Query Performance Reachability Queries Distance Queries Path Length Queries Path Queries Comparison of Query Types Summary GRIcano Cardinality Estimates Reachability Operator Distance Operator Path Length Operator x

11 Contents Path Operator Validation on Real World Graphs Cost Functions Reachability Queries Distance Queries Path Length Queries Path Queries Validation on Real World Graphs GRIcano Experimental Evaluation Related Work Cardinality and Cost Estimates Rule-based Query Optimization Cost-based Query Optimization Conclusion and Outlook Summary Future Work A. Strongly Connected Component 151 A.1. Kosaraju s Algorithm B. Rewrite Rules for Operators 153 B.1. Path Operator B.1.1. Restriction on Start and End Node B.1.2. Path Operator and Other Operators B.2. Path Length Operator B.2.1. Restriction on Start and End Node B.2.2. From Path Operator Ψ to Path Length Operator ψ B.2.3. Path Length Operator and Other Operators B.3. Distance Operator B.3.1. Restriction on Start and End Node B.3.2. From Path Operator Ψ to Distance Operator Φ B.3.3. Distance Operator and Other Operators B.4. Reachability Operator B.4.1. Restriction on Start and End Node B.4.2. From Path Operator Ψ to Reachability Operator φ B.4.3. Reachability Operator and Other Operators C. Additional Algorithms for GRIPP 161 C.1. Relational Schema for Storing GRIPP C.2. Stop Node List for GRIPP C.3. Reachability for Sets of Nodes xi

12 Contents D. Graph Properties 165 E. Model Specification for Volcano 167 F. Cost and Cardinality Functions for Volcano 173 G. Exemplary Queries for GRIcano 179 xii

13 1. Introduction The topic of this work is cost-based optimization of graph queries in relational database management systems. In Section 1.1 we first introduce the kind of graphs that led us to this topic, before we proceed in Section 1.2 with the motivation for our approach. In Section 1.3 we summarize our contribution in the area of cost-based optimization of graph queries. Finally, in Section 1.4 we give an overview of this work Queries on Graphs Graphs occur in many areas of life. Examples are public transport plans, road maps, the World Wide Web (WWW), or social networks. Common to all these graphs is that they consist of nodes and edges. Nodes are stations, junctions, web pages, or people. Edges in such networks are tracks, roads, links, or personal relationships. All these graphs have interesting features but we are interested in graphs in biology. To understand the content of these graphs we first make a short digression to cell biology. For a more comprehensive introduction we refer the reader to Alberts et al. [AJW + 08]. All biological cells are built in similar fashion, though there exist differences in the structure of cells between the three major groups, prokaryotes, eukaryotes, and archaea. All have in common that they contain a cell membrane as boundary to the outside and a genome, which holds information for building and maintaining the cell. In eukaryotes the genome is contained inside the nucleus, while in prokaryotes and archaea the genome is free in the cytoplasm. The genome is comprised of long stretches of DNA, the chromosomes. Genes are short regions of the genome that code for a functional product in the cell. During the transcription process genes are read and transcribed into RNA. Either the RNA itself is the functional product or the RNA, possibly with some modifications, is translated to proteins. Proteins in a cell are the workhorses as they catalyze reactions, process signals, or transport molecules. One class of proteins, the enzymes, catalyze chemical reactions, such as the degradation of sugar or the production of essential amino acids. Another class, the membrane proteins, reside inside the cell membrane and react to outer stimuli or facilitate the transport of substances in and out of a cell. When an outer stimuli occurs membrane proteins may activate or inactivate proteins inside the cell to enhance or suppress reactions. There exist other protein groups such as histones, which are concerned with packing the DNA in the nucleus of eukaryotes, collagens, which occur mainly in muscle cells, or antibodies, which are required in higher organisms for the immune response. 1

14 1. Introduction To give an impression of the complexity of the problem, every human has about 250,000 different proteins in his or her body, according to current estimates. Each protein may interact with numerous other proteins or some of the hundreds of thousands organic and inorganic substances. Biologists have studied these complex interactions involving proteins and other substances. Their knowledge is stored as graphs in publicly available data sources. Biological graphs may roughly be divided into three categories, metabolic networks, signaling pathways, and protein-protein interaction networks 1. For a review on different biological networks see [BN05]. Metabolic networks are graphs, which represent the conversion of substances in a cell. Nodes in these networks are proteins, other molecules such as sugars or fatty acids, or reactions. Edges in such graphs are usually directed and indicate that a molecule participates in a reaction. The most familiar conversion is the glycolysis. In the glycolysis glucose is converted to pyruvate, which produces energy during the conversion. Proteins and reactions participating in this conversion are said to be in the glycolysis pathway. In general, pathways in metabolic networks are subgraphs that stand for specific conversions defined by researchers. The pathways may overlap, i.e., they may share proteins or reactions. Data sources for metabolic networks are KEGG [KGK + 04], BioCyc [KOMK + 05], and Reactome [JTGV + 05] for instance. Figure 1.1 shows the glycolysis given by KEGG. Circles are molecules that are converted, rectangular boxes on edges stand for reactions catalyzed by enzymes that are identified by their EC number, and the boxes with rounded corners represent other pathways. Signaling pathways are graphs that capture the information flow in a cell. Nodes in these graphs are usually proteins or reactions, while edges represent the flow of information. For example, Figure 1.2 shows the activation of protein kinase A (PKA) by an outer stimuli as given by BioCarta [htt11b]. The activated form of PKA regulates several reactions, including one reaction of the glycolysis presented in Figure 1.1. Depending on the outer stimuli glucose PKA phosphorylates or dephosphorylates the complex of the two enzymes phosphofructokinase 2 and fructose-2,6-bisphosphatase. The phosphorylation status influences the reaction rate of the glycolysis. The third group of biological graphs are protein-protein interaction networks. In these graphs nodes are proteins, while edges represent interactions between proteins and they are usually undirected. Figure 1.3 shows known interactions for the protein complex phosphofructokinase 2 and fructose-2,6-bisphosphatase (PFKFB1) as given by String [vmjs + 05], a data source for protein-protein interactions. The red node in the center is PFKFB1. It interacts with protein kinase A (PKACA) and several other proteins. The different colors of the edges code for different evidences, e.g., interactions found in other data sources are represented by blue edges, while interactions derived using text mining methods are shown by light green edges. Other data sources that contain data about protein-protein interactions are for in- 1 See Pathguide: the pathway resource list for a list on data sources 2

15 1.1. Queries on Graphs Figure 1.1.: The glycolysis as given by KEGG. The circles are molecules that are converted, rectangular boxes on edges stand for reactions catalyzed by enzymes that are identified by their EC number, and the boxes with rounded corners stand for other pathways. 3

16 1. Introduction Figure 1.2.: The activation of PKA through an outer stimuli from BioCarta. stance DIP [XSD + 02], BIND [BBH03], Intact [XSD + 02], and PubGene [JLKH01] Motivation The examples in the last section show only small parts of different biological graphs. Table 1.1 shows the number of nodes and edges of selected data sources. For example, KEGG contains 42,002 nodes and 51,450 edges in its reference pathway as of March The reference pathway is a summarization of the pathways of all species. In contrast, BioCyc stores an individual metabolic network for each of the roughly 400 species. In addition, in contrast to KEGG BioCyc also represents relationships between genes and proteins. Biologist use specialized graph viewing tools to display those graphs. For a review on the tools see Suderman & Hallett [SH07]. The tools usually display parts of the entire graph, e.g., a single pathway of a metabolic network, possibly with links to other pathways as shown in Figure 1.1. With such tools a biologist is only able to navigate through graphs. Consider the question How many steps does a cell require to produce the amino acid lysine given the substrate glucose. A biologist may use the metabolic network of KEGG, 4

17 1.2. Motivation Figure 1.3.: Known protein-protein interactions for the protein complex PFKFB1 in humans. The different colors of edges stand for different evidences, e.g., interactions found in other data sources are represented by blue edges, while interactions derived using text mining methods are shown by light green edges. where she has to start at glucose in the glycolysis pathway, follow the link to the pathway of the citrate cycle, and then follow the link to the pathway of the lysine biosynthesis. This way, she will count that there are 25 steps required to produce lysine from the substrate glucose. Clearly, when manually navigating through the images of pathways a biologist might not find the shortest path or occasionally even no path at all although there exists one. Thus, tools are required that allow users to pose queries such as the one presented above and return an answer to the user. In [HNM + 00] van Helden and colleagues identified several other questions that are interesting for biologists: Get all reactions catalyzed by a given gene product. Find all metabolic pathways that convert compound A into compound B in less than X steps. Retrieve all genes whose expression is directly or indirectly affected by a given compound. Find all compounds that can be synthesized from a given precursor in less than X steps. Currently, researchers have to write specialized programs to traverse the graphs to 5

18 1. Introduction Biological graph Number of nodes Number of edges Metabolic networks KEGG [KGK + 04] 42,002 51,450 BioCyc A. thaliana [KOMK + 05] 10,951 23,649 Reactome [JTGV + 05] 11,795 23,649 Signaling pathways BioCarta [htt11b] only images NetPath TGF-β [KMR + 10] TransPath [KPV + 06] > 100,000 >240,000 Protein-protein interaction networks String [vmjs + 05] > 2,500,000 > 50,000,000 DIP [XSD + 02] 23,201 71,276 Intact 50, ,044 Table 1.1.: Sizes of biological graphs (in March 2011). answer such queries. Whenever they want to pose a new query these programs need to be adjusted. In this work we present GRIcano to overcome this problem Contribution In this work we present GRIcano, a novel tool that efficiently retrieves answers to graph queries. In GRIcano we employ ideas from query optimization in relational database management systems (RDBMS) and carry these ideas over to graph query optimization. In the following chapters we target several aspects of graph queries. We specifically make the following contributions: Extend the existing query language PQL. We present and extend the Pathway Query Language (PQL) [Les05a], which was developed to express graph queries. Using PQL a user may express conditions of a graph query as predicates. In Chapter 3 we describe the syntax as well as the semantics of PQL. Define relational operators to express PQL queries. In order to optimize a graph query we want to be able to alter the order in which predicates of the query are evaluated. We may achieve this by rewriting the PQL query to an algebraic expression and apply rewrite rules for transformation. As standard operators from relational algebra are not sufficient for expressing PQL queries, which we discuss in Chapter 4, we develop new and novel operators in this thesis. We define the path Ψ, path length ψ, distance Φ, and reachability operator φ to express predicates of graphs queries and provide rewrite rules for the exchange of operators. Propose and experimentally evaluate implementations for operators. For each proposed operator we have to provide implementations to compute the result. Thus, in Chapter 5 we discuss implementations to answer reachability, 6

19 1.4. Structure of this Work distance, path length, and path queries. We may use GRIPP, our newly developed index structure, for answering all four types of graph queries. Chapter 6 shows that we are able to compute the GRIPP index even for very large graphs, for which the transitive closure cannot be created. In addition, we are able to answer reachability queries on average in almost constant time regardless the size and shape of the graph using GRIPP. Develop functions to estimate cardinality of operators and cost of implementations. For cost-based query optimization we require cardinality estimates for the different operators and cost functions for each implementation. In Chapter 7 we develop equations that are based on key figures of the graph, which is to our knowledge a novel approach. Using our cost functions we correctly predict on generated as well as on real-world graphs the result sizes and fastest implementations. Present and evaluate a prototypical implementation of GRIcano. In Chapter 7 we present GRIcano, the first system that performs cost-based query optimization for graph queries. The underlying cost-based query optimizer is generated using the Volcano framework [GM93]. Volcano requires as input the available operators and rewrite rules of the algebra, the available implementations for the different operators, and the equations for the cardinality and cost estimates. We show the effect of GRIcano using exemplary queries Structure of this Work In Chapter 2 we introduce basic notation on graphs, relational algebra, and cost-based query optimization. Chapter 3 is devoted to a data model for storing graphs, graph queries, and PQL, a language to express graph queries. In Chapter 4 we first argue that PQL queries should be executed like standard SQL queries, i.e., first transforming them to an algebraic expression. We induce the necessity of new operators for the algebra and introduce the path operator, Ψ, path length operator, ψ, distance operator Φ, and reachability operator φ. We also provide rewrite rules for exchanging operators. In Chapter 5 we provide implementations for the operators proposed in Chapter 4. We present GRIPP, an index structure to efficiently answer reachability queries even on large graphs. In Chapter 6 we experimentally evaluate the presented implementations. In Chapter 7 we devise functions to estimate cardinality for the four newly defined operators and cost functions for the different implementations. In that chapter we also introduce GRIcano, our graph query optimizer. We show the capabilities of GRIcano using selected queries. Chapter 8 concludes the work. 7

20

21 2. Definitions and Terminology This chapter introduces basic notation on graphs, relational algebra, and query optimization. In Section 2.1 we formally define graphs and properties of graphs. Section 2.2 introduces fundamental concepts behind relational algebra. In Section 2.3 we present an introduction to cost-based query optimization in relational database management systems Graphs This work mostly deals with graph structured data. We therefore formally introduce graphs. For this purpose we adopt notation from Cormen et al. [CLR01] Definitions Definition 2.1 (Graph) A graph G = (V (G), E(G)) is a tuple consisting of a set of nodes V (G) and a set of edges E(G), with E(G) V (G) V (G). Whenever the context of the graph is clear we may write G = (V, E). There exist two types of graphs, directed and undirected graphs. Directed graphs have ordered pairs of nodes in E. In contrast, in undirected graphs the set E contains unordered pairs of nodes. Consider (u, v) E with u, v V and u v. In a directed graph only v is adjacent to u, while in an undirected graph the relation is symmetric, i.e., (u, v) is the same as (v, u). If (u, v) E in a directed graph we say node u has the outgoing edge (u, v) and therefore u is start node of (u, v). In analogy (u, v) is an incoming edge of node v and therefore v is target node of (u, v). We call u parent of v and v child of u. Definition 2.2 (Size of a graph) Let G = (V, E). The size of G is the number of nodes V plus the number of edges E in G, i.e., G = V + E. Based on the ratio between edges and nodes, which is called the density of a graph, we are able to divide graphs into two groups sparse and dense graphs. The literature does not provide a clear distinction between the two types. As rule of thumb, if the number of edges E is close to V 2 the graphs are called dense, otherwise if E V 2 they are sparse. 9

22 2. Definitions and Terminology e f a d b c Figure 2.1.: A directed graph. Circles represent nodes; arrows between nodes represent edges. Nodes in this example are uniquely labeled. The size of the graph is 14 (6 nodes plus 8 edges). For example, the degree of node b is deg(b) = 3. To describe the shape of a graph we look at the distribution of node degrees. To do so, we first define the degree of a node. Definition 2.3 (Degree of a node) Given a graph G = (V, E). The degree of node v V deg(v) is the number of edges in which v participates. If G is directed we may distinguish between an indegree deg in (v) and an outdegree deg out (v) of a node v. The indegree is the number of edges with v as target node and, in analogy, the outdegree is the number of edges with v as start node. Based on the distribution of the node degree we distinguish between different graph topologies. The distribution of the node degrees of random graphs follows a binomial distribution. Graphs where the distribution of the node degrees follows a power-law are called scale-free. Barabási and Oltvai describe in [BO04] these topologies. Nodes and edges are often labeled. Therefore we define a label function for nodes and edges of a graph. Definition 2.4 (Label function, φ) Let L be a set of labels. A label function φ assigns labels to nodes and edges, φ(v, L) : V L and φ(e, L) : E L. In this work we assume each label l L consists of a type and a value. Graphs also contain paths. Definition 2.5 (Path and path length) Let G = (V, E). A path p is a sequence of nodes v 0, v 1, v 2,..., v k, v i V such that (v i 1, v i ) E for i = 1, 2,..., k. The length of the path is the number of edges in the path. If there exists a path p from u to w we say w is reachable from u, written as u w. 10

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Bioinformatics: Network Analysis

Bioinformatics: Network Analysis Bioinformatics: Network Analysis Graph-theoretic Properties of Biological Networks COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Outline Architectural features Motifs, modules,

More information

Chapter 13: Query Processing. Basic Steps in Query Processing

Chapter 13: Query Processing. Basic Steps in Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Social Media Mining. Graph Essentials

Social Media Mining. Graph Essentials Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures

More information

Lecture 1 MODULE 3 GENE EXPRESSION AND REGULATION OF GENE EXPRESSION. Professor Bharat Patel Office: Science 2, 2.36 Email: b.patel@griffith.edu.

Lecture 1 MODULE 3 GENE EXPRESSION AND REGULATION OF GENE EXPRESSION. Professor Bharat Patel Office: Science 2, 2.36 Email: b.patel@griffith.edu. Lecture 1 MODULE 3 GENE EXPRESSION AND REGULATION OF GENE EXPRESSION Professor Bharat Patel Office: Science 2, 2.36 Email: b.patel@griffith.edu.au What is Gene Expression & Gene Regulation? 1. Gene Expression

More information

Analysis of Algorithms, I

Analysis of Algorithms, I Analysis of Algorithms, I CSOR W4231.002 Eleni Drinea Computer Science Department Columbia University Thursday, February 26, 2015 Outline 1 Recap 2 Representing graphs 3 Breadth-first search (BFS) 4 Applications

More information

Home Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit

Home Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit Data Structures Page 1 of 24 A.1. Arrays (Vectors) n-element vector start address + ielementsize 0 +1 +2 +3 +4... +n-1 start address continuous memory block static, if size is known at compile time dynamic,

More information

Data Warehousing und Data Mining

Data Warehousing und Data Mining Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing Grid-Files Kd-trees Ulf Leser: Data

More information

SQL Query Evaluation. Winter 2006-2007 Lecture 23

SQL Query Evaluation. Winter 2006-2007 Lecture 23 SQL Query Evaluation Winter 2006-2007 Lecture 23 SQL Query Processing Databases go through three steps: Parse SQL into an execution plan Optimize the execution plan Evaluate the optimized plan Execution

More information

Efficiently Identifying Inclusion Dependencies in RDBMS

Efficiently Identifying Inclusion Dependencies in RDBMS Efficiently Identifying Inclusion Dependencies in RDBMS Jana Bauckmann Department for Computer Science, Humboldt-Universität zu Berlin Rudower Chaussee 25, 12489 Berlin, Germany bauckmann@informatik.hu-berlin.de

More information

Search Engines Chapter 2 Architecture. 14.4.2011 Felix Naumann

Search Engines Chapter 2 Architecture. 14.4.2011 Felix Naumann Search Engines Chapter 2 Architecture 14.4.2011 Felix Naumann Overview 2 Basic Building Blocks Indexing Text Acquisition Text Transformation Index Creation Querying User Interaction Ranking Evaluation

More information

Datenbanksysteme II: Implementation of Database Systems Implementing Joins

Datenbanksysteme II: Implementation of Database Systems Implementing Joins Datenbanksysteme II: Implementation of Database Systems Implementing Joins Material von Prof. Johann Christoph Freytag Prof. Kai-Uwe Sattler Prof. Alfons Kemper, Dr. Eickler Prof. Hector Garcia-Molina

More information

Control of Gene Expression

Control of Gene Expression Home Gene Regulation Is Necessary? Control of Gene Expression By switching genes off when they are not needed, cells can prevent resources from being wasted. There should be natural selection favoring

More information

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)

Comp 5311 Database Management Systems. 16. Review 2 (Physical Level) Comp 5311 Database Management Systems 16. Review 2 (Physical Level) 1 Main Topics Indexing Join Algorithms Query Processing and Optimization Transactions and Concurrency Control 2 Indexing Used for faster

More information

query Form and Comprehension in Expressible Lisp

query Form and Comprehension in Expressible Lisp A New Advanced Query Web Page and its query language To replace the advanced query web form on www.biocyc.org Mario Latendresse Bioinformatics Research Group SRI International Mario@ai.sri.com 1 The Actual

More information

Practical Graph Mining with R. 5. Link Analysis

Practical Graph Mining with R. 5. Link Analysis Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities

More information

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics* IE 680 Special Topics in Production Systems: Networks, Routing and Logistics* Rakesh Nagi Department of Industrial Engineering University at Buffalo (SUNY) *Lecture notes from Network Flows by Ahuja, Magnanti

More information

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure 3.11 3.15 enzymes control cell chemistry ( metabolism )

Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure 3.11 3.15 enzymes control cell chemistry ( metabolism ) Biology 1406 Exam 3 Notes Structure of DNA Ch. 10 Genetic information (DNA) determines structure of proteins DNA RNA proteins cell structure 3.11 3.15 enzymes control cell chemistry ( metabolism ) Proteins

More information

SPANNING CACTI FOR STRUCTURALLY CONTROLLABLE NETWORKS NGO THI TU ANH NATIONAL UNIVERSITY OF SINGAPORE

SPANNING CACTI FOR STRUCTURALLY CONTROLLABLE NETWORKS NGO THI TU ANH NATIONAL UNIVERSITY OF SINGAPORE SPANNING CACTI FOR STRUCTURALLY CONTROLLABLE NETWORKS NGO THI TU ANH NATIONAL UNIVERSITY OF SINGAPORE 2012 SPANNING CACTI FOR STRUCTURALLY CONTROLLABLE NETWORKS NGO THI TU ANH (M.Sc., SFU, Russia) A THESIS

More information

Data Structure [Question Bank]

Data Structure [Question Bank] Unit I (Analysis of Algorithms) 1. What are algorithms and how they are useful? 2. Describe the factor on best algorithms depends on? 3. Differentiate: Correct & Incorrect Algorithms? 4. Write short note:

More information

G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs

G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif Sakr National ICT Australia UNSW, Sydney, Australia ssakr@cse.unsw.edu.eu Sameh Elnikety Microsoft Research Redmond, WA, USA samehe@microsoft.com

More information

www.gr8ambitionz.com

www.gr8ambitionz.com Data Base Management Systems (DBMS) Study Material (Objective Type questions with Answers) Shared by Akhil Arora Powered by www. your A to Z competitive exam guide Database Objective type questions Q.1

More information

Graph theoretic approach to analyze amino acid network

Graph theoretic approach to analyze amino acid network Int. J. Adv. Appl. Math. and Mech. 2(3) (2015) 31-37 (ISSN: 2347-2529) Journal homepage: www.ijaamm.com International Journal of Advances in Applied Mathematics and Mechanics Graph theoretic approach to

More information

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92.

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92. Name: Email ID: CSE 326, Data Structures Section: Sample Final Exam Instructions: The exam is closed book, closed notes. Unless otherwise stated, N denotes the number of elements in the data structure

More information

Data Lineage and Meta Data Analysis in Data Warehouse Environments

Data Lineage and Meta Data Analysis in Data Warehouse Environments Department of Informatics, University of Zürich BSc Thesis Data Lineage and Meta Data Analysis in Data Warehouse Environments Martin Noack Matrikelnummer: 09-222-232 Email: martin.noack@uzh.ch January

More information

Chapter 18 Regulation of Gene Expression

Chapter 18 Regulation of Gene Expression Chapter 18 Regulation of Gene Expression 18.1. Gene Regulation Is Necessary By switching genes off when they are not needed, cells can prevent resources from being wasted. There should be natural selection

More information

Databases and Information Systems 1 Part 3: Storage Structures and Indices

Databases and Information Systems 1 Part 3: Storage Structures and Indices bases and Information Systems 1 Part 3: Storage Structures and Indices Prof. Dr. Stefan Böttcher Fakultät EIM, Institut für Informatik Universität Paderborn WS 2009 / 2010 Contents: - database buffer -

More information

V. Adamchik 1. Graph Theory. Victor Adamchik. Fall of 2005

V. Adamchik 1. Graph Theory. Victor Adamchik. Fall of 2005 V. Adamchik 1 Graph Theory Victor Adamchik Fall of 2005 Plan 1. Basic Vocabulary 2. Regular graph 3. Connectivity 4. Representing Graphs Introduction A.Aho and J.Ulman acknowledge that Fundamentally, computer

More information

Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs

Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs Kousha Etessami U. of Edinburgh, UK Kousha Etessami (U. of Edinburgh, UK) Discrete Mathematics (Chapter 6) 1 / 13 Overview Graphs and Graph

More information

Visualizing Networks: Cytoscape. Prat Thiru

Visualizing Networks: Cytoscape. Prat Thiru Visualizing Networks: Cytoscape Prat Thiru Outline Introduction to Networks Network Basics Visualization Inferences Cytoscape Demo 2 Why (Biological) Networks? 3 Networks: An Integrative Approach Zvelebil,

More information

Load balancing Static Load Balancing

Load balancing Static Load Balancing Chapter 7 Load Balancing and Termination Detection Load balancing used to distribute computations fairly across processors in order to obtain the highest possible execution speed. Termination detection

More information

2007 7.013 Problem Set 1 KEY

2007 7.013 Problem Set 1 KEY 2007 7.013 Problem Set 1 KEY Due before 5 PM on FRIDAY, February 16, 2007. Turn answers in to the box outside of 68-120. PLEASE WRITE YOUR ANSWERS ON THIS PRINTOUT. 1. Where in a eukaryotic cell do you

More information

Algorithms in Computational Biology (236522) spring 2007 Lecture #1

Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Algorithms in Computational Biology (236522) spring 2007 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: Tuesday 11:00-12:00/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office

More information

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,

More information

Metabolic Network Analysis

Metabolic Network Analysis Metabolic Network nalysis Overview -- modelling chemical reaction networks -- Levels of modelling Lecture II: Modelling chemical reaction networks dr. Sander Hille shille@math.leidenuniv.nl http://www.math.leidenuniv.nl/~shille

More information

Gene Regulation -- The Lac Operon

Gene Regulation -- The Lac Operon Gene Regulation -- The Lac Operon Specific proteins are present in different tissues and some appear only at certain times during development. All cells of a higher organism have the full set of genes:

More information

Application of Graph-based Data Mining to Metabolic Pathways

Application of Graph-based Data Mining to Metabolic Pathways Application of Graph-based Data Mining to Metabolic Pathways Chang Hun You, Lawrence B. Holder, Diane J. Cook School of Electrical Engineering and Computer Science Washington State University Pullman,

More information

Map-like Wikipedia Visualization. Pang Cheong Iao. Master of Science in Software Engineering

Map-like Wikipedia Visualization. Pang Cheong Iao. Master of Science in Software Engineering Map-like Wikipedia Visualization by Pang Cheong Iao Master of Science in Software Engineering 2011 Faculty of Science and Technology University of Macau Map-like Wikipedia Visualization by Pang Cheong

More information

Chapter 7 Active Reading Guide Cellular Respiration and Fermentation

Chapter 7 Active Reading Guide Cellular Respiration and Fermentation Name: AP Biology Mr. Croft Chapter 7 Active Reading Guide Cellular Respiration and Fermentation Overview: Before getting involved with the details of cellular respiration and photosynthesis, take a second

More information

Network Analysis. BCH 5101: Analysis of -Omics Data 1/34

Network Analysis. BCH 5101: Analysis of -Omics Data 1/34 Network Analysis BCH 5101: Analysis of -Omics Data 1/34 Network Analysis Graphs as a representation of networks Examples of genome-scale graphs Statistical properties of genome-scale graphs The search

More information

Inside the PostgreSQL Query Optimizer

Inside the PostgreSQL Query Optimizer Inside the PostgreSQL Query Optimizer Neil Conway neilc@samurai.com Fujitsu Australia Software Technology PostgreSQL Query Optimizer Internals p. 1 Outline Introduction to query optimization Outline of

More information

Load Balancing and Termination Detection

Load Balancing and Termination Detection Chapter 7 Load Balancing and Termination Detection 1 Load balancing used to distribute computations fairly across processors in order to obtain the highest possible execution speed. Termination detection

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Web-Based Genomic Information Integration with Gene Ontology

Web-Based Genomic Information Integration with Gene Ontology Web-Based Genomic Information Integration with Gene Ontology Kai Xu 1 IMAGEN group, National ICT Australia, Sydney, Australia, kai.xu@nicta.com.au Abstract. Despite the dramatic growth of online genomic

More information

From DNA to Protein. Proteins. Chapter 13. Prokaryotes and Eukaryotes. The Path From Genes to Proteins. All proteins consist of polypeptide chains

From DNA to Protein. Proteins. Chapter 13. Prokaryotes and Eukaryotes. The Path From Genes to Proteins. All proteins consist of polypeptide chains Proteins From DNA to Protein Chapter 13 All proteins consist of polypeptide chains A linear sequence of amino acids Each chain corresponds to the nucleotide base sequence of a gene The Path From Genes

More information

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014 Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)

More information

Binary Trees and Huffman Encoding Binary Search Trees

Binary Trees and Huffman Encoding Binary Search Trees Binary Trees and Huffman Encoding Binary Search Trees Computer Science E119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Motivation: Maintaining a Sorted Collection of Data A data dictionary

More information

Intelligent Systems: Three Practical Questions. Carsten Rother

Intelligent Systems: Three Practical Questions. Carsten Rother Intelligent Systems: Three Practical Questions Carsten Rother 04/02/2015 Prüfungsfragen Nur vom zweiten Teil der Vorlesung (Dimitri Schlesinger, Carsten Rother) Drei Typen von Aufgaben: 1) Algorithmen

More information

Index Selection Techniques in Data Warehouse Systems

Index Selection Techniques in Data Warehouse Systems Index Selection Techniques in Data Warehouse Systems Aliaksei Holubeu as a part of a Seminar Databases and Data Warehouses. Implementation and usage. Konstanz, June 3, 2005 2 Contents 1 DATA WAREHOUSES

More information

Network (Tree) Topology Inference Based on Prüfer Sequence

Network (Tree) Topology Inference Based on Prüfer Sequence Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 vanniarajanc@hcl.in,

More information

Query Processing C H A P T E R12. Practice Exercises

Query Processing C H A P T E R12. Practice Exercises C H A P T E R12 Query Processing Practice Exercises 12.1 Assume (for simplicity in this exercise) that only one tuple fits in a block and memory holds at most 3 blocks. Show the runs created on each pass

More information

Querying ontologies in relational database systems

Querying ontologies in relational database systems Querying ontologies in relational database systems Silke Trißl and Ulf Leser Humboldt-Universität zu Berlin, Institute of Computer Sciences, D-10099 Berlin, Germany {trissl, leser}@informatik.hu-berlin.de

More information

Optimization of SQL Queries in Main-Memory Databases

Optimization of SQL Queries in Main-Memory Databases Optimization of SQL Queries in Main-Memory Databases Ladislav Vastag and Ján Genči Department of Computers and Informatics Technical University of Košice, Letná 9, 042 00 Košice, Slovakia lvastag@netkosice.sk

More information

Customer Intimacy Analytics

Customer Intimacy Analytics Customer Intimacy Analytics Leveraging Operational Data to Assess Customer Knowledge and Relationships and to Measure their Business Impact by Francois Habryn Scientific Publishing CUSTOMER INTIMACY ANALYTICS

More information

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Why? A central concept in Computer Science. Algorithms are ubiquitous. Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online

More information

Database-Supported XML Processors

Database-Supported XML Processors Database-Supported XML Processors Prof. Dr. Torsten Grust Technische Universität München grust@in.tum.de Winter Term 2005/06 Technische Universität München A Word About Myself 2 Torsten Grust Originally

More information

[Refer Slide Time: 05:10]

[Refer Slide Time: 05:10] Principles of Programming Languages Prof: S. Arun Kumar Department of Computer Science and Engineering Indian Institute of Technology Delhi Lecture no 7 Lecture Title: Syntactic Classes Welcome to lecture

More information

General Network Analysis: Graph-theoretic. COMP572 Fall 2009

General Network Analysis: Graph-theoretic. COMP572 Fall 2009 General Network Analysis: Graph-theoretic Techniques COMP572 Fall 2009 Networks (aka Graphs) A network is a set of vertices, or nodes, and edges that connect pairs of vertices Example: a network with 5

More information

Graph Database Proof of Concept Report

Graph Database Proof of Concept Report Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment

More information

Unit 5 Photosynthesis and Cellular Respiration

Unit 5 Photosynthesis and Cellular Respiration Unit 5 Photosynthesis and Cellular Respiration Advanced Concepts What is the abbreviated name of this molecule? What is its purpose? What are the three parts of this molecule? Label each part with the

More information

Vector storage and access; algorithms in GIS. This is lecture 6

Vector storage and access; algorithms in GIS. This is lecture 6 Vector storage and access; algorithms in GIS This is lecture 6 Vector data storage and access Vectors are built from points, line and areas. (x,y) Surface: (x,y,z) Vector data access Access to vector

More information

Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)?

Quiz! Database Indexes. Index. Quiz! Disc and main memory. Quiz! How costly is this operation (naive solution)? Database Indexes How costly is this operation (naive solution)? course per weekday hour room TDA356 2 VR Monday 13:15 TDA356 2 VR Thursday 08:00 TDA356 4 HB1 Tuesday 08:00 TDA356 4 HB1 Friday 13:15 TIN090

More information

Dynamics of Biological Systems

Dynamics of Biological Systems Dynamics of Biological Systems Part I - Biological background and mathematical modelling Paolo Milazzo (Università di Pisa) Dynamics of biological systems 1 / 53 Introduction The recent developments in

More information

Hormones & Chemical Signaling

Hormones & Chemical Signaling Hormones & Chemical Signaling Part 2 modulation of signal pathways and hormone classification & function How are these pathways controlled? Receptors are proteins! Subject to Specificity of binding Competition

More information

Relational Databases

Relational Databases Relational Databases Jan Chomicki University at Buffalo Jan Chomicki () Relational databases 1 / 18 Relational data model Domain domain: predefined set of atomic values: integers, strings,... every attribute

More information

Load Balancing and Termination Detection

Load Balancing and Termination Detection Chapter 7 slides7-1 Load Balancing and Termination Detection slides7-2 Load balancing used to distribute computations fairly across processors in order to obtain the highest possible execution speed. Termination

More information

Data storage Tree indexes

Data storage Tree indexes Data storage Tree indexes Rasmus Pagh February 7 lecture 1 Access paths For many database queries and updates, only a small fraction of the data needs to be accessed. Extreme examples are looking or updating

More information

Twincore - Zentrum für Experimentelle und Klinische Infektionsforschung Institut für Molekulare Bakteriologie

Twincore - Zentrum für Experimentelle und Klinische Infektionsforschung Institut für Molekulare Bakteriologie Twincore - Zentrum für Experimentelle und Klinische Infektionsforschung Institut für Molekulare Bakteriologie 0 HELMHOLTZ I ZENTRUM FÜR INFEKTIONSFORSCHUNG Technische Universität Braunschweig Institut

More information

Big Data and Scripting. Part 4: Memory Hierarchies

Big Data and Scripting. Part 4: Memory Hierarchies 1, Big Data and Scripting Part 4: Memory Hierarchies 2, Model and Definitions memory size: M machine words total storage (on disk) of N elements (N is very large) disk size unlimited (for our considerations)

More information

AN AI PLANNING APPROACH FOR GENERATING BIG DATA WORKFLOWS

AN AI PLANNING APPROACH FOR GENERATING BIG DATA WORKFLOWS AN AI PLANNING APPROACH FOR GENERATING BIG DATA WORKFLOWS Wesley Deneke 1, Wing-Ning Li 2, and Craig Thompson 2 1 Computer Science and Industrial Technology Department, Southeastern Louisiana University,

More information

DATA STRUCTURES USING C

DATA STRUCTURES USING C DATA STRUCTURES USING C QUESTION BANK UNIT I 1. Define data. 2. Define Entity. 3. Define information. 4. Define Array. 5. Define data structure. 6. Give any two applications of data structures. 7. Give

More information

Cell Structure & Function!

Cell Structure & Function! Cell Structure & Function! Chapter 3! The most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' but 'That's funny.! -- Isaac Asimov Animal Cell Plant Cell Cell

More information

Activity 7.21 Transcription factors

Activity 7.21 Transcription factors Purpose To consolidate understanding of protein synthesis. To explain the role of transcription factors and hormones in switching genes on and off. Play the transcription initiation complex game Regulation

More information

Database-Supported XML Processors

Database-Supported XML Processors Database-Supported XML Processors Prof. Dr. Torsten Grust torsten.grust@uni-tuebingen.de Winter 2008/2009 Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 1 Part I Preliminaries Torsten

More information

Query Optimization for Distributed Database Systems Robert Taylor Candidate Number : 933597 Hertford College Supervisor: Dr.

Query Optimization for Distributed Database Systems Robert Taylor Candidate Number : 933597 Hertford College Supervisor: Dr. Query Optimization for Distributed Database Systems Robert Taylor Candidate Number : 933597 Hertford College Supervisor: Dr. Dan Olteanu Submitted as part of Master of Computer Science Computing Laboratory

More information

Distance Degree Sequences for Network Analysis

Distance Degree Sequences for Network Analysis Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation

More information

Name Date Period. 2. When a molecule of double-stranded DNA undergoes replication, it results in

Name Date Period. 2. When a molecule of double-stranded DNA undergoes replication, it results in DNA, RNA, Protein Synthesis Keystone 1. During the process shown above, the two strands of one DNA molecule are unwound. Then, DNA polymerases add complementary nucleotides to each strand which results

More information

for High Performance Computing

for High Performance Computing Technische Universität München Institut für Informatik Lehrstuhl für Rechnertechnik und Rechnerorganisation Automatic Performance Engineering Workflows for High Performance Computing Ventsislav Petkov

More information

AP WORLD LANGUAGE AND CULTURE EXAMS 2012 SCORING GUIDELINES

AP WORLD LANGUAGE AND CULTURE EXAMS 2012 SCORING GUIDELINES AP WORLD LANGUAGE AND CULTURE EXAMS 2012 SCORING GUIDELINES Interpersonal Writing: E-mail Reply 5: STRONG performance in Interpersonal Writing Maintains the exchange with a response that is clearly appropriate

More information

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes. 1. The advantage of.. is that they solve the problem if sequential storage representation. But disadvantage in that is they are sequential lists. [A] Lists [B] Linked Lists [A] Trees [A] Queues 2. The

More information

How To Find Local Affinity Patterns In Big Data

How To Find Local Affinity Patterns In Big Data Detection of local affinity patterns in big data Andrea Marinoni, Paolo Gamba Department of Electronics, University of Pavia, Italy Abstract Mining information in Big Data requires to design a new class

More information

AP BIOLOGY CHAPTER 7 Cellular Respiration Outline

AP BIOLOGY CHAPTER 7 Cellular Respiration Outline AP BIOLOGY CHAPTER 7 Cellular Respiration Outline I. How cells get energy. A. Cellular Respiration 1. Cellular respiration includes the various metabolic pathways that break down carbohydrates and other

More information

Module 3 Questions. 7. Chemotaxis is an example of signal transduction. Explain, with the use of diagrams.

Module 3 Questions. 7. Chemotaxis is an example of signal transduction. Explain, with the use of diagrams. Module 3 Questions Section 1. Essay and Short Answers. Use diagrams wherever possible 1. With the use of a diagram, provide an overview of the general regulation strategies available to a bacterial cell.

More information

Chapter 7 Load Balancing and Termination Detection

Chapter 7 Load Balancing and Termination Detection Chapter 7 Load Balancing and Termination Detection Load balancing used to distribute computations fairly across processors in order to obtain the highest possible execution speed. Termination detection

More information

Extraction and Visualization of Protein-Protein Interactions from PubMed

Extraction and Visualization of Protein-Protein Interactions from PubMed Extraction and Visualization of Protein-Protein Interactions from PubMed Ulf Leser Knowledge Management in Bioinformatics Humboldt-Universität Berlin Finding Relevant Knowledge Find information about Much

More information

GENE REGULATION. Teacher Packet

GENE REGULATION. Teacher Packet AP * BIOLOGY GENE REGULATION Teacher Packet AP* is a trademark of the College Entrance Examination Board. The College Entrance Examination Board was not involved in the production of this material. Pictures

More information

Student name ID # 2. (4 pts) What is the terminal electron acceptor in respiration? In photosynthesis? O2, NADP+

Student name ID # 2. (4 pts) What is the terminal electron acceptor in respiration? In photosynthesis? O2, NADP+ 1. Membrane transport. A. (4 pts) What ion couples primary and secondary active transport in animal cells? What ion serves the same function in plant cells? Na+, H+ 2. (4 pts) What is the terminal electron

More information

TIn 1: Lecture 3: Lernziele. Lecture 3 The Belly of the Architect. Basic internal components of the 8086. Pointers and data storage in memory

TIn 1: Lecture 3: Lernziele. Lecture 3 The Belly of the Architect. Basic internal components of the 8086. Pointers and data storage in memory Mitglied der Zürcher Fachhochschule TIn 1: Lecture 3 The Belly of the Architect. Lecture 3: Lernziele Basic internal components of the 8086 Pointers and data storage in memory Architektur 8086 Besteht

More information

Load Balancing and Termination Detection

Load Balancing and Termination Detection Chapter 7 Slide 1 Slide 2 Load Balancing and Termination Detection Load balancing used to distribute computations fairly across processors in order to obtain the highest possible execution speed. Termination

More information

How To Optimize A Query With Dags

How To Optimize A Query With Dags Efficient Generation and Execution of DAG-Structured Query Graphs Inauguraldissertation zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften der Universität Mannheim vorgelegt von

More information

A Comparison of Dictionary Implementations

A Comparison of Dictionary Implementations A Comparison of Dictionary Implementations Mark P Neyer April 10, 2009 1 Introduction A common problem in computer science is the representation of a mapping between two sets. A mapping f : A B is a function

More information

Chapter 2 Data Storage

Chapter 2 Data Storage Chapter 2 22 CHAPTER 2. DATA STORAGE 2.1. THE MEMORY HIERARCHY 23 26 CHAPTER 2. DATA STORAGE main memory, yet is essentially random-access, with relatively small differences Figure 2.4: A typical

More information

A Fast Algorithm For Finding Hamilton Cycles

A Fast Algorithm For Finding Hamilton Cycles A Fast Algorithm For Finding Hamilton Cycles by Andrew Chalaturnyk A thesis presented to the University of Manitoba in partial fulfillment of the requirements for the degree of Masters of Science in Computer

More information

Chapter 9 Mitochondrial Structure and Function

Chapter 9 Mitochondrial Structure and Function Chapter 9 Mitochondrial Structure and Function 1 2 3 Structure and function Oxidative phosphorylation and ATP Synthesis Peroxisome Overview 2 Mitochondria have characteristic morphologies despite variable

More information

Exemplar for Internal Achievement Standard. German Level 1

Exemplar for Internal Achievement Standard. German Level 1 Exemplar for Internal Achievement Standard German Level 1 This exemplar supports assessment against: Achievement Standard 90885 Interact using spoken German to communicate personal information, ideas and

More information

Mining Social-Network Graphs

Mining Social-Network Graphs 342 Chapter 10 Mining Social-Network Graphs There is much information to be gained by analyzing the large-scale data that is derived from social networks. The best-known example of a social network is

More information

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere! Interconnection Networks Interconnection Networks Interconnection networks are used everywhere! Supercomputers connecting the processors Routers connecting the ports can consider a router as a parallel

More information

Clustering & Visualization

Clustering & Visualization Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

More information

Behavioral Service Substitution: Analysis and Synthesis

Behavioral Service Substitution: Analysis and Synthesis Behavioral Service Substitution: Analysis and Synthesis D I S S E R T A T I O N zur Erlangung des akademischen Grades Dr. rer. nat. im Fach Informatik eingereicht an der Mathematisch-Naturwissenschaftlichen

More information

Regulation of enzyme activity

Regulation of enzyme activity 1 Regulation of enzyme activity Regulation of enzyme activity is important to coordinate the different metabolic processes. It is also important for homeostasis i.e. to maintain the internal environment

More information