RANDOMIZED ALGORITHMS
|
|
- Cornelius Murphy
- 7 years ago
- Views:
Transcription
1 RANDOMIZED ALGORITHMS
2 CONCEPTS Deterministic algorithms will always perform the same series of operations (and produce the same result) on a given input Random algorithms may execute different series of operations and potentially produce different results on a given input Why use random algorithms? Improve expected worst case run time Accelerate execution at the cost of making errors infrequently Approximate solutions to difficult problems
3 WE VE ALREADY SEEN RANDOM ALGORITHMS Simulations Bootstrap Random starting points in K-means clustering
4 LINEAR SEARCH Consider searching an (unsorted) array on N elements for a key Data : An array of N elements and a key k Result: The index of k in N, or 1 if not found. for i =0to N 1 do if N[i] = k then return i; end end return -1 ; Worst case run time: O(N) For a given key consider an adversarial opponent whose job it is to make the algorithm perform as poorly as possible if he/she knows the key Can always achieve worst case scenario by placing the key last
5 RANDOMIZED LINEAR SEARCH Half the time, search left to right, and the other half: right to left Data : An array of N elements and a key k Result: The index of k in N, or 1 if not found. if Random(0, 1) < 0.5 then for i =0to N 1 do if N[i] = k then return i; What is the expected run time for a given key? If the key is in position i, then: end end end else for i = N 1 downto 0 do if N[i] = k then return i; end end end return -1 ; E[run time] = 1/2 i + 1/2 (N+1-i) = 1/2 (N+1) The worst that an adversary can do now is place the key in the middle, because they don t know which algorithm we will choose.
6 COMPARING STRINGS To decide if two strings of equal length are equal or not, we may choose to compare a random number of positions (e.g. every 10th position) and report strings as equal if the subset matches. The expected run time is better than direct comparison but now there is a possibility of making an error: reporting that two strings are equal when they are not. Error rate analysis will depend on the assumptions we make about the text.
7 Las Vegas algorithms are randomized algorithms that always run correctly. Monte Carlo algorithms, on the other hand can make errors. Useful algorithms make errors infrequently.
8 RANDOMIZED QUICKSORT Quicksort is a classical example of divide and conquer approaches. Given an array of N elements, select a pivot element of this array p and split the array into two subarrays: Elements smaller than p Elements greater than p Repeat the process recursively in each of the subarrays Recursion terminates when the size of the subarray is 0 or 1 The sorted list can be assembled as the recursion returns
9 QUICKSORT PERFORMANCE If the pivot splits the elements into proportions c and 1-c at each step, then the following recursion holds for Quicksort complexity: T (N) =T (cn)+t ([1 c]n)+o(n) Sort Subarray 1 Sort Subarray 2 Split the array into two subarrays If we think of the recursion process as a binary tree (each level of recursion is a new level), then the total work performed at each level is O(N) If the tree is reasonably balanced (e.g. c is in ), then its depth will be O(logN), giving total run complexity of O(NlogN)
10 Choose midpoint as the pivot Not a very balanced tree
11 Choose best element A better-balanced tree
12 QUICKSORT PERFORMANCE Poor choice of pivots can lead to quadratic run-time. Assuming uniform distribution of values in the array, if we pick a pivot at random, then 50% of the time it will split the array no worse that 25% to 75%. Hence, on average, good splits will be obtained frequently and the algorithm is expected to have the NlogN runtime.
13 MOTIF FINDING Given a collection of T strings, find the best pattern of length L that appears in all sequences. Best could mean: Exactly contained Is close to a specified sequence profile The idea is to recast the problem in terms of a randomized profile alignment For example, the first nucleotides in HIV-1 protease are highly conserved, and can be thought of as a motif: 1 A C G T
14 SEQUENCE PROFILE We can build a probability matrix of finding a given nucleotide in the i-th position of a motif, to allow some mismatches E.g. CCCATTAGTC is the consensus decamer at the beginning of HIV-1 protease Some positions allow more variability then others: E.g. 2 vs A C G T
15 SEQUENCE PROFILE The profile defines a probability distribution that can be used to score other motifs. To compute the score of a motif, we evaluate the probability that it was generated by the profile. E.g. Pr (CCCATTAGTC) = 0.82 Pr (CCTATTAGTC) = 0.07 Pr (AAAAAAAAAA) = A C G T
16 P-MOST PROBABLE L-MER Given a sequence profile on L letters and a string of length N we define the most probable L-mer of the string as the one that has the highest probability as measured by the profile P. Scan the string left to right, considering all L-mers Compute the probability of generating a given L-mer using the profile Select the one with the highest score Can be computed in time O (LN) Note: in practice, zero probabilities for some letters in a given position will be replaced with small numbers. For example, if none of 1000 training strings had a C in the third position, instead of assigning Pr (C,3) = 0, we may instead set Pr (C,3) = 1/1001.
17 P-MOST PROBABLE L-MERS IN MANY SEQUENCES FIND THE P-MOST PROBABLE L-MER IN EACH OF THE SEQUENCES. P= A 1/2 7/8 3/8 0 1/8 0 C 1/8 0 1/2 5/8 3/8 0 T 1/8 1/ /4 7/8 G 1/4 0 1/8 3/8 1/4 1/8 ctataaacgttacatc atagcgattcgactg cagcccagaaccct cggtataccttacatc tgcattcaatagctta tatcctttccactcac ctccaaatcctttaca ggtcatcctttatcct
18 Initial profile 1 a a a c g t 2 a t a g c g 3 a a c c c t 4 g a a c c t 5 a t a g c t 6 g a c c t g 7 a t c c t t 8 t a c c t t A 5/8 5/8 4/ C 0 0 4/8 6/8 4/8 0 T 1/8 3/ /8 6/8 G 2/ /8 1/8 2/8 ctataaacgttacatc atagcgattcgactg cagcccagaaccct cggtgaaccttacatc tgcattcaatagctta tgtcctgtccactcac ctccaaatcctttaca ggtctacctttatcct 1 a a a c g t 2 a t a g c g 3 a a c c c t 4 g a a c c t 5 a t a g c t 6 g a c c t g 7 a t c c t t 8 t a c c t t A 5/8 5/8 4/ C 0 0 4/8 6/8 4/8 0 T 1/8 3/ /8 6/8 G 2/ /8 1/8 2/8 Generate an updated profile using the new set of P-most probable L-mers. Red: increase in frequency Blue: decrease in frequency A 1/2 7/8 3/8 0 1/8 0 C 1/8 0 1/2 5/8 3/8 0 T 1/8 1/ /4 7/8 G 1/4 0 1/8 3/8 1/4 1/8
19 GREEDY PROFILE MOTIF SEARCH Use P-Most probable l-mers to adjust start positions until we reach a best profile; this is the motif. Select random starting positions. Create a profile P from the substrings at these starting positions. Find the P-most probable l-mer a in each sequence and change the starting position to the starting position of a. Compute a new profile based on the new starting positions after each iteration and proceed until we cannot increase the score.
20 PERFORMANCE? Since we choose starting positions randomly, there is little chance that our guess will be close to an optimal motif, meaning it will take a very long time to find the optimal motif. It is unlikely that the random starting positions will lead us to the correct solution at all. In practice, this algorithm is run many times with the hope that random starting positions will be close to the optimum solution simply by chance.
21 GIBBS SAMPLING We can improve the algorithm by introducing Gibbs Sampling, an iterative procedure that discards one L-mer after each iteration and replaces it with a new one. Gibbs sampling proceeds more slowly and chooses new L-mers at random increasing the odds that it will converge to the correct solution. Gibbs sampling is a general class of sampling procedures used for approximating complex, difficult to compute distributions: in our case the distribution of P profiles.
22 HOW GIBBS SAMPLING WORKS 1. Randomly choose starting positions. s = (s 1,...,s t ) and form the set of L-mers associated with these starting positions. 2. Randomly choose one of the T sequences. 3. Create a profile P from the other T-1 sequences. 4. For each position in the excluded sequence, ' calculate the probability that the l-mer starting at that position was generated by P. 5. Choose a new starting position for the excluded sequence at random based on the probabilities from step 4.' 6. Repeat steps 2-5 until there is no improvement
23 GIBBS SAMPLING: AN EXAMPLE Input: T = 5 sequences, motif length L = 8 1. GTAAACAATATTTATAGC 2. AAAATTTACCTTAGAAGG 3. CCGTACTGTCAAGCGTGG 4. TGAGTAAACGACGTCCCA 5. TACTTAACACCCTGTCAA
24 STEP 1 Randomly choose starting positions. 1. GTAAACAATATTTATAGC (7) 2. AAAATTTACCTTAGAAGG (11) 3. CCGTACTGTCAAGCGTGG (9) 4. TGAGTAAACGACGTCCCA (4) 5. TACTTAACACCCTGTCAA (1)
25 STEP 2 Exclude one sequence at random 1. GTAAACAATATTTATAGC (7) 2. AAAATTTACCTTAGAAGG (11) 3. CCGTACTGTCAAGCGTGG (9) 4. TGAGTAAACGACGTCCCA (4) 5. TACTTAACACCCTGTCAA (1)
26 STEP 3 Create the octamer profile from sequences 1,3,4,5 1 A A T A T T T A 3 T C A A G C G T 4 G T A A A C G A 5 T A C T T A A C A 1/4 2/4 2/4 3/4 1/4 1/4 1/4 2/4 C 0 1/4 1/ /4 0 1/4 T 2/4 1/4 1/4 1/4 2/4 1/4 1/4 1/4 G 1/ /4 0 3/4 0 Consensus String T A A A T C G A
27 STEP 4 Calculate the probability of every octamer from the excluded sequence (2) AAAATTTACCTTAGAAGG AAAATTTACCTTAGAAGG AAAATTTACCTTAGAAGG 0 AAAATTTACCTTAGAAGG 0 AAAATTTACCTTAGAAGG 0 AAAATTTACCTTAGAAGG 0 AAAATTTACCTTAGAAGG 0 AAAATTTACCTTAGAAGG AAAATTTACCTTAGAAGG 0 AAAATTTACCTTAGAAGG 0 AAAATTTACCTTAGAAGG 0 Normalize (divide by the sum) to obtain a proper probability distribution.
28 STEP 5 Select the starting position of the octamer in string 2 using the just computed probabilties: P(selecting starting position 1):.706 P(selecting starting position 2):.118 P(selecting starting position 8):.176 Go back to Step 2 until no change in the P-profile.
29 GIBBS SAMPLER IN PRACTICE Gibbs sampling needs to be modified when applied to samples with unequal distributions of nucleotides (relative entropy approach). Gibbs sampling often converges to locally optimal motifs rather than globally optimal motifs. Needs to be run with many randomly chosen seeds to achieve good results.
APP INVENTOR. Test Review
APP INVENTOR Test Review Main Concepts App Inventor Lists Creating Random Numbers Variables Searching and Sorting Data Linear Search Binary Search Selection Sort Quick Sort Abstraction Modulus Division
More informationIntroduction to Algorithms March 10, 2004 Massachusetts Institute of Technology Professors Erik Demaine and Shafi Goldwasser Quiz 1.
Introduction to Algorithms March 10, 2004 Massachusetts Institute of Technology 6.046J/18.410J Professors Erik Demaine and Shafi Goldwasser Quiz 1 Quiz 1 Do not open this quiz booklet until you are directed
More informationPartitioning and Divide and Conquer Strategies
and Divide and Conquer Strategies Lecture 4 and Strategies Strategies Data partitioning aka domain decomposition Functional decomposition Lecture 4 and Strategies Quiz 4.1 For nuclear reactor simulation,
More informationClustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
More informationSorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)
Sorting revisited How did we use a binary search tree to sort an array of elements? Tree Sort Algorithm Given: An array of elements to sort 1. Build a binary search tree out of the elements 2. Traverse
More informationRandomized algorithms
Randomized algorithms March 10, 2005 1 What are randomized algorithms? Algorithms which use random numbers to make decisions during the executions of the algorithm. Why would we want to do this?? Deterministic
More informationAlgorithm Analysis [2]: if-else statements, recursive algorithms. COSC 2011, Winter 2004, Section N Instructor: N. Vlajic
1 Algorithm Analysis []: if-else statements, recursive algorithms COSC 011, Winter 004, Section N Instructor: N. Vlajic Algorithm Analysis for-loop Running Time The running time of a simple loop for (int
More informationSeed Distributions for the NCAA Men s Basketball Tournament: Why it May Not Matter Who Plays Whom*
Seed Distributions for the NCAA Men s Basketball Tournament: Why it May Not Matter Who Plays Whom* Sheldon H. Jacobson Department of Computer Science University of Illinois at Urbana-Champaign shj@illinois.edu
More informationSymbol Tables. Introduction
Symbol Tables Introduction A compiler needs to collect and use information about the names appearing in the source program. This information is entered into a data structure called a symbol table. The
More informationCSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92.
Name: Email ID: CSE 326, Data Structures Section: Sample Final Exam Instructions: The exam is closed book, closed notes. Unless otherwise stated, N denotes the number of elements in the data structure
More information6. Standard Algorithms
6. Standard Algorithms The algorithms we will examine perform Searching and Sorting. 6.1 Searching Algorithms Two algorithms will be studied. These are: 6.1.1. inear Search The inear Search The Binary
More informationRegular Expressions and Automata using Haskell
Regular Expressions and Automata using Haskell Simon Thompson Computing Laboratory University of Kent at Canterbury January 2000 Contents 1 Introduction 2 2 Regular Expressions 2 3 Matching regular expressions
More informationLecture 18: Applications of Dynamic Programming Steven Skiena. Department of Computer Science State University of New York Stony Brook, NY 11794 4400
Lecture 18: Applications of Dynamic Programming Steven Skiena Department of Computer Science State University of New York Stony Brook, NY 11794 4400 http://www.cs.sunysb.edu/ skiena Problem of the Day
More information9th Max-Planck Advanced Course on the Foundations of Computer Science (ADFOCS) Primal-Dual Algorithms for Online Optimization: Lecture 1
9th Max-Planck Advanced Course on the Foundations of Computer Science (ADFOCS) Primal-Dual Algorithms for Online Optimization: Lecture 1 Seffi Naor Computer Science Dept. Technion Haifa, Israel Introduction
More informationChapter 6: Episode discovery process
Chapter 6: Episode discovery process Algorithmic Methods of Data Mining, Fall 2005, Chapter 6: Episode discovery process 1 6. Episode discovery process The knowledge discovery process KDD process of analyzing
More informationFor example, we have seen that a list may be searched more efficiently if it is sorted.
Sorting 1 Many computer applications involve sorting the items in a list into some specified order. For example, we have seen that a list may be searched more efficiently if it is sorted. To sort a group
More informationBinary Search Trees CMPSC 122
Binary Search Trees CMPSC 122 Note: This notes packet has significant overlap with the first set of trees notes I do in CMPSC 360, but goes into much greater depth on turning BSTs into pseudocode than
More informationLecture 10: Regression Trees
Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
More informationSection IV.1: Recursive Algorithms and Recursion Trees
Section IV.1: Recursive Algorithms and Recursion Trees Definition IV.1.1: A recursive algorithm is an algorithm that solves a problem by (1) reducing it to an instance of the same problem with smaller
More informationZabin Visram Room CS115 CS126 Searching. Binary Search
Zabin Visram Room CS115 CS126 Searching Binary Search Binary Search Sequential search is not efficient for large lists as it searches half the list, on average Another search algorithm Binary search Very
More informationSorting Algorithms. Nelson Padua-Perez Bill Pugh. Department of Computer Science University of Maryland, College Park
Sorting Algorithms Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park Overview Comparison sort Bubble sort Selection sort Tree sort Heap sort Quick sort Merge
More information6.045: Automata, Computability, and Complexity Or, Great Ideas in Theoretical Computer Science Spring, 2010. Class 4 Nancy Lynch
6.045: Automata, Computability, and Complexity Or, Great Ideas in Theoretical Computer Science Spring, 2010 Class 4 Nancy Lynch Today Two more models of computation: Nondeterministic Finite Automata (NFAs)
More informationLABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
More informationThe Tower of Hanoi. Recursion Solution. Recursive Function. Time Complexity. Recursive Thinking. Why Recursion? n! = n* (n-1)!
The Tower of Hanoi Recursion Solution recursion recursion recursion Recursive Thinking: ignore everything but the bottom disk. 1 2 Recursive Function Time Complexity Hanoi (n, src, dest, temp): If (n >
More informationBinary search algorithm
Binary search algorithm Definition Search a sorted array by repeatedly dividing the search interval in half. Begin with an interval covering the whole array. If the value of the search key is less than
More informationBig Data & Scripting Part II Streaming Algorithms
Big Data & Scripting Part II Streaming Algorithms 1, 2, a note on sampling and filtering sampling: (randomly) choose a representative subset filtering: given some criterion (e.g. membership in a set),
More informationProgramming Using Python
Introduction to Computation and Programming Using Python Revised and Expanded Edition John V. Guttag The MIT Press Cambridge, Massachusetts London, England CONTENTS PREFACE xiii ACKNOWLEDGMENTS xv 1 GETTING
More informationData Structures and Algorithms Written Examination
Data Structures and Algorithms Written Examination 22 February 2013 FIRST NAME STUDENT NUMBER LAST NAME SIGNATURE Instructions for students: Write First Name, Last Name, Student Number and Signature where
More informationDynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction
Lecture 11 Dynamic Programming 11.1 Overview Dynamic Programming is a powerful technique that allows one to solve many different types of problems in time O(n 2 ) or O(n 3 ) for which a naive approach
More informationCSC148 Lecture 8. Algorithm Analysis Binary Search Sorting
CSC148 Lecture 8 Algorithm Analysis Binary Search Sorting Algorithm Analysis Recall definition of Big Oh: We say a function f(n) is O(g(n)) if there exists positive constants c,b such that f(n)
More information1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++
Answer the following 1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++ 2) Which data structure is needed to convert infix notations to postfix notations? Stack 3) The
More informationGerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
More informationChapter Objectives. Chapter 9. Sequential Search. Search Algorithms. Search Algorithms. Binary Search
Chapter Objectives Chapter 9 Search Algorithms Data Structures Using C++ 1 Learn the various search algorithms Explore how to implement the sequential and binary search algorithms Discover how the sequential
More informationDiscrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10
CS 70 Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10 Introduction to Discrete Probability Probability theory has its origins in gambling analyzing card games, dice,
More informationK-Cover of Binary sequence
K-Cover of Binary sequence Prof Amit Kumar Prof Smruti Sarangi Sahil Aggarwal Swapnil Jain Given a binary sequence, represent all the 1 s in it using at most k- covers, minimizing the total length of all
More informationIntroduction to Markov Chain Monte Carlo
Introduction to Markov Chain Monte Carlo Monte Carlo: sample from a distribution to estimate the distribution to compute max, mean Markov Chain Monte Carlo: sampling using local information Generic problem
More informationSIMS 255 Foundations of Software Design. Complexity and NP-completeness
SIMS 255 Foundations of Software Design Complexity and NP-completeness Matt Welsh November 29, 2001 mdw@cs.berkeley.edu 1 Outline Complexity of algorithms Space and time complexity ``Big O'' notation Complexity
More informationDecision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationDistributed Computing over Communication Networks: Maximal Independent Set
Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.
More informationA binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heap-order property
CmSc 250 Intro to Algorithms Chapter 6. Transform and Conquer Binary Heaps 1. Definition A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called
More informationChapter 3. if 2 a i then location: = i. Page 40
Chapter 3 1. Describe an algorithm that takes a list of n integers a 1,a 2,,a n and finds the number of integers each greater than five in the list. Ans: procedure greaterthanfive(a 1,,a n : integers)
More informationRandom Map Generator v1.0 User s Guide
Random Map Generator v1.0 User s Guide Jonathan Teutenberg 2003 1 Map Generation Overview...4 1.1 Command Line...4 1.2 Operation Flow...4 2 Map Initialisation...5 2.1 Initialisation Parameters...5 -w xxxxxxx...5
More informationData Structures. Topic #12
Data Structures Topic #12 Today s Agenda Sorting Algorithms insertion sort selection sort exchange sort shell sort radix sort As we learn about each sorting algorithm, we will discuss its efficiency Sorting
More informationConverting a Number from Decimal to Binary
Converting a Number from Decimal to Binary Convert nonnegative integer in decimal format (base 10) into equivalent binary number (base 2) Rightmost bit of x Remainder of x after division by two Recursive
More informationEligibility Traces. Suggested reading: Contents: Chapter 7 in R. S. Sutton, A. G. Barto: Reinforcement Learning: An Introduction MIT Press, 1998.
Eligibility Traces 0 Eligibility Traces Suggested reading: Chapter 7 in R. S. Sutton, A. G. Barto: Reinforcement Learning: An Introduction MIT Press, 1998. Eligibility Traces Eligibility Traces 1 Contents:
More informationFind-The-Number. 1 Find-The-Number With Comps
Find-The-Number 1 Find-The-Number With Comps Consider the following two-person game, which we call Find-The-Number with Comps. Player A (for answerer) has a number x between 1 and 1000. Player Q (for questioner)
More information6.3 Conditional Probability and Independence
222 CHAPTER 6. PROBABILITY 6.3 Conditional Probability and Independence Conditional Probability Two cubical dice each have a triangle painted on one side, a circle painted on two sides and a square painted
More informationInternational Journal of Advanced Research in Computer Science and Software Engineering
Volume 3, Issue 7, July 23 ISSN: 2277 28X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Greedy Algorithm:
More informationLecture 1: Course overview, circuits, and formulas
Lecture 1: Course overview, circuits, and formulas Topics in Complexity Theory and Pseudorandomness (Spring 2013) Rutgers University Swastik Kopparty Scribes: John Kim, Ben Lund 1 Course Information Swastik
More informationQuestions 1 through 25 are worth 2 points each. Choose one best answer for each.
Questions 1 through 25 are worth 2 points each. Choose one best answer for each. 1. For the singly linked list implementation of the queue, where are the enqueues and dequeues performed? c a. Enqueue in
More informationData Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
More informationComputer Science 210: Data Structures. Searching
Computer Science 210: Data Structures Searching Searching Given a sequence of elements, and a target element, find whether the target occurs in the sequence Variations: find first occurence; find all occurences
More informationName: 1. CS372H: Spring 2009 Final Exam
Name: 1 Instructions CS372H: Spring 2009 Final Exam This exam is closed book and notes with one exception: you may bring and refer to a 1-sided 8.5x11- inch piece of paper printed with a 10-point or larger
More informationIntroduction to Parallel Programming and MapReduce
Introduction to Parallel Programming and MapReduce Audience and Pre-Requisites This tutorial covers the basics of parallel programming and the MapReduce programming model. The pre-requisites are significant
More informationLecture Notes on Binary Search Trees
Lecture Notes on Binary Search Trees 15-122: Principles of Imperative Computation Frank Pfenning André Platzer Lecture 17 October 23, 2014 1 Introduction In this lecture, we will continue considering associative
More informationEnsemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
More informationHeaps & Priority Queues in the C++ STL 2-3 Trees
Heaps & Priority Queues in the C++ STL 2-3 Trees CS 3 Data Structures and Algorithms Lecture Slides Friday, April 7, 2009 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationBattleships Searching Algorithms
Activity 6 Battleships Searching Algorithms Summary Computers are often required to find information in large collections of data. They need to develop quick and efficient ways of doing this. This activity
More informationMATH 140 Lab 4: Probability and the Standard Normal Distribution
MATH 140 Lab 4: Probability and the Standard Normal Distribution Problem 1. Flipping a Coin Problem In this problem, we want to simualte the process of flipping a fair coin 1000 times. Note that the outcomes
More informationOverview. Data Mining. Predicting Stock Market Returns. Predicting Health Risk. Wharton Department of Statistics. Wharton
Overview Data Mining Bob Stine www-stat.wharton.upenn.edu/~bob Applications - Marketing: Direct mail advertising (Zahavi example) - Biomedical: finding predictive risk factors - Financial: predicting returns
More information14.1 Rent-or-buy problem
CS787: Advanced Algorithms Lecture 14: Online algorithms We now shift focus to a different kind of algorithmic problem where we need to perform some optimization without knowing the input in advance. Algorithms
More informationKrishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C
Tutorial#1 Q 1:- Explain the terms data, elementary item, entity, primary key, domain, attribute and information? Also give examples in support of your answer? Q 2:- What is a Data Type? Differentiate
More informationUnsupervised Learning and Data Mining. Unsupervised Learning and Data Mining. Clustering. Supervised Learning. Supervised Learning
Unsupervised Learning and Data Mining Unsupervised Learning and Data Mining Clustering Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...
More informationSolutions to Homework 6
Solutions to Homework 6 Debasish Das EECS Department, Northwestern University ddas@northwestern.edu 1 Problem 5.24 We want to find light spanning trees with certain special properties. Given is one example
More informationClass Notes CS 3137. 1 Creating and Using a Huffman Code. Ref: Weiss, page 433
Class Notes CS 3137 1 Creating and Using a Huffman Code. Ref: Weiss, page 433 1. FIXED LENGTH CODES: Codes are used to transmit characters over data links. You are probably aware of the ASCII code, a fixed-length
More informationProperties of Stabilizing Computations
Theory and Applications of Mathematics & Computer Science 5 (1) (2015) 71 93 Properties of Stabilizing Computations Mark Burgin a a University of California, Los Angeles 405 Hilgard Ave. Los Angeles, CA
More informationHidden Markov Models
8.47 Introduction to omputational Molecular Biology Lecture 7: November 4, 2004 Scribe: Han-Pang hiu Lecturer: Ross Lippert Editor: Russ ox Hidden Markov Models The G island phenomenon The nucleotide frequencies
More informationWhat Is Recursion? Recursion. Binary search example postponed to end of lecture
Recursion Binary search example postponed to end of lecture What Is Recursion? Recursive call A method call in which the method being called is the same as the one making the call Direct recursion Recursion
More informationIntroduction to Algorithms. Part 3: P, NP Hard Problems
Introduction to Algorithms Part 3: P, NP Hard Problems 1) Polynomial Time: P and NP 2) NP-Completeness 3) Dealing with Hard Problems 4) Lower Bounds 5) Books c Wayne Goddard, Clemson University, 2004 Chapter
More informationrecursion, O(n), linked lists 6/14
recursion, O(n), linked lists 6/14 recursion reducing the amount of data to process and processing a smaller amount of data example: process one item in a list, recursively process the rest of the list
More informationpath tracing computer graphics path tracing 2009 fabio pellacini 1
path tracing computer graphics path tracing 2009 fabio pellacini 1 path tracing Monte Carlo algorithm for solving the rendering equation computer graphics path tracing 2009 fabio pellacini 2 solving rendering
More informationHow To Improve Efficiency In Ray Tracing
CS 563 Advanced Topics in Computer Graphics Russian Roulette - Sampling Reflectance Functions by Alex White Monte Carlo Ray Tracing Monte Carlo In ray tracing, use randomness to evaluate higher dimensional
More informationQuiz 4 Solutions EECS 211: FUNDAMENTALS OF COMPUTER PROGRAMMING II. 1 Q u i z 4 S o l u t i o n s
Quiz 4 Solutions Q1: What value does function mystery return when called with a value of 4? int mystery ( int number ) { if ( number
More informationOffline sorting buffers on Line
Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com
More informationUNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS
UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi-110 012 dcmishra@iasri.res.in What is Learning? "Learning denotes changes in a system that enable
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationPigeonhole Principle Solutions
Pigeonhole Principle Solutions 1. Show that if we take n + 1 numbers from the set {1, 2,..., 2n}, then some pair of numbers will have no factors in common. Solution: Note that consecutive numbers (such
More informationAlgorithms. Margaret M. Fleck. 18 October 2010
Algorithms Margaret M. Fleck 18 October 2010 These notes cover how to analyze the running time of algorithms (sections 3.1, 3.3, 4.4, and 7.1 of Rosen). 1 Introduction The main reason for studying big-o
More informationB-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees
B-Trees Algorithms and data structures for external memory as opposed to the main memory B-Trees Previous Lectures Height balanced binary search trees: AVL trees, red-black trees. Multiway search trees:
More informationGuessing Game: NP-Complete?
Guessing Game: NP-Complete? 1. LONGEST-PATH: Given a graph G = (V, E), does there exists a simple path of length at least k edges? YES 2. SHORTEST-PATH: Given a graph G = (V, E), does there exists a simple
More informationS. Muthusundari. Research Scholar, Dept of CSE, Sathyabama University Chennai, India e-mail: nellailath@yahoo.co.in. Dr. R. M.
A Sorting based Algorithm for the Construction of Balanced Search Tree Automatically for smaller elements and with minimum of one Rotation for Greater Elements from BST S. Muthusundari Research Scholar,
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationCOMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
More informationData Structures and Data Manipulation
Data Structures and Data Manipulation What the Specification Says: Explain how static data structures may be used to implement dynamic data structures; Describe algorithms for the insertion, retrieval
More informationUniversal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m.
Universal hashing No matter how we choose our hash function, it is always possible to devise a set of keys that will hash to the same slot, making the hash scheme perform poorly. To circumvent this, we
More informationPredicting daily incoming solar energy from weather data
Predicting daily incoming solar energy from weather data ROMAIN JUBAN, PATRICK QUACH Stanford University - CS229 Machine Learning December 12, 2013 Being able to accurately predict the solar power hitting
More informationCSC2420 Fall 2012: Algorithm Design, Analysis and Theory
CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms
More informationPrevious Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles
B-Trees Algorithms and data structures for external memory as opposed to the main memory B-Trees Previous Lectures Height balanced binary search trees: AVL trees, red-black trees. Multiway search trees:
More informationThinking of a (block) cipher as a permutation (depending on the key) on strings of a certain size, we would not want such a permutation to have many
Fixed points of permutations Let f : S S be a permutation of a set S. An element s S is a fixed point of f if f(s) = s. That is, the fixed points of a permutation are the points not moved by the permutation.
More informationOutline. Introduction Linear Search. Transpose sequential search Interpolation search Binary search Fibonacci search Other search techniques
Searching (Unit 6) Outline Introduction Linear Search Ordered linear search Unordered linear search Transpose sequential search Interpolation search Binary search Fibonacci search Other search techniques
More informationRoots of Equations (Chapters 5 and 6)
Roots of Equations (Chapters 5 and 6) Problem: given f() = 0, find. In general, f() can be any function. For some forms of f(), analytical solutions are available. However, for other functions, we have
More informationMotivation Suppose we have a database of people We want to gure out who is related to whom Initially, we only have a list of people, and information a
CSE 220: Handout 29 Disjoint Sets 1 Motivation Suppose we have a database of people We want to gure out who is related to whom Initially, we only have a list of people, and information about relations
More informationExam study sheet for CS2711. List of topics
Exam study sheet for CS2711 Here is the list of topics you need to know for the final exam. For each data structure listed below, make sure you can do the following: 1. Give an example of this data structure
More informationBLAST. Anders Gorm Pedersen & Rasmus Wernersson
BLAST Anders Gorm Pedersen & Rasmus Wernersson Database searching Using pairwise alignments to search databases for similar sequences Query sequence Database Database searching Most common use of pairwise
More informationLab 11. Simulations. The Concept
Lab 11 Simulations In this lab you ll learn how to create simulations to provide approximate answers to probability questions. We ll make use of a particular kind of structure, called a box model, that
More informationRotation Operation for Binary Search Trees Idea:
Rotation Operation for Binary Search Trees Idea: Change a few pointers at a particular place in the tree so that one subtree becomes less deep in exchange for another one becoming deeper. A sequence of
More informationData Structures and Algorithms
Data Structures and Algorithms Computational Complexity Escola Politècnica Superior d Alcoi Universitat Politècnica de València Contents Introduction Resources consumptions: spatial and temporal cost Costs
More informationR-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants
R-Trees: A Dynamic Index Structure For Spatial Searching A. Guttman R-trees Generalization of B+-trees to higher dimensions Disk-based index structure Occupancy guarantee Multiple search paths Insertions
More information