# New Hash Function Construction for Textual and Geometric Data Retrieval

Save this PDF as:

Size: px
Start display at page:

Download "New Hash Function Construction for Textual and Geometric Data Retrieval"

## Transcription

1 Latest Trends on Computers, Vol., pp , ISBN , ISSN 79-45, CSCC conference, Corfu, Greece, New Hash Function Construction for Textual and Geometric Data Retrieval Václav Skala, Jan Hrádek, Martin Kuchař University of West Bohemia Department of Computer Science and Engineering CZ 36 4 Plzen, Czech Republic ABSTRACT Techniques based on hashing are heavily used in many applications, e.g. information retrieval, geometry processing, chemical and medical applications etc. and even in cryptography. Traditionally the hash functions are considered in a form of h(v) = f(v) mod m, where m is considered as a prime number and f(v) is a function over the element v, which is generally of unlimited dimensionality and/or of unlimited range of values. In this paper a new approach for a hash function construction is presented which offers unique properties for textual and geometric data. Textual data have a limited range of values (the alphabet size) and unlimited dimensionality (the string length), while geometric data have unlimited range of values (usually (-, ) ), but limited dimensionality (usually or 3). Construction of the hash function differs for textual and geometric data and the proposed hash construction has been verified on non-trivial data sets. Categories and Subject Descriptors H.3. [Information Storage and Retrieval]: Content Analysis and Indexing; H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval; [Theory of Computation] Miscelaneous General Terms Your general terms must be any of the following 6 designated terms: Algorithms, Performance, Experimentation, Verification, Theory. Keywords Hash function, textual data, geometrical data, indexing, data retrieval,. INTRODUCTION Hashing techniques are very popular and used in many applications. Their main advantage is a fast retrieval, in the ideal case with O() complexity. Unfortunately, in text processing a geometrical data processing is quite different as hash functions can form long buckets that cause unacceptable run-time in real situations. In geometric data processing we need to process 6 8 of points consisting usually of three, i.e. < x, y, z> coordinates, or more coordinates. The situation is even more complicated by the fact that range of values for each coordinate is generally unlimited, i.e. the interval ( -, ) has to be considered. In the text processing case, the dimensionality is unlimited, i.e. a string can be very long, e.g. the titin protein is described by the word Methionylthreonylthreonylglutaminylarginyl...isoleucine which consist of 89,89 characters, but the range of values for each dimension is limited to the size of the actual alphabet. It can be seen that there are significantly different requirements from those two applications to the hash function construction and hashing method in general. In the following text we explain how the hash function is constructed in general and then how the hash function is constructed for textual and geometric data. Experiments are described and obtained results are evaluated.. HASHING TECHNIQUE Hashing technique is based on an idea, that there is a hash function h(v) which gives a unique address to a table for the given primitive v with an acceptable sparsicity of the table. This is the idea of perfect hashing [] or nearly perfect hashing [9], which is not usable for larger data sets. In practical use hash functions [3-4], [6-7] do not return unique addresses for different primitives. It results to buckets, which might be pretty long and lead to unacceptable results in indexing and retrieving items from data sets. There are actually two problems that should be resolved if the hash function is to be designed, i.e.: We should avoid the overflow operation in hash function computation. It is quite severe requirement which is quite hard to implement in general case. We should use the whole size of the table and reduce the bucket/cluster lengths as much as possible. The maximal and average cluster length should be as low as possible (cluster is usually implemented as a list of primitives for the cases when the hash function gives the same value). The hash function must be as simple as possible in order to have very fast evaluation. There are several additional requirements that differs from application to application, e.g. distributed hashing etc. In the following we show the approach taken on geometrical data case first and then how construction of the hash function hould be made for textual data.

2 Latest Trends on Computers, Vol., pp , ISBN , ISSN 79-45, CSCC conference, Corfu, Greece, 3. HASH FUNCTION CONSTRUCTION 3. Geometric case Let us consider the geometrical case first. In this case the dimensionality of the given item is given, usually (<x,y>) or 3 (<x,y,z>), but the range of values is unlimited. Let us consider recommended hash function for geometrical purposes []. The original hash function was defined as where: int is the conversion to integer - the fraction part of the float is removed, Q defines sensitivity - number of valid decimal digits (numerical error elimination) - for 3 decimal digits set Q =., m is the size of the hash table that is determined as described later, but generally as k for fast evaluation of the modulo and division operations, X, Y, Z are co-ordinates of a vertex. It should be noted that in the graphical data case the number of processed primitives, e.g. points, can easily reach 6 8. The hash function shown above uses very simple formula that is recommended in many publications for small or medium data sets. Nevertheless when the property of the hash function was experimentally verified for elimination of duplicities of points, it has not proved good properties for larger data sets, see Tab.. and Fig. -. Fig.3-4 presents rendered data sets. The experiments proved that the function has relatively stable properties nearly without significant influence of the coefficient Q. Table. Typical characteristic of the original hash function File Number of triangles Original number of vertices Final number of vertices Maximal cluster length CTHead.stl Gener.stl Teapot.stl Figure. Number of clusters for precision Q=7 decimal points for Teapot data set One disadvantage of this hash function is that the coefficient Q depends on the data and can lead to mixing some vertices together. The second disadvantage is that the argument of the hash function can easily overflow Figure. Number of clusters for precision Q=9 decimal points for Teapot data set Figure 3. Teapot data set Figure 4. CT Head data set Data analysis proved that it is not reasonable to remove the fraction part from the co-ordinate value as it helps us to distinguish co-ordinates better, it is necessary to remove all coefficients that depends on data set somehow it increases the application stability the available memory has to be used as much as possible to get larger hash table, the hash function should not be static one - it should be dynamic according to currently available memory, but generally the size of the hash table can be fixed. Taking into account required properties of the hash function, several functions have been derived in the general form. Where: α, β and γ are coefficients of the hash function, e.g. 3, 5 and 7 (not necessarily integer numbers), C coefficient is a scaling coefficient set so that the full range of integer values is used, i.e. maximum range of the interval <, 3 -> or <, 64 -> is used. In the following we consider the interval <, 3 -> only. For a simplicity assume that all co-ordinates X are from the <, X max > interval, similarly for others. Then we can compute maximal value ξ that can be obtained from the formula as ξ = α * X max + β * Y max + γ * Z max Because the overflow operation must be avoided and also we must use the whole size of the table, the C coefficient must be determined as C = min { C, C } where: C * ξ <= 3 C = 3 - k In order to get a maximal flexibility of the hash function we must use the whole address space interval (in our case <, 3 >),

4 Latest Trends on Computers, Vol., pp , ISBN , ISSN 79-45, CSCC conference, Corfu, Greece, The maximal value of f(x) can be easily determined as Where is generally the highest value representing the last character in the given alphabet on the i-th position of the given string (string can consists of different alphabets on each string position). In majority of cases, the will be the same and representing Z character. Now we have to determine the constant C so that value will be again from the interval <, 3 -> or <, 64 ->. To be able to compare different hash functions for textual data it is necessary to introduce some general criteria. Let us assume that there are already N items stored in the data structure and I is the cluster length. Three basic situations can occur when a new item, i.e. a string, is inserted to the structure:. The item is not stored in the data structure and the appropriate cluster is empty. The item is inserted to this cluster. The cost of this operation for all such items can be expressed as Q =. The item is not stored in the data structure so the whole cluster is to be searched and the item is to be inserted to an appropriate cluster. The cost of this operation for all such items can be expressed as (because the cluster of the length I must be searched for all items in this cluster I-times, value I is powered by two) I m Q = I C I I = 3. The item is stored already in the data structure so the corresponding cluster is to be searched and the item is not inserted to the appropriate cluster. Because only half of the cluster is to be searched on average, the cost of this operation for all such items can be expressed as I m Q3 = I C I I = It is necessary to point out that the cost of the hash function evaluation has not been considered, as it is the same for all cases. The cost of item insertion to a cluster was omitted as the item is inserted to the front of the bucket. The final criterion can be expressed as Q = Q + Q + Q 3 3 = I m I = I C I Empty clusters are not considered by this criterion because the hash table length m depends on the number of items stored. It can be seen that the criterion Q depends on the number of items. We used a relative criterion Q to evaluate properties of hash functions for different data sets with different sizes defined as Q Q ' = N Several experiments with coefficients α, β, γ for the hash function were made recently. These coefficients were taken as decimal numbers and hash function behavior was tested for large data sets with very good results being obtained for geometrical applications. These results encouraged us to apply the hash functions to large dictionaries. In this paper two dictionaries, Czech and English, were used [5]. The proposed approach was also experimentally proved on French, German, Hebrew and Russian dictionaries as well. There were several significant results from the previous experiments with large geometrical data sets. The most important assumptions of our approach have been: large available memory for applications is considered, the load factor f (will be defined later in this paper) should be smaller than or equal to.5, the hash function value should be in the interval of < ; 3 - > before the modulo operation is used to get a better spread for all considered items, the expected number of items to be stored is in the range < 5 ; 7 > or higher, hash functions used for strings must be different from hash function used in geometrical applications, because strings can have various length, i.e. non-constant dimensionality, and characters are taken from a discrete value set, non-uniform distribution of characters in dictionaries will not be considered in order to obtain reasonable generality and simplicity. Both hash functions were tested using Czech and English ISPELL dictionaries [5]. The data sets for both languages were taken from the ispell package for spell checking. The Czech dictionary contains approx..5 * 6 words and the English dictionary contains approx..3 * 5 words. Several experiments were made including slight modifications for the Czech language. Czech dictionary (end),,4,6,8 q Figure 8. Relative criterion Q for Czech dictionary (I a : min..5, I m : min. 6) I a is an average cluster length, I m is a maximum cluster length

5 Latest Trends on Computers, Vol., pp , ISBN , ISSN 79-45, CSCC conference, Corfu, Greece, Figure 9. Relative criterion Q for English dictionary (I a : min.3, I m : min 4) The properties of the hash function are very poor at the interval (;.3). This behavior is caused by the non-uniform frequency of characters in words. The perfect overlapping of some words causes the peaks for the multipliers of.5. For example: the strings ab, aac, aaae, will have exactly the same value. To improve the properties of the hash function and reduce overlapping the coefficient q should be an irrational number. However it is not possible to store an irrational number on the computer in the usual way. Thus in the actual implementation the irrational coefficient q is approximated using the closest available number on a computer with 64 bit double type. In this instance 7 decimal places were used for the representation of q. The experiments were repeated again for irrational values of q in the interval q (,4;,9). Results of this experiment are presented in Fig. -. It can be seen from the graphs that the properties of the hash function were improved and the relative criterion Q has lower value.,6475,647,6465,646,6455,645 English dictionary (start),5 q Czech dictionary (end),6445,4,5,6,7,8 q,9 Figure. Relative criterion Q for Czech dictionary,638,637,636,635,634,633,63,63,63,69 Figure. Relative criterion Q for English Varying the size of the hash table Some small improvement can be expected if the size of the hash table is increased and dramatic changes in behavior can be expected if the hash table is shorter than a half of the designed length. Such behavior has been proved in another experiments, see FIG. 6-7; the mark X is used for comparison only. Note that the table sizes were changed accordingly of the hash table varied in interval <,48,576; 33,554,43> for the Czech dictionary and <65,536;,97,5> for the English dictionary.,8,6,4,,8,6,4, English dictionary (start),4,5,6,7,8,9 q Czech dictionary (end) 3 4 multiple of the length of the hash table Figure. Relative criterion Q for Czech dictionary when varying size of the hash table (I a :.-.58, I m : 4-),8,6,4,,8,6,4, English dictionary (start) 3 4 multiple of the length of the hash table Figure 3. Relative criterion Q for English dictionary when varying size of the hash table (I a :.-.36, I m : 3-)

7 Latest Trends on Computers, Vol., pp , ISBN , ISSN 79-45, CSCC conference, Corfu, Greece, Influence of the table length for different data length *T *T T T/ T/ *T Figure A.. Maximal bucket length dependence on the number of vertices and the hash table length Parameters: (αα, β, γ ) = ( π, e, ), T is the recommended table length

### Lecture 1: Elementary Number Theory

Lecture 1: Elementary Number Theory The integers are the simplest and most fundamental objects in discrete mathematics. All calculations by computers are based on the arithmetical operations with integers

### MLR Institute of Technology

MLR Institute of Technology DUNDIGAL 500 043, HYDERABAD COMPUTER SCIENCE AND ENGINEERING Computer Programming Lab List of Experiments S.No. Program Category List of Programs 1 Operators a) Write a C program

### Performance evaluation of Web Information Retrieval Systems and its application to e-business

Performance evaluation of Web Information Retrieval Systems and its application to e-business Fidel Cacheda, Angel Viña Departament of Information and Comunications Technologies Facultad de Informática,

### Chapter II Binary Data Representation

Chapter II Binary Data Representation The atomic unit of data in computer systems is the bit, which is actually an acronym that stands for BInary digit. It can hold only 2 values or states: 0 or 1, true

### Pythagorean Theorem. Overview. Grade 8 Mathematics, Quarter 3, Unit 3.1. Number of instructional days: 15 (1 day = minutes) Essential questions

Grade 8 Mathematics, Quarter 3, Unit 3.1 Pythagorean Theorem Overview Number of instructional days: 15 (1 day = 45 60 minutes) Content to be learned Prove the Pythagorean Theorem. Given three side lengths,

### Actually, if you have a graphing calculator this technique can be used to find solutions to any equation, not just quadratics. All you need to do is

QUADRATIC EQUATIONS Definition ax 2 + bx + c = 0 a, b, c are constants (generally integers) Roots Synonyms: Solutions or Zeros Can have 0, 1, or 2 real roots Consider the graph of quadratic equations.

### Static Program Transformations for Efficient Software Model Checking

Static Program Transformations for Efficient Software Model Checking Shobha Vasudevan Jacob Abraham The University of Texas at Austin Dependable Systems Large and complex systems Software faults are major

### Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

### Symbol Tables. IE 496 Lecture 13

Symbol Tables IE 496 Lecture 13 Reading for This Lecture Horowitz and Sahni, Chapter 2 Symbol Tables and Dictionaries A symbol table is a data structure for storing a list of items, each with a key and

### CHAPTER 3 Numbers and Numeral Systems

CHAPTER 3 Numbers and Numeral Systems Numbers play an important role in almost all areas of mathematics, not least in calculus. Virtually all calculus books contain a thorough description of the natural,

### An Introduction to Galois Fields and Reed-Solomon Coding

An Introduction to Galois Fields and Reed-Solomon Coding James Westall James Martin School of Computing Clemson University Clemson, SC 29634-1906 October 4, 2010 1 Fields A field is a set of elements on

### Students will understand 1. use numerical bases and the laws of exponents

Grade 8 Expressions and Equations Essential Questions: 1. How do you use patterns to understand mathematics and model situations? 2. What is algebra? 3. How are the horizontal and vertical axes related?

### Computer Graphics CS 543 Lecture 12 (Part 1) Curves. Prof Emmanuel Agu. Computer Science Dept. Worcester Polytechnic Institute (WPI)

Computer Graphics CS 54 Lecture 1 (Part 1) Curves Prof Emmanuel Agu Computer Science Dept. Worcester Polytechnic Institute (WPI) So Far Dealt with straight lines and flat surfaces Real world objects include

### 6 EXTENDING ALGEBRA. 6.0 Introduction. 6.1 The cubic equation. Objectives

6 EXTENDING ALGEBRA Chapter 6 Extending Algebra Objectives After studying this chapter you should understand techniques whereby equations of cubic degree and higher can be solved; be able to factorise

### (Refer Slide Time: 00:00:56 min)

Numerical Methods and Computation Prof. S.R.K. Iyengar Department of Mathematics Indian Institute of Technology, Delhi Lecture No # 3 Solution of Nonlinear Algebraic Equations (Continued) (Refer Slide

### Big Data & Scripting Part II Streaming Algorithms

Big Data & Scripting Part II Streaming Algorithms 1, Counting Distinct Elements 2, 3, counting distinct elements problem formalization input: stream of elements o from some universe U e.g. ids from a set

### Big Data & Scripting Part II Streaming Algorithms

Big Data & Scripting Part II Streaming Algorithms 1, 2, a note on sampling and filtering sampling: (randomly) choose a representative subset filtering: given some criterion (e.g. membership in a set),

### Georgia Standards of Excellence Curriculum Map. Mathematics. GSE 8 th Grade

Georgia Standards of Excellence Curriculum Map Mathematics GSE 8 th Grade These materials are for nonprofit educational purposes only. Any other use may constitute copyright infringement. GSE Eighth Grade

### Degree Reduction of Interval SB Curves

International Journal of Video&Image Processing and Network Security IJVIPNS-IJENS Vol:13 No:04 1 Degree Reduction of Interval SB Curves O. Ismail, Senior Member, IEEE Abstract Ball basis was introduced

### Common Core Unit Summary Grades 6 to 8

Common Core Unit Summary Grades 6 to 8 Grade 8: Unit 1: Congruence and Similarity- 8G1-8G5 rotations reflections and translations,( RRT=congruence) understand congruence of 2 d figures after RRT Dilations

### Fractions and Decimals

Fractions and Decimals Tom Davis tomrdavis@earthlink.net http://www.geometer.org/mathcircles December 1, 2005 1 Introduction If you divide 1 by 81, you will find that 1/81 =.012345679012345679... The first

### Universal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m.

Universal hashing No matter how we choose our hash function, it is always possible to devise a set of keys that will hash to the same slot, making the hash scheme perform poorly. To circumvent this, we

### 1. Relational database accesses data in a sequential form. (Figures 7.1, 7.2)

Chapter 7 Data Structures for Computer Graphics (This chapter was written for programmers - option in lecture course) Any computer model of an Object must comprise three different types of entities: 1.

### Homework 5 Solutions

Homework 5 Solutions 4.2: 2: a. 321 = 256 + 64 + 1 = (01000001) 2 b. 1023 = 512 + 256 + 128 + 64 + 32 + 16 + 8 + 4 + 2 + 1 = (1111111111) 2. Note that this is 1 less than the next power of 2, 1024, which

### Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:

### Two General Methods to Reduce Delay and Change of Enumeration Algorithms

ISSN 1346-5597 NII Technical Report Two General Methods to Reduce Delay and Change of Enumeration Algorithms Takeaki Uno NII-2003-004E Apr.2003 Two General Methods to Reduce Delay and Change of Enumeration

### CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

We Can Early Learning Curriculum PreK Grades 8 12 INSIDE ALGEBRA, GRADES 8 12 CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA April 2016 www.voyagersopris.com Mathematical

### Polynomials. Dr. philippe B. laval Kennesaw State University. April 3, 2005

Polynomials Dr. philippe B. laval Kennesaw State University April 3, 2005 Abstract Handout on polynomials. The following topics are covered: Polynomial Functions End behavior Extrema Polynomial Division

### A Static Analyzer for Large Safety-Critical Software. Considered Programs and Semantics. Automatic Program Verification by Abstract Interpretation

PLDI 03 A Static Analyzer for Large Safety-Critical Software B. Blanchet, P. Cousot, R. Cousot, J. Feret L. Mauborgne, A. Miné, D. Monniaux,. Rival CNRS École normale supérieure École polytechnique Paris

### ALGEBRA I A PLUS COURSE OUTLINE

ALGEBRA I A PLUS COURSE OUTLINE OVERVIEW: 1. Operations with Real Numbers 2. Equation Solving 3. Word Problems 4. Inequalities 5. Graphs of Functions 6. Linear Functions 7. Scatterplots and Lines of Best

### Intersection of a Line and a Convex. Hull of Points Cloud

Applied Mathematical Sciences, Vol. 7, 213, no. 13, 5139-5149 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/1.12988/ams.213.37372 Intersection of a Line and a Convex Hull of Points Cloud R. P. Koptelov

### CHAPTER 2 Data Representation in Computer Systems

CHAPTER 2 Data Representation in Computer Systems 2.1 Introduction 47 2.2 Positional Numbering Systems 48 2.3 Converting Between Bases 48 2.3.1 Converting Unsigned Whole Numbers 49 2.3.2 Converting Fractions

### A New Polygon Based Algorithm for Filling Regions

Tamkang Journal of Science and Engineering, Vol. 2, No. 4, pp. 175-186 (2) 175 A New Polygon Based Algorithm for Filling Regions Hoo-Cheng *, Mu-Hwa Chen*, Shou-Yiing Hsu**, Chaoyin Chien*, Tsu-Feng Kuo*

### Positional Numbering System

APPENDIX B Positional Numbering System A positional numbering system uses a set of symbols. The value that each symbol represents, however, depends on its face value and its place value, the value associated

### Performance Level Descriptors Grade 6 Mathematics

Performance Level Descriptors Grade 6 Mathematics Multiplying and Dividing with Fractions 6.NS.1-2 Grade 6 Math : Sub-Claim A The student solves problems involving the Major Content for grade/course with

### Recall that in base 10, the digits of a number are just coefficients of powers of the base (10):

Binary Number System 1 Base 10 digits: 0 1 2 3 4 5 6 7 8 9 Base 2 digits: 0 1 Recall that in base 10, the digits of a number are just coefficients of powers of the base (10): 417 = 4 * 10 2 + 1 * 10 1

### Grade 6 Mathematics Assessment. Eligible Texas Essential Knowledge and Skills

Grade 6 Mathematics Assessment Eligible Texas Essential Knowledge and Skills STAAR Grade 6 Mathematics Assessment Mathematical Process Standards These student expectations will not be listed under a separate

### Notes on Factoring. MA 206 Kurt Bryan

The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor

### Binary Representation and Computer Arithmetic

Binary Representation and Computer Arithmetic The decimal system of counting and keeping track of items was first created by Hindu mathematicians in India in A.D. 4. Since it involved the use of fingers

### POLYNOMIAL RINGS AND UNIQUE FACTORIZATION DOMAINS

POLYNOMIAL RINGS AND UNIQUE FACTORIZATION DOMAINS RUSS WOODROOFE 1. Unique Factorization Domains Throughout the following, we think of R as sitting inside R[x] as the constant polynomials (of degree 0).

### In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data.

MATHEMATICS: THE LEVEL DESCRIPTIONS In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data. Attainment target

### Domain Essential Question Common Core Standards Resources

Middle School Math 2016-2017 Domain Essential Question Common Core Standards First Ratios and Proportional Relationships How can you use mathematics to describe change and model real world solutions? How

### APPLICATIONS OF THE ORDER FUNCTION

APPLICATIONS OF THE ORDER FUNCTION LECTURE NOTES: MATH 432, CSUSM, SPRING 2009. PROF. WAYNE AITKEN In this lecture we will explore several applications of order functions including formulas for GCDs and

### Number Systems and. Data Representation

Number Systems and Data Representation 1 Lecture Outline Number Systems Binary, Octal, Hexadecimal Representation of characters using codes Representation of Numbers Integer, Floating Point, Binary Coded

### For example, estimate the population of the United States as 3 times 10⁸ and the

CCSS: Mathematics The Number System CCSS: Grade 8 8.NS.A. Know that there are numbers that are not rational, and approximate them by rational numbers. 8.NS.A.1. Understand informally that every number

### Continued fractions and good approximations.

Continued fractions and good approximations We will study how to find good approximations for important real life constants A good approximation must be both accurate and easy to use For instance, our

### C programming: exercise sheet L2-STUE (2011-2012)

C programming: exercise sheet L2-STUE (2011-2012) Algorithms and Flowcharts Exercise 1: comparison Write the flowchart and associated algorithm that compare two numbers a and b. Exercise 2: 2 nd order

### BM307 File Organization

BM307 File Organization Gazi University Computer Engineering Department 9/24/2014 1 Index Sequential File Organization Binary Search Interpolation Search Self-Organizing Sequential Search Direct File Organization

### Copyrighted Material. Chapter 1 DEGREE OF A CURVE

Chapter 1 DEGREE OF A CURVE Road Map The idea of degree is a fundamental concept, which will take us several chapters to explore in depth. We begin by explaining what an algebraic curve is, and offer two

### IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS

Volume 2, No. 3, March 2011 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at www.jgrcs.info IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE

### Chapter Objectives. Chapter 9. Sequential Search. Search Algorithms. Search Algorithms. Binary Search

Chapter Objectives Chapter 9 Search Algorithms Data Structures Using C++ 1 Learn the various search algorithms Explore how to implement the sequential and binary search algorithms Discover how the sequential

### Data Warehousing und Data Mining

Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing Grid-Files Kd-trees Ulf Leser: Data

### Integer Operations. Overview. Grade 7 Mathematics, Quarter 1, Unit 1.1. Number of Instructional Days: 15 (1 day = 45 minutes) Essential Questions

Grade 7 Mathematics, Quarter 1, Unit 1.1 Integer Operations Overview Number of Instructional Days: 15 (1 day = 45 minutes) Content to Be Learned Describe situations in which opposites combine to make zero.

### Binary Number System. 16. Binary Numbers. Base 10 digits: 0 1 2 3 4 5 6 7 8 9. Base 2 digits: 0 1

Binary Number System 1 Base 10 digits: 0 1 2 3 4 5 6 7 8 9 Base 2 digits: 0 1 Recall that in base 10, the digits of a number are just coefficients of powers of the base (10): 417 = 4 * 10 2 + 1 * 10 1

### North Carolina Math 2

Standards for Mathematical Practice 1. Make sense of problems and persevere in solving them. 2. Reason abstractly and quantitatively 3. Construct viable arguments and critique the reasoning of others 4.

### Datenbanksysteme II: Hashing. Ulf Leser

Datenbanksysteme II: Hashing Ulf Leser Content of this Lecture Hashing Extensible Hashing Linear Hashing Ulf Leser: Implementation of Database Systems, Winter Semester 2012/2013 2 Sorting or Hashing Sorted

### CSCI 230 Class Notes Binary Number Representations and Arithmetic

CSCI 230 Class otes Binary umber Representations and Arithmetic Mihran Tuceryan with some modifications by Snehasis Mukhopadhyay Jan 22, 1999 1 Decimal otation What does it mean when we write 495? How

### Image Compression through DCT and Huffman Coding Technique

International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul

### Algebra Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the 2012-13 school year.

This document is designed to help North Carolina educators teach the Common Core (Standard Course of Study). NCDPI staff are continually updating and improving these tools to better serve teachers. Algebra

### MACM 101 Discrete Mathematics I

MACM 101 Discrete Mathematics I Exercises on Combinatorics, Probability, Languages and Integers. Due: Tuesday, November 2th (at the beginning of the class) Reminder: the work you submit must be your own.

### Presented By: Ms. Poonam Anand

Presented By: Ms. Poonam Anand Know the different types of numbers Describe positional notation Convert numbers in other bases to base 10 Convert base 10 numbers into numbers of other bases Describe the

### Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age.

Volume 3, Issue 10, October 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Load Measurement

### Programming Challenges

Programming Challenges Week 4 - Arithmetic and Algebra Claus Aranha caranha@cs.tsukuba.ac.jp College of Information Sciences May 11, 2014 Some Quick Notes (1) ICPC Programming Contest 2014 Website: http://icpc.iisf.or.jp/2014-waseda/

### What are the place values to the left of the decimal point and their associated powers of ten?

The verbal answers to all of the following questions should be memorized before completion of algebra. Answers that are not memorized will hinder your ability to succeed in geometry and algebra. (Everything

### Data Structure. Lecture 3

Data Structure Lecture 3 Data Structure Formally define Data structure as: DS describes not only set of objects but the ways they are related, the set of operations which may be applied to the elements

### Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm

Journal of Al-Nahrain University Vol.15 (2), June, 2012, pp.161-168 Science Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm Manal F. Younis Computer Department, College

### Figure 2.1: Center of mass of four points.

Chapter 2 Bézier curves are named after their inventor, Dr. Pierre Bézier. Bézier was an engineer with the Renault car company and set out in the early 196 s to develop a curve formulation which would

### Common Core State Standard I Can Statements 8 th Grade Mathematics. The Number System (NS)

CCSS Key: The Number System (NS) Expressions & Equations (EE) Functions (F) Geometry (G) Statistics & Probability (SP) Common Core State Standard I Can Statements 8 th Grade Mathematics 8.NS.1. Understand

### Computer Organization and Architecture

Computer Organization and Architecture Chapter 9 Computer Arithmetic Arithmetic & Logic Unit Performs arithmetic and logic operations on data everything that we think of as computing. Everything else in

### Chapter 2 Numeric Representation.

Chapter 2 Numeric Representation. Most of the things we encounter in the world around us are analog; they don t just take on one of two values. How then can they be represented digitally? The key is that

### COGNITIVE TUTOR ALGEBRA

COGNITIVE TUTOR ALGEBRA Numbers and Operations Standard: Understands and applies concepts of numbers and operations Power 1: Understands numbers, ways of representing numbers, relationships among numbers,

### MATHEMATICS Grade 6 Standard: Number, Number Sense and Operations

Standard: Number, Number Sense and Operations Number and Number C. Develop meaning for percents including percents greater than 1. Describe what it means to find a specific percent of a number, Systems

### Lecture 4 Online and streaming algorithms for clustering

CSE 291: Geometric algorithms Spring 2013 Lecture 4 Online and streaming algorithms for clustering 4.1 On-line k-clustering To the extent that clustering takes place in the brain, it happens in an on-line

### Prentice Hall Algebra 2 2011 Correlated to: Colorado P-12 Academic Standards for High School Mathematics, Adopted 12/2009

Content Area: Mathematics Grade Level Expectations: High School Standard: Number Sense, Properties, and Operations Understand the structure and properties of our number system. At their most basic level

### Greater Nanticoke Area School District Math Standards: Grade 6

Greater Nanticoke Area School District Math Standards: Grade 6 Standard 2.1 Numbers, Number Systems and Number Relationships CS2.1.8A. Represent and use numbers in equivalent forms 43. Recognize place

### Prentice Hall Mathematics: Course 1 2008 Correlated to: Arizona Academic Standards for Mathematics (Grades 6)

PO 1. Express fractions as ratios, comparing two whole numbers (e.g., ¾ is equivalent to 3:4 and 3 to 4). Strand 1: Number Sense and Operations Every student should understand and use all concepts and

### ISSN: 2319-5967 ISO 9001:2008 Certified International Journal of Engineering Science and Innovative Technology (IJESIT) Volume 2, Issue 3, May 2013

Transistor Level Fault Finding in VLSI Circuits using Genetic Algorithm Lalit A. Patel, Sarman K. Hadia CSPIT, CHARUSAT, Changa., CSPIT, CHARUSAT, Changa Abstract This paper presents, genetic based algorithm

### This Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers

This Unit: Floating Point Arithmetic CIS 371 Computer Organization and Design Unit 7: Floating Point App App App System software Mem CPU I/O Formats Precision and range IEEE 754 standard Operations Addition

### The finite field with 2 elements The simplest finite field is

The finite field with 2 elements The simplest finite field is GF (2) = F 2 = {0, 1} = Z/2 It has addition and multiplication + and defined to be 0 + 0 = 0 0 + 1 = 1 1 + 0 = 1 1 + 1 = 0 0 0 = 0 0 1 = 0

### Solution for Homework 2

Solution for Homework 2 Problem 1 a. What is the minimum number of bits that are required to uniquely represent the characters of English alphabet? (Consider upper case characters alone) The number of

### 3. INNER PRODUCT SPACES

. INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.

### Integer roots of quadratic and cubic polynomials with integer coefficients

Integer roots of quadratic and cubic polynomials with integer coefficients Konstantine Zelator Mathematics, Computer Science and Statistics 212 Ben Franklin Hall Bloomsburg University 400 East Second Street

### CS 248 Assignment 2 Polygon Scan Conversion. CS248 Presented by Josh Wiseman Stanford University October 19, 2005

CS 248 Assignment 2 Polygon Scan Conversion CS248 Presented by Josh Wiseman Stanford University October 19, 2005 Announcements First thing: read README.animgui animgui.. It should tell you everything you

### Digital Logic. The Binary System is a way of writing numbers using only the digits 0 and 1. This is the method used by the (digital) computer.

Digital Logic 1 Data Representations 1.1 The Binary System The Binary System is a way of writing numbers using only the digits 0 and 1. This is the method used by the (digital) computer. The system we

### Solution of Linear Systems

Chapter 3 Solution of Linear Systems In this chapter we study algorithms for possibly the most commonly occurring problem in scientific computing, the solution of linear systems of equations. We start

### A Tool for Generating Partition Schedules of Multiprocessor Systems

A Tool for Generating Partition Schedules of Multiprocessor Systems Hans-Joachim Goltz and Norbert Pieth Fraunhofer FIRST, Berlin, Germany {hans-joachim.goltz,nobert.pieth}@first.fraunhofer.de Abstract.

### Towards Fast SQL Query Processing in DB2 BLU Using GPUs A Technology Demonstration. Sina Meraji sinamera@ca.ibm.com

Towards Fast SQL Query Processing in DB2 BLU Using GPUs A Technology Demonstration Sina Meraji sinamera@ca.ibm.com Please Note IBM s statements regarding its plans, directions, and intent are subject to

### 8 Square matrices continued: Determinants

8 Square matrices continued: Determinants 8. Introduction Determinants give us important information about square matrices, and, as we ll soon see, are essential for the computation of eigenvalues. You

### CHAPTER 1 Splines and B-splines an Introduction

CHAPTER 1 Splines and B-splines an Introduction In this first chapter, we consider the following fundamental problem: Given a set of points in the plane, determine a smooth curve that approximates the

### Interactive Math Glossary Terms and Definitions

Terms and Definitions Absolute Value the magnitude of a number, or the distance from 0 on a real number line Additive Property of Area the process of finding an the area of a shape by totaling the areas

### North Carolina Math 1

Standards for Mathematical Practice 1. Make sense of problems and persevere in solving them. 2. Reason abstractly and quantitatively 3. Construct viable arguments and critique the reasoning of others 4.

### Representation of Data

Representation of Data In contrast with higher-level programming languages, C does not provide strong abstractions for representing data. Indeed, while languages like Racket has a rich notion of data type

### MATHEMATICAL INDUCTION. Mathematical Induction. This is a powerful method to prove properties of positive integers.

MATHEMATICAL INDUCTION MIGUEL A LERMA (Last updated: February 8, 003) Mathematical Induction This is a powerful method to prove properties of positive integers Principle of Mathematical Induction Let P

### Polygon Scan Conversion and Z-Buffering

Polygon Scan Conversion and Z-Buffering Rasterization Rasterization takes shapes like triangles and determines which pixels to fill. 2 Filling Polygons First approach:. Polygon Scan-Conversion Rasterize

### Prentice Hall: Middle School Math, Course 1 2002 Correlated to: New York Mathematics Learning Standards (Intermediate)

New York Mathematics Learning Standards (Intermediate) Mathematical Reasoning Key Idea: Students use MATHEMATICAL REASONING to analyze mathematical situations, make conjectures, gather evidence, and construct

### Big Data & Scripting storage networks and distributed file systems

Big Data & Scripting storage networks and distributed file systems 1, 2, in the remainder we use networks of computing nodes to enable computations on even larger datasets for a computation, each node

### PA Common Core Standards Standards for Mathematical Practice Grade Level Emphasis*

Habits of Mind of a Productive Thinker Make sense of problems and persevere in solving them. Attend to precision. PA Common Core Standards The Pennsylvania Common Core Standards cannot be viewed and addressed