Optimizing a Semantic Comparator using CUDA-enabled Graphics Hardware



Similar documents
DISTRIBUTED DATA PARALLEL TECHNIQUES FOR CONTENT-MATCHING INTRUSION DETECTION SYSTEMS

DISTRIBUTED DATA PARALLEL TECHNIQUES FOR CONTENT-MATCHING INTRUSION DETECTION SYSTEMS. G. Chapman J. Cleese E. Idle

Cluster-Aware Cache for Network Attached Storage *

Performance of a Browser-Based JavaScript Bandwidth Test

Performance of Multiple TFRC in Heterogeneous Wireless Networks

Optical Illusion. Sara Bolouki, Roger Grosse, Honglak Lee, Andrew Ng

License & SW Asset Management at CES Design Services

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Mobile Network Configuration for Large-scale Multimedia Delivery on a Single WLAN

A Resolution Approach to a Hierarchical Multiobjective Routing Model for MPLS Networks

Control of Wireless Networks with Flow Level Dynamics under Constant Time Scheduling

Apigee Edge: Apigee Cloud vs. Private Cloud. Evaluating deployment models for API management

Report b Measurement report. Sylomer - field test

Mixed Method of Model Reduction for Uncertain Systems

Project Management Basics

A note on profit maximization and monotonicity for inbound call centers

Piracy in two-sided markets

Queueing systems with scheduled arrivals, i.e., appointment systems, are typical for frontal service systems,

CHARACTERISTICS OF WAITING LINE MODELS THE INDICATORS OF THE CUSTOMER FLOW MANAGEMENT SYSTEMS EFFICIENCY

Control Theory based Approach for the Improvement of Integrated Business Process Interoperability

Binary search tree with SIMD bandwidth optimization using SSE

Introduction to GPU hardware and to CUDA

A Spam Message Filtering Method: focus on run time

SRA SOLOMON : MUC-4 TEST RESULTS AND ANALYSI S

Utility-Based Flow Control for Sequential Imagery over Wireless Networks

! Search engines are highly profitable. n 99% of Google s revenue from ads. n Yahoo, bing also uses similar model


GPU Parallel Computing Architecture and CUDA Programming Model

Turbulent Mixing and Chemical Reaction in Stirred Tanks

International Journal of Heat and Mass Transfer

How Enterprises Can Build Integrated Digital Marketing Experiences Using Drupal

Cost Models for Selecting Materialized Views in Public Clouds

A technical guide to 2014 key stage 2 to key stage 4 value added measures

CASE STUDY BRIDGE.

You may use a scientific calculator (non-graphing, non-programmable) during testing.

Warehouse Security System based on Embedded System

Exposure Metering Relating Subject Lighting to Film Exposure

Improving the Performance of Web Service Recommenders Using Semantic Similarity

Profitability of Loyalty Programs in the Presence of Uncertainty in Customers Valuations

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

T-test for dependent Samples. Difference Scores. The t Test for Dependent Samples. The t Test for Dependent Samples. s D

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

SHARESYNC SECURITY FEATURES

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

RevoScaleR Speed and Scalability

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

CLUSTBIGFIM-FREQUENT ITEMSET MINING OF BIG DATA USING PRE-PROCESSING BASED ON MAPREDUCE FRAMEWORK

Sector Concentration in Loan Portfolios and Economic Capital. Abstract

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

Nimble Storage Exchange ,000-Mailbox Resiliency Storage Solution

Next Generation GPU Architecture Code-named Fermi

Testing Documentation for CCIH Database Management System By: John Reeves, Derek King, and Robert Watts

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Return on Investment and Effort Expenditure in the Software Development Environment

Assessing the Discriminatory Power of Credit Scores

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

REDUCTION OF TOTAL SUPPLY CHAIN CYCLE TIME IN INTERNAL BUSINESS PROCESS OF REAMER USING DOE AND TAGUCHI METHODOLOGY. Abstract. 1.

QUANTIFYING THE BULLWHIP EFFECT IN THE SUPPLY CHAIN OF SMALL-SIZED COMPANIES

Scheduling of Jobs and Maintenance Activities on Parallel Machines

GPU Architecture. Michael Doggett ATI

EVALUATING SERVICE QUALITY OF MOBILE APPLICATION STORES: A COMPARISON OF THREE TELECOMMUNICATION COMPANIES IN TAIWAN

Packet-based Network Traffic Monitoring and Analysis with GPUs

Simulation of Power Systems Dynamics using Dynamic Phasor Models. Power Systems Laboratory. ETH Zürich Switzerland

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

DUE to the small size and low cost of a sensor node, a

ultra fast SOM using CUDA

Risk Management for a Global Supply Chain Planning under Uncertainty: Models and Algorithms

Chapter 3 Torque Sensor

Bio-Plex Analysis Software

Optimizing Application Performance with CUDA Profiling Tools

A COMPARATIVE STUDY OF THREE-PHASE AND SINGLE-PHASE PLL ALGORITHMS FOR GRID-CONNECTED SYSTEMS

TRADING rules are widely used in financial market as

Stream Processing on GPUs Using Distributed Multimedia Middleware

Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries

Growing Self-Organizing Maps for Surface Reconstruction from Unstructured Point Clouds

BUILT-IN DUAL FREQUENCY ANTENNA WITH AN EMBEDDED CAMERA AND A VERTICAL GROUND PLANE

Parallel Programming Survey

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

On Demand Satellite Image Processing

Introduction to GPGPU. Tiziano Diamanti

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

A Review On Software Testing In SDlC And Testing Tools

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

Transcription:

Optimizing a Semantic Comparator uing CUDA-enabled Graphic Hardware Aalap Tripathy Suneil Mohan, Rabi Mahapatra Embedded Sytem and Codeign Lab codeign.ce.tamu.edu (Preented at ICSC 0, September 9, 0 in Palo Alto, CA)

09/9/0 Overview Introduction Current v/ future technologie Key tep in computation Decription of architecture Experimental Setup & Reult Concluion

09/9/0 3 Introduction Search i a key activity on the Internet 3 billion querie a month (3500/ec) Growing rapidly (38% annually) Search Engine need to be Precie Increaed uer expectation from earch reult Not 00 link but few relevant document Search Engine need to be Scalable Search engine deployed a ditributed ytem Newer method make more computational demand Search Engine need to conume Low Energy Ten of Mega Watt (.5 MW/year) Coare-grained, tak-parallel approach i inufficient Preciion Meaning-baed earch Low Energy Conumption Objective: Deploy meaning-baed earch to enhance earch-quality while conuming le energy and meeting time contrain

09/9/0 4 Current Search Engine q! URL,Snippet! Front-End Web App (URL,nippet)! Query Proceor (Score, docid)! Document Proceor I 0 I I I 3 I N- I N I 0 I I I 3 I N- I N I 0 I I I 3 I N- I N Pool 0 Pool Pool Pool 3 Pool Np- Pool Np Pool 0 Pool Pool Pool 3 Pool Np- Pool Np Pool 0 Pool Pool Pool 3 Pool Np- Pool Np I 0 I I I 3 I N- I N I 0 I I I 3 I N- I N Index Server Pool 0 Pool Pool Pool 3 Pool Np- Pool Np Pool 0 Pool Pool Pool 3 Pool Np- Pool Np Doc Server

Current Search Paradigm Vector Method 09/9/0 5 D = American The American man ate Indian food. VAmerican Man V Man ate V ate Indian VIndian Food VFood Decriptor repreenting meaning of the entire text Bai vector repreenting the term American Scalar weight / coefficient denoting relative importance of the term Vector Method can not differentiate between two document containing the ame keyword - American man ate Indian food v/ Indian man ate American food - Produce hundred of irrelevant reult no preciion - Hundred of redundant operation performed in the proce - Ten of MW of power conumed in the proce

Future Search paradigm - Tenor Method The American man ate Indian food The Indian man ate American food c a b d e a e c d b ab c ac 3 bc a e c a c 3 e c 4 ab 5 a 6 b 7 c Bai vector term a = " man", b = " american", c = " ate", a b = " man american ", 4 a e 5 a 6 e 7 c (3 Term) (3 Term) a = "man", e Bai vector term = "Indian", c = "ate", a e = " man Indian ", bc = " american ate ", ab c = " man american ate ", e c = " Indian ate ", a e c = " man Indian ate ", Tenor method differentiate document containing ame keyword - Capture the relationhip between term - At what cot? Exponentially larger number of term 09/9/0 6

Semantic Comparion uing Tenor 04/0/00 7 (The american man ate indian food) ab c 4 ab 5 a ac 6 b 7 c 3 bc Tenor (T) (The indian man ate american food) ae c ac 3 ec 4 ae 5b 6c 7a Tenor (T) Similarity(T, T ) = T T = 57 65 76 ( < ). Identify common bai vector. Multiply calar coefficient 3. Find um of all product

09/9/0 8 Overview Introduction Current v/ future technologie Key tep in computation Challenge Decription of architecture Experimental Setup & Reult Concluion

Key Step in Semantic Computation For two Tenor of ize n, n, Search i O(n.n ) or O(n. log n ) Can we improve upon thi? 09/9/0 9

09/9/0 0 Bloom Filter Bloom Filter Enable Compact repreentation of a Set - Parameter: Number of element to be inerted (m) Size of the Bloom Filter (ize n ) Number of Indice ued to repreent each element (k) - Probability of fale poitive can be controlled

04/0/00 Detail of Comparion Coefficient table of Query Tenor (Table ) Tenor Coeff id Set of BF bit indice Id { x i : 0 x i ize n } Id i Id n Tenor id Id Id i i = 0. n Coeff i = 0.4 { 0,, 5 } { } Coefficient table of Doc Tenor (Table ) Set of BF bit indice { x i : 0 x i ize n } { 0,, 5 } Bloom filter of Doc Tenor 0 0 0 0 3 4. Identify common bai vector (filtered). Multiply coefficient 3. Compute um of product Id n n { } 0 Size n - Size n - T T =... i i... j j...

09/8/0 Overview Introduction Current v/ future technologie Key tep in computation Challenge Decription of architecture Experimental Setup & Reult Concluion

09/9/0 3 Architecture Decription - CUDA CUDA - Compute Unified Device Architecture - Device Architecture pec - An extenion to C (library, API, compiler) GPGPU ue heterogeneou parallel computing model - Kernel i called by hot and run by GPU - Each SIMD proceor execute ame intruction over different data element in parallel - Can proce thouand of thread imultaneouly. - The number of logical thread and thread block urpae the number of phyical execution unit

Architecture Decription CUDA Memory Model 09/9/0 4 CUDA Memory Model - Thread - Regiter (per thread) - Local Memory (off-chip) - Block - Shared Memory between thread - Device - Global Memory between kernel - Contant Memory Read only, tore invariant - Texture Memory limited but can cache part of Global Memory

Semantic Comparion uing CUDA 5 Phae A Phae B Phae C Phae D Copy Table from Hot PC to CUDA Global Memory Copy Table from Hot PC to CUDA Global Memory Encode Table in Bloom Filter Encode Table. Tet preence in Bloom Filter Compute Semantic Similarity uing Filtered element CUDA Programming Model Split a tak into ubtak Divide input data into chunk that fit global memory Load a data chunk from global memory into hared memory Each data chunk i proceed by a thread block Copy reult from hared memory back to global memory Optimization : Maximize independent parallelim

Phae A Copy Data from Hot to GPU 09/9/0 6 Copy two table to be compared into CUDA global Memory - Data ha to be explicitly copied into CUDA Global Memory - Optimization : Data tructure i flattened to increae coaleced memory accee - Maximize the available PCIe bandwidth (76.8 Gb/ for NVIDIA C870)

Phae B Encode Co-efficient Table in Bloom Filter 09/9/0 7 Encode Table (Document Bai Coefficient Table) in Bloom Filter - The i th Doc_Bai term i hahed uing two hah function - k additional Bloom Filter Indice are generated uing: - Turn every Index Poition in BF bit array in CUDA texture Memory - Optimization 3: At leat n thread are launched. Limit the number of block and increae the number of thread. Increae hared memory reue.

Phae C Encode & Tet Table uing Bloom Filter 09/9/0 8 Encode Table in Bloom Filter, Tet - The i th Query_Bai term i hahed uing ame two hah function - k additional Bloom Filter Indice are generated - Thoe index poition are teted in BF bit array in CUDA texture Memory - At leat n thread are launched. - If all indice are, Query_Bai i i a filtered element, tore Index (i) - Optimization 4: Shared memory i inacceible after end of kernel. Thi data i tranferred to Global Memory at the end of each thread block

Phae D Compute Semantic Similarity uing Filtered element 09/9/0 9 Extract correponding calar coefficient, multiply and um - The index of the potential match in Table i ued to lookup Coeff - The correponding match in Table i ued to lookup Coeff - The ame kernel perform multiplication (interim product) - Intermediate product from multiple thread are ummed in parallel.

09/9/0 0 Other Algorithmic Optimization Partitioning the computation to keep all tream core buy - Optimization 5: Multiple thread, multiple thread block in contant ue Monitoring per-proceor reource utilization - Optimization 6: Low utilization per thread block allow multiple active block per multi-proceor

09/9/0 Overview Introduction Current v/ future technologie Key tep in computation Decription of architecture Experimental Setup & Reult Concluion

09/9/0 Experimental Setup Device Characteritic GPU # Stream Proceor / core Core Frequency Value 8/6 (Nvidia Tela C870) 600 Mhz CUDA Toolkit 3. Interface 6x PCI-Expre Memory Clock Global Memory Contant Memory.6 GHz.6GB 65KB Experimental etup Experimental Parameter Table Size (N) Similarity between Table (c) CUDA Parameter (num_bock, thread / block) Experimental Meaurement Execution Time Power/Energy Throughput (Comparion / ec) Shared Memory/block 6KB Regiter per block 89 Number of thread per block Memory Bu Bandwidth Warp Size (Number of thread per thread proceor) 5 76.8 GB/, 384 bit-wide GDDR3 CPU P4 GB RAM, Ubuntu 9.0 3

09/9/0 3 Reult - Execution Time Profiling Execution Time (m) 00 80 GPU_Exec_Time(m) 60 CPU_Exec_Time(m) 40 0 00 80 60 40 0 0 000 5000 0000 50000 00000 50000 Number of Entrie (Term) in Table & (n=n=n) Exponential increae in CPU execution time for large table Same dataet on a GPU i up to 4x fater (imilarity c=0%)

09/9/0 4 Reult - Power Profiling Dynamic Power (W) 300 50 00 50 00 50 0 GPU Dynamic Power(W) CPU Dynamic Power(W) 000 5000 0000 50000 00000 50000 Number of Entrie (Term) in Table & (n=n=n) CPU-GPU Power Characteritic Sytem Bae Power Sytem Idle Power (GPU cold hutdown) Sytem Idle Power (GPU Awake, Idle) GPU Idle Power Value 5W 50W 86W 36W Meaured uing WattUp Power Meter. (Meaure Main Power) GPU dynamic power i lower but approache that of a CPU for N>50000 GPU are known to be energy-efficient but not necearily power-efficient

Reult - Energy Saved per Comparion 09/9/0 5 Table Size (N) CPU Execution Time () CPU Average Power (W) GPU Execution Time () GPU Average Power (W) Energy Saved (%) 5k 0.8 3 0.05 59 79.65 0k 0.74 39 0. 56 77.64 50k 0.0 4 4.93 88 77.7 00k 8.4 46 9.57 7 77.96 50k 85.3 5 43.83 33 78.04 Computing Energy Saved (Wh%) - Experiment over 5000<N<50000, Similarity between table: c=75% - Energy aving ~78% per comparion - A future emantic earch engine can either: - reduce energy footprint or - increae throughput with ame footprint

Reult - Profiling Semantic Comparator Kernel on the GPU 09/9/0 6 Number of Entrie (Term) in Table & (n =n =N) Profiling Semantic Kernel - Data copy from CPU to GPU (Phae A) ceae to be a bottleneck for N>5k - Extracting Scalar coefficient (Phae D) become a bottleneck - Computing Hah Function, Inertion into a Bloom Filter (Phae B, C) computationally negligible

09/9/0 7 Reult - Throughput Improvement Table Size (N) CPU Throughput (comparion / ) GPU Throughput (comparion / ) Improvement 5k 53996.9 73097.43 3.0 0k 3458.9 4675.69 3.47 50k 499.40 05.8 4.05 00k.39 50.85 4.0 50k 53.94 8. 4. Improvement in Throughput - Ran experiment with randomly varying imilarity between table for given N - Throughput wa defined a the invere of the averaged execution time for a given N - GPU throughput improvement i higher for larger value of N - For maller value of N, the overhead of data tranfer from CPU to GPU dominate

09/9/0 8 Overview Introduction Current v/ future technologie Key tep in computation Decription of architecture Experimental Setup & Reult Concluion

09/9/0 9 Concluion Semantic earch require introduction of fine-grained parallelim at compute node Search Engine Preciion Ue Tenor Method for meaning repreentation Search Engine Scalability Handle exploive growth in coefficient table within compute node Preciion Meaning-baed earch Leverage off-the-helf hardware like GPU a coproceor Search Engine Energy Conumption GPU baed emantic comparator ha extraordinary energy efficiency We have deigned GPU baed co-proceor that provide 4x peedup and 78% energy aving over a traditional CPU for emantic earch Low Energy Conumption

Thank You Q&A Optimizing a Semantic Comparator uing CUDAenabled Graphic Hardware

Appendix (Extra Slide) Optimizing a Semantic Comparator uing CUDAenabled Graphic Hardware

04/0/00 3 Comparion with prior art Characteritic Time/Cycle Energy Saving Traditional Microproceor Wort performing Wort performing GPGPU Medium Moderately high ASIC 4 cycle @ realizable clock frequency Very High Adoption Cot Low Intermediate High fabrication, development, integration cot. IO iue not addreed Overall characterization Low peed, Low Cot Balanced Cot and Speed High Speed, high cot

04/0/00 33 Future Work Memory I/O Iue Tranmit only hahed dataet to GPU Will reduce dataet from Nx40x to Nx8x byte per table (5 time) Tranmit only one Hah intead of two to GPU Compute the econd et of hahe in the GPU from the firt Will reduce dataet from Nx8x to Nx8x Can not Call one kernel from another Control ha to pa through the CPU Vary GPU Parameter Experimentation with Multiple Grid (In thi paper a ingle grid wa ued) Further experimentation with varying number of block, number of thread per block

04/0/00 34 Reference S. Mohan, A. Tripathy, A. Biwa, and R. Mahapatra, "Parallel Proceor Core for Semantic Search Engine," preented at the Workhop on Large-Scale Parallel Proceing (LSPP) to be held at the IEEE International Parallel and Ditributed Proceing Sympoium (IPDPS'), Anchorage, Alaka, USA, 0. S. Mohan, A. Biwa, A. Tripathy, J. Panigrahy, and R. Mahapatra, "A parallel architecture for meaning comparion," preented at the Parallel & Ditributed Proceing (IPDPS), 00 IEEE International Sympoium on, Atlanta, GA 00. A. Biwa, S. Mohan, A. Tripathy, J. Panigrahy and R. Mahapatra, "Semantic Key for Meaning Baed Searching", in 009 IEEE International Conference on Semantic Computing (ICSC 009),4-6 September 009, Berkeley, CA, USA. A. Biwa, S. Mohan, and R. Mahapatra, "Search Co-ordination by Semantic Routed Network", in 8th International Conference on Computer Communication and Network, (ICCCN 009),009, San Francico, CA, USA. A. Biwa, S. Mohan, J. Panigrahy, A. Tripathy, and R. Mahapatra, "Repreentation of complex concept for emantic routed network," in 0th International Conference on Ditributed Computing and Networking, (ICDCN 009), 009, Hyderabad, pp 7-38 A. Biwa, S. Mohan and R. Mahapatra, "Optimization of Semantic Routing Table", in 7th International Conference on Computer Communication and Network, (ICCCN 008), 008, US. Virgin Iland

Challenge Exploive Growth in Number of Term with Tenor Model 09/9/0 35 Repreentation of intent a b c Number of Index Term with Vector Method 3 Number of Index Term with Tenor Method 7 e a b c d 5 3 a b c d e f g h 8 55 a b c d e f g h i 9 5

04/0/00 36 Current Search Paradigm Vector baed model Aign weight to keyword Compute imilarity uing dot product The ale manager took the order. D = ale Vale manager Vmanager tookvtook order V order Decriptor repreenting meaning of the entire text Scalar weight / coefficient denoting preence of the term Bai vector repreenting the term ale

Comparing Current v/ Future Method 09/9/0 37 Creating Coefficient Table - Firt column how term, Second Column how coefficient Tenor Method introduce two additional tep - Concept Tree, Tenor Form - More computation, but increaed preciion.

09/9/0 38 Future A Semantic Search Engine Query Proceor I 0 I I I 3 I N- I N I 0 I I I 3 I N- I N I 0 I I I 3 I N- I N I 0 I I I 3 I N- I N q! Front-End Web App (URL,nippet)! Document Proceor (Score, docid)! Pool 0 Pool Pool Pool 3 Pool Np- Pool Np Pool 0 Pool Pool Pool 3 Pool Np- Pool Np Pool 0 Pool Pool Pool 3 Pool Np- Pool Np Pool 0 Pool Pool Pool 3 Pool Np- Pool Np Reorganize the index hard of a earch engine - Small World Network - Reduce Query Rate to <Q/ N<<Q - Query reolution i guaranteed within a average of 3 hop - What i the downide? I 0 I I I 3 I N- I N Index Server Pool 0 Pool Pool Pool 3 Pool Np- Pool Np Doc Server

04/0/00 39 Approach Ue Tenor Baed Repreentation for meaning. Meaning Comparion baed on dot product of the tenor. The american man ate indian food 4 ab c ab 5 a ac 6 b 7 3 bc c Bai vector term a = " man", b = " american", c = " ate", a b = " man american ", bc = " american ate ", ab c = " man american ate ",

04/0/00 40 Converion of a Tree to a Tenor Example of a pecific concept tree Generic Concept tree with C Child Node, Depth D, L leave Salient Feature - Concept Tree: Hierarchical acyclic directed n-ary tree. - Lead node repreent term wherea the tree decribe interrelationhip Expanion of Tree - Bottom-up. Make allpoible polyadic combination - Generic Cae (Aume): - Intermediate Node I - C Child Node - One child node P contain L leave - Number of Term at D I due to P will be L - - Each of the C node will produce nnnc - term

04/0/00 4 Tenor comparion c b a a b a c ab 7 6 5 4 3 b c c ) ( T T ) T Similarity(T, 6 7 5 6 7 5 < = = a c b b a a a b 7 6 5 4 3 c c b c Tenor (T) Tenor (T) (The american man ate indian food) (The indian man ate american food). Identify common bai vector. Multiply calar coefficient 3. Find um of all product

04/0/00 4 Dot product challenge (The american man ate indian food) 4 ab c ab 5 a ac 6 b 7 3 bc c When Tenor are large, identification of common bai vector i time conuming. For two Tenor of ize n, n Search i O(n.n ) or O(n. log n ) Can we improve upon thi?

04/0/00 43 Bloom Filter Compact repreentation of a et. m bit long bit vector k hah function Element X Hah Func. F (Id i )= 0 F (Id i )= Bit Array 0 0 0 3 4 F k (Id i )= j 0 m- m-

04/0/00 44 Bloom Filter Inertion Element X Hah Func. F (Id i )= 0 F (Id i )= Bit Array 0 0 0 0 0 0 3 4 F k (Id i )= j 0 0 m- m-

04/0/00 45 Bloom Filter Teting for preence (Memberhip tet) Element X Bit Array Hah Func. 0 F (Id i )= 0 F (Id i )= 0 0 3 4 Can have fale poitive F k (Id i )= j Never have fale negative Fale Poitive rate can be reduced by chooing large m and optimal k value. m- m- 0 For n=0 3 element, k= 7, m = 040 bit Probability of Fale poitive ~ 8x0-3

04/0/00 46 Data Structure (The american man ate indian food) Element ac Id = MD5( ac ) 4 ab c ab 5 a ac 6 b 7 3 bc c Bit Index Generation Hah Func. Bit F (Id i )= 0 Array 0 F (Id i )= 0 3 F k (Id i )= j 4 Tenor id Id Id i Coeff i = 0. Set of BF bit indice bit indice { x i : 0 x i m } { 0,, j } 0 m- m- Id n n { } Bloom Filter Coefficient table