ll IHIDI

Similar documents
HELP DESK SYSTEM IZZAT HAFIFI BIN AHMAD ARIZA

SPAM FILTERING USING BAYESIAN TECHNIQUE BASED ON INDEPENDENT FEATURE SELECTION MASURAH BINTI MOHAMAD

JOB AGENT MANAGEMENT SYSTEM LU CHUN LING. A thesis submitted in partial fulfillment of. the requirements for the award of the degree of

PRODUCTIVITY IMPROVEMENT VIA SIMULATION METHOD (MANUFACTURING INDUSTRY) HASBULLAH BIN MAT ISA

EXPERIMENTAL ANALYSIS OF PASSIVE BANDWIDTH ESTIMATION TOOL FOR MULTIPLE HOP WIRELESS NETWORKS NURUL AMIRAH BINTI ABDULLAH

Car Rental Management System (CRMS) Lee Chen Yong

BISKUT RAYA INVENTORY MANAGEMENT SYSTEM (BRIMS) NURUL AMIRAH BINTI ROSLAN THESIS SUBMITTED IN FULFILLMENT OF THE DEGREE OF COMPUTER SCIENCE

Knuth-Morris-Pratt Algorithm

Tarkh BOOK INVENTORY SYSTEM USING RFID NURUL NADIA BINTI MAT JALALUDDIN

A Fast Pattern Matching Algorithm with Two Sliding Windows (TSW)

MODELING AND SIMULATION OF SINGLE PHASE INVERTER WITH PWM USING MATLAB/SIMULINK AZUAN BIN ALIAS

IMPROVING SERVICE REUSABILITY USING ENTERPRISE SERVICE BUS AND BUSINESS PROCESS EXECUTION LANGUAGE AKO ABUBAKR JAAFAR

INTEGRATED STAFF ATTENDANCE SYSTEM (ISAS) WEE PEK LING

LIGHTNING AS A NEW RENEWABLE ENERGY SOURCE SARAVANA KUMAR A/L ARPUTHASAMY UNIVERSITI TEKNOLOGI MALAYSIA

TABLE OF CONTENTS. SUPERVISOR S DECLARATION ii STUDENT S DECLARATION iii DEDICATION ACKNOWLEDGEMENTS v ABSTRACT LIST OF TABLES

THE FINGERPRINT IDENTIFICATION OF ATTENDANCE ANALYSIS & MANAGEMENT. LEE GUAN HENG (Software Engineering)

UNIVERSITI TEKNIKAL MALAYSIA MELAKA FAKULTI TEKNOLOGI MAKLUMAT DAN KOMUNIKASI

Improved Approach for Exact Pattern Matching

STRESS EFFECT STUDY ON 6 DIFFERENT PATTERN OF TYRES FOR SIZE 175/70 R13 SYAHRIL AZEEM ONG BIN HAJI MALIKI ONG. for the award of the degree of

THE RELATIONSHIP BETWEEN HUMAN RESOURCE INFORMATION SYSTEM (HRIS) AND HUMAN RESOURCE MANAGEMENT (HRM) ALFRED PUN KHEE SEONG

VEHICLE TRACKING AND SPEED ESTIMATION SYSTEM CHAN CHIA YIK. Report submitted in partial fulfillment of the requirements

DEVELOP AND DESIGN SHEMATIC DIAGRAM AND MECHANISM ON ONE SEATER DRAG BUGGY MUHAMMAD IBRAHIM B MD NUJID

CLINIC MANAGEMENT SYSTEM: ELECTRONIC MEDICAL RECORDS SYSTEM NUR SYATIRAH BINTI MOHD ADZHAR

Genetic Algorithm to Optimize Routing Problem Modelled As the Travelling Salesman Problem MUHAMMAD AZRUL FAIZ BIN NOR ADZMI

MASTER S PROJECT REPORT SUMMARY

HELP DESK MANAGEMENT SYSTEM. LOW SlEW PEI

Robust Quick String Matching Algorithm for Network Security

TABLE OF CONTENTS CHAPTER TITLE PAGE

Contributing Efforts of Various String Matching Methodologies in Real World Applications

TRANSFORMATIONAL PROJECT MANAGER: AN ENABLER OF AN ENTERPRISE RESOURCE PLANNING (ERP) IMPLEMENTATION SUCCESS JOHN ONYEKACHI OKUGO

ONLINE DATA VISUALIZATION FOR UMP STRATEGIC PLAN DASHBOARD ANDY PUE ZENFOONG TECHNICAL SUBMITTED IN FULFILMENT OF THE DEGREE OF COMPUTER SCIENCE

CLINICAL MANAGEMENT SYSTEM (CMS)

UMP-AUTOMATIC SPORT FACILITIES MANAGEMENT SYSTEM (UMPASFMS) TAN HOOI FONG UNIVERSITI MALAYSIA PAHANG

THE APPLICATION OF EARNED VALUE MANAGEMENT IN EVENT PROJECT MANAGEMENT INDUSTRY YEOH XIN HAO

A STUDY ON INVENTORY CONTROL SYSTEM PRACTICE IN KUANTAN FOOD PROCESSING SMALL MEDIUM ENTERPRISE (SME) NURUL AJILAH BINTI MOHAMAD ROSALAN

SIT! FAIRUZ BT MOHD RAZALI

RFID BASED SYSTEMATIC STUDENT S ATTENDANCE MANAGEMENT SYSTEM HANISAH BINTI HAMID. Faculty of Electrical and Electronic Engineering

DEVELOPMENT OF STATISTICAL PROCESS CONTROL (SPC) MATLAB- BASED SOFTWARE FOR AUTOMOTIVE INDUSTRIES APPLICATION SITI HAZLINA BINTI MOKHTAR

CLIENT SERVER APPLICATION FOR SERVER FARM PERFORMANCE MONITORING ABDIRASHID HASSAN ABDI

DEVELOPING AN ISP FOR HOTEL INDUSTRY: A CASE STUDY ON PUTRA PALACE HOTEL

ECOMMERCE USING SEARCH ENGINE OPTIMIZATION (ESEO) YEAP JIA WEI BACHELOR OF COMPUTER SCIENCE (SOFTWARE ENGINEERING) UNIVERSITI MALAYSIA PAHANG

ATTENDANCE SYSTEM USING MYKAD AND MOBILE APPLICATION MASTIJRA BINTI AWANG

INTEGRATING CONSUMER TRUST IN BUILDING AN E-COMMERCE WEBSITE NUR ZAILAN BIN OTHMAN

BUS MANAGEMENT SYSTEM HASDILAZIRA BT ABD LATIIF

REGISTRATION AND BILLING SYSTEM TAN CHIN YONG

DRUG MANAGEMENT SYSTEM (DMS) NURUL A1N BINTI MUHAMMAD

MOTION DETECTION FOR PC BASED SECURITY SYSTEM BY USING OPTICAL FLOW NUR NABILAH BT MOHAMAD HAMID

NR. Pe 10 No. Panggian

NOOR HANIRA BINTI MAHIDIN

COMPARATIVE STUDY OF ERP IMPLEMENTATION METHODOLOGY CASE STUDY: ACCELERATED SAP VS DANTES & HASIBUAN METHODOLOGY

Screen Design : Navigation, Windows, Controls, Text,

ANT COLONY OPTIMIZATION (ACO) ALGORITHM FOR CNC ROUTE PROBLEM WAN NUR FARHANAR BT WAN ZAKARTA

WEB-BASED PROPERTY MANAGEMENT SYSTEM SAFURA ADEELA BINTI SUKIMAN

A FAST STRING MATCHING ALGORITHM

vii TABLE OF CONTENTS CHAPTER TITLE PAGE DECLARATION DEDICATION ACKNOWLEDGEMENT ABSTRACT ABSTRAK

Lecture 4: Exact string searching algorithms. Exact string search algorithms. Definitions. Exact string searching or matching

TABLE OF CONTENT CHAPTER TITLE PAGE TITLE DECLARATION DEDICATION ACKNOWLEDGEMENTS ABSTRACT ABSTRAK

DRIVING SCHOOL STUDENT MANAGEMENT SYSTEM TENGKU DIYANA BINTI TENGKU IBRAHIM

RESTAURANT ORDERING SYSTEM USING MOBILE APPLICATION

REAL TIME FACE DETECTION SYSTEM AMY SAFRINA BINTI MOHD ALI

MOHD KHALIL BIN ABDUL KADJR

Fast string matching

A STUDY OF SECURITY LIMITATIONS IN VIRTUAL LOCAL AREA NETWORK IMPLEMENTATION

Learn AX: A Beginner s Guide to Microsoft Dynamics AX. Managing Users and Role Based Security in Microsoft Dynamics AX Dynamics101 ACADEMY

AERODYNAMIC STUDY OF FORMULA 1 IN SCHOOL FOR MOZAC RACING CAR MOHD FAIZ BIN MAT ZAID UNIVERSITI TEKNIKAL MALAYSIA MELAKA

It iiumiuimmnuu. Pusat Khidrpp; Maklumat Akadendk UNIVERSITI MALAYSIA SARAWAK Kote ßarnprjn HOST-BASED INTRUSION DETECTION SYSTEM (HIDS)

OCBC GREAT EASTERN CO-BRAND CARD FREQUENTLY ASKED QUESTIONS (FAQ) REBATE FEATURES, INTEREST FREE AUTO INSTALMENT PAYMENT PLAN (AUTO-IPP) AND BENEFITS

CUSTOMER ONLINE PURCHASE INTENTION TOWARDS AIRLINE E-TICKETING IN KLANG VALLEY CHEW YUH YIING CHONG CHOOI SUN MICHELLE SIM KAI FERN YONG SOOK HUOI

SMART SHOES CHARGER TAN CHEE CHIAN

DEVELOPMENT OF PCA-BASED FAULT DETECTION SYSTEM BASED ON VARIOUS OF NOC MODELS FOR CONTINUOUS-BASED PROCESS MOHAMAD YUSUP BIN ABD WAHAB

DEPARTMENT OF ESTATE -MANAGEMENT FACULTY OF ARCHITECTURE, PLANNING AND SURVEYING UNIVERSITI TEKNOLOGI MARA

SWAY REDUCTION ON GANTRY CRANE SYSTEM USING DELAYED FEEDBACK SIGNAL (DFS) NORASHID BIN IDRUS

A STUDY ON MOTIVATION TO START UP A BUSINESS AMONG CHINESE ENTREPRENEURS

UNIVERSITI PUTRA MALAYSIA CLUSTERING ALGORITHM FOR MARKET-BASKET ANALYSIS: THE UNDERLYING CONCEPT OF DATA MINING TECHNOLOGY

HOME AUTOMATION SYSTEM USING POWER LINE COMMUNICATION DARLENE BINTI MOHAMAD DOUGLAS

SIMULATION OF MICRO HYDRO POWER BASED ON RIVER CONFIGURATION AT RIVER UPSTREAM NUR AZIRAH BINTI ABDUL RAHIM. Thesis submitted in fulfillment of

CLINIC MANAGEMENT SYSTEM: OUTPATIENT MANAGEMENT SYSTEM NURZETY AQTAR AHMAD AZUAN

STUDY OF F1 CAR AERODYNAMIC REAR WING USING COMPUTATIONAL FLUID DYNAMIC (CFD) MOHD SHAHMAL BIN MOHD SHAHID

SPA BEAUTY MANAGEMENT SYSTEM NAJIHAH BINTI RUSSLI

STUDY OF F1 CAR AERODYNAMICS FRONT WING USING COMPUTATIONAL FLUID DYNAMICS (CFD) MOHD SYAZRUL SHAFIQ B SAAD

FTMK LECTURER'S APPOINTMENT SYSTEM (FLAS) NORAHAYU BTNTI MOHD RAMLY

This report is submitted in partial fulfillment of the requirements for the Bachelor of Computer Science (Software Development)

Table of Contents. Chapter No. 1 Introduction 1. iii. xiv. xviii. xix. Page No.

STUDENT S DECLARATION

DEVELOPMENT OF DATA ACQUISITION SOFTWARE FOR CENTRALIZED RADIATION MONITORING SYSTEM

Schneps, Leila; Colmez, Coralie. Math on Trial : How Numbers Get Used and Abused in the Courtroom. New York, NY, USA: Basic Books, p i.

HOME SECURITY SENSOR SYSTEM USING ATMEL AVR MICROCONTROLLER LIM BOON HOI

MBO TICKET M-BOOKING SYSTEM RIDZWAN BIN ABDOL RAHMAN UNNERSITI TEKNIKAL MALAYSIA MELAKA

CELL PHONE SALES MANAGEMENT SYSTEM (CPSMS)

Keywords: System Identification, Least Square Method, RC circuits. Kata kunci : Sistem Identifikasi, Least Square Method, Litar RC

K-BASED HELPDESK SYSTEM SISTEM MEJA BANTUAN BERASASKAN PENGETAHUAN

NUR AISHAH BINTI OMAR YAHYA

INVENTORY MANAGEMENT SYSTEM USING DISTRIBUTED SYSTEM APPROACH GOH LIRONG

NOR BAHYAH BINTI ABDUL RASHID

TOTAL QUALITY MANAGEMENT APPROACH, A CASE STUDY IN MANUFACTURING INDUSTRY YAP PEI XIANG

MOHD JOHARI BIN OTHMAN

Faculty of Computer Science & Information Technology

ASSEMBLY LINE BALANCING IMPROVEMENT: A CASE STUDY IN AN ELECTRONIC INDUSTRY SITI FARAHIN BINTI BADRUL HISHAM. for the award of the Degree of

Lead Nurturing for Merchant Acquiring

INTRANET CHAT MONITORING TOOL SERVER AMIR FAUZAN BIN JAAFAR

Transcription:

PERPUSTAKAAN UMP ll IHIDI 11111111111 0000108282 TEXT SEARCHING ALGORITHM USING BOYER MOORE HORSPOOL (BMH) ALGORITHM SITI NURAFIQAH BINTI JAAFAR A thesis submitted in fulfillment of the requirements for the award of the degree of Bachelor of Computer Science (Graphics and Multimedia Technology) Faculty of Computer Systems & Software Engineering University Malaysia Pahang DISEMBER 2015

VII ABSTRACT The text searching is one of the famous technique uses to fmd data or text from the system for faster searching. Search algorithm is also known as a universal problemsolving mechanism. We study about the selected technique which is Boyer Moore Horspool algorithm that used for searching the occurrences of a pattern (keyword) in a random text in medical database. The matching between these two words which is pattern and text are analyze by display the accuracy percentage. The objective of this study is to develop a searching function using Boyer-Moore Horspool algorithm based on keyword in a medical database. Therefore, we intend to provide an automatic search routine in the medical first aid application and evaluate performance of the proposed searching algorithm through character match percentage. Based on the study, the average analysis result for this algorithm is 93% which is a strong match.

VIII ABSTRAK Pencarian teks adalah salah satu tekriik yang terkenal digunakan untuk mencari data atau teks daripada sistem untuk carian yang lebih pantas. Algoritma carian juga dikenali sebagai mekanisme penyelesaian masalah yang universal. Kami belajar tentang teknik yang dipilih iaitu Boyer Moore Horspool algoritma yang mana digunakan untuk mencari kejadian sesuatu corak (kata kunci) dalam teks secara rawak di dalam pengkalan data perubatan. Persamaan perkataan bagi kedua-dua corak dan teks mi akan dianalisis untuk memaparkan peratusan ketepatannya. Objektif kajian mi adalah untuk membangunkan fungsi pencarian menggunakan Boyer-Moore Horspool algoritma berdasarkan kata kunci dalam pangkalan data perubatan. Oleh itu, kami berhasrat untuk menyediakan rutin carian automatik dalam permohonan pertolongan cemas perubatan dan menilai prestasi algoritma pencarian dicadangkan melalui peratusan persamaan perkataan. Berdasarkan kajian mi, purata bagi keputusan analisis untuk algoritma mi ialah 93% dimana ia adalah persamaan yang kuat.

ix TABLE OF CONTENT DECLARATION OF THESIS AND COPYRIGHT iii DECLARATION iv SUPERVISOR DECLARATION v ACKNOWLEDGEMENTS ABSTRACT ABSTRAK TABLE OF CONTENTS LIST OF FIGURES LIST OF TABLES EQUATION LIST OF ABBREVIATIONS Vi vii viii ix AN xv xvii xviii CHAPTER 1 :INTRODUCTION 1.0 Introduction 1.1 Problem Background 2 1.2 Problem Statement 3 1.3 Objective 3 1.4 Scope 4 1.5Thesis Organization 5

X CHAPTER 2: LITERATURE REVIEW 2.0 Introduction 6 2.1 Search algorithm techniques 8 2. 1.1 Boyer Moore Algorithm 8 2.1.2 Boyer Moore Horspool Algorithm 9 2.1.3 Brute-Force Algorithm 10 2.1.4 Knuth-Morris Pratt Algorithm 11 2.1.5 Rabin-Karp Algorithm 12 2.2 Comparison of search algorithm 13 2.3 Analysis for accuracy 16 2.3.1 Word Match Percentage 16 2.3.2 Character Match Percentage 18 2.4 Existing system on application of search algorithm 21 2.4.1 English dictionary 21 2.4.2 Intrusion detection system (IDS) 22 2.4.3 DNA pattern searching 22 2.5 Comparison of existing system 23 2.6 Methodology 24 2.6.1 Rapid application development (RAD) Design Model 24 2.6.1.1 RAD Requirements or Planning Phase 25 2.6.1.2 RAD Design Phase 26 2.6.1.3 RAD Construction Phase 26 2.6.1.4 RAD Cutover Phase 27 2.7 RAD advantage and disadvantage 27 2.8 Hardware and software 28 2.8.1 Hardware tools 28 2.8.2 Software tools 29 2.9 Chapter summary 30

xi CHAPTER 3: METHODOLOGY 3.0 Introduction 31 3.1 Framework project 33 3.2 Requirements or Planning Phase 34 3.3 Design phase 35 3.3.1 Boyer Moore Horspool (BMH) Algorithm 35 3.3.1.1 Details description on boyer moore horspool algorithm 36 3.3.2 Accuracy result 40 3.3.3 System prototype 41 3.4 Construction phase 42 3.5 Cutover phase 43 3.6 Chapter summary 44 CHAPTER 4: IMPLEMENTATION 4.1 Implementation on model 45 4.2 The function 46 4.2.1 Step i :Read data from user input 46 4.2.2 Step2:Prepare preprocessing 48 4.2.1 Step3: Prepare searching 49 4.2.2 Step4: Display intended data 52 4.3 Chapter summary 55 CHAPTER 5: TESTING AND RESULT DISCUSSION 5.1 Result and discussion 56 5. 1.1 Result analysis on each dataset 58 5.1.2 Total average accuracy of dataset 63 5.2 Discussion 63 53 Chapter summary 64

XII CHAPTER 6: CONCLUSION 6.1 Conclusion 65 6.2 Lesson learnt 67 6.3 Research constraint 68 6.3 Future work 69 REFERENCE 70 APPENDICES Appendix A Project Gantt Chart 71 Appendix B Sample data 73

XIII LIST OF FIGURES Figure 2.1 RAD Model 25 Figure 3.1 RAD model graphic 32 Figure 3.2 Project Framework 33 Figure 3.3 Boyer Moore Horspool Algorithm 35 Figure 3.4 Bad shift 1 38 Figure 3.5 Bad shift 2 38 Figure 3.6 Bad Shift 3 38 Figure 3.7 Bad shift 4 39 Figure 3.8 Bad shift 5 39 Figure 3.9 Flowchart 41 Figure 3.10 BMH preprocessing 42 Figure 3.11 BMH searching 43 Figure 4.1 Searching interface 46 Figure 4.2 About us interface 47 Figure 4.3 Codes for database connection 47 Figure 4.4 Codes for preprocessing 49

xiv Figure 4.5 Bad shift 1 50 Figure 4.6 Bad shift 2 50 Figure 4.7 Bad shift 3 50 Figure 4.8 Bad shift 4 51 Figure 4.9 Codes for searching 51 Figure 4.10 Codes for database query 51 Figure 4.11 Codes for accuracy 53 Figure 4.12 Searching interface 54 Figure 4.13 Information interface 55

xv LIST OF TABLE Table 2.1 Comparison of Advantages and Disadvantages 13 Table 2.2 Comparison of Advantages and Disadvantages 14 Table 2.3 Comparison of Performance (Mustafa Abdulsahib Naser, 2010) 15 Table 2.4 Word Match Percentage 18 Table 2.5 Character Match Percentage 20 Table 2.6 Comparison of Existing System 23 Table 2.7 RAD Advantages and Disadvantages 27 Table 2.8 Lists of hardware requirement 28 Table 2.9 Lists of software requirement 29 Table 3.1 Horspool bad shift table 36 Table 3.2 Horspool bad shift table formula 37 Table 4.1 Horspool bad shift table 48 Table 4.2 Horspool bad shift table calculation 48 Table 5.1 Analysis dataset 1 58 Table 5.2 Analysis dataset 2 59

Table 5.3 Analysis dataset 3 60 Table 5.4 Analysis dataset 4 61 Table 5.5 Analysis dataset 5 62 Table 5.6 Total average accuracy of dataset 63

xvii EQUATION (2. 1) WIVIP 16 (2.2) CMP 18 (3.1)CMP 40

xviii LIST OF ABBREVIATIONS BM - Boyer-Moore algorithm BMH - Boyer Moore Horspool algorithm BF - Brute-Force algorithm KMP - Knuth-Morris Pratt algorithm RK - Rabin-Karp algorithm T - Text P - Pattern

CHAPTER 1 INTRODUCTION 1.0 INTRODUCTION Search algorithm is known as a universal problem-solving mechanism. It is an algorithm for searching an item (pattern) with a collection of item (text). Normally, it is a list of data documents that match the predefined matching criteria which is the outcome of search forms. This algorithm is use to seeking for more valuable information from available dataset in medical database. For example, search algorithm is needed when to search for word or phrase in a document, search for a student's name in a student list and search for approximate file name in a directory. There are several types of text searching algorithm which is Boyer-Moore algorithm (BM), Boyer Moore Horspool algorithm (BMH), Brute-Force algorithm (BF), Knuth-Morris Pratt algorithm (KMP), and Rabin-Karp algorithm (RK).

2 This research is concern on pattern matching problems. It is used to check the similarities between pattern matching and the text from database. In many field, such as computer engineering, computer science, bio-science database query and so on, text matching processing is essential and therefore applied frequently. To check whether this pattern is a substring of the text, the algorithm will compares a short string called pattern with a long string called text. 1.1 PROBLEM BACKGROUND The most important problem in text searching is the exact text matching [1]. This can be describes in finding an exact related text to search an exact pattern input by user which expect the relevant output from their search in the shortest time. For example, there are given a text, T of length in and pattern, P of length n in matching text problem. The output from matching algorithm is either an indication that the pattern P does not exist in T or the starting index in T is a substring matching P. Hence, want to find whether P is a substring of T starting from some index that matches pattern P.

3 1.2 PROBLEM STATEMENT Conventional method of search algorithm has raising some problems such as inefficiency of pattern search techniques. The problem consists of: 1.2.1 The amount of memory space needed by the computations. 1.2.2 The matching accuracy between pattern input and text in database. 1.3 OBJECTIVES Based on the problems statement, the objectives of this study are: i. To develop a searching function using Boyer-Moore Horspool algorithm based on keyword in a database. ii. To provide an automatic search routine in any searching application. iii. To evaluate performance of the proposed searching algorithm through character match percentage.

4 1.4 SCOPE The scopes for this project research are: L Search algorithm for pattern matching in a text Study and explore details for search algorithm to find advantages and disadvantages of selected technique which is Boyer Moore Horspool algorithm. The keyword search can be input maximum only one word and search only for symptoms field. ii. Applying search algorithm techniques in medical database. This selected technique will be applied to medical database and do some analysis regarding to character match between the input and output display. iii. Medical database collection For this study, we collected 1000 data for medical database. Based on this data we divided it into 5 dataset which each dataset hold for 200 data.

1.5 THESIS ORGANIZATION Chapter 1 begins by giving an introduction part that explains the project overview of the research, the main objectives of the project where those objectives are approaching from the problem statements. As well, it clearly defined the scopes of the project to be focus on this system. Chapter 2 offers a literature review. It is focusing on the studies and discussion of the existing system. It also will extract idea from the existing systems that will be used in this system. The comparison is declared in this chapter. Besides, software and hardware use also will discuss in this chapter. Chapter 3 outlines the method about the methodology terms to be discussed on in developing the project. Through this chapter, it wills reviews all the technologies and methods that are used to develop and implement the project. Next, in Chapter 4 which is implementation describes the implementation for the project development where focusing on construction phase. Chapter 5 offers a testing and result discussion for the project. This chapter to show the result experiment of selected technique based on its accuracy. Chapter 6 is a discussion on conclusion of the project research in searching algorithm and it future work.

CHAPTER 2 LITERATURE REVIEW 2.0 INTRODUCTION Search algorithm is a list of data documents that match the predefined matching criteria which is the outcome of search forms. It also related to the act of seeking for more valuable information from available dataset [2]. Based on [3], without searching algorithm, we would not be possible to find different phrase based upon a certain pattern. This chapter presents a description of a few standard algorithms used in text searching. It is to find within a text T and match for pattern P, where the text, pattern and match can be interpreted in different ways. This research focuses on when the text is fixed and not known in progress. Text searching is commonly to retrieve many documents that are relevant to the intended search question by the user. Based on [1], this matching algorithm scans the text with the aid of the sliding window mechanism.

7 In this research, a study on different text search techniques is explored and defined. The focus is on finding the first, or all the occurrences of a keyword in a textstring searching, arises in many computing applications that can be applying in medical database. This search technique is so fundamental that most large computer programs use it in one form or another. It is used to find the matches between the text and specified pattern that enter by the user to search selected terms related to the medical database. Hence, there is some comparison to find the best search algorithm between selected techniques in this research so that can be implemented in medical database. There are several basic techniques for search algorithm based on text searching [2]: A. Full text search This type of search automatically indexes all words in each document which is contrast with keyword search. Commonly, it is widely used for internet information retrieval in automated search engines. Due to its completed document indexing, this search presents a clear advantage that relevant document are rarely overlooked. B. Keyword search Keyword search allows user to find phrase and words located everywhere in the record since it is widely used in library management systems.

8 2.1 SEARCH ALGORITHM TECHNIQUES This technique of text matching algorithms is based on character comparisons between the character in text and character in the pattern. There are five types of techniques that for comparison to this study. The following are some examples of search algorithm. 2.1.1 Boyer Moore Algorithm Boyer Moore (BM) algorithm was discovered in 1997 is considered as the most efficient string matching algorithm in unusual application. In order to calculate the new comparing position, Boyer Moore algorithm uses both good-suffix function and badchar function, [6]. The good-suffix function looks up string leader character from right to the left. In addition, the bad-character shift consists in aligning the text character T with rightmost occurrence in P. Normally, this algorithm work fast in the larger alphabet case. The Boyer Moore algorithm requires O(nm) time complexity for the searching phase. However, for the heuristics' pre-processing phases it requires O(m+(Y) time and space complexity [5]. Furthermore, times complexity is O(nlm) under best performance, however, under the worst performance, times complexity is O(mn).

9 2.1.2 Boyer Moore Horspool Algorithm Boyer Moore Horspool (BMH) algorithm is a simplification of the Boyer Moore algorithm. This algorithm used to solves the problem of string matching where to find an occurrence of a pattern (a string) in a text (another string), or to decide that none exists. The Boyer Moore Horspool algorithm is efficient due to standard benchmark for practical string search literature. Besides, the scanning pattern from right to left method will use when trying to match the pattern against the text [4]. This algorithm proposed to only use the bad-char function of the rightmost character of the window to compute the shift in the BM algorithm [3]. Furthermore, for mismatch case, the shift value is determined by the bad char value of last character instead of character that caused mismatch. So that more jump is archived using bad char. The concept of good-suffix from original Boyer Moore algorithm is removed so that easy to implement. This choice will lead to a better performance in searching the text. Furthermore, practical applications show that this algorithm is much more efficient compared to Boyer Moore Algorithm On average, and in worst cases, BMH algorithm is faster than BM algorithm. BMH algorithm require O(m+ )time complexity in preprocessing phase and O(nn) time complexity in searching phase since it is a heuristic. In the best performance, it time complexity is 0 (n/rn).

10 2.1.3 Brute-Force Algorithm The Brute Force algorithm is the most basic method of approaching the problem of pattern matching. It is also known as Naive String Matching algorithm. This algorithm compares the pattern to the text which is one character at the time until mismatching characters are found. It is very easy and simple to follow. This algorithm uses a comparing technique which is scanning left to right, one character at a time and requires no preprocessing phase or extra space [1]. This algorithm can be a best choice when need to finding a small pattern in a small text [3]. However, it has no preprocessing phase and did not required extra space [4]. For every possible substring in the text, the Brute Force algorithm takes 0(m) time to find whether the pattern appears in the text. Hence, the time complexity for this algorithm is 0(nm). Once a mismatch occurs, the sliding window will shifts one position to the left and next restarts to match from the first position of the pattern makes this algorithm not efficiency [1].

11 2.1.4 Knuth-Morris-Pratt Algorithm Knuth-Morris-Pratt algorithm (Knuth et al, 1977) was introduced by Don Knuth, Jim Morris, and Vaughan Pratt. This algorithm is an improvement of the Brute Force (BF) algorithm [1]. This algorithm was developing in order to speed up the procedure of exact pattern matching by improving the length of the shift [4]. It is consider the first linear string matching algorithm and use a shift function based on the notion of the prefixes of the pattern. It is the classical algorithm that implements efficiently the left-to-right scan strategy which is quite similar to the Brute Force approach [6]. This algorithm states that the most we can shift the pattern in order to avoid redundant comparisons is namely the length of the next function. In addition, to achieve this task, Knuth-Morris-Pratt preprocessor the pattern to find matches of prefixes of the pattern with the pattern itself. The Knuth-Morris-Pratt algorithm require 0(m) time and space complexity for the preprocessing phase, while it requires 0(n+m) time complexity for the searching phase [1]. Disable to work well when the size alphabet increases is the strongest weakness of this algorithm [3]. It will lead to higher chance of mismatch.