Blaise Bulamambo. BSc Computer Science 2007/2008

Size: px
Start display at page:

Download "Blaise Bulamambo. BSc Computer Science 2007/2008"

Transcription

1 Concert Life Database for Natural Language Blaise Bulamambo BSc Computer Science 2007/2008 The candidate confirms that the work submitted is their own and the appropriate credit has been given where reference has been made to the work of others. I understand that failure to attribute material which is obtained from another source may be considered as plagiarism. (Signature of Student)

2 Summary The aim of this project is to design a system that will allow users to query the database using natural language. A concertdatabase was built to achieve this aim and some extensions were also added to the built system. ii

3 Acknowledgements I would like to thank this country the United Kingdom especially the government for giving me this opportunity to continue my studies. I also thank you My Yvonne my special friend for all support she has given me since the beginning of my academic studies. I also thank my supervisor Eric Atwell for all the advice and support during this project, and last but not least I thank my tutor Dr Martyn Clark for all the encouragements he offered from the beginning of my University life. iii

4 Contents 1. Introduction Project Background Project Objectives and Minimum Requirements Project Challenges Project Schedule Background Research Design Methodologies The Traditional Life Cycle Rapid Application Development (RAD) Rational Unified Process (RUP).. 4 Choice of Methodology Natural Language Processing MontyLingua Natural Language Toolkit (NLTK) General Architecture for Text Engineering (GATE) Choice of Natural Language Processing Toolkit Querying with Natural Language Building Natural Language System Syntax-based Systems Semantic Grammar Systems Database Model Relational Database Flat-File Database Requirements Analysis Methods Used to Gather Requirements Gathering Requirements Functional Requirements System Design System Architecture Database Design Flat-File Format...19 iv

5 4.3 Parser Keyword Extraction User Interface Design Implementation and Testing Database Implementation Natural Language Toolkit Choice of Programming Language User Interface Implementation Parser Implementation Keyword Extraction Database Search Evaluation Meeting the Functional Requirements Meeting Possible Extensions Conclusion.32 References 33 Appendices...35 Appendix A: Personal Reflection 35 Appendix B: Project Schedule...36 v

6 1. Introduction 1.1 Project Background The aim of this project is to contribute to the development of a corpus texts and associated mark-up capturing expert analysis, documenting the rise of the music concert industry in the nineteenth-century London. Researchers in the School of Music have already undertaken a pilot project to collect a database of records documenting concert life in the nineteenth-century London. The source documents included posters, newspaper adverts and reviews, concert programmes etc; these were keyed or scanned into an Oracle database. This project will focus on porting the data from Oracle database to open-source corpus toolkit such as the Natural Language Toolkit (NLTK), and then build a natural language interface that will allow users to query the database using natural language, such as English. 1.2 Project Objective and Minimum Requirements The source documents collected for the 19 th century concerts are stored in full-fledged relational database. To access the data stored in this database an artificial structured query language such as SQL is required. The main objective of this project is to create a natural language processing system that will allow users to query the database using natural language. The system will accept a user query in natural language, process the query using natural language processing tools, then search the database and output the results. In order to achieve the above objectives the following minimum requirements and possible extensions which should only be included if time permits need to be satisfied: 1. Minimum requirements: To convert the data stored in the Oracle Concert Database into a different format and to access the converted data using the open-source corpus toolkit. To search the converted data using natural language. 1

7 To implement an input/output interface to allow user to enter natural language queries and display results. 2. Possible Extensions: 1. An extension to the User Interface to include features such as: When user enters words that are not covered by the grammar, a feed back message is displayed showing the invalid words. A Help command, when selected should display all the available help commands such as, type %display_grammar to view grammar, type %exit to close the system, etc 2. A graphical User Interface (GUI) to that includes all the features above. 1.3 Project Challenges The main challenge encountered during this project was that access to the Oracle database was not permitted due to integrity and security of the stored data. Instead, some files, which contained data from the database, were provided in an excel format. The files provided contained many fields which had numbers, alphanumeric data, compound names and many empty cells. It required a significant amount of time to try to understand the data that each file contained and then to manually modify the files so that they could be used properly to perform some tasks on them. 1.4 Project Schedule The Gantt chart in appendix B shows the details of how the project was managed. The project was divided into small manageable tasks. In additional to that, milestones were set to help control the progress of each phase. Most of the first semester was spent on understanding the project requirements and doing background research. In the second semester some tasks had to be rescheduled due to the time taken to design the system. It took a large amount time to design and implement the system due to the author inexperience with the python programming language. As a result, the writingup and draft chapter were postponed until the design and implementation phase was at least half completed. 2

8 2. Background Research 2.1 Design Methodologies A design methodology refers to an organised collection of procedures, techniques, tools and documentation aids that must be followed in order to control the process of developing an information system. Various design methodologies have been created, each with its own strengths and weaknesses. This section defines couple of methodologies that could be applied to this project, and select the most appropriate for this project The Traditional Life Cycle The Traditional Life Cycle, often referred to as the Waterfall model, outlines the series of steps that should occur when building an information system. These steps usually occur in a predefined order with a review at the end of each stage before the next can be started. Although there are many variants, the basic structure of the waterfall model is illustrated in Figure 1 [1]. Figure 1: Traditional waterfall life cycle model. By dividing the development of a system into an orderly sequence of phases, and each subdivided into more manageable tasks, control of the applications development process is assured. Criticisms of the Traditional Life Cycle are that, it is inflexible, slow, costly and cumbersome due to significant structure and tight control; the outputs that the system is meant to produce are usually decided very early in the development process, meaning that changes in the users requirements can only be made at the end of the project [2]. 3

9 2.1.2 Rapid Application Development (RAD) The underlying objective of the RAD is to produce high quality systems quickly through the use of iterative Prototyping. Prototyping is an iterative process where users suggest modifications before further prototyping and the final information system are built. RAD is seen as a possible solution to the problems and pressures of the Traditional Life Cycle. The Rapid Application Development is mostly applied to projects in which requirements are not fully known. Despite the proposed advantages of the RAD approach its speedy processes and lower cost may lead to lower overall system quality [1, 2] Rational Unified Process (RUP) The Rational Unified Process is an iterative process which advocates an increasing understanding of the problem through successive refinements and an incremental growth of an effective solution over multiple cycles. It incorporates the flexibility to accommodate new requirements or tactical changes to business objectives. Risks are usually identified or resolved sooner rather than later [4]. The Rational Unified Process consists of the following four phases: 1. Inception: Identify the scope and initial plan of the project. 2. Elaboration: Capture the requirements and design the system architecture. 3. Construction: Build the first operational system version. 4. Transition: An almost risks free system to deliver to the end users. Within each phase are a series of iterations. An iteration represents a complete development cycle that results in an executable release which grows from iteration to iteration to become the final system [4]. The Rational Unified Process methodology is most appropriate for large projects where requirements are not well understood or changing due to external changes, changing expectations, budget changes or rapidly changing technology [5]. 4

10 2.1.4 Choice of Methodology In order to identify which methodology is appropriate for this project certain characteristic had to be considered. The following are the characteristics of this project: The School of Music have provided documents which will be used to gather the requirements needed for this project [5]. Once gathered, these requirements are likely to stay stable during the system development life cycle. Therefore, there will be no need for further investigation or iterations to discover new requirements. The project objectives and deliverables are stated from the start of the project. The system developer will arrange meeting with the users for further inquiries about the project if needs be. This project involves only one person who is not fully experienced and has other commitments. Since the human resources is restricted to only one person, and also the project requirements and the objectives are known from the start, this suggests that the Waterfall Model with some flexibility would be the appropriate option for this project. The Rational Unified Process and the Rapid Application Development methodologies are mostly used when requirements are not well understood from the start and are likely to change during the system development life cycle; therefore, the need for more iterations as users requirements come to light. The orderly sequence of the Waterfall Model phases allows strict control of the project and ensures progress of the system development process. 2.2 Natural Language Processing Natural language processing (NLP) can be described as an attempt to automatically manipulate and analyse natural or human languages using computers. It encompasses tasks such as natural text processing, speech processing and many more. There exist many natural language processing tools that can be used to automatically process natural languages, each of them with their own strengths and weaknesses. Many of these tools inherit techniques largely from Linguistics and Artificial Intelligence, and are also influenced by new areas such as Machine Learning, Computational Statistics 5

11 and Cognitive Science [13]. The next section explores some of the tools used in natural language processing tasks MontyLingua MontyLingua is a freeware natural language processing tool developed in python programming language. It is an entire suite of individual tools that can be used to process natural language text ranging from raw text to the semantic interpretation of that text. It is more of an end-to-end natural language processing, and will work straight out-of-the-box. Entering an English sentence into MontyLigua, the output will be a semantic interpretation of that sentence, as shown in figure 2. Figure 2: MontyLingua interface [19 ] Some of the tasks that MontyLingua can perform are outline below: MontyTokenizer splits raw English sentence into its constituent tokens. MontyTagger Part-of-speech tagging enriched with common sense. MontyChunker Lightning fast regular expression chunker MontyExtractor Extract phrases and subject/verb/object triplets from sentences [19]. 6

12 2.2.2 Natural Language Toolkit Similar to MontyLingua, Natural Language Toolkit is open-source software written in python programming language. It is a collection of modules and corpora that allow students to learn and conduct research in Natural Language Processing. NLTK includes programming libraries that can be used to write programs that can be use to perform some natural language tasks. Some tasks that can be performed using NLTK libraries are listed below: Tokenization: the processing of splitting a sentence into its constituent tokens. A process which can be difficult for languages such as Chinese and Arabic, since these languages do not have explicit boundaries. Part-of-speech (POS) Tagging: a task of labelling each word in a sentence with its appropriate POS tag. Parsing: the process of constructing a parse tree given a sentence. NLTK modules contain source codes, written in python programming language that can be used or modified to accomplish natural language tasks [13] General Architecture for Text Engineering (GATE) Gate is a general Java based toolkit that provides an infrastructure for building language engineering systems. It provides resources for performing all sorts of natural language processing tasks and also an architecture which describes how language processing components connect to each other and a graphical environment. Gate is free open source software and comes with a set of modules that are used for information extraction and text mining [20]. Figure 3 shows GATE s environment. 7 Figure 3: GATE s development environment [20].

13 2.2.4 Choosing a Natural Language Processing Tool This section compares the natural language processing tools explained above, and then selects the most appropriate for this project. The main requirement for this work is to port the data from Oracle Database to open-source natural language processing toolkits. The ultimate goal of this work is to allow users to query the database using natural language. Gate is analysed first, followed by MontyLingua, and finally the NLTK. The GATE framework is more of a template for building language engineering systems rather than natural language processing. It provides a graphical environment that allows users to manipulate collections and documents easily. Figure 6 shows GATE s development environment with its Information Extraction components loaded. The utilisation of GATE is limited to Java programming language [20]. Python programming language is the chosen language to build the system that will fulfil the requirements for this project. Python was chosen because of its ease of use compared to other programming languages. Python programs are much easier to port to and can be easily modified to run on different platforms. For these reasons GATE framework was not suited to fulfil the requirements for this project. Both MontyLingua and NLTK were developed to process English text using python language, but NLTK was written with the aim to teach and allow students to learn and extend the existing components in their own linguistics tasks. Unlike MontyLingua, NLTK comes with large programming libraries that can be used and modified to accomplish some natural language tasks [13]. For these reasons, the Natural Language Toolkit was selected as the most suited to fulfil the requirements for this work. NLTK is a free open-source licensed under the GNU general public. The current version of NLTK and installation guide can be downloaded from the project s website at Below are the main modules from NLTK used to build the concertdatabase: WordTokenizer: for splitting text into tokens. cfg (Context Free Grammar): terminal and nonterminal symbols dictating how constituents are expanded into other constituents. 8

14 trees: for representing hierarchical structures according to some context free grammar, including graphically drawing tree. Parsers: for implementing parsing algorithms [13]. 2.3 Querying with Natural Language A natural language query system allows a user to access stored data by entering queries using some natural language such as English. This method of accessing data stored in a database is very easy and convenient. This is a great advantage, because the user does not need to learn complicated query language such as SQL in order to be able to access the data stored in the database. On the other hand, natural language systems present some disadvantages that users have to cope with most of them can only understand the grammar that is stored in the system domain. If the query contains words that are not covered by the system domain grammar, interpretation of the query would fail. Users queries are therefore limited by the system domain, where only certain types of queries can be interpreted [11] Building Natural Language Systems One of the main components to consider when building natural language systems is the Parser. A Parser breaks down a sentence into its component parts-of-speech (POS) using a set of provided grammar, and then constructs a parse tree. For example, the sentence, The children ate the cake would be parsed as shown in figure 2. The meaning of the nodes (i.e. part-of-speech) is shown in the box to the right of the parse tree [12]. NP S VP AT NNS VBD NP The children ate AT NN S --- > Sentence AT --- > article NN --- > noun NNS --- > noun, plural VBD --- >verb, past tense VP --- > verb phrase NP --- >noun phrase the cake Figure 2: Each node is assumed to be a constituent [12]. 9

15 Most parsers require a set of grammar to construct parse tree. There are two main types of grammar used for parsing; syntactic grammar and semantic grammar. A syntax-based system is built using syntactic grammar whereas a semantic grammar based system is built using semantic grammar [11]. This section analyses the differences between these two type grammars. One of them may be applied in the implementation this project concertdatabase system Syntax-based System In syntax-based systems the user s query is parsed using grammar which describes the possible syntactic structures of the entered query. If the sentence is valid according to the syntactic grammar, a parse tree of the sentence is then constructed, otherwise parsing will fail. An example of a syntactic grammar is shown below. S ---> NP VP NP ---> Det N VP ---> V N Det ---> which what when N ---> piano artist guitar violin V ---> play plays sing find The grammar above describes the possible structures of a valid sentence. It says that a sentence (S) consists of noun phrase (NP) followed by a verb phrase (VP), in turn a determiner can be which, what or when, etc A syntax-based system uses this grammar to construct the possible structure of the entered sentence. If the sentence contains words that are not covered by the grammar in the system domain, the system will not be able to parse it. For example, the following sentence which artist plays piano will be parsed as shown in figure 3, whereas the sentence which artists play piano will not be parsed. This is because the noun artists is not covered in the system grammar. Even though artist the singular of artists is in the domain, to the system these are two distinct words [11]. 10

16 S NP VP Det N V N Which artist plays piano Figure 3: A syntax-based parse tree Semantic grammar Systems Semantic grammars system relies mainly on semantic concepts rather than syntactic classification of words. These semantics concepts, which are often made of more than one word, are combined to build larger concepts until it gets back to a specific sentence that has been predefined. Semantic grammars are built to contain knowledge that is specific to a particular domain. If the knowledge base changes, the semantic grammar has to be changed as well to reflect these changes, otherwise the grammar would be useless for the new knowledge domain. For example, the question which rock contains magnesium would be parsed as shown in figure 5 using the semantic grammar in figure 4. S --- -> Specimen_question Spacecraft_question Specimen_question ---- > Specimen Emits_info Specimen Contains_info Specimen_spec ---- > which rock which specimen Emits_info ---- > emits Radiation Radiation ---- > radiation light Contains_info ---- > contains Substance Substance --- -> magnesium calcium Spacecraft_question ---- > Spacecraft Depart_info Spacecraft Arrive_info Spacecraft ---- > which vessel which spacecraft Depart_info ---- > was launched on Date departed on Date Arrive_info ---- > returns on Date arrives on Date Figure 4: A semantic grammar [11]. 11

17 S Specimen_question Specimen_spec Contains_info Which rock contains Substance magnesium Figure 4: Parse tree built with semantic grammar [11] 2.4 Database Models This section seeks to describe relational and flat-file databases and outline the weaknesses and strengths between these two models of database. Other models such as hierarchical model and network model do exist. The consideration of relational and flat-file databases is due to the fact that the system being built needs to port to the data stored in an Oracle database, which is a relational database. Flat-file database is considered as an alternative method for storing the data from the Oracle database Relational Database A database can be described as a collection of records or data that is stored on a computer system. A relational database is a structured the collection of data that are stored in tables or relations. Each table or relation has a unique name. In a relational database tables can be joined together during search so as to perform a search and display the results from the joined tables. A software called Database Management System (DBMS) controls the organisation, storage and retrieval of data in a relational database [17]. Relational databases present some advantages and disadvantages compared to other models. 12

18 Advantages: Security: a relational database provides better security to stored data from outside intrusion. Concurrency control: a mechanism that ensures that multiples users access data concurrently in a control manner without compromising the integrity of the stored data. Accessibility to data: allows users to retrieve, store, update data in the database in an organised manner [17]. Disadvantages: Relational database system requires a large storage space for storing the data and the software that control the retrieval and manipulation of data. It also requires that users learn an artificial structured language such as SQL in order to retrieve and manage stored data. Many natural language processing queries can not be expressed in the SQL language, making it difficult to manipulate data stored in a relational database using natural language processing toolkit such as the NLTK [17] Flat-file database A Flat-file database can be described as a simple database system that is designed around a single table. In contrast to relational databases in which data can be stored in multiple tables, flat-files databases store all in one single table or list. Each field in a record can be delimited by whitespace, fixed width, tabs, commas (CSV) or other characters [18]. Data stored in Flat-file databases are prone to corruption due to the fact that there is no control mechanism to manage the access and modification to data. Searching a flat file database requires going through every record until the required record is found. As the data gets bigger in the database accessing them becomes very challenging. There is no mechanism to prevent duplicating data or records. These disadvantages are not present in relational databases. Despite these disadvantages flatfiles databases present some advantages over rational databases. Advantages over relational databases: 13

19 Available and Versatile: a flat-file database can be implemented using any operating system. There is no need to install additional software to manage the flat-files stored in the database. Flat-files can be read by various programs. Smaller and Easy: Flat-files use less space compared to relational database, they are easy to create and are particularly useful for making data available and accessible to other programs [17]. Python programming language comes with a CSV module for reading flat-files saved in a CSV format, thus facilitating the manipulation of the data stored in the flat-file database. Many natural language queries can be formulated to access data stored in CSV format, therefore making CSV format a suitable candidate format to store flatfiles for this project. 14

20 3. Requirements Analysis This section is the first phase of the Waterfall Model methodology used for this project. It involves methods for gathering and defining user requirements for the system. This phase is very crucial to ensure that user requirements are not misunderstood from the beginning of the project. This phase will look at the requirements for the new system and why the existing system can not achieve the requirements for the new system. 3.1 Method used to gather requirements There exist a number of techniques used to gather system requirements including observation, interviewing, questionnaires, and studying documentation [2]. Studying documentation was the methods used in this project to gather the requirements for the new system. The reason for choosing this method was due to the fact that this project was assigned by an external client. Researchers from Leeds School of Music had already undertaken a pilot project to collect a database of records documenting concert life in nineteen-century London and have designed a web site that allow users to access the information stored in the database. Couple of meetings were arranged to meet with Rachel Cowgill from the School of music responsible for running the entire project. She provided some documentation and advised us to visit the pilot website in order to have more insight of how the system works. 3.2 Gathering Requirements From the provided documentation and the website it was established that the existing system was designed to store data collected from journals, newspaper adverts and reviews documenting concert life in the nineteenth- century London and allow users access to the stored data. The collected data were stored in an Oracle relational database. To access stored data users are required to learn SQL. The aim of this project is to contribute to the development of the project to allow users to access the collected data using Natural Language Processing methods. Since the existing database system was designed using the relational model, accessing data using natural language processing methods would be impossible unless an interface that would 15

21 translates natural language queries into SQL queries was built. Building such a system would be very difficult and computationally expensive. 3.3 Functional Requirements This section lists the functional requirements gathered from reading the documentation provided by Rachel Cowgill and from observing the pilot concertlife project website at There are categorised as essential or desirable. Those classified as essential need to be achieved in order to meet the minimum requirements for this project, and those that are classified as desirable are the possible extension for this project, and thus adding more capabilities to the system. No. Requirement Type 1 A different format concertlife database Essential 2 User interface that links to stored database Essential 3 Allows users to query database using natural language (e.g. Essential English) 4 Display results from the database Essential 5 Display syntax errors from query Desirable 6 Display welcome message Desirable 7 Option to display help commands Desirable 8 Option to display parse tree for the query Desirable 9 Graphical User Interface (GUI) Desirable Table1: Functional Requirements [15] 16

22 4. System Design This section describes the design of the system that I propose to build in order to meet the system requirements discussed in the previous section. The system architecture is shown in figure 5. As mentioned in the previous sections, the purpose of this project is to build a system that will accept users queries in natural language and process them to search the database. To build a system that would fully understand natural language queries is very difficult due to syntactic ambiguities that a sentence may have. For example, old men and women does old have a wider scope than and or is it the other way around? Another example, I saw the man with a telescope who has the telescope? To build a system that will represent all the semantic meaning of such sentences is very complex due to the vast amount of information needs to be recorded in order to parse the sentences [15, 16]. In this project I have proposed to build a system that restrict user queries to a small domain. In this approach I have created syntactic grammar that is used to parse users queries. If the parsing succeeds, a parse tree is generated according to the grammar specification, and then a keyword is extracted from the parse tree to search the database. If parsing fails, an error message is displayed informing the user that the system does not cover some words in the query. The proposed system architecture is structured as follow: System Architecture: describes the overview of the concertdatabase system. Database design: describes the database model used to store the converted data from the Oracle database. Parser: describes how queries are parsed using syntactic grammar. Keyword Extraction: extract keyword from the parse tree used to search the database. User Interface Design: describes the design of the interface that will allow users to enter their queries and display output from database. 17

23 Query Parser Keyword Extraction Search Database Query Results Figure 5: ConcertDatabase System Design 4.1 System Architecture The system architecture shown in figure 5 is designed to accept users query, parse the query, extract the keyword from the query and use it to search the database, and then display the results to the user. 4.2 Database Design Database models were discussed in previous section. This section describes the database model chosen for this system. As mentioned in previous section data stored in the Oracle database can only be accessed using SQL. It was established that the data stored in the Oracle database had to be converted into a different format to allow users to access the data using natural language (e.g. English). The candidate database model to consider is the Flat-File database. Relational databases have some functions 18

24 that allow connection to multiple tables in the database in a single connection. Flatfile database can combine tables together to emulate such behaviour. The problem with combining tables to allow is that the database will contain different kinds of data stored in one big table making the search process very slow. Another possibility would be to save each table from the Oracle database separately in a flat-file format. This approach will result in a new file open operation for each table [17]. Despite these disadvantages, flat-file database offers a solution to needed to build the ConcertLifeDatabase system cheaply Flat-File Format Compared to other flat-file formats such as tab, fixed-width, Comma Separated Value (CSV) format is the most common format for transferring data between database applications. They are easy to create and maintain. Python programming language comes with a CSV module that can implement classes to read or write CSV file format. Since Python is the programming language of choice to implement the concertdatabase system for this project and also for reasons mentioned above, it would seem reasonable to use the CSV file format to create the Flat-file database. In its simplest form a CSV file consists of records and fields. Each field is delimited by a comma, and the records are separated by suitable end of file character such as a carriage return. Below is an example of a CSV file [18]. Artists, concert life database, , 182, 222 Instruments like piano, 3033, 7890, , 1, blaise, UK, 2, 4 The following CSV file example has three records and five fields. If a field contains some leading commas, space or carriage character, the field would be enclosed in double quotes. Oracle databases possess some functionalities to import or export data to CSV format. This task was not performed in this project since Rachel Cowgill provided some filed in an excel format which were then transformed into CSV format by simply saving them using the extension CSV to the file name as in people.csv. Below is a figure of one of the files provided by the Rachel Cowgill, which was converted to CSV format. 19

25 Figure 6: people.csv The Flat-File database is created in the server where the python interpreter resides. If the database is created in a separate location as the python interpreter, a path to the database must be specified for the python program to locate the file. It is therefore much easier to create the flat-file database in the same location as the python interpreter. 4.3 Parser This section describes methods used to parse users queries in order to extract a keyword to search the database. In previous section syntactic and semantic grammars were described. Semantic grammars are mostly used to parse queries so as to map the semantic meaning of the query to some predefined representation of the query in the system domain. To extract a keyword from the user s query, a syntax-based grammar is the most appropriate as it classifies each word in a query into its part-of-speech constituent and then builds a parse tree using the syntactic grammar. From this tree a keyword could then be extracted. 20

26 4.4 Keyword Extraction To search the database a keyword is extracted from the parse tree and used to search the database. The extraction of the keyword from the parse tree depends on the type of the query entered by a user. The keyword extracted from a which query is expected to be different from the one extracted from a find or retrieve query. 4.5 User Interface Design A user interface will allow users to enter queries in natural language and display the results from the database. User interface will display a welcome message, a message prompting the user to enter a query or to enter some other commands such as the help command or exit command. Figure 7 shows an example of how the implemented user interface looks like. Figure 7: User Interface for the concert Database system. The concertdatabase has been tested on windows command prompt and on UNIX shell. To run the program, at command prompt enter python concertdatabase then press enter. The screen similar to the one shown in figure 7 should appear with the messages of welcome. Below is a list of the concertdatabase features: When a user enters a valid query the system should parse the query, extract a keyword, search the database then display the results if there is a match or a no match message if there is no match. When a user enters an invalid query the system should display a message containing the invalid word(s), and an active prompt so that the user could reenter a new query. 21

27 When a user enters the command %show_grammar, the syntax grammar used for this system should be displayed. When a user enters the %help command, a list of available commands should be displayed. When a user enters the %exit command, the concertdatabase should be closed. 22

28 5. Implementation and Testing This section discusses the implementation and testing of the system, including the software that is needed for the concertdatabase system to run properly. The first requirements of this project was to reproduce the data stored in the Oracle relational database into a different format, then the reproduced database should be ported to some open-source natural language toolkit to allow users to query the database using natural language. This section includes: Database Implementation. Natural Language Toolkit. Choice of Programming language. User Interface implementation. Parser Implementation Keyword Extraction implementation Database Search 5.1 Database Implementation Flat-File database as explained previous sections was the appropriate model to use to store data from the Oracle database. The flat-file database will contain files saved in comma separated value (CSV) for the reasons mentioned in section 4. The files provided by Rachel Cowgill as follow, concert_items.xls, people.xls, venues.xls and works.xls. These files were converted into CSV format and saved in the server where the python interpreter and the Natural Language Processing Toolkit (mentioned below) reside. These four files contain many empty fields and many alphanumeric data. A screen shot of one of the files is shown below in figure 8. Since a keyword is used to search the database, a sequential search is performed across the entire database until a match is found. If a match is found the system outputs the entire row(s) in CSV format. If there is no match, the system outputs a message informing the user that there is no match. 23

29 Figure 8: Flat-file in CSV format. 5.2 Natural Language Toolkit Section discussed the reasons why NLTK was selected as the Natural Language Processing appropriate for this project. This section will not repeat these reasons. NLTK was downloaded from the website. It is released under open-source license and available for students to modify and use in their Natural Language Processing tasks. NLTK must be saved in the same location as the python interpreter to make it easier for python program to import the needed modules to perform NLP tasks. Before using an NLTK module it must be imported first. 5.3 Choice of Programming language Python programming language was chosen as the language of choice to implement the concertdatabase primarily because it is a relaxed programming language compared to Java for example. In Java for instance variables types always need to be specified before use, whereas in Python you do not have to specify variables type before use. Python programs are much easier to use and modify to perform natural language processing tasks. The main reason for using Python was due to the fact that NLTK was written in Python, it made perfectly sense to choose Python. 24

30 5.4 User Interface implementation The concertdatabase was designed to run on a Windows prompt or UNIX shell. To run the program, at the command prompt you enter the python concertdatabase.py as shown in figure 9. Figure 9: User Interface for the concertdatabase. If a user enters %help to view the available commands, the outcome of this action is shown in figure 10. Figure 10: Output of %help command. 25

31 If a user enters %show_grammar the result is shown in figure 11. Figure 11: Result of %show_grammar command. If a user enters %Exit the result is shown in figure 12. Figure 12: Result of %exit command. 26

32 5.5 Parser Implementation Before a keyword is extracted a syntactic parser needs to process the query and built the parse tree. NLTK comes with module that contains parsers, this module have to be import in the concertdatabase before using the parser. In addition to that syntactic grammar needs to be created for the parser to use. Figure 13 shows the syntactic grammar that was created for the concertdatabase system. grammar = parse_fcfg(''' # sentences S -> FindQ S -> WHQ S -> SHOW # A which-question WHQ -> WH N Vbd NP WHQ -> WH N Vbd N # A find, show, retrieve question FindQ -> VB PP ADJ N SHOW -> VB PP N VB -> "find" "show" "retrieve" NP -> Det N WH -> "which" PP -> Prep Det ADJ -> "russian" "french" "english" "italian" "swedish" "france" "german" Det -> "the" "a" "an" Prep -> "all" "to" "for" "by" "before" "after" "during" Vbd -> "played" "performed" "sang" N -> "artist" "artists" "performers" "piano" "guitar" "clarinet" "brown" "violin" "violinist" N -> "composer" "france" "germany" "publisher" "singer" ''') Figure 13: Syntactic grammar used to parse users queries. The following piece of code create a function parse( sentence) that take a sentence as parameter, splits the sentence into tokens then return a parse tree of the sentence. def parse(sentence): tokens = sentence.split() parser = nltk.chartparser(grammar, nltk.parse.bu_strategy) return parser.nbest_parse(tokens) For example, if a user enters the following query Find all the Russian artist. 27

33 This query would be parsed as shown in figure 14. Figure 14: A parse tree for the query Find all the Russian artist. If a user enters some words that are not covered by the syntactic grammar result system displays an error message informing the user of the invalid words as shown in figure 15. In this example the user enters the query find all the frenchs artis which contains the invalids words frenchs and artis the system displays the error message Grammar does not cover some of the input words: frenchs, artis. This result means that the parser fails to parse the user query because some of the input words are not in the grammar. Figure 15: output displaying error message. 28

34 5.6 Keyword Extraction Implementation The following codes extract a keyword from the parse tree depending on the type of query entered by the user. The function def interpret takes as parameters a parse tree and calls function eval_s(x) to evaluates the node of each tree (if there happens to be more that one) the tree at height 0 in order to work out what type of query the user has entered. If the type of query is established, the appropriate function is called in order to extract a keyword to search the database. def interpret(sentence): return map(lambda x: eval_s(x), parse(sentence)) def eval_s(tree): # which question if tree[0].node == 'WHQ': output = eval_whq(tree[0]) # find question elif tree[0].node == 'FindQ': output = eval_findq(tree[0]) elif tree[0].node =='SHOW': output = eval_show(tree[0]) else: output = None return output # return a noun def eval_whq(tree): noun = eval_n(tree[3]) return noun def eval_n(tree): return tree[0] def eval_findq(tree): adj = eval_j(tree[2]) return adj def eval_j(tree): return tree[0] 5.7 Database Search To search the database a keyword must be extracted from the parse tree. After extracting a keyword the database is searched. If a match is found in the database the row(s) in which the keyword appears is/are displayed in the shell, as shown in figure16. 29

35 Figure 16: Result from database If there is no match a message in the database a message reflecting this is displayed, as shown below. Figure 17: No match in the database. 30

36 6. Evaluation This section evaluates the implementation of the concertdatabase system to ensure whether it has met the minimum requirements set at the outset of this project. The most obvious way to evaluate the built system is against the minimum requirements. This section will include evaluation against the possible extensions that were proposed to be built time permitting. The aim of this project was to port the data from Oracle database to open-source corpus such as the Natural Language Toolkit. 6.1 Meeting the Functional Requirements The first objective was to reproduce the data in Oracle database into a different format in order to allow users to query the database using natural language. Since it was not possible to access the Oracle database Rachel Cowgill from the School of Music provided the author some files in an excel format with the data from Oracle database. A Flat-File model database was created with the files saved in Comma Separated Value format to facilitate better access using python programming language. Natural Language Toolkit was the open-source Natural Language Processing Toolkit of choice for this project. This was downloaded and used to build the concertdatabase system. I believe that I have achieved this first objective. The second objective was to search the reproduced database using natural language. Again I believe that I achieved this objective. The third was to implement a user interface that will allow users enter their queries and display the results. This object was also achieved. 6.2 Meeting Possible Extensions Following the feed back that I received for my mid-term report with suggestions that the possible extensions may not be sufficient enough, I decided to modify them to include the following: An extension to the User Interface to include features such as: 31

37 When user enters words that are not covered by the grammar, a message is displayed showing the invalid words. A Help command, when selected should display all the available help commands such as, %display_grammar, %display_tree, %exit, etc I believe that the system that I have built features these extensions. GUI extension was not developed due to lack of time. 6.3 Conclusion Building systems that can query databases using natural language is a very difficult task to achieve due the fact that languages contain many ambiguous words. The approach used in this project to build such a system was to restrict the types of queries that the system can accept. If the query is not covered in the system domain it is rejected as invalid. Using this approach the author feels a sense of pride to have built a system that use natural language to query the database. The built system seems a bit simplistic, but it can be extended by increasing the system domain so that it can accept more queries. Future work could include developing a Graphical User Interface and perhaps semantic grammar as well. 32

38 References [1] P. Bocij, D. Chaffey, A. Greasley, and S. Hickie. Business Information Systems. Practice Hall, third edition, [2] D.E. Avison and G. Fitzgerald. Information Systems Development: Methodologies, Techniques and Tools. McGraw-Hill, second edition, [3] NLTK. Introduction to Natural Language Processing. Last accessed 5/12/2007 [4] G. Booch, J. Rumbaugh, and I. Jacobson. The Unified Modelling Language User Guide. Addison Wesley, second edition, [5] CMS. Selecting a Development Approach. proach.pdf. Last accessed 5/12/2007. [6] R. Dale, H. Moisi, and H. Somers. Handbook of Natural Language Processing. Marcel Dekker, Inc [7] J. Dibble and B. Zon. Nineteenth-Century British Music Studies. Ashgate, volume2. [8] Concert Life in Nineteenth-Century London Database. School of Art, Publishing and Music, Oxford Brookes University. [9] Concertlifeproject. Concert Life in 19th-Century London Database and Research Project. Last accessed 7/12/2007. [10] T. Connolly and C. Begg. Database Systems: A practical Approach to Design, Implementation, and Management. Addison Wesley, fourth edition, [11] I Androutsopoulos, Ritchie G.D and Thanisch P. Natural Language Interface to Databases An Introduction. Last accessed 12/03/

39 [12] D. Christopher, Manning and Schutze H. Foundation of Statistical Natural Language Processing. Massachusetts Institute of Technology, [13] Crossroads: Getting Started on Natural Language Processing with Python. Last accessed 20/04/2008 [14] A. Zamel: Structured Sentences Text Editor. last accessed 10/03/2008 [15] Natural Language Toolkit: last accessed 20/04/2008 [16] ELF Software Documentation Series: [17] Web Site Owner: Database Types. last accessed 21/04/2008 [18] Paul Bourke: CSV Comma Separated Value [19] MontyLingua. A free commonsense-enriched Natural Language Understander for English. last accessed 20/04/2008 [20] Advanced Knowledge Technologies: General Architecture for Text Engineering, last accessed 19/04/2008 [21] Pendar. last accessed 15/03/2008. [22] The University of Edinburgh School of Informatics. last accessed 7/03/

40 Appendix A Personal Reflection At the beginning of this project I did not understand very well what I was supposed to build. I spent a significant amount of time trying to understand what the project was all about. Then eureka I shouted, I got it now, I understand what the project is all about. How wrong I was, I spent a long time searching for information that did not have any thing to do with my project. It took me quite sometime to realise that the research that I was conducting did not have any thing to do with my project. I only realised that I did not understand the problem when I wrote a program in python to perform what I thought was part of my project, the results were so minimal that I asked myself is that it? the program contained about 30 0r 40 lines of codes. Following this I decided to really sit down and go through the project description that s when I realised that I completely misunderstood the project requirements. With the mid-term report deadline approaching I had to spend sleepless nights to do my background research and write up my mid-term report as soon as possible. To add fuel to the fire, I fell ill. I had to struggle to complete the report. During the second semester I spent a lot of time building the system. I am not particularly good at programming; it took me ages to build the system. I have learned a lot from this experience, particularly time management, research methods and discipline. If I had to restart the project I would definitely start early, choosing a project methodology as early as possible and design a timetable with milestones. Sticking to the schedule and following the project methodology are the most important parts of the project. I feel a sense of pride having done this project. I would strongly advise future students to spend more time at understanding the project description and to start doing background research as soon as possible. This has taught me a lesson. I have benefited from it though. 35

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 5 INTELLIGENT MULTIDIMENSIONAL DATABASE INTERFACE Mona Gharib Mohamed Reda Zahraa E. Mohamed Faculty of Science,

More information

Natural Language Database Interface for the Community Based Monitoring System *

Natural Language Database Interface for the Community Based Monitoring System * Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University

More information

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper

Parsing Technology and its role in Legacy Modernization. A Metaware White Paper Parsing Technology and its role in Legacy Modernization A Metaware White Paper 1 INTRODUCTION In the two last decades there has been an explosion of interest in software tools that can automate key tasks

More information

Classification of Natural Language Interfaces to Databases based on the Architectures

Classification of Natural Language Interfaces to Databases based on the Architectures Volume 1, No. 11, ISSN 2278-1080 The International Journal of Computer Science & Applications (TIJCSA) RESEARCH PAPER Available Online at http://www.journalofcomputerscience.com/ Classification of Natural

More information

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.

How the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD. Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.

More information

Automatic Text Analysis Using Drupal

Automatic Text Analysis Using Drupal Automatic Text Analysis Using Drupal By Herman Chai Computer Engineering California Polytechnic State University, San Luis Obispo Advised by Dr. Foaad Khosmood June 14, 2013 Abstract Natural language processing

More information

(Refer Slide Time: 01:52)

(Refer Slide Time: 01:52) Software Engineering Prof. N. L. Sarda Computer Science & Engineering Indian Institute of Technology, Bombay Lecture - 2 Introduction to Software Engineering Challenges, Process Models etc (Part 2) This

More information

Natural Language Web Interface for Database (NLWIDB)

Natural Language Web Interface for Database (NLWIDB) Rukshan Alexander (1), Prashanthi Rukshan (2) and Sinnathamby Mahesan (3) Natural Language Web Interface for Database (NLWIDB) (1) Faculty of Business Studies, Vavuniya Campus, University of Jaffna, Park

More information

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems

Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural

More information

NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR

NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR NATURAL LANGUAGE QUERY PROCESSING USING SEMANTIC GRAMMAR 1 Gauri Rao, 2 Chanchal Agarwal, 3 Snehal Chaudhry, 4 Nikita Kulkarni,, 5 Dr. S.H. Patil 1 Lecturer department o f Computer Engineering BVUCOE,

More information

NATURAL LANGUAGE TO SQL CONVERSION SYSTEM

NATURAL LANGUAGE TO SQL CONVERSION SYSTEM International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR) ISSN 2249-6831 Vol. 3, Issue 2, Jun 2013, 161-166 TJPRC Pvt. Ltd. NATURAL LANGUAGE TO SQL CONVERSION

More information

We will learn the Python programming language. Why? Because it is easy to learn and many people write programs in Python so we can share.

We will learn the Python programming language. Why? Because it is easy to learn and many people write programs in Python so we can share. LING115 Lecture Note Session #4 Python (1) 1. Introduction As we have seen in previous sessions, we can use Linux shell commands to do simple text processing. We now know, for example, how to count words.

More information

Natural Language to Relational Query by Using Parsing Compiler

Natural Language to Relational Query by Using Parsing Compiler Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR

NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR Arati K. Deshpande 1 and Prakash. R. Devale 2 1 Student and 2 Professor & Head, Department of Information Technology, Bharati

More information

3C05: Unified Software Development Process

3C05: Unified Software Development Process 3C05: Unified Software Development Process 1 Unit 5: Unified Software Development Process Objectives: Introduce the main concepts of iterative and incremental development Discuss the main USDP phases 2

More information

31 Case Studies: Java Natural Language Tools Available on the Web

31 Case Studies: Java Natural Language Tools Available on the Web 31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software

More information

To introduce software process models To describe three generic process models and when they may be used

To introduce software process models To describe three generic process models and when they may be used Software Processes Objectives To introduce software process models To describe three generic process models and when they may be used To describe outline process models for requirements engineering, software

More information

Deposit Identification Utility and Visualization Tool

Deposit Identification Utility and Visualization Tool Deposit Identification Utility and Visualization Tool Colorado School of Mines Field Session Summer 2014 David Alexander Jeremy Kerr Luke McPherson Introduction Newmont Mining Corporation was founded in

More information

English Grammar Checker

English Grammar Checker International l Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 English Grammar Checker Pratik Ghosalkar 1*, Sarvesh Malagi 2, Vatsal Nagda 3,

More information

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file?

Files. Files. Files. Files. Files. File Organisation. What s it all about? What s in a file? Files What s it all about? Information being stored about anything important to the business/individual keeping the files. The simple concepts used in the operation of manual files are often a good guide

More information

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

More information

Surveying and evaluating tools for managing processes for software intensive systems

Surveying and evaluating tools for managing processes for software intensive systems Master Thesis in Software Engineering 30 Credits, Advanced Level Surveying and evaluating tools for managing processes for software intensive systems Anuradha Suryadevara IDT Mälardalen University, ABB

More information

Data Management, Analysis Tools, and Analysis Mechanics

Data Management, Analysis Tools, and Analysis Mechanics Chapter 2 Data Management, Analysis Tools, and Analysis Mechanics This chapter explores different tools and techniques for handling data for research purposes. This chapter assumes that a research problem

More information

A system is a set of integrated components interacting with each other to serve a common purpose.

A system is a set of integrated components interacting with each other to serve a common purpose. SYSTEM DEVELOPMENT AND THE WATERFALL MODEL What is a System? (Ch. 18) A system is a set of integrated components interacting with each other to serve a common purpose. A computer-based system is a system

More information

The most suitable system methodology for the proposed system is drawn out.

The most suitable system methodology for the proposed system is drawn out. 3.0 Methodology 3.1 Introduction In this chapter, five software development life cycle models are compared and discussed briefly. The most suitable system methodology for the proposed system is drawn out.

More information

UNIVERSITY OF WATERLOO Software Engineering. Analysis of Different High-Level Interface Options for the Automation Messaging Tool

UNIVERSITY OF WATERLOO Software Engineering. Analysis of Different High-Level Interface Options for the Automation Messaging Tool UNIVERSITY OF WATERLOO Software Engineering Analysis of Different High-Level Interface Options for the Automation Messaging Tool Deloitte Inc. Toronto, ON M5K 1B9 Prepared By Matthew Stephan Student ID:

More information

Module 2. Software Life Cycle Model. Version 2 CSE IIT, Kharagpur

Module 2. Software Life Cycle Model. Version 2 CSE IIT, Kharagpur Module 2 Software Life Cycle Model Lesson 3 Basics of Software Life Cycle and Waterfall Model Specific Instructional Objectives At the end of this lesson the student will be able to: Explain what is a

More information

Learning Translation Rules from Bilingual English Filipino Corpus

Learning Translation Rules from Bilingual English Filipino Corpus Proceedings of PACLIC 19, the 19 th Asia-Pacific Conference on Language, Information and Computation. Learning Translation s from Bilingual English Filipino Corpus Michelle Wendy Tan, Raymond Joseph Ang,

More information

A Scheme for Automation of Telecom Data Processing for Business Application

A Scheme for Automation of Telecom Data Processing for Business Application A Scheme for Automation of Telecom Data Processing for Business Application 1 T.R.Gopalakrishnan Nair, 2 Vithal. J. Sampagar, 3 Suma V, 4 Ezhilarasan Maharajan 1, 3 Research and Industry Incubation Center,

More information

Interactive Dynamic Information Extraction

Interactive Dynamic Information Extraction Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Module 2. Software Life Cycle Model. Version 2 CSE IIT, Kharagpur

Module 2. Software Life Cycle Model. Version 2 CSE IIT, Kharagpur Module 2 Software Life Cycle Model Lesson 4 Prototyping and Spiral Life Cycle Models Specific Instructional Objectives At the end of this lesson the student will be able to: Explain what a prototype is.

More information

Abstract. 1 Introduction

Abstract. 1 Introduction Amir Tomer Amir Tomer is the Director of Systems and Software Engineering Processes at RAFAEL Ltd., Israel,with whom he has been since 1982,holding a variety of systems and software engineering positions,both

More information

Natural Language Query Processing for Relational Database using EFFCN Algorithm

Natural Language Query Processing for Relational Database using EFFCN Algorithm International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-4, Issue-02 E-ISSN: 2347-2693 Natural Language Query Processing for Relational Database using EFFCN Algorithm

More information

A Review of an MVC Framework based Software Development

A Review of an MVC Framework based Software Development , pp. 213-220 http://dx.doi.org/10.14257/ijseia.2014.8.10.19 A Review of an MVC Framework based Software Development Ronnie D. Caytiles and Sunguk Lee * Department of Multimedia Engineering, Hannam University

More information

Chapter 1. Dr. Chris Irwin Davis Email: cid021000@utdallas.edu Phone: (972) 883-3574 Office: ECSS 4.705. CS-4337 Organization of Programming Languages

Chapter 1. Dr. Chris Irwin Davis Email: cid021000@utdallas.edu Phone: (972) 883-3574 Office: ECSS 4.705. CS-4337 Organization of Programming Languages Chapter 1 CS-4337 Organization of Programming Languages Dr. Chris Irwin Davis Email: cid021000@utdallas.edu Phone: (972) 883-3574 Office: ECSS 4.705 Chapter 1 Topics Reasons for Studying Concepts of Programming

More information

CHAPTER_3 SOFTWARE ENGINEERING (PROCESS MODELS)

CHAPTER_3 SOFTWARE ENGINEERING (PROCESS MODELS) CHAPTER_3 SOFTWARE ENGINEERING (PROCESS MODELS) Prescriptive Process Model Defines a distinct set of activities, actions, tasks, milestones, and work products that are required to engineer high quality

More information

Software Development Life Cycle

Software Development Life Cycle 4 Software Development Life Cycle M MAJOR A J O R T TOPICSO P I C S Objectives... 52 Pre-Test Questions... 52 Introduction... 53 Software Development Life Cycle Model... 53 Waterfall Life Cycle Model...

More information

Increasing Development Knowledge with EPFC

Increasing Development Knowledge with EPFC The Eclipse Process Framework Composer Increasing Development Knowledge with EPFC Are all your developers on the same page? Are they all using the best practices and the same best practices for agile,

More information

Software Engineering. What is a system?

Software Engineering. What is a system? What is a system? Software Engineering Software Processes A purposeful collection of inter-related components working together to achieve some common objective. A system may include software, mechanical,

More information

How To Use The Correlog With The Cpl Powerpoint Powerpoint Cpl.Org Powerpoint.Org (Powerpoint) Powerpoint (Powerplst) And Powerpoint 2 (Powerstation) (Powerpoints) (Operations

How To Use The Correlog With The Cpl Powerpoint Powerpoint Cpl.Org Powerpoint.Org (Powerpoint) Powerpoint (Powerplst) And Powerpoint 2 (Powerstation) (Powerpoints) (Operations orrelog SQL Table Monitor Adapter Users Manual http://www.correlog.com mailto:info@correlog.com CorreLog, SQL Table Monitor Users Manual Copyright 2008-2015, CorreLog, Inc. All rights reserved. No part

More information

Managing explicit knowledge using SharePoint in a collaborative environment: ICIMOD s experience

Managing explicit knowledge using SharePoint in a collaborative environment: ICIMOD s experience Managing explicit knowledge using SharePoint in a collaborative environment: ICIMOD s experience I Abstract Sushil Pandey, Deependra Tandukar, Saisab Pradhan Integrated Knowledge Management, ICIMOD {spandey,dtandukar,spradhan}@icimod.org

More information

Classical Software Life Cycle Models

Classical Software Life Cycle Models Classical Software Life Cycle Models SWEN 301 Trimester 1, 2015 Lecturer: Dr Hui Ma Engineering and Computer Science Lecture slides make use of material provided on the textbook's companion website Motivation

More information

Introduction to the Data Migration Framework (DMF) in Microsoft Dynamics WHITEPAPER

Introduction to the Data Migration Framework (DMF) in Microsoft Dynamics WHITEPAPER Introduction to the Data Migration Framework (DMF) in Microsoft Dynamics WHITEPAPER Junction Solutions documentation 2012 All material contained in this documentation is proprietary and confidential to

More information

CS 389 Software Engineering. Lecture 2 Chapter 2 Software Processes. Adapted from: Chap 1. Sommerville 9 th ed. Chap 1. Pressman 6 th ed.

CS 389 Software Engineering. Lecture 2 Chapter 2 Software Processes. Adapted from: Chap 1. Sommerville 9 th ed. Chap 1. Pressman 6 th ed. CS 389 Software Engineering Lecture 2 Chapter 2 Software Processes Adapted from: Chap 1. Sommerville 9 th ed. Chap 1. Pressman 6 th ed. Topics covered Software process models Process activities Coping

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1

Motivation. Korpus-Abfrage: Werkzeuge und Sprachen. Overview. Languages of Corpus Query. SARA Query Possibilities 1 Korpus-Abfrage: Werkzeuge und Sprachen Gastreferat zur Vorlesung Korpuslinguistik mit und für Computerlinguistik Charlotte Merz 3. Dezember 2002 Motivation Lizentiatsarbeit: A Corpus Query Tool for Automatically

More information

Course Scheduling Support System

Course Scheduling Support System Course Scheduling Support System Roy Levow, Jawad Khan, and Sam Hsu Department of Computer Science and Engineering, Florida Atlantic University Boca Raton, FL 33431 {levow, jkhan, samh}@fau.edu Abstract

More information

Transaction-Typed Points TTPoints

Transaction-Typed Points TTPoints Transaction-Typed Points TTPoints version: 1.0 Technical Report RA-8/2011 Mirosław Ochodek Institute of Computing Science Poznan University of Technology Project operated within the Foundation for Polish

More information

1 File Processing Systems

1 File Processing Systems COMP 378 Database Systems Notes for Chapter 1 of Database System Concepts Introduction A database management system (DBMS) is a collection of data and an integrated set of programs that access that data.

More information

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 UNDER THE GUIDANCE Dr. N.P. DHAVALE, DGM, INFINET Department SUBMITTED TO INSTITUTE FOR DEVELOPMENT AND RESEARCH IN BANKING TECHNOLOGY

More information

Development models. 1 Introduction. 2 Analyzing development models. R. Kuiper and E.J. Luit

Development models. 1 Introduction. 2 Analyzing development models. R. Kuiper and E.J. Luit Development models R. Kuiper and E.J. Luit 1 Introduction We reconsider the classical development models: the Waterfall Model [Bo76], the V-Model [Ro86], the Spiral Model [Bo88], together with the further

More information

The preliminary design of a wearable computer for supporting Construction Progress Monitoring

The preliminary design of a wearable computer for supporting Construction Progress Monitoring The preliminary design of a wearable computer for supporting Construction Progress Monitoring 1 Introduction Jan Reinhardt, TU - Dresden Prof. James H. Garrett,Jr., Carnegie Mellon University Prof. Raimar

More information

Unit 5.1 The Database Concept

Unit 5.1 The Database Concept Unit 5.1 The Database Concept Candidates should be able to: What is a Database? A database is a persistent, organised store of related data. Persistent Data and structures are maintained when data handling

More information

Visionet IT Modernization Empowering Change

Visionet IT Modernization Empowering Change Visionet IT Modernization A Visionet Systems White Paper September 2009 Visionet Systems Inc. 3 Cedar Brook Dr. Cranbury, NJ 08512 Tel: 609 360-0501 Table of Contents 1 Executive Summary... 4 2 Introduction...

More information

Software Engineering. Software Processes. Based on Software Engineering, 7 th Edition by Ian Sommerville

Software Engineering. Software Processes. Based on Software Engineering, 7 th Edition by Ian Sommerville Software Engineering Software Processes Based on Software Engineering, 7 th Edition by Ian Sommerville Objectives To introduce software process models To describe three generic process models and when

More information

Week 3. COM1030. Requirements Elicitation techniques. 1. Researching the business background

Week 3. COM1030. Requirements Elicitation techniques. 1. Researching the business background Aims of the lecture: 1. Introduce the issue of a systems requirements. 2. Discuss problems in establishing requirements of a system. 3. Consider some practical methods of doing this. 4. Relate the material

More information

Natural Language Processing

Natural Language Processing Natural Language Processing 2 Open NLP (http://opennlp.apache.org/) Java library for processing natural language text Based on Machine Learning tools maximum entropy, perceptron Includes pre-built models

More information

Integrating VoltDB with Hadoop

Integrating VoltDB with Hadoop The NewSQL database you ll never outgrow Integrating with Hadoop Hadoop is an open source framework for managing and manipulating massive volumes of data. is an database for handling high velocity data.

More information

Chapter 3 Technology adapted

Chapter 3 Technology adapted Chapter 3 Technology adapted 3.1 Introduction In developing a web enabled solution for laboratory data and document management, there are several options available for system analysis and designing, documentation

More information

Flattening Enterprise Knowledge

Flattening Enterprise Knowledge Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it

More information

Basic Parsing Algorithms Chart Parsing

Basic Parsing Algorithms Chart Parsing Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS 2011/2012 Anna Schmidt Talk Outline Chart Parsing Basics Chart Parsing Algorithms Earley Algorithm CKY Algorithm

More information

G563 Quantitative Paleontology. SQL databases. An introduction. Department of Geological Sciences Indiana University. (c) 2012, P.

G563 Quantitative Paleontology. SQL databases. An introduction. Department of Geological Sciences Indiana University. (c) 2012, P. SQL databases An introduction AMP: Apache, mysql, PHP This installations installs the Apache webserver, the PHP scripting language, and the mysql database on your computer: Apache: runs in the background

More information

2. Distributed Handwriting Recognition. Abstract. 1. Introduction

2. Distributed Handwriting Recognition. Abstract. 1. Introduction XPEN: An XML Based Format for Distributed Online Handwriting Recognition A.P.Lenaghan, R.R.Malyan, School of Computing and Information Systems, Kingston University, UK {a.lenaghan,r.malyan}@kingston.ac.uk

More information

Language Interface for an XML. Constructing a Generic Natural. Database. Rohit Paravastu

Language Interface for an XML. Constructing a Generic Natural. Database. Rohit Paravastu Constructing a Generic Natural Language Interface for an XML Database Rohit Paravastu Motivation Ability to communicate with a database in natural language regarded as the ultimate goal for DB query interfaces

More information

11.1 What is Project Management? Object-Oriented Software Engineering Practical Software Development using UML and Java. What is Project Management?

11.1 What is Project Management? Object-Oriented Software Engineering Practical Software Development using UML and Java. What is Project Management? 11.1 What is Project Management? Object-Oriented Software Engineering Practical Software Development using UML and Java Chapter 11: Managing the Software Process Project management encompasses all the

More information

Compiler I: Syntax Analysis Human Thought

Compiler I: Syntax Analysis Human Thought Course map Compiler I: Syntax Analysis Human Thought Abstract design Chapters 9, 12 H.L. Language & Operating Sys. Compiler Chapters 10-11 Virtual Machine Software hierarchy Translator Chapters 7-8 Assembly

More information

Technical Report. The KNIME Text Processing Feature:

Technical Report. The KNIME Text Processing Feature: Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold Killian.Thiel@uni-konstanz.de Michael.Berthold@uni-konstanz.de Copyright 2012 by KNIME.com AG

More information

Special Topics in Computer Science

Special Topics in Computer Science Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS

More information

GCE APPLIED ICT A2 COURSEWORK TIPS

GCE APPLIED ICT A2 COURSEWORK TIPS GCE APPLIED ICT A2 COURSEWORK TIPS COURSEWORK TIPS A2 GCE APPLIED ICT If you are studying for the six-unit GCE Single Award or the twelve-unit Double Award, then you may study some of the following coursework

More information

Comprendium Translator System Overview

Comprendium Translator System Overview Comprendium System Overview May 2004 Table of Contents 1. INTRODUCTION...3 2. WHAT IS MACHINE TRANSLATION?...3 3. THE COMPRENDIUM MACHINE TRANSLATION TECHNOLOGY...4 3.1 THE BEST MT TECHNOLOGY IN THE MARKET...4

More information

AN ARCHITECTURE OF AN INTELLIGENT TUTORING SYSTEM TO SUPPORT DISTANCE LEARNING

AN ARCHITECTURE OF AN INTELLIGENT TUTORING SYSTEM TO SUPPORT DISTANCE LEARNING Computing and Informatics, Vol. 26, 2007, 565 576 AN ARCHITECTURE OF AN INTELLIGENT TUTORING SYSTEM TO SUPPORT DISTANCE LEARNING Marcia T. Mitchell Computer and Information Sciences Department Saint Peter

More information

TREC 2003 Question Answering Track at CAS-ICT

TREC 2003 Question Answering Track at CAS-ICT TREC 2003 Question Answering Track at CAS-ICT Yi Chang, Hongbo Xu, Shuo Bai Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China changyi@software.ict.ac.cn http://www.ict.ac.cn/

More information

Integrating NLTK with the Hadoop Map Reduce Framework 433-460 Human Language Technology Project

Integrating NLTK with the Hadoop Map Reduce Framework 433-460 Human Language Technology Project Integrating NLTK with the Hadoop Map Reduce Framework 433-460 Human Language Technology Project Paul Bone pbone@csse.unimelb.edu.au June 2008 Contents 1 Introduction 1 2 Method 2 2.1 Hadoop and Python.........................

More information

Name of pattern types 1 Process control patterns 2 Logic architectural patterns 3 Organizational patterns 4 Analytic patterns 5 Design patterns 6

Name of pattern types 1 Process control patterns 2 Logic architectural patterns 3 Organizational patterns 4 Analytic patterns 5 Design patterns 6 The Researches on Unified Pattern of Information System Deng Zhonghua,Guo Liang,Xia Yanping School of Information Management, Wuhan University Wuhan, Hubei, China 430072 Abstract: This paper discusses

More information

Oracle Service Bus Examples and Tutorials

Oracle Service Bus Examples and Tutorials March 2011 Contents 1 Oracle Service Bus Examples... 2 2 Introduction to the Oracle Service Bus Tutorials... 5 3 Getting Started with the Oracle Service Bus Tutorials... 12 4 Tutorial 1. Routing a Loan

More information

Online Enrollment and Administration System

Online Enrollment and Administration System FYP Proposal Report Real World Database Development by Kong Koon Kit Chan Yin Mo Leung Shiu Hong Advised by Prof. Frederick H. Lochovsky Submitted in partial fulfillment of the requirements for COMP 4981

More information

The Development of Multimedia-Multilingual Document Storage, Retrieval and Delivery System for E-Organization (STREDEO PROJECT)

The Development of Multimedia-Multilingual Document Storage, Retrieval and Delivery System for E-Organization (STREDEO PROJECT) The Development of Multimedia-Multilingual Storage, Retrieval and Delivery for E-Organization (STREDEO PROJECT) Asanee Kawtrakul, Kajornsak Julavittayanukool, Mukda Suktarachan, Patcharee Varasrai, Nathavit

More information

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania oananicolae1981@yahoo.com

More information

D2.4: Two trained semantic decoders for the Appointment Scheduling task

D2.4: Two trained semantic decoders for the Appointment Scheduling task D2.4: Two trained semantic decoders for the Appointment Scheduling task James Henderson, François Mairesse, Lonneke van der Plas, Paola Merlo Distribution: Public CLASSiC Computational Learning in Adaptive

More information

ABET General Outcomes. Student Learning Outcomes for BS in Computing

ABET General Outcomes. Student Learning Outcomes for BS in Computing ABET General a. An ability to apply knowledge of computing and mathematics appropriate to the program s student outcomes and to the discipline b. An ability to analyze a problem, and identify and define

More information

Firewall Builder Architecture Overview

Firewall Builder Architecture Overview Firewall Builder Architecture Overview Vadim Zaliva Vadim Kurland Abstract This document gives brief, high level overview of existing Firewall Builder architecture.

More information

A Survey of Software Development Process Models in Software Engineering

A Survey of Software Development Process Models in Software Engineering , pp. 55-70 http://dx.doi.org/10.14257/ijseia.2015.9.11.05 A Survey of Software Development Process Models in Software Engineering Iqbal H. Sarker 1, Faisal Faruque 1, Ujjal Hossen 2 and Atikur Rahman

More information

Encoding Library of Congress Subject Headings in SKOS: Authority Control for the Semantic Web

Encoding Library of Congress Subject Headings in SKOS: Authority Control for the Semantic Web Encoding Library of Congress Subject Headings in SKOS: Authority Control for the Semantic Web Corey A Harper University of Oregon Libraries Tel: +1 541 346 1854 Fax:+1 541 346 3485 charper@uoregon.edu

More information

Efficiency Considerations of PERL and Python in Distributed Processing

Efficiency Considerations of PERL and Python in Distributed Processing Efficiency Considerations of PERL and Python in Distributed Processing Roger Eggen (presenter) Computer and Information Sciences University of North Florida Jacksonville, FL 32224 ree@unf.edu 904.620.1326

More information

HND degree top-up students perceptions of their experience at the University of Worcester: how can future students experiences be improved?

HND degree top-up students perceptions of their experience at the University of Worcester: how can future students experiences be improved? HND degree top-up students perceptions of their experience at the University of Worcester: how can future students experiences be improved? Lerverne Barber and Dr Nick Breeze University of Worcester (l.barber@worc.ac.uk

More information

Automatic Timeline Construction For Computer Forensics Purposes

Automatic Timeline Construction For Computer Forensics Purposes Automatic Timeline Construction For Computer Forensics Purposes Yoan Chabot, Aurélie Bertaux, Christophe Nicolle and Tahar Kechadi CheckSem Team, Laboratoire Le2i, UMR CNRS 6306 Faculté des sciences Mirande,

More information

Shallow Parsing with Apache UIMA

Shallow Parsing with Apache UIMA Shallow Parsing with Apache UIMA Graham Wilcock University of Helsinki Finland graham.wilcock@helsinki.fi Abstract Apache UIMA (Unstructured Information Management Architecture) is a framework for linguistic

More information

HELP DESK SYSTEMS. Using CaseBased Reasoning

HELP DESK SYSTEMS. Using CaseBased Reasoning HELP DESK SYSTEMS Using CaseBased Reasoning Topics Covered Today What is Help-Desk? Components of HelpDesk Systems Types Of HelpDesk Systems Used Need for CBR in HelpDesk Systems GE Helpdesk using ReMind

More information

Comparative Analysis Report:

Comparative Analysis Report: Comparative Analysis Report: Visualization Tools & Platforms By Annabel Weiner, Erol Basusta, Leah Wilkinson, and Quenton Oakes Table of Contents Executive Summary Introduction Assessment Criteria Publishability

More information

An Integrated Framework for Hospital Appointment Management Mohammed Jamal Anwar Computer Science with Operational Research (Industry) 2008/2009

An Integrated Framework for Hospital Appointment Management Mohammed Jamal Anwar Computer Science with Operational Research (Industry) 2008/2009 An Integrated Framework for Hospital Appointment Management Mohammed Jamal Anwar Computer Science with Operational Research (Industry) 2008/2009 The candidate confirms that the work submitted is their

More information

Information Systems Analysis and Design CSC340. 2004 John Mylopoulos. Software Architectures -- 1. Information Systems Analysis and Design CSC340

Information Systems Analysis and Design CSC340. 2004 John Mylopoulos. Software Architectures -- 1. Information Systems Analysis and Design CSC340 XIX. Software Architectures Software Architectures UML Packages Client- vs Peer-to-Peer Horizontal Layers and Vertical Partitions 3-Tier and 4-Tier Architectures The Model-View-Controller Architecture

More information

Software Engineering Reference Framework

Software Engineering Reference Framework Software Engineering Reference Framework Michel Chaudron, Jan Friso Groote, Kees van Hee, Kees Hemerik, Lou Somers, Tom Verhoeff. Department of Mathematics and Computer Science Eindhoven University of

More information

Dobbin Day - User Guide

Dobbin Day - User Guide Dobbin Day - User Guide Introduction Dobbin Day is an in running performance form analysis tool. A runner s in-running performance is solely based on the price difference between its BSP (Betfair Starting

More information

THE BCS PROFESSIONAL EXAMINATIONS Certificate in IT. October 2006. Examiners Report. Information Systems

THE BCS PROFESSIONAL EXAMINATIONS Certificate in IT. October 2006. Examiners Report. Information Systems THE BCS PROFESSIONAL EXAMINATIONS Certificate in IT October 2006 Examiners Report Information Systems General Comments The pass rate for Section A was disappointing, being lower than previously. One reason

More information

Unit 1 Learning Objectives

Unit 1 Learning Objectives Fundamentals: Software Engineering Dr. Rami Bahsoon School of Computer Science The University Of Birmingham r.bahsoon@cs.bham.ac.uk www.cs.bham.ac.uk/~rzb Office 112 Y9- Computer Science Unit 1. Introduction

More information

CUSTOMER Presentation of SAP Predictive Analytics

CUSTOMER Presentation of SAP Predictive Analytics SAP Predictive Analytics 2.0 2015-02-09 CUSTOMER Presentation of SAP Predictive Analytics Content 1 SAP Predictive Analytics Overview....3 2 Deployment Configurations....4 3 SAP Predictive Analytics Desktop

More information

Database Optimizing Services

Database Optimizing Services Database Systems Journal vol. I, no. 2/2010 55 Database Optimizing Services Adrian GHENCEA 1, Immo GIEGER 2 1 University Titu Maiorescu Bucharest, Romania 2 Bodenstedt-Wilhelmschule Peine, Deutschland

More information

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. White Paper Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. Using LSI for Implementing Document Management Systems By Mike Harrison, Director,

More information

1 Topic. 2 Scilab. 2.1 What is Scilab?

1 Topic. 2 Scilab. 2.1 What is Scilab? 1 Topic Data Mining with Scilab. I know the name "Scilab" for a long time (http://www.scilab.org/en). For me, it is a tool for numerical analysis. It seemed not interesting in the context of the statistical

More information