Classification of Natural Language Interfaces to Databases based on the Architectures

Volume 1, No. 11, ISSN 2278-1080 The International Journal of Computer Science & Applications (TIJCSA) RESEARCH PAPER Available Online at http://www.journalofcomputerscience.com/ Classification of Natural Language Interfaces to Databases based on the Architectures S.AQUTER BABU Asst. Professor Dept. of Computer Science Dravidian University Kuppam, India s_a_babu1@yahoo.co.in D. MABUNI Asst. Professor Dept. of Computer Science Dravidian University Kuppam, India mabuni.d@gmail.com Prof. C. LOKANATHA REDDY Professor Dept. of Computer Science Dravidian University Kuppam, India Abstract Natural Language Interface to Database (NLITDB) system is an interface to a database where an user submits his/her request to retrieve some information from a database in natural language like English. A NLITDB system accepts questions in natural language and generates results. Generally, users have to learn a Query Language such as Structured Query Language (SQL) to formulate a query and to retrieve information from a database. Learning a Query Language such as Structured Query Language (SQL) is difficult for many non-technical database users. A solution for this problem is to make use of NLITDB to retrieve information from database. Nowadays, the importance of NLITDB system is gained because of the increasing interaction of non-technical users with databases. Many NLITDB systems were developed since 1960 s. Each NLITDB system used an architecture to process the natural language question submitted by the user. In this paper, We classify and review the existing NLITDB systems based on the architectures adopted by them. However, the outcomes of this classification are dedicated to the researchers in NILITDB systems to know which architectures were used more in the development of NLITDB systems. Keywords- Databases, Natural Language Interface to Database (NLITDB), Architecture, Structured Query language (SQL). 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 51

1. Introduction One of the main characteristics of a Database Management System (DBMS) is to allow users to create and maintain a Database. Database is an organized collection of logically related data. Nowadays, many non-technical people also are interacting with databases. DBMSs provide a query language such as Structured Query language (SQL) for the users to formulate queries and to retrieve information from a database. It is difficult for the non-technical people to formulate a query in query language such as SQL to retrieve information from a database because of the lack of knowledge about database structure, SQL syntax etc. Natural Language Interface to Database (NLITDB) systems were developed since 1960 s to solve the problem of formulating queries in query language such as SQL to retrieve the information from a database. NLITDB systems allow the users to submit their request to retrieve information from the database in natural language such as English. NLITDB system accepts questions in natural language and these user questions are translated into a query language such as SQL, which are processed by the DBMS to retrieve the answers. In this paper, We classify and review the existing NLITDB systems based on the architectures adopted by them. However, the outcomes of this classification are dedicated to the researchers in NILITDB systems to know which architectures were used more in the development of NLITDB systems. The rest of the paper is organized as follows: section 2 presents an overview of different architectures adopted by many NLITDB systems. Section 3 discusses the classification of NLITDB systems based on the architectures adopted by them. Section 4 presents the results and Finally, section 5 concludes the paper. 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 52

2. Types of Architectures The following four types of Architectures were used in the development of many NLITDB systems [1]. Each architecture in NLITDB system reflects different choices of what information is to be applied and in what manner. Pattern-Matching systems Syntax-based systems Semantic Grammar systems Intermediate Representation Languages (IRL) 2.1) Pattern-Matching systems Some of the early NLITDB systems relied on pattern-matching techniques to answer the user's questions. To illustrate a simplistic pattern-matching approach, consider a database table holding information about countries: Countries table Country Capital Language --------- ------- --------- France Paris French Italy Rome Italian......... A primitive pattern-matching system could use rules like: pattern:... ``capital''... <country> action : Report Capital of row where Country = <country> The above rule says that if a user's request contains the word ``capital'' followed by a country name (i.e. a name appearing in the Country column), then the system should locate the row which contains the country name, and print the corresponding capital. 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 53

If, for example, the user typed ``What is the capital of Italy?'', the system would use the above pattern rule, and report ``Rome''. The same rule would allow the system to handle ``Print the capital of Italy.'', ``Could you please tell me what is the capital of Italy?'', etc. In all cases the same response would have been generated. The main advantage of the pattern-matching approach is its simplicity: no elaborate parsing and interpretation modules are needed, and the systems are easy to implement. The pattern-matching architecture was used in one of the NLITDB systems SAVVY. 2.2) Syntax-based systems Syntax based systems are based on the idea of extending syntactic parsers with semantic labels. A sentence is parsed using certain grammar rules resulting in a syntactic tree, some of the nodes in the tree are then mapped to their semantic meaning, and these semantic meanings are further combined to produce the corresponding database query in database query language such as SQL. The main advantage of using syntax based approaches is that they provide detailed information about the structure of a sentence. A parse tree contains a lot of information about the sentence structure; starting from a single word and its part of speech, how words can be grouped together to form a phrase, how phrases can be grouped together to form more complex phrases, until a complete sentence is built. Having this information, we can map the semantic meanings to certain production rules (or nodes in a parse tree). The Syntax based systems architecture was used in the NLITDB systems like LUNAR, NALIX etc. 2.3) Semantic Grammar systems A Semantic grammar system is very similar to the syntax based system, meaning that the query result is obtained by mapping the parse tree of a sentence to a database query in database query language such as SQL. The basic idea of a semantic grammar system is to 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 54

simplify the parse tree as much as possible, by removing unnecessary nodes or combining some nodes together. Based on this idea, the semantic grammar system can better reflect the semantic representation without having complex parse tree structures. Therefore, a production rule in a semantic grammar system does not necessarily correspond to the general syntactic concepts. Instead of smaller structures, the semantic grammar approach also provides a special way for assigning a name to a certain node in the tree, thus resulting in less ambiguity compared to the syntax based approach. The Semantic grammar systems architecture was used in the NLITDB systems like PLANES, LADDER, REL etc. 2.4) Intermediate Representation Languages (IRL) Due to the difficulties of directly translating a sentence into a general database query languages using a syntax based approach, the intermediate representation systems were proposed. The idea is to map a sentence into a logical query language first, and then further translate this logical query language into a general database query language, such as SQL. In the process there can be more than one intermediate meaning representation language. The following Figure shows a possible architecture of an intermediate representation language system. 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 55

The Intermediate Representation Languages (IRL) architecture was used in the NLITDB systems like CHAT-80, PHILIQA, TEAM etc. 3. Classification of NLITDB Systems Each NLITDB system used an architecture to process the natural language question submitted by the user. We have collected information about Twenty One existing NLITDB systems through research papers published and available in the Internet. After studying and analyzing these NLITDB systems, We have classified them into different categories based on the architectures adopted by them. The following NLITDB system adopted Pattern-Matching systems architecture. SAVVY The following NLITDB systems adopted Syntax-based systems architecture. LUNAR NALIX The following NLITDB systems adopted Semantic-Grammar systems architecture. LADDER RENDEZVOUS PLANES REL EUFID ELF EASYASK ENGLISH QUERY The following NLITDB systems adopted Intermediate-Representation Languages (IRL) architecture. PHILIQA CHAT-80 TEAM IRUS Ginsparg s JANUS LOQUI MASQUE/SQL EDITE CLE 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 56

The following graph shows the above classification Classification of NLITDB Systems Number of NLITDB Systems 12 10 8 6 4 2 0 Pattern-Matching systems Syntax-based systems Sematic- Grammar systems Intermediate- Representation Languages Architectures 4. Results Based on our study and analysis about NLITDB systems, We came to know about twenty one existing NLITDB systems and their architectures. We also came to know that most of the NLITDB systems have adopted Semantic-Grammar systems and Intermediate-Representation Languages Architectures. 5. Conclusion Natural Language Interface to Database (NLITDB) system allows database users to formulate questions in natural language like English to retrieve information from a database. Users questions are translated into database query language such as SQL, which is processed by a DBMS to return the answer. Many NLITDB systems were developed since 1960 s with different architectures. In this paper, We have classified twenty one existing NLITDB systems based on four main architectures adopted by them. Based on our study and analysis about NLITDB systems, We conclude that most of the NLITDB systems have adopted Semantic-Grammar systems and Intermediate-Representation Languages Architectures. References [1] I. Androutsopoulos, G.D. Ritchie, and P. Thanisch, Natural Language Interfaces to Databases An Introduction, Journal of Natural Language Engineering 1 Part 1 (1995), 29--81. [2] Eric Brill, Transformation Based Error Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging, ACL (1995). 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 57

[3] Eugene Charniak, A maximum-entropy-inspired parser, North American Association for Computational Linguistics (2000), 132--139. [4] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Cli#ord Stein, Introduction to Algorithms, Second Edition, MIT Press and McGrawHill, 2001. [5] D.R. Dowty, R.E. Wall, and S. Peters, Introduction to montague semantics, D.Reidel Publishing Company, Dordrecht, Holland, 1981. [6] G. Hendrix, E. Sacerdoti, D. Sagalowicz, and J. Slocum, Developing a Natural Language Interface to Complex Data, ACM Transactions on Database Systems (1978), 105--147. [7] Daniel Jurafsky and James H. Martin, Speech and Natural Language Processing, PrenticeHall Inc., Upper Saddle River, New Jersey, 2000. [8] Rohit J. Kate and Raymond J. Mooney, Using StringKernels for Learning Semantic Parsers, COLING ACL (2006). [9] Yunyao Li, Huahai Yang, and H.V. Jagadish, Nalix:an Interactive Natural Language Interface for Query ing XML, SIGMOD (2005). [10] Yunyao Li, Huahai Yang, and H.V. Jagadish, Constructing a Generic Natural Language Interface for an XML Database, EDBT (2006). [11] Raymond J. Mooney, Learning Language from Perceptual Context:A Challenge Problem for AI, Amer ican Association for Artificial Intelligence (2006). [12] AnaMaria Popescu, Alex Armanasu, Oren Etzioni, David Ko, and Alexander Yates, Modern Natural Language Interfaces to Databases:Composing Statistical Parsing with Semantic Tractability, COLING (2004). [13] Woods, W. (1973). An experimental parsing system for transition network grammars in Natural Language Processing, R. Rustin. Ed., Algorithmic Press, New York. [14] B.J. Grosz, TEAM: A Transportable Natural Language Interface System, In Proceedings of the 1 st Conference on Applied Natural Language Processing, Santa Monica, California, (1983), pp 39-45. [15] P. Resnik, Access to Multiple Underlying Systems in JANUS, BBN report 7142, Bolt Beranek and Newman inc., Cambridge, Massachusetts, (September, 1989). * * * 2012, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 58