Research Portfolio. Beáta B. Megyesi January 8, 2007
|
|
- Angelina Hampton
- 8 years ago
- Views:
Transcription
1 Research Portfolio Beáta B. Megyesi January 8, 2007 Research Activities Research activities focus on mainly four areas: Natural language processing During the last ten years, since I started my academic career in computational linguistics, my main research topic concerns corpus linguistics, part-of-speech tagging, morphological analysis, and shallow syntactic analysis (e.g. chunking, parsing) mainly by using machine learning techniques for Swedish, English as well as for Hungarian. I am also interested in using rule-based finitestate techniques to build shallow syntactic analyzer for Swedish that provides both phrase-structure and dependency analysis for Swedish. My work within NLP has been published both at national and international conferences, see papers: 1, 2, 3, 4, 5, 8, 9, 10, 14, and 15 under Publications. Speech research Within speech research at KTH, the main purpose was to improve speech synthesis for Swedish. For a better sounding text-to-speech system, in depth analysis is needed to find out the relationship between prosodic and linguistic structure. Therefore, I studied the relationship between prosody in terms of prosodic breaks and linguistic structure in various speaking styles, both in spontaneous and non-spontaneous speech in different communicative situations. The results are published mainly at well-known international conferences, see papers: 6, 7, 11, 12, 13, 15, 16, 17 under Publications. Parallel corpora and machine translation The last two years, I have been working on the development of a parallel corpus between Swedish and Turkish that can be used for machine translation as well as for linguistic analysis of the languages involved. This work has been carried out within the project Supporting research environment for minor languages (Classic, Turkish and Hindi) and serves as a pilot project aiming at developing methods to automatically build parallel corpora between various language pairs that might belong to different language types. The outcome of the pilot project is presented in paper 19 under Publications.
2 Text categorization During the last year, I begun to look into the automatic classification of texts into genres and text types using machine learning techniques where focus is put on knowledge representation to explore linguistic features, both semantic (by means of automatically extracted keywords) and morpho-syntactic features (e.g. part-of-speech, syntactic phrases and depth), see papers: 18, 20 and 21 under Publications. Publications Reviewed Papers The papers are available at bea/ 1. Megyesi, B Brill s PoS Tagger with Extended Lexical Templates for Hungarian. Workshop (W01) on Machine Learning in Human Language Technology, ACAI 99, Chania, Crete, Greece July 5 - July 16, Megyesi, B Improving Brill s PoS Tagger for an Agglutinative Language. In Proceedings of the Joint Sigdat Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/- VLC 99), pp , University of Maryland, USA, June 21 22, Megyesi, B. & Rydin, S Towards a Finite-State Parser for Swedish. In Proceedings of NoDaLiDa 1999, pp , Trondheim, Norway, December 9 10, Berthelsen, H. & Megyesi, B Ensemble of Classifiers for Noise Detection in PoS Tagged Corpora. In Proceedings of the Third International Workshop on TEXT, SPEECH and DIALOGUE, pp , Brno, Czech Republic, September 13 16, Springer-Verlag in LNCS/LNAI series. 5. Megyesi, B Data-Driven Methods for PoS tagging and Chunking of Swedish. In Proceedings of NoDaLiDa 2001, Uppsala, Sweden, May 21 22, Gustafson-Čapková, S. & Megyesi, B A Comparative Study of Pauses in Dialogues and Read Speech. In Proceedings of Eurospeech 2001, Volume 2, pp , Aalborg, Denmark, September 3 7, Megyesi, B. & Gustafson-Čapková, S Pausing in Dialogues and Read Speech: Speakers Production and Listeners Interpretation. In Proceedings of the Workshop on Prosody in Speech Recognition and Understanding, pp , New Jersey, USA, October 22 24, Megyesi, B Comparing Data-Driven Learning Algorithms for PoS Tagging of Swedish. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2001), pp , Carnegie Mellon University, Pittsburgh, PA, USA, June 3 4,
3 9. Megyesi, B Phrasal Parsing by Using Data-Driven PoS Taggers. In Proceedings of the Conference on Recent Advances in Natural Language Processing, Euro Conference RANLP-2001, pp , Tzigov Chark, Bulgaria, September 5 7, Megyesi, B Shallow Parsing with PoS Taggers and Linguistic Features. Journal of Machine Learning Research: Special Issue on Shallow Parsing, JMLR (2), pp , MIT Press. 11. Gustafson-Čapková, S. & Megyesi, B Silence and Discourse Context in Read Speech and Dialogues in Swedish. In Proceedings of the Speech Prosody 2002 conference, Bernard Bel & Isabelle Marlien (eds.), pp , Aix-en-Provence, France, April 11 13, Carlson, R., Granström, B., Heldner, M., House, D., Megyesi, B., Strangert, E. & Swerts, M Boundaries and groupings the structuring of speech in different communicative situations: a description of the GROG project. In Proceedings of Fonetik 2002, TMH-QPSR Volume 44, pp , Stockholm, Sweden, May 29 31, Megyesi, B. & Gustafson-Čapková, S Production and Perception of Pauses and their Linguistic Context in Read and Spontaneous Speech in Swedish. In Proceedings of ICSLP th International Conference on Spoken Language Processing, Denver, USA, September 16 20, Megyesi, B & Carlson, R Data-Driven Methods for Building a Swedish Treebank. Extended abstract to the Swedish Treebank Symposium, November 2002, Växjö University, Sweden 15. Megyesi, B Data-Driven Syntactic Analysis Methods and Applications for Swedish. Doctoral Dissertation, Department of Speech, Music and Hearing, Kungliga Tekniska Högskolan 16. Heldner, M. & Megyesi, B Exploring the Prosody-Syntax Interface. In Proceeding of the 15th International Congress of Phonetic Sciences (ICPhS), 2-9 August 2003, Barcelona, Spain 17. Heldner, M. & Megyesi, B The Acoustic and Morpho-Syntactic Context of Prosodic Boundaries in Dialogs. In Proceeding of Fonetik 2003, 2-3 June 2003, Umeå, Sweden 18. Wastholm, P., Kusma, A., & Megyesi, B Using Linguistic Data for Genre Classification. In Proceedings of Swedish Artificial Intelligence and Learning Systems SAIS-SSLS Mälardalen University, Västerås. Sweden. 19. Bandmann Megyesi, B., Sågvall Hein, A. and Csató Johansson, É. (2006). Building a Swedish-Turkish Parallel Corpus. In Proceedings of Language Resources and Evaluation Conference LREC May 22-28, Genoa, Italy. 3
4 20. Hulth, A. & Megyesi, B. (2006). A Study on Automatically Extracted Keywords in Text Categorization. In Proceedings of Association for Computational Linguistics ACL 2006 June 17 23, Sydney, Australia. where 1 and 2, as well as 16 and 17 can be considered as the same papers. The PhD thesis, number 15, is partly based on the papers previously published. Other material 1. Megyesi, B A Short Descriptive Grammar for Hungarian. Dept. of Linguistics, Stockholm University. 2. Megyesi, B Brill s Transformation-Based Tagger. Dept. of Linguistics, Stockholm University. Supervised Research During the years, I have supervised several master thesis in computational linguistics and tutored and supervised project work in my courses (see also Pedagogical portfolio for a more detailed description). One of the projects on automatic text categorization was of high quality and I together with my students extended their work and wrote the paper Using Linguistic Data for Genre Classification by Wastholm, Kusma and Megyesi in 2005, see under Publications. I arranged a Ph.D. course in Perl programming at the Department of Linguistics, Stockholm University during Spring The work included planning of the course and part of lecturing. I was an invited speaker at the Swedish National Graduate School of Language Technology in 2003 and gave a talk about Phrasal Parsing by Using Data-Driven PoS Taggers. Also, I participated as a lecturer at the International PhD course on treebanks, arranged by Stockholm University in Cooperations and founding As I mentioned in the introduction, my research activities have been carried out partly by myself, and partly in cooperation with other researchers. My work on data-driven tagging and chunking of Swedish was performed by myself alone. I worked with prof. Ralph Grishman and Roman Yangerber at New York University in 1999 where I was a visiting researcher and was working on information extraction. I participated in the implementation of a new domain (natural disasters) to the Proteus information extraction system. The visit was supported by STINT, the Swedish Foundation for International Cooperation in Research and Higher Education. At Stockholm University, I cooperated with my colleagues Sara Rydin (1999) on rule-based shallow parsing of Swedish, with Harald Berthelsen (2000) on automatically finding and filtering annotation errors in English by using ensemble methods, and Sofia Gustafson-Capkova ( ) on the relation between 4
5 prosodic (in terms of pausing) and linguistic structure (on morphological, syntactic and discourse level). In all work, both authors were fully participating in the projects. At CTT, TMH, KTH I worked with prof. Rolf Carlson, prof. Björn Granström, Dr David House, Dr Mattias Heldner, and with prof. Eva Strangert at Umeå University ( ), and Dr Marc Swerts (2002) within the project Gräns och gruppering Strukturering av talet i olika kommunikativa situationer lead by prof. Eva Strangert and financed by the Swedish Research Council ( ). My role in the project was to build a corpus by collecting the material and annotating it by means of prosodic phrases as well as on various linguistic levels, e.g. part-of-speech, and phrase structure information, and run statistical analysis to determine the relationship between the prosodic and linguistic structure. I was one of the initiators to the Swedish Treebank Symposium and the Nordic treebank network in 2002 and 2003 together with prof. Joakim Nivre and prof. Martin Volk. Unfortunately, I was not able to follow the project as I became a mother to twins and was on parental leave from September 2003 to September Furthermore, I worked with Dr Anette Hulth on text categorization where my main role was to provide the linguistic analysis needed in the knowledge representation phase for the categorization and run the machine learning algorithm to build the models and evaluate these. Currently, I participate in the project Supporting research environment for minor languages (Classic, Turkish and Hindi) supported by the Swedish Research Council and the Faculty of Languages at Uppsala University. I work 10% of my time together with prof. Anna Sågvall Hein, prof. Éva Csató Johanson and Dr Bengt Dahlqvist on building a Swedish Turkish parallel corpus to be used in linguistic research and machine translation. My work includes, besides administrative work such as maintaining the project page, corpus collection, normalization, annotation, and alignment. Prof. Kemal Oflazer at Sabanci University in Istanbul is also connected to the project as he provides the morpho-syntactic analysis of the Turkish material. Also, I am working in the newly founded project Methods and tools for automatic grammar extraction (Metoder och verktyg för automatisk grammatikextraktion) supported by the Swedish Research Council during the period 2006 and 2009 with prof. Anna Sågvall Hein (project leader) and prof. Joakim Nivre. Presentations Papers presented at international conferences/workshops: Brill s PoS Tagger with Extended Lexical Templates for Hungarian. Workshop (W01) on Machine Learning in Human Language Technology, ACAI 99, Greece, Towards a Finite-State Parser for Swedish. NoDaLiDa 1999, pp , Norway,
6 Ensemble of Classifiers for Noise Detection in PoS Tagged Corpora. Third International Workshop on TEXT, SPEECH and DIALOGUE, pp , Brno, Czech Republic, Data-Driven Methods for PoS tagging and Chunking of Swedish. NoDaLiDa 2001, Sweden, Pausing in Dialogues and Read Speech: Speakers Production and Listeners Interpretation. Workshop on Prosody in Speech Recognition and Understanding, NJ, USA, Comparing Data-Driven Learning Algorithms for PoS Tagging of Swedish. Conference on Empirical Methods in Natural Language Processing (EMNLP 2001), Carnegie Melon University, PA, USA, Phrasal Parsing by Using Data-Driven PoS Taggers. Conference on Recent Advances in Natural Language Processing, Euro Conference RANLP- 2001, Bulgaria, Silence and Discourse Context in Read Speech and Dialogues in Swedish. Speech Prosody 2002 conference, France, Production and Perception of Pauses and their Linguistic Context in Read and Spontaneous Speech in Swedish. ICSLP th International Conference on Spoken Language Processing, Colorado, USA, Data-Driven Methods for Building a Swedish Treebank. Swedish Treebank Symposium, November 2002, Växjö University, Sweden Building a Swedish-Turkish Parallel Corpus. Language Resources and Evaluation Conference LREC 2006, May 22-28, Genoa, Italy. During the period October 2000 and June 2003, I gave talks about the ongoing research within Natural Language Processing at CTT, KTH for CTT s industrial partners, researchers as well as for international reviewers at least two or three times a year. Invited Lectures DSV, Stockholm University, 1999 New York University, New York, 1999 Copenhagen Business School, Denmark, 2001 Swedish National Graduate School of Language Technology, 2003 International PhD course on treebanks, Stockholm University, 2004 Lund University,
7 Professional Activities Session chair for Named Entity Recognition at the Conference on Recent Advances in Natural Language Processing, Euro Conference RANLP-2001, pp , Tzigov Chark, Bulgaria, September 5 7, 2001 Member of the program and organizing committee of the Swedish Treebank Symposium, November 2002, Växjö University, Sweden; with Joakim Nivre (chair), Rolf Carlson, Lars Ahrenberg, and Lars Borin. Member of the program committee for the 41st Annual Meeting of the Association for Computational Linguistics (ACL) conference 2003 Member of the program committee for Nordiska Datalingvistdagarna, Nodalida 2003 Member of the program committee for the Conference on Recent Advances in Natural Language Processing (RANLP) 2003 Reviewer for the Journal of Natural Language Engineering 2004 Member of the organizing and program committee for the Language Technology Conference, October 20-21, 2005, Uppsala University Academic Honors Young Researcher Award for the paper entitled Phrasal Parsing by Using Data-Driven PoS Taggers. received at the Euro Conference Recent Advances in Natural Language Processing, RANLP September 2001, Tzigov Chark, Bulgaria. 7
The English-Swedish-Turkish Parallel Treebank
The English-Swedish-Turkish Parallel Treebank Beáta Megyesi, Bengt Dahlqvist, Éva Á. Csató and Joakim Nivre Department of Linguistics and Philology, Uppsala University first.last@lingfil.uu.se Abstract
More informationTesting Data-Driven Learning Algorithms for PoS Tagging of Icelandic
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged
More informationProgram curriculum for graduate studies in Speech and Music Communication
Program curriculum for graduate studies in Speech and Music Communication School of Computer Science and Communication, KTH (Translated version, November 2009) Common guidelines for graduate-level studies
More informationRobust Methods for Automatic Transcription and Alignment of Speech Signals
Robust Methods for Automatic Transcription and Alignment of Speech Signals Leif Grönqvist (lgr@msi.vxu.se) Course in Speech Recognition January 2. 2004 Contents Contents 1 1 Introduction 2 2 Background
More informationNational Masters School in Language Technology
National Masters School in Language Technology GSLT May 19, 2009 Introduction Sweden is a relatively small country and competence in language technology is spread over a number of academic institutions
More informationSpecial Topics in Computer Science
Special Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology INTRODUCTION Jong C. Park, CS
More informationBrill s rule-based PoS tagger
Beáta Megyesi Department of Linguistics University of Stockholm Extract from D-level thesis (section 3) Brill s rule-based PoS tagger Beáta Megyesi Eric Brill introduced a PoS tagger in 1992 that was based
More informationModule Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
More informationProsodic Phrasing: Machine and Human Evaluation
Prosodic Phrasing: Machine and Human Evaluation M. Céu Viana*, Luís C. Oliveira**, Ana I. Mata***, *CLUL, **INESC-ID/IST, ***FLUL/CLUL Rua Alves Redol 9, 1000 Lisboa, Portugal mcv@clul.ul.pt, lco@inesc-id.pt,
More informationNamed Entity Recognition Experiments on Turkish Texts
Named Entity Recognition Experiments on Dilek Küçük 1 and Adnan Yazıcı 2 1 TÜBİTAK - Uzay Institute, Ankara - Turkey dilek.kucuk@uzay.tubitak.gov.tr 2 Dept. of Computer Engineering, METU, Ankara - Turkey
More informationProcessing: current projects and research at the IXA Group
Natural Language Processing: current projects and research at the IXA Group IXA Research Group on NLP University of the Basque Country Xabier Artola Zubillaga Motivation A language that seeks to survive
More informationSearch and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
More informationInternet as corpus: automatic construction of a Swedish news corpus
Department of Numerical Analysis and Computing Science TRITA-NA-P0117 IPLab-195 Internet as corpus: automatic construction of a Swedish news corpus Martin Hassel Interaction and Presentation Laboratory
More informationEnglish Grammar Checker
International l Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 English Grammar Checker Pratik Ghosalkar 1*, Sarvesh Malagi 2, Vatsal Nagda 3,
More information31 Case Studies: Java Natural Language Tools Available on the Web
31 Case Studies: Java Natural Language Tools Available on the Web Chapter Objectives Chapter Contents This chapter provides a number of sources for open source and free atural language understanding software
More informationLearning Morphological Disambiguation Rules for Turkish
Learning Morphological Disambiguation Rules for Turkish Deniz Yuret Dept. of Computer Engineering Koç University İstanbul, Turkey dyuret@ku.edu.tr Ferhan Türe Dept. of Computer Engineering Koç University
More informationAccelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems
Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural
More informationText-To-Speech Technologies for Mobile Telephony Services
Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary
More informationOpen-Source, Cross-Platform Java Tools Working Together on a Dialogue System
Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System Oana NICOLAE Faculty of Mathematics and Computer Science, Department of Computer Science, University of Craiova, Romania oananicolae1981@yahoo.com
More informationSWING: A tool for modelling intonational varieties of Swedish Beskow, Jonas; Bruce, Gösta; Enflo, Laura; Granström, Björn; Schötz, Susanne
SWING: A tool for modelling intonational varieties of Swedish Beskow, Jonas; Bruce, Gösta; Enflo, Laura; Granström, Björn; Schötz, Susanne Published in: Proceedings of Fonetik 2008 Published: 2008-01-01
More informationTurkish Radiology Dictation System
Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr
More informationThe XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1
More informationDEPENDENCY PARSING JOAKIM NIVRE
DEPENDENCY PARSING JOAKIM NIVRE Contents 1. Dependency Trees 1 2. Arc-Factored Models 3 3. Online Learning 3 4. Eisner s Algorithm 4 5. Spanning Tree Parsing 6 References 7 A dependency parser analyzes
More informationNgram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department
More informationCurriculum Vitae. Joakim Nivre. Personal Information. Education
Curriculum Vitae Joakim Nivre Personal Information Name: Joakim Nivre Born: August 21, 1962 Gender: Male Nationality: Swedish Email: joakim.nivre@lingfil.uu.se Web: http://stp.lingfil.uu.se/ nivre/ Education
More informationAutomatic structural metadata identification based on multilayer prosodic information
Proceedings of Disfluency in Spontaneous Speech, DiSS 2013 Automatic structural metadata identification based on multilayer prosodic information Helena Moniz 1,2, Fernando Batista 1,3, Isabel Trancoso
More informationPoS-tagging Italian texts with CORISTagger
PoS-tagging Italian texts with CORISTagger Fabio Tamburini DSLO, University of Bologna, Italy fabio.tamburini@unibo.it Abstract. This paper presents an evolution of CORISTagger [1], an high-performance
More informationLanguage Processing and the Clean Up System
ATLAS - Human Language Technologies integrated within a Multilingual Web Content Management System Svetla Koeva Department of Computational Linguistics, Institute for Bulgarian Bulgarian Academy of Sciences
More informationSWIFT Aligner, A Multifunctional Tool for Parallel Corpora: Visualization, Word Alignment, and (Morpho)-Syntactic Cross-Language Transfer
SWIFT Aligner, A Multifunctional Tool for Parallel Corpora: Visualization, Word Alignment, and (Morpho)-Syntactic Cross-Language Transfer Timur Gilmanov, Olga Scrivner, Sandra Kübler Indiana University
More informationVoiceXML-Based Dialogue Systems
VoiceXML-Based Dialogue Systems Pavel Cenek Laboratory of Speech and Dialogue Faculty of Informatics Masaryk University Brno Agenda Dialogue system (DS) VoiceXML Frame-based DS in general 2 Computer based
More informationABSTRACT 2. SYSTEM OVERVIEW 1. INTRODUCTION. 2.1 Speech Recognition
The CU Communicator: An Architecture for Dialogue Systems 1 Bryan Pellom, Wayne Ward, Sameer Pradhan Center for Spoken Language Research University of Colorado, Boulder Boulder, Colorado 80309-0594, USA
More informationAutomatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990
More informationTibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
More informationHybrid Strategies. for better products and shorter time-to-market
Hybrid Strategies for better products and shorter time-to-market Background Manufacturer of language technology software & services Spin-off of the research center of Germany/Heidelberg Founded in 1999,
More informationA POS-based Word Prediction System for the Persian Language
A POS-based Word Prediction System for the Persian Language Masood Ghayoomi 1 Ehsan Daroodi 2 1 Nancy 2 University, Nancy, France masood29@gmail.com 2 Iran National Science Foundation, Tehran, Iran darrudi@insf.org
More informationDiagnosis Code Assignment Support Using Random Indexing of Patient Records A Qualitative Feasibility Study
Diagnosis Code Assignment Support Using Random Indexing of Patient Records A Qualitative Feasibility Study Aron Henriksson 1, Martin Hassel 1, and Maria Kvist 1,2 1 Department of Computer and System Sciences
More informationClassification of Natural Language Interfaces to Databases based on the Architectures
Volume 1, No. 11, ISSN 2278-1080 The International Journal of Computer Science & Applications (TIJCSA) RESEARCH PAPER Available Online at http://www.journalofcomputerscience.com/ Classification of Natural
More informationLanguage and Computation
Language and Computation week 13, Thursday, April 24 Tamás Biró Yale University tamas.biro@yale.edu http://www.birot.hu/courses/2014-lc/ Tamás Biró, Yale U., Language and Computation p. 1 Practical matters
More informationNATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR
NATURAL LANGUAGE QUERY PROCESSING USING PROBABILISTIC CONTEXT FREE GRAMMAR Arati K. Deshpande 1 and Prakash. R. Devale 2 1 Student and 2 Professor & Head, Department of Information Technology, Bharati
More informationPOSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
More informationMaster of Arts in Linguistics Syllabus
Master of Arts in Linguistics Syllabus Applicants shall hold a Bachelor s degree with Honours of this University or another qualification of equivalent standard from this University or from another university
More informationChapter 8. Final Results on Dutch Senseval-2 Test Data
Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised
More informationComputer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia
Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia Outline I What is CALL? (scott) II Popular language learning sites (stella) Livemocha.com (stacia) III IV Specific sites
More informationAn Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them
An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them Vangelis Karkaletsis and Constantine D. Spyropoulos NCSR Demokritos, Institute of Informatics & Telecommunications,
More informationShallow Parsing with Apache UIMA
Shallow Parsing with Apache UIMA Graham Wilcock University of Helsinki Finland graham.wilcock@helsinki.fi Abstract Apache UIMA (Unstructured Information Management Architecture) is a framework for linguistic
More informationThe Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems
The Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems Dr. Ananthi Sheshasaayee 1, Angela Deepa. V.R 2 1 Research Supervisior, Department of Computer Science & Application,
More informationEfficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
More informationNatural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
More informationHow the Computer Translates. Svetlana Sokolova President and CEO of PROMT, PhD.
Svetlana Sokolova President and CEO of PROMT, PhD. How the Computer Translates Machine translation is a special field of computer application where almost everyone believes that he/she is a specialist.
More informationComma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University
Comma checking in Danish Daniel Hardt Copenhagen Business School & Villanova University 1. Introduction This paper describes research in using the Brill tagger (Brill 94,95) to learn to identify incorrect
More informationMEDAR Mediterranean Arabic Language and Speech Technology An intermediate report on the MEDAR Survey of actors, projects, products
MEDAR Mediterranean Arabic Language and Speech Technology An intermediate report on the MEDAR Survey of actors, projects, products Khalid Choukri Evaluation and Language resources Distribution Agency;
More informationStudy Plan for Master of Arts in Applied Linguistics
Study Plan for Master of Arts in Applied Linguistics Master of Arts in Applied Linguistics is awarded by the Faculty of Graduate Studies at Jordan University of Science and Technology (JUST) upon the fulfillment
More informationSOCIS: Scene of Crime Information System - IGR Review Report
SOCIS: Scene of Crime Information System - IGR Review Report Katerina Pastra, Horacio Saggion, Yorick Wilks June 2003 1 Introduction This report reviews the work done by the University of Sheffield on
More informationWorkshop. Neil Barrett PhD, Jens Weber PhD, Vincent Thai MD. Engineering & Health Informa2on Science
Engineering & Health Informa2on Science Engineering NLP Solu/ons for Structured Informa/on from Clinical Text: Extrac'ng Sen'nel Events from Pallia've Care Consult Le8ers Canada-China Clean Energy Initiative
More informationAUTOMATIC DETECTION OF CONTRASTIVE ELEMENTS IN SPONTANEOUS SPEECH
AUTOMATIC DETECTION OF CONTRASTIVE ELEMENTS IN SPONTANEOUS SPEECH Ani Nenkova University of Pennsylvania nenkova@seas.upenn.edu Dan Jurafsky Stanford University jurafsky@stanford.edu ABSTRACT In natural
More informationAn Online Service for SUbtitling by MAchine Translation
SUMAT CIP-ICT-PSP-270919 An Online Service for SUbtitling by MAchine Translation Annual Public Report 2012 Editor(s): Contributor(s): Reviewer(s): Status-Version: Arantza del Pozo Mirjam Sepesy Maucec,
More informationCINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test
CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed
More informationLinguistic richness and technical aspects of an incremental finite-state parser
Linguistic richness and technical aspects of an incremental finite-state parser Hrafn Loftsson, Eiríkur Rögnvaldsson School of Computer Science, Reykjavik University Kringlan 1, Reykjavik IS-103, Iceland
More informationArchitecture of an Ontology-Based Domain- Specific Natural Language Question Answering System
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering
More informationA Machine Translation System Between a Pair of Closely Related Languages
A Machine Translation System Between a Pair of Closely Related Languages Kemal Altintas 1,3 1 Dept. of Computer Engineering Bilkent University Ankara, Turkey email:kemal@ics.uci.edu Abstract Machine translation
More informationComparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances
Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances Sheila Garfield and Stefan Wermter University of Sunderland, School of Computing and
More informationOn the use of the multimodal clues in observed human behavior for the modeling of agent cooperative behavior
From: AAAI Technical Report WS-02-03. Compilation copyright 2002, AAAI (www.aaai.org). All rights reserved. On the use of the multimodal clues in observed human behavior for the modeling of agent cooperative
More informationCustomizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
More informationBridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project
Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project Ahmet Suerdem Istanbul Bilgi University; LSE Methodology Dept. Science in the media project is funded
More informationSemi-Supervised Learning for Blog Classification
Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Semi-Supervised Learning for Blog Classification Daisuke Ikeda Department of Computational Intelligence and Systems Science,
More informationSymbiosis of Evolutionary Techniques and Statistical Natural Language Processing
1 Symbiosis of Evolutionary Techniques and Statistical Natural Language Processing Lourdes Araujo Dpto. Sistemas Informáticos y Programación, Univ. Complutense, Madrid 28040, SPAIN (email: lurdes@sip.ucm.es)
More informationTerminology Extraction from Log Files
Terminology Extraction from Log Files Hassan Saneifar 1,2, Stéphane Bonniol 2, Anne Laurent 1, Pascal Poncelet 1, and Mathieu Roche 1 1 LIRMM - Université Montpellier 2 - CNRS 161 rue Ada, 34392 Montpellier
More informationAnnotation and Evaluation of Swedish Multiword Named Entities
Annotation and Evaluation of Swedish Multiword Named Entities DIMITRIOS KOKKINAKIS Department of Swedish, the Swedish Language Bank University of Gothenburg Sweden dimitrios.kokkinakis@svenska.gu.se Introduction
More informationTHE ICSI/SRI/UW RT04 STRUCTURAL METADATA EXTRACTION SYSTEM. Yang Elizabeth Shriberg1;2 Andreas Stolcke1;2 Barbara Peskin1 Mary Harper3
Liu1;3 THE ICSI/SRI/UW RT04 STRUCTURAL METADATA EXTRACTION SYSTEM Yang Elizabeth Shriberg1;2 Andreas Stolcke1;2 Barbara Peskin1 Mary Harper3 1International Computer Science Institute, USA2SRI International,
More informationThe SweDat Project and Swedia Database for Phonetic and Acoustic Research
2009 Fifth IEEE International Conference on e-science The SweDat Project and Swedia Database for Phonetic and Acoustic Research Jonas Lindh and Anders Eriksson Department of Philosophy, Linguistics and
More informationZeynep Azar. English Teacher, Açı Private Primary School, Istanbul, Turkey Azar, E.Z.
Zeynep Azar Date/Place of birth : 13 November 1988, Bursa, Turkey Nationality : Turkish Address : Bisschop Zwijsenstraat 103-01 Zipcode, Residence : 5021KB, Tilburg, Netherlands Phone number : +31 (0)
More informationT U R K A L A T O R 1
T U R K A L A T O R 1 A Suite of Tools for Augmenting English-to-Turkish Statistical Machine Translation by Gorkem Ozbek [gorkem@stanford.edu] Siddharth Jonathan [jonsid@stanford.edu] CS224N: Natural Language
More informationUNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE
UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE A.J.P.M.P. Jayaweera #1, N.G.J. Dias *2 # Virtusa Pvt. Ltd. No 752, Dr. Danister De Silva Mawatha, Colombo 09, Sri Lanka * Department of Statistics
More informationLIUM s Statistical Machine Translation System for IWSLT 2010
LIUM s Statistical Machine Translation System for IWSLT 2010 Anthony Rousseau, Loïc Barrault, Paul Deléglise, Yannick Estève Laboratoire Informatique de l Université du Maine (LIUM) University of Le Mans,
More informationAudience response system based annotation of speech
Edlund, Al Moubayed, Tånnander & Gustafson Audience response system based annotation of speech Jens Edlund 1, Samer Al Moubayed 1, Christina Tånnander 2 1 & Joakim Gustafson 1 KTH Speech, Music and Hearing,
More informationBuilding A Vocabulary Self-Learning Speech Recognition System
INTERSPEECH 2014 Building A Vocabulary Self-Learning Speech Recognition System Long Qin 1, Alexander Rudnicky 2 1 M*Modal, 1710 Murray Ave, Pittsburgh, PA, USA 2 Carnegie Mellon University, 5000 Forbes
More informationHow To Complete The Danish Masters Program In Lct
European Masters Program in Language and Communication Technologies (LCT) Modules Handbook for Prospective Students European Masters Program in LCT - Modules Handbook Page ii Chapter 1 Study Program The
More informationAspects of North Swedish intonational phonology. Bruce, Gösta
Aspects of North Swedish intonational phonology. Bruce, Gösta Published in: Proceedings from Fonetik 3 ; Phonum 9 Published: 3-01-01 Link to publication Citation for published version (APA): Bruce, G.
More informationA System for Labeling Self-Repairs in Speech 1
A System for Labeling Self-Repairs in Speech 1 John Bear, John Dowding, Elizabeth Shriberg, Patti Price 1. Introduction This document outlines a system for labeling self-repairs in spontaneous speech.
More informationOpen Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
More informationThe Evalita 2011 Parsing Task: the Dependency Track
The Evalita 2011 Parsing Task: the Dependency Track Cristina Bosco and Alessandro Mazzei Dipartimento di Informatica, Università di Torino Corso Svizzera 185, 101049 Torino, Italy {bosco,mazzei}@di.unito.it
More informationDigital Communication and Interoperability - A Case Study
CLARIN: a pan-european research infrastructure for language resources Martin Wynne Martin.wynne@it.ox.ac.uk Oxford e-research Centre & IT Services (formerly OUCS) & Faculty of Linguistics, Philology and
More informationSpeech Processing Applications in Quaero
Speech Processing Applications in Quaero Sebastian Stüker www.kit.edu 04.08 Introduction! Quaero is an innovative, French program addressing multimedia content! Speech technologies are part of the Quaero
More informationWord Completion and Prediction in Hebrew
Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology
More informationPost-doctoral researcher, Faculty of Translation Studies, University College Ghent
Lieve Macken Faculty of Translation Studies Groot-Brittanniëlaan 45 B-9000, Ghent Belgium email: lieve.macken@hogent.be url: lt3.hogent.be/en/people/lieve-macken/ Born: June 17, 1968 Belgium Nationality:
More informationTRANSLATION OF TELUGU-MARATHI AND VICE- VERSA USING RULE BASED MACHINE TRANSLATION
TRANSLATION OF TELUGU-MARATHI AND VICE- VERSA USING RULE BASED MACHINE TRANSLATION Dr. Siddhartha Ghosh 1, Sujata Thamke 2 and Kalyani U.R.S 3 1 Head of the Department of Computer Science & Engineering,
More informationEngaging high school students in interdisciplinary studies through the Computational Linguistics Olympiad
Engaging high school students in interdisciplinary studies through the Computational Linguistics Olympiad Dragomir Radev, University of Michigan radev@umich.edu Lori Levin, Carnegie Mellon University lsl@cs.cmu.edu
More informationSemantic annotation of requirements for automatic UML class diagram generation
www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute
More informationÄSSA12, No English Translation Available, 30 credits Svenska som andraspråk 1 A, gy, 30 högskolepoäng First Cycle / Grundnivå
Faculties of Humanities and Theology ÄSSA12, No English Translation Available, 30 credits Svenska som andraspråk 1 A, gy, 30 högskolepoäng First Cycle / Grundnivå Details of approval The syllabus was approved
More informationTowards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives
Towards a RB-SMT Hybrid System for Translating Patent Claims Results and Perspectives Ramona Enache and Adam Slaski Department of Computer Science and Engineering Chalmers University of Technology and
More informationShallow Parsing with PoS Taggers and Linguistic Features
Journal of Machine Learning Research 2 (2002) 639 668 Submitted 9/01; Published 3/02 Shallow Parsing with PoS Taggers and Linguistic Features Beáta Megyesi Centre for Speech Technology (CTT) Department
More informationThe PALAVRAS parser and its Linguateca applications - a mutually productive relationship
The PALAVRAS parser and its Linguateca applications - a mutually productive relationship Eckhard Bick University of Southern Denmark eckhard.bick@mail.dk Outline Flow chart Linguateca Palavras History
More informationONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS
ONLINE RESUME PARSING SYSTEM USING TEXT ANALYTICS Divyanshu Chandola 1, Aditya Garg 2, Ankit Maurya 3, Amit Kushwaha 4 1 Student, Department of Information Technology, ABES Engineering College, Uttar Pradesh,
More informationETL Ensembles for Chunking, NER and SRL
ETL Ensembles for Chunking, NER and SRL Cícero N. dos Santos 1, Ruy L. Milidiú 2, Carlos E. M. Crestana 2, and Eraldo R. Fernandes 2,3 1 Mestrado em Informática Aplicada MIA Universidade de Fortaleza UNIFOR
More informationCollecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
More informationThe course is included in the CPD programme for teachers II.
Faculties of Humanities and Theology LLYU72, Swedish as a Second Language for Upper Secondary School Teachers, 60 credits Svenska som andraspråk för lärare i gymnasieskolan, 60 högskolepoäng First Cycle
More informationParsing Software Requirements with an Ontology-based Semantic Role Labeler
Parsing Software Requirements with an Ontology-based Semantic Role Labeler Michael Roth University of Edinburgh mroth@inf.ed.ac.uk Ewan Klein University of Edinburgh ewan@inf.ed.ac.uk Abstract Software
More informationMASTER OF PHILOSOPHY IN ENGLISH AND APPLIED LINGUISTICS
University of Cambridge: Programme Specifications Every effort has been made to ensure the accuracy of the information in this programme specification. Programme specifications are produced and then reviewed
More informationSYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 Jin Yang and Satoshi Enoue SYSTRAN Software, Inc. 4444 Eastgate Mall, Suite 310 San Diego, CA 92121, USA E-mail:
More informationA General Evaluation Framework to Assess Spoken Language Dialogue Systems: Experience with Call Center Agent Systems
Conférence TALN 2000, Lausanne, 16-18 octobre 2000 A General Evaluation Framework to Assess Spoken Language Dialogue Systems: Experience with Call Center Agent Systems Marcela Charfuelán, Cristina Esteban
More information