Data-Driven Spell Checking: The Synergy of Two Algorithms for Spelling Error Detection and Correction

Size: px
Start display at page:

Download "Data-Driven Spell Checking: The Synergy of Two Algorithms for Spelling Error Detection and Correction"

Transcription

1 Data-Driven Spell Checking: The Synergy of Two Algorithms for Spelling Error Detection and Correction Eranga Jayalatharachchi, Asanka Wasala*, Ruvan Weerasinghe University of Colombo School of Computing, 35, Reid Avenue, Colombo 00700, Sri Lanka *Localisation Research Centre CSIS Department, University of Limerick, Limerick, Ireland 1

2 Contents 1. Introduction 2. Background Sinhala Language Work on Indian Languages Work on Sinhala 3. Methodology Subasa v1 Subasa v2 4. Evaluation 5. Conclusions & Future Work 6. Demonstration 2

3 Introduction Spell Checking The task of identifying and flagging incorrectly spelled words in a document written in a natural language Spell Correcting The process of replacing the misspelled words with the most likely intended ones Applications Word processing, optical character recognition (OCR), character recognition, speech recognition, computer aided language learning (CALL) etc. 3

4 Introduction Misspelled Words Non-word errors It was teh wind Real-word errors My sun is a doctor Automatic Spelling Error Detection and Correction (Kukich 1992):. 1. Non-word error detection 2. Isolated word error correction 3. Context-dependent error correction 4

5 Introduction About 80% of all misspelled English words (non-word errors) in human typewritten text are due to single-error misspellings. (Damerau 1964) ther insertion teh transposition the th deletion thw substitution 5

6 Introduction Correction Techniques (Kukich. 1992) 1. Minimum edit distance techniques 2. Similarity key techniques 3. Rule-based techniques 4. N-gram-based techniques 5. Probabilistic techniques 6. Neural nets 6

7 Objective Introduction To enhance Subasa, the only documented spell checker available to-date for Sinhala (Wasala et al. 2010; Walasa et al. 2011) Subasa v1 : n-gram Subasa v2: n-gram + edit distance 7

8 N-grams Introduction An n-gram is a sub-sequence of n items from a given sequence Word intention Letter unigrams i n t e n t i o n Letter bi-grams Letter tri-grams in nt te en nt ti io on int nte ten ent nti tio ion 8

9 Introduction N-gram Generating Algorithm function get_n_grams (word, n) returns n_grams_list l length (word) - n n_grams_list empty () for i from 0 to l do n_grams_list append ( substring (word, i, n) ) 9

10 Minimum Edit-Distance Introduction Minimum number of editing operations required to transform one string to another Insertions Deletions Substitutions (Wagner 1974) 10

11 Editing Operations Introduction i n t e n t i o n i n t e n t i o n e x e c u t i o n e x e c u t i o n 5 Substitutions 1 Deletion Cost = 5 x 2 = 10 Cost of Edit Operations Insertion = 1 Deletion = 1 Substitution = Deletion + Insertion = = 2 3 Substitutions 1 Insertion Cost = 1 + (3 x 2) + 1 = 8 11

12 Introduction Minimum Edit Distance Calculation Algorithm A dynamic programming algorithm for minimum edit-distance computation creates an edit-distance matrix M with one column for each symbol in the target sequence and one row for each symbol in the source sequence. function minimum_edit_distance (source, target) returns min_distance m length(source) n length(target) create distance matrix M[n+1,m+1] M[0,0] 0 for each column i from 0 to n do for each row j from 0 to m do M[i,j] min ( M[i-1,j] + cost_insert(target i ), M[i-1,j-1] + cost_substitute(source j, target i ), M[i,j-1] + cost_delete(source j ) ) min_distance M[i+1,j+1] 12

13 source Edit Distance Matrix Introduction n o i t n e t n i # target # e x e c u t i o n Each cell M[i,j] contains the minimum edit distance between the first i characters of the target and the first j characters of the source 13

14 source Edit Distance Matrix Introduction n o i t n e t n i # target # e x e c u t i o n Each cell M[i,j] contains the minimum edit distance between the first i characters of the target and the first j characters of the source 14

15 Background Sinhala Language & Script Majority language of Sri Lanka Sinhala script is a derivative of Brahmi script Sinhala script is an syllabic script 5 pre-nasalized stops & 2 unique vowels (Nandasara, 2009) Sinhala is a phonetic language na-na-la-la dissention Conjunct letters 15

16 Background Work on Indic Languages Non-word spelling correction for Assamese (Das et al. 2002) Uses similarity-key and minimum edit distance techniques Rule cum Dictionary based approach for spell checking Malayalam (Santhosh et al. 2002) Spelling correction for Tamil (Dhanabalan et al. 2003) Non-word error detection using simple dictionary lookups Spell checking for Bangla (Chaudhuri 2002) An adaptation of similarity key based technique 16

17 Background Work on Sinhala Language Thibus Commercial-grade Mozilla Firefox Extension (addons.mozilla.org) Dictionary-based OpenOffce Extension (openoffice.org) Uses Hunspell Microsoft Office Word 2007 (microsoft.com) Via Language Interface Pack (LIP) for Sinhala Subasa (v1) (Wasala et al. 2009; Wasala et al. 2010) N-gram based Phonetic errors 17

18 Methodology: Subasa v1 The Process (k, c) kat kat cat 18

19 Methodology: Subasa v1 The Process (contd.) kat cat ka, at ca, at ka, at = 10+5 ca, at = 20+5 kat cat ka = 10 ca = 20 at = 5 cat 19

20 Methodology: Subasa v1 Phoneme Classes Graphemes Phoneme class, /k/, /g/, /tʃ/, /dʒ/, /ʈ/, /ɖ/, /t / Graphemes Phoneme class, /d /, /p/, /b/, /n/, /l/,, /s/ or /ʃ/, /ɲ/ 20

21 Example Methodology: Subasa v1 UCSC Corpus 10 Mn Words Word Unigrams (440,021) Letter bi-grams (46,878) Letter tri-grams (16,6460) Dictionary of Sinhala Spelling (Koparahewa. 2006) 21

22 22

23 The Process Methodology: Subasa v2 23

24 Methodology: Subasa v2 The Process : Edit Distance Module 24

25 Methodology: Subasa v2 Data UCSC Corpus 10 Mn Words Word Unigrams (440,021) Letter bi-grams (46,878) Letter tri-grams (166,460) Dictionary of Sinhala Spelling (Koparahewa 2006) Word Unigrams (spell checked by Subasa v1) 25

26 Methodology: Subasa v2 New Phoneme Classes 26

27 27

28 Evaluation Compared with: Microsoft Word 2007 Sinhala Language Interface Pack 2007 for Microsoft Office OpenOffice.org 3.2 Writer based on Hunspell Subasa v1 based on n-grams from UCSC Corpus Manual Inspection by a linguist Test cases Test 1: Public Sinhala Newspaper Test 2: Sinhala Blog Syndicator 28

29 Results: Test 1 Evaluation 6155 words from a Public Sinhala Newspaper Incorrect Words Detected Correct Words Detected Word % % Writer % % Subasa v % % Subasa v % % Manual % % 29

30 Results: Test 2 Evaluation 4117 words extracted from a Sinhala blog syndicator Incorrect Words Detected Correct Words Detected Word % % Writer % % Subasa v % % Subasa v % % Manual % % 30

31 Conclusions and Future Work Conclusions Subasa v2 performs much closer to Manual inspection N-gram + Edit distance is better than n-gram only approach Data driven Good for languages with limited resources 31

32 Conclusions and Future Work Future Works Larger dictionary Optimizations to Edit Distance module Candidate correction ranking Word boundary analysis Morphological analysis 32

33 Demonstration & 33

34 Improved Detections Subasa v1 Subasa v2 34

35 Improved Corrections Subasa v1 Subasa v2 35

A Mixed Trigrams Approach for Context Sensitive Spell Checking

A Mixed Trigrams Approach for Context Sensitive Spell Checking A Mixed Trigrams Approach for Context Sensitive Spell Checking Davide Fossati and Barbara Di Eugenio Department of Computer Science University of Illinois at Chicago Chicago, IL, USA dfossa1@uic.edu, bdieugen@cs.uic.edu

More information

! # % & (() % +!! +,./// 0! 1 /!! 2(3)42( 2

! # % & (() % +!! +,./// 0! 1 /!! 2(3)42( 2 ! # % & (() % +!! +,./// 0! 1 /!! 2(3)42( 2 5 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 15, NO. 5, SEPTEMBER/OCTOBER 2003 1073 A Comparison of Standard Spell Checking Algorithms and a Novel

More information

Implementation of Internet Domain Names in Sinhala

Implementation of Internet Domain Names in Sinhala Implementation of Internet Domain Names in Sinhala Harsha Wijayawardhana, Asanka Wasala, Ruvan Weerasinghe and Chamila Liyanage University of Colombo School of Computing 35, Reid Avenue, Colombo 00700

More information

Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University

Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University Grammars and introduction to machine learning Computers Playing Jeopardy! Course Stony Brook University Last class: grammars and parsing in Prolog Noun -> roller Verb thrills VP Verb NP S NP VP NP S VP

More information

The Design of a Proofreading Software Service

The Design of a Proofreading Software Service The Design of a Proofreading Software Service Raphael Mudge Automattic Washington, DC 20036 raffi@automattic.com Abstract Web applications have the opportunity to check spelling, style, and grammar using

More information

POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition

POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics

More information

LuitPad: A fully Unicode compatible Assamese writing software

LuitPad: A fully Unicode compatible Assamese writing software LuitPad: A fully Unicode compatible Assamese writing software Navanath Saharia 1,3 Kishori M Konwar 2,3 (1) Tezpur University, Tezpur, Assam, India (2) University of British Columbia, Vancouver, Canada

More information

Word Completion and Prediction in Hebrew

Word Completion and Prediction in Hebrew Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology

More information

Machine Translation. Agenda

Machine Translation. Agenda Agenda Introduction to Machine Translation Data-driven statistical machine translation Translation models Parallel corpora Document-, sentence-, word-alignment Phrase-based translation MT decoding algorithm

More information

Your single-source partner for corporate product communication. Transit NXT Evolution. from Service Pack 0 to Service Pack 8

Your single-source partner for corporate product communication. Transit NXT Evolution. from Service Pack 0 to Service Pack 8 Transit NXT Evolution from Service Pack 0 to Service Pack 8 April 2009: Transit NXT Service Pack 0 (Version 4.0.0.671) Additional versions of DTP programs supported: InDesign CS3 and FrameMaker 9 Additional

More information

Turkish Radiology Dictation System

Turkish Radiology Dictation System Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey arisoyeb@boun.edu.tr, arslanle@boun.edu.tr

More information

Text-To-Speech Technologies for Mobile Telephony Services

Text-To-Speech Technologies for Mobile Telephony Services Text-To-Speech Technologies for Mobile Telephony Services Paulseph-John Farrugia Department of Computer Science and AI, University of Malta Abstract. Text-To-Speech (TTS) systems aim to transform arbitrary

More information

Tibetan For Windows - Software Development and Future Speculations. Marvin Moser, Tibetan for Windows & Lucent Technologies, USA

Tibetan For Windows - Software Development and Future Speculations. Marvin Moser, Tibetan for Windows & Lucent Technologies, USA Tibetan For Windows - Software Development and Future Speculations Marvin Moser, Tibetan for Windows & Lucent Technologies, USA Introduction This paper presents the basic functions of the Tibetan for Windows

More information

Processing: current projects and research at the IXA Group

Processing: current projects and research at the IXA Group Natural Language Processing: current projects and research at the IXA Group IXA Research Group on NLP University of the Basque Country Xabier Artola Zubillaga Motivation A language that seeks to survive

More information

Q&As: Microsoft Excel 2013: Chapter 2

Q&As: Microsoft Excel 2013: Chapter 2 Q&As: Microsoft Excel 2013: Chapter 2 In Step 5, why did the date that was entered change from 4/5/10 to 4/5/2010? When Excel recognizes that you entered a date in mm/dd/yy format, it automatically formats

More information

Bangla Localization of OpenOffice.org. Asif Iqbal Sarkar Research Programmer BRAC University Bangladesh

Bangla Localization of OpenOffice.org. Asif Iqbal Sarkar Research Programmer BRAC University Bangladesh Bangla Localization of OpenOffice.org Asif Iqbal Sarkar Research Programmer BRAC University Bangladesh Localization L10n is the process of adapting the text and applications of a product or service to

More information

www.sdl.com SDL Trados Studio 2015 Translation Memory Management Quick Start Guide

www.sdl.com SDL Trados Studio 2015 Translation Memory Management Quick Start Guide www.sdl.com SDL Trados Studio 2015 Translation Memory Management Quick Start Guide SDL Trados Studio 2015 Translation Memory Management Quick Start Guide Copyright Information Copyright 2011-2015 SDL Group.

More information

MEMBERSHIP LOCALIZATION WITHIN A WEB BASED JOIN FRAMEWORK

MEMBERSHIP LOCALIZATION WITHIN A WEB BASED JOIN FRAMEWORK MEMBERSHIP LOCALIZATION WITHIN A WEB BASED JOIN FRAMEWORK 1 K. LALITHA, 2 M. KEERTHANA, 3 G. KALPANA, 4 S.T. SHWETHA, 5 M. GEETHA 1 Assistant Professor, Information Technology, Panimalar Engineering College,

More information

Reading Readiness Online

Reading Readiness Online 4433 Bissonnet Bellaire, Texas 77401 713.664.7676 f: 713.664.4744 Reading Readiness Online Lesson 1: Introduction Prerequisite Reading Skills What is Reading? Reading is a process in which symbols on paper

More information

Working Note FIRE 2013

Working Note FIRE 2013 Working Note FIRE 2013 FAQ retrieval using noisy queries Divyesh Sanjay Kothari Abhinav Saraswat Sarang Kapoor ISM DHANBAD ISM DHANBAD ISM DHANBAD Anjaney Pandey ISM DHANBAD Sukomal Pal ISM DHANBAD mailto:divyesh2506@gmail.com

More information

Problems with the current speling.org system

Problems with the current speling.org system Problems with the current speling.org system Jacob Sparre Andersen 22nd May 2005 Abstract We out-line some of the problems with the current speling.org system, as well as some ideas for resolving the problems.

More information

SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY

SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY G.Evangelin Jenifer #1, Mrs.J.Jaya Sherin *2 # PG Scholar, Department of Electronics and Communication Engineering(Communication and Networking), CSI Institute

More information

UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE

UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE UNKNOWN WORDS ANALYSIS IN POS TAGGING OF SINHALA LANGUAGE A.J.P.M.P. Jayaweera #1, N.G.J. Dias *2 # Virtusa Pvt. Ltd. No 752, Dr. Danister De Silva Mawatha, Colombo 09, Sri Lanka * Department of Statistics

More information

ECDL / ICDL Word Processing Syllabus Version 5.0

ECDL / ICDL Word Processing Syllabus Version 5.0 ECDL / ICDL Word Processing Syllabus Version 5.0 Purpose This document details the syllabus for ECDL / ICDL Word Processing. The syllabus describes, through learning outcomes, the knowledge and skills

More information

A POS-based Word Prediction System for the Persian Language

A POS-based Word Prediction System for the Persian Language A POS-based Word Prediction System for the Persian Language Masood Ghayoomi 1 Ehsan Daroodi 2 1 Nancy 2 University, Nancy, France masood29@gmail.com 2 Iran National Science Foundation, Tehran, Iran darrudi@insf.org

More information

Keyboards for inputting Japanese language -A study based on US patents

Keyboards for inputting Japanese language -A study based on US patents Keyboards for inputting Japanese language -A study based on US patents Umakant Mishra Bangalore, India umakant@trizsite.tk http://umakant.trizsite.tk (This paper was published in April 2005 issue of TRIZsite

More information

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or

More information

A Natural Language Query Processor for Database Interface

A Natural Language Query Processor for Database Interface A Natural Language Query Processor for Database Interface Mrs.Vidya Dhamdhere Lecturer department of Computer Engineering Department G.H.Raisoni college of Engg.(Pune University) vidya.dhamdhere@gmail.com

More information

English to Arabic Transliteration for Information Retrieval: A Statistical Approach

English to Arabic Transliteration for Information Retrieval: A Statistical Approach English to Arabic Transliteration for Information Retrieval: A Statistical Approach Nasreen AbdulJaleel and Leah S. Larkey Center for Intelligent Information Retrieval Computer Science, University of Massachusetts

More information

AUTOLEX: An Automatic Lexicon Builder for Minority Languages Using an Open Corpus

AUTOLEX: An Automatic Lexicon Builder for Minority Languages Using an Open Corpus PACLIC 24 Proceedings 63 AUTOLEX: An Automatic Lexicon Builder for Minority Languages Using an Open Corpus Evan Liz C. Buhay a, Marie Joy P. Evardone a, Hansel B. Nocon a, Davis Muhajereen D. Dimalen a,

More information

Designing forms for auto field detection in Adobe Acrobat

Designing forms for auto field detection in Adobe Acrobat Adobe Acrobat 9 Technical White Paper Designing forms for auto field detection in Adobe Acrobat Create electronic forms more easily by using the right elements in your authoring program to take advantage

More information

Module 9 The CIS error profiling technology

Module 9 The CIS error profiling technology Florian Fink Module 9 The CIS error profiling technology 2015-09-15 1 / 24 Module 9 The CIS error profiling technology Florian Fink Centrum für Informations- und Sprachverarbeitung (CIS) Ludwig-Maximilians-Universität

More information

Improving Data Driven Part-of-Speech Tagging by Morphologic Knowledge Induction

Improving Data Driven Part-of-Speech Tagging by Morphologic Knowledge Induction Improving Data Driven Part-of-Speech Tagging by Morphologic Knowledge Induction Uwe D. Reichel Department of Phonetics and Speech Communication University of Munich reichelu@phonetik.uni-muenchen.de Abstract

More information

Micro blogs Oriented Word Segmentation System

Micro blogs Oriented Word Segmentation System Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,

More information

Word processing software

Word processing software Unit 244 Word processing software UAN: Level: 2 Credit value: 4 GLH: 30 Assessment type: Relationship to NOS: Assessment requirements specified by a sector or regulatory body: Aim: R/502/4628 Portfolio

More information

Oracle Database 11g SQL

Oracle Database 11g SQL AO3 - Version: 2 19 June 2016 Oracle Database 11g SQL Oracle Database 11g SQL AO3 - Version: 2 3 days Course Description: This course provides the essential SQL skills that allow developers to write queries

More information

GCE. Computing. Mark Scheme for January 2011. Advanced Subsidiary GCE Unit F452: Programming Techniques and Logical Methods

GCE. Computing. Mark Scheme for January 2011. Advanced Subsidiary GCE Unit F452: Programming Techniques and Logical Methods GCE Computing Advanced Subsidiary GCE Unit F452: Programming Techniques and Logical Methods Mark Scheme for January 2011 Oxford Cambridge and RSA Examinations OCR (Oxford Cambridge and RSA) is a leading

More information

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890

Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Speech Recognition on Cell Broadband Engine UCRL-PRES-223890 Yang Liu, Holger Jones, John Johnson, Sheila Vaidya (Lawrence Livermore National Laboratory) Michael Perrone, Borivoj Tydlitat, Ashwini Nanda

More information

Phonetic Models for Generating Spelling Variants

Phonetic Models for Generating Spelling Variants Phonetic Models for Generating Spelling Variants Rahul Bhagat and Eduard Hovy Information Sciences Institute University Of Southern California 4676 Admiralty Way, Marina Del Rey, CA 90292-6695 {rahul,

More information

SMSFR: SMS-Based FAQ Retrieval System

SMSFR: SMS-Based FAQ Retrieval System SMSFR: SMS-Based FAQ Retrieval System Partha Pakray, 1 Santanu Pal, 1 Soujanya Poria, 1 Sivaji Bandyopadhyay, 1 Alexander Gelbukh 2 1 Computer Science and Engineering Department, Jadavpur University, Kolkata,

More information

Evaluating grapheme-to-phoneme converters in automatic speech recognition context

Evaluating grapheme-to-phoneme converters in automatic speech recognition context Evaluating grapheme-to-phoneme converters in automatic speech recognition context Denis Jouvet, Dominique Fohr, Irina Illina To cite this version: Denis Jouvet, Dominique Fohr, Irina Illina. Evaluating

More information

IENG2004 Industrial Database and Systems Design. Microsoft Access I. What is Microsoft Access? Architecture of Microsoft Access

IENG2004 Industrial Database and Systems Design. Microsoft Access I. What is Microsoft Access? Architecture of Microsoft Access IENG2004 Industrial Database and Systems Design Microsoft Access I Defining databases (Chapters 1 and 2) Alison Balter Mastering Microsoft Access 2000 Development SAMS, 1999 What is Microsoft Access? Microsoft

More information

PARLIAMENT OF THE DEMOCRATIC SOCIALIST REPUBLIC OF SRI LANKA

PARLIAMENT OF THE DEMOCRATIC SOCIALIST REPUBLIC OF SRI LANKA PARLIAMENT OF THE DEMOCRATIC SOCIALIST REPUBLIC OF SRI LANKA FINANCE (AMENDMENT) ACT, No. 8 OF 2008 [Certified on 29th February, 2008] Printed on the Order of Government Published as a Supplement to Part

More information

Review of Hashing: Integer Keys

Review of Hashing: Integer Keys CSE 326 Lecture 13: Much ado about Hashing Today s munchies to munch on: Review of Hashing Collision Resolution by: Separate Chaining Open Addressing $ Linear/Quadratic Probing $ Double Hashing Rehashing

More information

How To Write A Phonetic Spelling Checker For Brazilian Pruirosa Pessoa

How To Write A Phonetic Spelling Checker For Brazilian Pruirosa Pessoa Towards a Phonetic Brazilian Portuguese Spell Checker Lucas Vinicius Avanço Magali Sanches Duran Maria das Graças Volpe Nunes avanco89@gmail.com, magali.duran@uol.com.br, gracan@icmc.usp.br, Interinstitutional

More information

DEVNAGARI DOCUMENT SEGMENTATION USING HISTOGRAM APPROACH

DEVNAGARI DOCUMENT SEGMENTATION USING HISTOGRAM APPROACH DEVNAGARI DOCUMENT SEGMENTATION USING HISTOGRAM APPROACH Vikas J Dongre 1 Vijay H Mankar 2 Department of Electronics & Telecommunication, Government Polytechnic, Nagpur, India 1 dongrevj@yahoo.co.in; 2

More information

PDF Accessibility Overview

PDF Accessibility Overview Contents 1 Overview of Portable Document Format (PDF) 1 Determine the Accessibility Path for each PDF Document 2 Start with an Accessible Document 2 Characteristics of Accessible PDF files 4 Adobe Acrobat

More information

Training Needs Analysis

Training Needs Analysis Training Needs Analysis Microsoft Office 2007 Access 2007 Course Code: Name: Chapter 1: Access 2007 Orientation I understand how Access works and what it can be used for I know how to start Microsoft Access

More information

Synergy Controller Application Note 4 March 2012, Revision F Tidal Engineering Corporation 2012. Synergy Controller Bar Code Reader Applications

Synergy Controller Application Note 4 March 2012, Revision F Tidal Engineering Corporation 2012. Synergy Controller Bar Code Reader Applications Synergy Controller Bar Code Reader Applications Synergy Controller with Hand Held Products Bar Code Scanner OCR-A Labeled Part Introduction The value of the ubiquitous Bar Code Scanner for speeding data

More information

Quality Companion 3 by Minitab

Quality Companion 3 by Minitab Quality Companion 3 by Minitab Contents Part 1. Introduction to Quality Companion 3 Part 2. What's New Part 3. Known Problems and Workarounds Important: The Quality Companion Dashboard is no longer available.

More information

2 Analysis of Texting Forms

2 Analysis of Texting Forms An Unsupervised Model for Text Message Normalization Paul Cook Department of Computer Science University of Toronto Toronto, Canada pcook@cs.toronto.edu Suzanne Stevenson Department of Computer Science

More information

Content Management System

Content Management System OIT Training and Documentation Services Content Management System End User Training Guide OIT TRAINING AND DOCUMENTATION oittraining@uta.edu http://www.uta.edu/oit/cs/training/index.php 2009 CONTENTS 1.

More information

LINGSTAT: AN INTERACTIVE, MACHINE-AIDED TRANSLATION SYSTEM*

LINGSTAT: AN INTERACTIVE, MACHINE-AIDED TRANSLATION SYSTEM* LINGSTAT: AN INTERACTIVE, MACHINE-AIDED TRANSLATION SYSTEM* Jonathan Yamron, James Baker, Paul Bamberg, Haakon Chevalier, Taiko Dietzel, John Elder, Frank Kampmann, Mark Mandel, Linda Manganaro, Todd Margolis,

More information

Recognizing Non-Translatable Symbols in a Multi-Lingual Computer-Assisted Translation System for DTP Documents

Recognizing Non-Translatable Symbols in a Multi-Lingual Computer-Assisted Translation System for DTP Documents AUTOMATYKA 2010 Tom 14 Zeszyt 3/1 Szymon Grabowski*, Cezary Draus**, Wojciech Bieniecki* Recognizing Non-Translatable Symbols in a Multi-Lingual Computer-Assisted Translation System for DTP Documents 1.

More information

Microsoft Office PowerPoint 2003. Identify components of the PowerPoint window. Tutorial 1 Creating a Presentation

Microsoft Office PowerPoint 2003. Identify components of the PowerPoint window. Tutorial 1 Creating a Presentation Microsoft Office PowerPoint 2003 Tutorial 1 Creating a Presentation 1 Identify components of the PowerPoint window You will recognize some of the features of the PowerPoint window that are common to Windows

More information

Using Edit-Distance Functions to Identify Similar E-Mail Addresses Howard Schreier, U.S. Dept. of Commerce, Washington DC

Using Edit-Distance Functions to Identify Similar E-Mail Addresses Howard Schreier, U.S. Dept. of Commerce, Washington DC Paper 073-29 Using Edit-Distance Functions to Identify Similar E-Mail Addresses Howard Schreier, U.S. Dept. of Commerce, Washington DC ABSTRACT Version 9 of SAS software has added functions which can efficiently

More information

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University

More information

Excel 2002. What you will do:

Excel 2002. What you will do: What you will do: Explore the features of Excel 2002 Create a blank workbook and a workbook from a template Format a workbook Apply formulas to a workbook Create a chart Import data to a workbook Share

More information

May 2013. Training Guide

May 2013. Training Guide May 2013 Training Guide Contents Introduction... 5 1. 2. 3. 4. 5. 6. 7. 8. 9. Getting started... 6 Exercise 1 Starting Read&Write 11 Gold... 6 Exercise 2 Positioning the toolbar... 7 Exercise 3 Understanding

More information

Data Warehousing. Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de. Winter 2014/15. Jens Teubner Data Warehousing Winter 2014/15 1

Data Warehousing. Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de. Winter 2014/15. Jens Teubner Data Warehousing Winter 2014/15 1 Jens Teubner Data Warehousing Winter 2014/15 1 Data Warehousing Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Winter 2014/15 Jens Teubner Data Warehousing Winter 2014/15 152 Part VI ETL Process

More information

HIT THE GROUND RUNNING MS WORD INTRODUCTION

HIT THE GROUND RUNNING MS WORD INTRODUCTION HIT THE GROUND RUNNING MS WORD INTRODUCTION MS Word is a word processing program. MS Word has many features and with it, a person can create reports, letters, faxes, memos, web pages, newsletters, and

More information

Internationalized Domain Names -

Internationalized Domain Names - Internationalized Domain Names - Getting them to work Gihan Dias LK Domain Registry What is IDN? Originally DNS names were restricted to the characters a-z (letters), 0-9 (digits) and '-' (hyphen) (LDH)

More information

RA MODEL VISUALIZATION WITH MICROSOFT EXCEL 2013 AND GEPHI

RA MODEL VISUALIZATION WITH MICROSOFT EXCEL 2013 AND GEPHI RA MODEL VISUALIZATION WITH MICROSOFT EXCEL 2013 AND GEPHI Prepared for Prof. Martin Zwick December 9, 2014 by Teresa D. Schmidt (tds@pdx.edu) 1. DOWNLOADING AND INSTALLING USER DEFINED SPLIT FUNCTION

More information

The Re-emergence of Data Capture Technology

The Re-emergence of Data Capture Technology The Re-emergence of Data Capture Technology Understanding Today s Digital Capture Solutions Digital capture is a key enabling technology in a business world striving to balance the shifting advantages

More information

Towards Unsupervised Word Error Correction in Textual Big Data

Towards Unsupervised Word Error Correction in Textual Big Data Towards Unsupervised Word Error Correction in Textual Big Data Joao Paulo Carvalho 1 and Sérgio Curto 1 1 INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Rua Alves Redol 9, Lisboa, Portugal

More information

Er is door mij gebruik gemaakt van dia s uit presentaties van o.a. Anastasios Kesidis, CIL, Athene Griekenland, en Asaf Tzadok, IBM Haifa Research Lab

Er is door mij gebruik gemaakt van dia s uit presentaties van o.a. Anastasios Kesidis, CIL, Athene Griekenland, en Asaf Tzadok, IBM Haifa Research Lab IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Er is door mij gebruik gemaakt van dia s uit presentaties

More information

Understanding Video Lectures in a Flipped Classroom Setting. A Major Qualifying Project Report. Submitted to the Faculty

Understanding Video Lectures in a Flipped Classroom Setting. A Major Qualifying Project Report. Submitted to the Faculty 1 Project Number: DM3 IQP AAGV Understanding Video Lectures in a Flipped Classroom Setting A Major Qualifying Project Report Submitted to the Faculty Of Worcester Polytechnic Institute In partial fulfillment

More information

Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models. Alessandro Vinciarelli, Samy Bengio and Horst Bunke

Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models. Alessandro Vinciarelli, Samy Bengio and Horst Bunke 1 Offline Recognition of Unconstrained Handwritten Texts Using HMMs and Statistical Language Models Alessandro Vinciarelli, Samy Bengio and Horst Bunke Abstract This paper presents a system for the offline

More information

PHONETIC TOOL FOR THE TUNISIAN ARABIC

PHONETIC TOOL FOR THE TUNISIAN ARABIC PHONETIC TOOL FOR THE TUNISIAN ARABIC Abir Masmoudi 1,2, Yannick Estève 1, Mariem Ellouze Khmekhem 2, Fethi Bougares 1, Lamia Hadrich Belguith 2 (1) LIUM, University of Maine, France (2) ANLP Research

More information

An Arabic Text-To-Speech System Based on Artificial Neural Networks

An Arabic Text-To-Speech System Based on Artificial Neural Networks Journal of Computer Science 5 (3): 207-213, 2009 ISSN 1549-3636 2009 Science Publications An Arabic Text-To-Speech System Based on Artificial Neural Networks Ghadeer Al-Said and Moussa Abdallah Department

More information

GDP11 Student User s Guide. V. 1.7 December 2011

GDP11 Student User s Guide. V. 1.7 December 2011 GDP11 Student User s Guide V. 1.7 December 2011 Contents Getting Started with GDP11... 4 Program Structure... 4 Lessons... 4 Lessons Menu... 4 Navigation Bar... 5 Student Portfolio... 5 GDP Technical Requirements...

More information

news from Tom Bacon about Monday's lecture

news from Tom Bacon about Monday's lecture ECRIC news from Tom Bacon about Monday's lecture I won't be at the lecture on Monday due to the work swamp. The plan is still to try and get into the data centre in two weeks time and do the next migration,

More information

Using Microsoft Word. Working With Objects

Using Microsoft Word. Working With Objects Using Microsoft Word Many Word documents will require elements that were created in programs other than Word, such as the picture to the right. Nontext elements in a document are referred to as Objects

More information

Easy Bangla Typing for MS-Word!

Easy Bangla Typing for MS-Word! Easy Bangla Typing for MS-Word! W ELCOME to Ekushey 2.2c, the easiest and most powerful Bangla typing software yet produced! Prepare yourself for international standard UNICODE Bangla typing. Fully integrated

More information

Programming with SQL

Programming with SQL Unit 43: Programming with SQL Learning Outcomes A candidate following a programme of learning leading to this unit will be able to: Create queries to retrieve information from relational databases using

More information

Setting Up OpenOffice.org: Choosing options to suit the way you work

Setting Up OpenOffice.org: Choosing options to suit the way you work Setting Up OpenOffice.org: Choosing options to suit the way you work Title: Setting Up OpenOffice.org: Choosing options to suit the way you work Version: 1.0 First edition: December 2004 First English

More information

Creating A Simple Dictionary With Definitions

Creating A Simple Dictionary With Definitions Creating A Simple Dictionary With Definitions The KAS Knowledge Acquisition System allows you to create new dictionaries with definitions from scratch or append information to existing dictionaries. The

More information

Word 2007 Unit B: Editing Documents

Word 2007 Unit B: Editing Documents Word 2007 Unit B: Editing Documents TRUE/FALSE 1. You can select text and then drag it to a new location using the mouse. 2. The last item copied from a document is stored on the system Clipboard. 3. The

More information

Localization of Text Editor using Java Programming

Localization of Text Editor using Java Programming Localization of Text Editor using Java Programming Varsha Tomar M.Tech Scholar Banasthali University Jaipur, India Manisha Bhatia Assistant Professor Banasthali University Jaipur, India ABSTRACT Software

More information

The Benefits of Invented Spelling. Jennifer E. Beakas EDUC 340

The Benefits of Invented Spelling. Jennifer E. Beakas EDUC 340 THE BENEFITS OF INVENTED SPELLING 1 The Benefits of Invented Spelling Jennifer E. Beakas EDUC 340 THE BENEFITS OF INVENTED SPELLING 2 Abstract The use of invented spelling has long been a controversial

More information

Creating Reports Crystal Clear

Creating Reports Crystal Clear Creating Reports Crystal Clear Presented by: Robert Acosta - Senior Client Support Co-Presenter: Praveen Maturi - Support Manager Agenda Why Crystal Reports? Planning a Report Report Access ECD vs Company

More information

On Optimizing the Editing Algorithms for Evaluating Similarity Between Monophonic Musical Sequences

On Optimizing the Editing Algorithms for Evaluating Similarity Between Monophonic Musical Sequences On Optimizing the Editing Algorithms for Evaluating Similarity Between Monophonic Musical Sequences Pierre Hanna, Pascal Ferraro, Matthias Robine To cite this version: Pierre Hanna, Pascal Ferraro, Matthias

More information

Reading Competencies

Reading Competencies Reading Competencies The Third Grade Reading Guarantee legislation within Senate Bill 21 requires reading competencies to be adopted by the State Board no later than January 31, 2014. Reading competencies

More information

Beginning Microsoft Access

Beginning Microsoft Access Beginning Microsoft Access A database is a collection of information. Common collections of information that can be entered into a database include the library card catalog, a recipe box, or your personal

More information

Extraction Transformation Loading ETL Get data out of sources and load into the DW

Extraction Transformation Loading ETL Get data out of sources and load into the DW Lection 5 ETL Definition Extraction Transformation Loading ETL Get data out of sources and load into the DW Data is extracted from OLTP database, transformed to match the DW schema and loaded into the

More information

ECDL. European Computer Driving Licence. Word Processing Software BCS ITQ Level 2. Syllabus Version 5.0

ECDL. European Computer Driving Licence. Word Processing Software BCS ITQ Level 2. Syllabus Version 5.0 European Computer Driving Licence Word Processing Software BCS ITQ Level 2 Using Microsoft Word 2010 Syllabus Version 5.0 This training, which has been approved by BCS, The Chartered Institute for IT,

More information

Introduction to IBM Watson Analytics Data Loading and Data Quality

Introduction to IBM Watson Analytics Data Loading and Data Quality Introduction to IBM Watson Analytics Data Loading and Data Quality December 16, 2014 Document version 2.0 This document applies to IBM Watson Analytics. Licensed Materials - Property of IBM Copyright IBM

More information

Perplexity Method on the N-gram Language Model Based on Hadoop Framework

Perplexity Method on the N-gram Language Model Based on Hadoop Framework 94 International Arab Journal of e-technology, Vol. 4, No. 2, June 2015 Perplexity Method on the N-gram Language Model Based on Hadoop Framework Tahani Mahmoud Allam 1, Hatem Abdelkader 2 and Elsayed Sallam

More information

Text Processing (Business Professional)

Text Processing (Business Professional) Unit Title: Word Processing OCR unit number: 03938 Level: 3 Credit value: 6 Guided learning hours: 60 Unit reference number: M/505/7104 Unit aim Text Processing (Business Professional) This unit aims to

More information

Keywords : complexity, dictionary, compression, frequency, retrieval, occurrence, coded file. GJCST-C Classification : E.3

Keywords : complexity, dictionary, compression, frequency, retrieval, occurrence, coded file. GJCST-C Classification : E.3 Global Journal of Computer Science and Technology Software & Data Engineering Volume 13 Issue 4 Version 1.0 Year 2013 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals

More information

NetClient CS Document Management Portal User Guide. version 9.x

NetClient CS Document Management Portal User Guide. version 9.x NetClient CS Document Management Portal User Guide version 9.x TL 23560 (6/9/11) Copyright Information Text copyright 2001-2011 by Thomson Reuters/Tax & Accounting. All rights reserved. Video display images

More information

OneTouch 4.0 with OmniPage OCR Features. Mini Guide

OneTouch 4.0 with OmniPage OCR Features. Mini Guide OneTouch 4.0 with OmniPage OCR Features Mini Guide The OneTouch 4.0 software you received with your Visioneer scanner now includes new OmniPage Optical Character Recognition (OCR) features. This brief

More information

Knocker main application User manual

Knocker main application User manual Knocker main application User manual Author: Jaroslav Tykal Application: Knocker.exe Document Main application Page 1/18 U Content: 1 START APPLICATION... 3 1.1 CONNECTION TO DATABASE... 3 1.2 MODULE DEFINITION...

More information

User Management Resource Administrator 7.2

User Management Resource Administrator 7.2 User Management Resource Administrator 7.2 Table Of Contents What is User Management Resource Administrator... 1 UMRA Scripts... 1 UMRA Projects... 1 UMRA Software... 1 Quickstart - Sample project wizard...

More information

1.0 Getting Started Guide

1.0 Getting Started Guide KOFAX Transformation Modules Invoice Packs 1.0 Getting Started Guide 10300805-000 Rev 1.0 2008 Kofax, Inc., 16245 Laguna Canyon Road, Irvine, California 92618, U.S.A. All rights reserved. Use is subject

More information

USER GUIDE for LEAD AUDITORS

USER GUIDE for LEAD AUDITORS USER GUIDE for LEAD AUDITORS Surveys, Audits, Assessments and Reviews Information System Doc 22-0085 Rev0 Paper copies of this document may not be current and should not be relied on for official purposes.

More information

The National Reading Panel: Five Components of Reading Instruction Frequently Asked Questions

The National Reading Panel: Five Components of Reading Instruction Frequently Asked Questions The National Reading Panel: Five Components of Reading Instruction Frequently Asked Questions Phonemic Awareness What is a phoneme? A phoneme is the smallest unit of sound in a word. For example, the word

More information

Blackboard Help. Getting Started My Institution Tab Courses Tab Working With Modules Customizing Tab Modules Course Catalog.

Blackboard Help. Getting Started My Institution Tab Courses Tab Working With Modules Customizing Tab Modules Course Catalog. Blackboard Help Getting Started My Institution Tab Courses Tab Working With Modules Customizing Tab Modules Course Catalog 1 Getting Started The following are some things to keep in mind when using Blackboard

More information

Outlook Web Access (OWA) 2010 Email Cheat Sheet

Outlook Web Access (OWA) 2010 Email Cheat Sheet June 9, 2014 Outlook Web Access (OWA) 2010 Email Cheat Sheet Outlook Web Access 2010 allows you to gain access to your messages, calendars, contacts, tasks and public folders from any computer with internet

More information

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

More information