Package franc. R topics documented: November 12, 2015. Title Detect the Language of Text Version 1.1.1



Similar documents
Package uptimerobot. October 22, 2015

Package whoapi. R topics documented: June 26, Type Package Title A 'Whoapi' API Client Version Date Author Oliver Keyes

Package retrosheet. April 13, 2015

Package searchconsoler

Package urltools. October 11, 2015

Package R4CDISC. September 5, 2015

Package OECD. R topics documented: January 17, Type Package Title Search and Extract Data from the OECD Version 0.2.

Package bizdays. March 13, 2015

Package TSfame. February 15, 2013

Package dunn.test. January 6, 2016

Package sendmailr. February 20, 2015

Package RIGHT. March 30, 2015

Package fimport. February 19, 2015

Package erp.easy. September 26, 2015

Package decompr. August 17, 2016

Package brewdata. R topics documented: February 19, Type Package

Package pdfetch. R topics documented: July 19, 2015

Package sjdbc. R topics documented: February 20, 2015

Migration Manual (For Outlook Express 6)

Package AzureML. August 15, 2015

Package plan. R topics documented: February 20, 2015

Package syuzhet. February 22, 2015

Package translater. R topics documented: February 20, Type Package

Package RCassandra. R topics documented: February 19, Version Title R/Cassandra interface

Package benford.analysis

Migration Manual (For Outlook 2010)

Package hazus. February 20, 2015

Package httprequest. R topics documented: February 20, 2015

Package empiricalfdr.deseq2

Core Fittings C-Core and CD-Core Fittings

Package hoarder. June 30, 2015

HTML Form Widgets. Review: HTML Forms. Review: CGI Programs

Package tagcloud. R topics documented: July 3, 2015

Package GSA. R topics documented: February 19, 2015

Product Internationalization of a Document Management System

Package jrvfinance. R topics documented: October 6, 2015

Package pinnacle.api

Package LexisPlotR. R topics documented: April 4, Type Package

<option> eggs </option> <option> cheese </option> </select> </p> </form>

Package cgdsr. August 27, 2015

Package CRM. R topics documented: February 19, 2015

Package ggrepel. February 8, 2016

Package DCG. R topics documented: June 8, Type Package

Package pxr. February 20, 2015

Package packrat. R topics documented: March 28, Type Package

Package survpresmooth

Package optirum. December 31, 2015

Package ChannelAttribution

Package smoothhr. November 9, 2015

Package RedditExtractoR

Package HadoopStreaming

Check list for web developers

TinyUrl (v1.2) 1. Description. 2. Configuration Commands

Package WikipediR. January 13, 2016

Package fastghquad. R topics documented: February 19, 2015

Package wordnet. January 6, 2016

SEARCH G2. Paul Taylor, Solutions Architect October 29, 2014

Package MBA. February 19, Index 7. Canopy LIDAR data

Package instar. November 10, 2015

Cross Site Scripting (XSS) and PHP Security. Anthony Ferrara NYPHP and OWASP Security Series June 30, 2011

Blackbox Reversing of XSS Filters

vtiger CRM Database UTF-8 Configuration (For MySQL)

Package hive. January 10, 2011

Package dsstatsclient

Barracuda Spam Firewall User s Guide

Package EasyHTMLReport

Package tvm. R topics documented: August 21, Type Package Title Time Value of Money Functions Version Author Juan Manuel Truppia

Package lunar. R topics documented: February 20, Type Package

Playing with Web Application Firewalls

Package xtal. December 29, 2015

2.0. Specification of HSN 2.0 JavaScript Static Analyzer

Package MDM. February 19, 2015

BARRACUDA SPAM FIREWALL FAQ

HTML Web Page That Shows Its Own Source Code

Package tablaxlsx. R topics documented: June 9, 2016

Package mpa. R topics documented: February 15, Type Package. Title CoWords Method. Version Date

Package bigrf. February 19, 2015

Installing and Sending with DocuSign for NetSuite v2.2

Package missforest. February 20, 2015

MS ACCESS DATABASE DATA TYPES

Barracuda Spam Firewall User s Guide

Playing with Web Application Firewalls

Package SHELF. February 5, 2016

Package changepoint. R topics documented: November 9, Type Package Title Methods for Changepoint Detection Version 2.

Merchant Service Provider Guide for Mobilpenge Based Acquiring

Softek Software Ltd. Softek Barcode Reader Toolkit for Android. Product Documentation V7.5.1

Package saccades. March 18, 2015

Package MixGHD. June 26, 2015

PIKA µfirewall Cloud Management Guide

Package date. R topics documented: February 19, Version Title Functions for handling dates. Description Functions for handling dates.

Input Validation Vulnerabilities, Encoded Attack Vectors and Mitigations OWASP. The OWASP Foundation. Marco Morana & Scott Nusbaum

Transcription:

Title Detect the Language of Text Version 1.1.1 Package franc November 12, 2015 Author Gabor Csardi, Titus Wormer, Maciej Ceglowski, Jacob R. Rideout, and Kent S. Johnson Maintainer Gabor Csardi <gcsardi@mango-solutions.com> With no external dependencies and support for 335 languages; all languages spoken by more than one million speakers. 'Franc' is a port of the 'JavaScript' project of the same name, see <https://github.com/wooorm/franc>. License MIT + file LICENSE LazyData true URL https://github.com/mangothecat/franc BugReports https://github.com/mangothecat/franc/issues Suggests testthat RoxygenNote 5.0.1 Encoding UTF-8 Imports jsonlite Collate 'distances.r' 'expressions.r' 'franc.r' 'ngrams.r' 'normalize.r' 'script.r' 'speakers.r' 'trigrams.r' NeedsCompilation no Repository CRAN Date/Publication 2015-11-12 13:08:48 R topics documented: franc............................................. 2 franc_all........................................... 3 speakers........................................... 4 Index 5 1

2 franc franc Detect the language of a string Detect the language of a string franc(text, min_speakers = 1e+06, whitelist = NULL, blacklist = NULL, min_length = 10, max_length = 2048) Arguments text A string constant. Should be at least min_length characters long, this is 10 characters by default. Only the first max_length characters are used (2048 by default), to make the detection reasonably fast. min_speakers Languages with at least this many speakers are checked. By default this is one million. Set it to zero to include all languages known by franc. See also speakers. whitelist List of three letter language codes to check against. blacklist List of three letter language codes not to check againts. min_length Minimum number of characters required in the text. max_length Maximum number of characters used from the text. By default only the first 2048 characters are used. Value A three letter ISO-639-3 language code, the detected language of the text. "und" is returned for too short input. See Also franc_all for scores against many languages, speakers. Examples ## afr franc("alle menslike wesens word vry") ## nno franc("alle mennesker er født frie og") ## Too short, und franc("the") ## You can change what s too short (default: 10), sco franc("the", min_length = 3)

franc_all 3 franc_all List of probably languages for a text Returns the scores for all languages that use the same script as the input text, in decreasing order of probability. The score is calculated from the distances of the trigram distributions in the input text and in the language model. The closer the languages, the higher the score. Scores are scaled, so that the closest language will have a score of 1. franc_all(text, min_speakers = 1e+06, whitelist = NULL, blacklist = NULL, min_length = 10, max_length = 2048) Arguments Value text A string constant. Should be at least min_length characters long, this is 10 chracters by default. Only the first max_length characters are used (2048 by default), to make the detection reasonably fast. min_speakers Languages with at least this many speakers are checked. By default this is one million. Set it to zero to include all languages known by franc. See also speakers. whitelist blacklist min_length max_length List of three letter language codes to check against. List of three letter language codes not to check againts. Minimum number of characters required in the text. Maximum number of characters used from the text. By default only the first 2048 characters are used. A data frame with columns language and score. The language column contains the three letter ISO-639-3 language codes. The score column contains the scores. See Also franc if you only want the top result, speakers. Examples head(franc_all("o Brasil caiu 26 posições")) ## Provide a whitelist: franc_all("o Brasil caiu 26 posições", whitelist = c("por", "src", "glg", "spa")) ## Provide a blacklist:

4 speakers head(franc_all("o Brasil caiu 26 posições", blacklist = c("src", "glg", "lav"))) speakers Number of speakers for 370 languages Format This is a superset of all languages detected by franc. Numbers were collected by Titus Wormer. To quote him: Painstakingly crawled by hand from OHCHR, the numbers are (in some cases, very) rough estimates or out-of-date.. speakers A data frame with columns: language Three letter language code. speakers Number of speakers. name Full name of language. iso6391 ISO 639-1 codes. See more at http://en.wikipedia.org/wiki/iso_639. iso6392 ISO 639-2T codes. See more at http://en.wikipedia.org/wiki/iso_639.

Index Topic datasets speakers, 4 franc, 2, 3 franc_all, 2, 3 speakers, 2, 3, 4 5