Applications of Evolutionary Computation in the Analysis of Factors Influencing the Evolution of Human Language. Alex Decker December 5, 2003

Similar documents
D A T A M I N I N G C L A S S I F I C A T I O N

Understanding by Design. Title: BIOLOGY/LAB. Established Goal(s) / Content Standard(s): Essential Question(s) Understanding(s):

Essential SharePoint Search Hints for 2010 and Beyond

Generic Proposal Structure

NEUROEVOLUTION OF AUTO-TEACHING ARCHITECTURES

BCS HIGHER EDUCATION QUALIFICATIONS Level 6 Professional Graduate Diploma in IT. March 2013 EXAMINERS REPORT. Knowledge Based Systems

Using Software Agents to Simulate How Investors Greed and Fear Emotions Explain the Behavior of a Financial Market

Reading Competencies

A SURVEY ON GENETIC ALGORITHM FOR INTRUSION DETECTION SYSTEM

1. Current situation Describe the problem or opportunity (the need for your proposal).

Cambridge IELTS 2. Examination papers from the University of Cambridge Local Examinations Syndicate

Summary Genes and Variation Evolution as Genetic Change. Name Class Date

1/9. Locke 1: Critique of Innate Ideas

Industrial Engineering Definition of Tuning

Genetic Algorithms and Sudoku

A Binary Model on the Basis of Imperialist Competitive Algorithm in Order to Solve the Problem of Knapsack 1-0

Comparative Analysis on the Armenian and Korean Languages

Solar Energy MEDC or LEDC

Alpha Cut based Novel Selection for Genetic Algorithm

Intelligent Modeling of Sugar-cane Maturation

Hypothesis testing. c 2014, Jeffrey S. Simonoff 1

Planning and Writing Essays

Planning and conducting a dissertation research project

SOME IMPORTANT DIFFERENCES BETWEEN WHITE-LABEL JOB BOARD PROVIDERS

Mining the Software Change Repository of a Legacy Telephony System

Report Writing: Editing the Writing in the Final Draft

When you hear the word engagement, you

INFORMATIVE SPEECH. Examples: 1. Specific purpose: I want to explain the characteristics of the six major classifications of show dogs.

AP Biology Essential Knowledge Student Diagnostic

Background Biology and Biochemistry Notes A

DRA2 Word Analysis. correlated to. Virginia Learning Standards Grade 1

The 2014 Ultimate Career Guide

Writing Thesis Defense Papers

Learner Guide. Cambridge IGCSE Economics

Genetic Algorithm. Based on Darwinian Paradigm. Intrinsically a robust search and optimization mechanism. Conceptual Algorithm

MARYCREST COLLEGE THE CONE LIBRARY Davenport, Iowa

Comparison of Request Admission Based Performance Isolation Approaches in Multi-tenant SaaS Applications

Modeling an Agent-Based Decentralized File Sharing Network

Healthcare, transportation,

SEO MADE SIMPLE. 5th Edition. Insider Secrets For Driving More Traffic To Your Website Instantly DOWNLOAD THE FULL VERSION HERE

Participants Manual Video Seven The OSCAR Coaching Model

GA as a Data Optimization Tool for Predictive Analytics

Evolutionary Detection of Rules for Text Categorization. Application to Spam Filtering

Performance Tuning with Oracle Enterprise Manager Session # S300610

Writing learning objectives

Kenken For Teachers. Tom Davis June 27, Abstract

Principles of Evolution - Origin of Species

Jean Piaget: Cognitive Theorist 1. Theorists from centuries ago have provided support and research about the growth of

MOFAS Community Grants Program. Grantee Interview Report #1 (Phase 1)

Evolutionary SAT Solver (ESS)

The Applications of Genetic Algorithms in Stock Market Data Mining Optimisation

NEGOTIATING STRATEGIES

Managing Variability in Software Architectures 1 Felix Bachmann*

Product Line Development - Seite 8/42 Strategy

Crosswalk of the Common Core Standards and the Standards for the 21st-Century Learner Writing Standards

Sample Size Issues for Conjoint Analysis

What Have I Learned In This Class?

A Robust Method for Solving Transcendental Equations

CHAPTER 6 GENETIC ALGORITHM OPTIMIZED FUZZY CONTROLLED MOBILE ROBOT

Task Scheduling in Hadoop

Supervisor of Banks: Proper Conduct of Banking Business [9] (4/13) Sound Credit Risk Assessment and Valuation for Loans Page 314-1

Teachers should read through the following activity ideas and make their own risk assessment for them before proceeding with them in the classroom.

How to write behavioural objectives Introduction This chapter deals with the concept of behavioural objective in education. Efforts are made to

Genetic programming with regular expressions

Is Spam Bad For Your Mailbox?

Efficiency Testing of Self-adapting Systems by Learning of Event Sequences

Model Assignment Issued July 2013 Level 4 Diploma in Business and Administration

Section 11. Giving and Receiving Feedback

Evaluation of Different Task Scheduling Policies in Multi-Core Systems with Reconfigurable Hardware

There are basically three options available for overcoming barriers to learning:

INTEGRATING THE COMMON CORE STANDARDS INTO INTERACTIVE, ONLINE EARLY LITERACY PROGRAMS

AN INTRODUCTION TO SOCIOLOGICAL THEORIES

RuleSpeak R Sentence Forms Specifying Natural-Language Business Rules in English

Object-oriented design methodologies

ESQUIVEL S.C., GATICA C. R., GALLARD R.H.

Circuits and Boolean Expressions

How to Take Running Records

Historical Linguistics. Diachronic Analysis. Two Approaches to the Study of Language. Kinds of Language Change. What is Historical Linguistics?

When being a good lawyer is not enough: Understanding how In-house lawyers really create value

Computational intelligence in intrusion detection systems

THE EFFECT OF USING FRAYER MODEL ON STUDENTS VOCABULARY MASTERY. * Ellis EkawatiNahampun. ** Berlin Sibarani. Abstract

GCE Economics Candidate Exemplar Work ECON4: The National and International Economy

Optimizing Hadoop Block Placement Policy & Cluster Blocks Distribution

Castilion Primary School Coaching Handbook: a guide to excellent practice. Growing excellent teachers

Critical Analysis So what does that REALLY mean?

6 Creating the Animation

Geoff Considine, Ph.D.

Portfolio management tools. Why and when are they used?

The 7 Deadly Sins of Copywriting

Approvals Management Engine R12 (AME) Demystified

Adaptive Business Intelligence

Purposes and Processes of Reading Comprehension

A Multi-objective Genetic Algorithm for Employee Scheduling

HOW TO WRITE A CRITICAL ARGUMENTATIVE ESSAY. John Hubert School of Health Sciences Dalhousie University

AP ENGLISH LANGUAGE AND COMPOSITION 2015 SCORING GUIDELINES

GESE Initial steps. Guide for teachers, Grades 1 3. GESE Grade 1 Introduction

WRITING PROOFS. Christopher Heil Georgia Institute of Technology

Projects - Neural and Evolutionary Computing

Resources for Writing Program Learning Outcomes

Transcription:

Applications of Evolutionary Computation in the Analysis of Factors Influencing the Evolution of Human Language Alex Decker December 5, 2003

Abstract This paper describes the development of a framework designed to analyze the various factors which influence the evolution of human languages and analyze their interaction and significance. To do this, a simplified language generation system was developed allowing agents to communicate predefined concepts in an evolving way. Evolutionary Computation was utilized to drive this evolution. The framework tries to encapsulate many of the unique aspects of human language in a flexible way. It is hoped that it can then be used to analyze very complex scenarios. Keywords Evolutionary Computation, Human Language 1

Contents 1 Introduction 3 1.1 Motivation.............................. 3 1.2 Background.............................. 3 2 Research and Experimental Setup 4 2.1 Agents................................. 4 2.1.1 Vocabulary.......................... 4 2.1.2 Community.......................... 5 2.1.3 Parameters.......................... 5 2.2 Evolutionary Approach....................... 6 2.2.1 Genetic Operators: Mutation and Recombination..... 6 2.2.2 Phase One.......................... 6 2.2.3 Phase Two.......................... 7 3 Results 8 4 Conclusion 9 5 Future Work 9 2

1 Introduction The study of human language has long been a topic of fascination to me. However, there are many barriers to the effective study of language, not the least of which are ethical concerns of using actual human subjects and placing them in controlled linguistic environments. The research contained herein was conducted with the the hope of creating a framework to facilitate conducting complex and meaningful linguistic experiments. The framework and the motivations behind its design will be detailed and a number of possible future experiments will be discussed. 1.1 Motivation The existence of this framework will be a boon to all researchers interested not only in analyzing how our language got here, but where it might go in the future. Development of agents which mirror the linguistic complexity of actual humans is a goal which has never really even been attempted. Admittedly, this research is only the first step toward realizing that goal, but it should provide not only a quality platform for research, but a motivation to continue the refinement of these techniques. 1.2 Background The purpose of this research is to analyze the way different factors contribute to linguistic evolution in a controlled environment composed of evolutionary patterns which are as realistic as possible. Research into the use of computers for linguistic ends has been divided into two main camps. One represents people who are attempting to encode technical parameters and analyze language in a very deterministic way. Simon Kirby addressed this side of the issue somewhat in his paper[1] where he introduced alternatives to evolution as explanations for language. The other camp has been the evolution of new languages in the context of solving other problems, allowing agents within those systems to communicate. This approach was taken by Bruce Maclennan [2]. The problem with this second method is that these languages are meaningless outside the context of the agents environment and therefore useless when performing research applicable to real languages. In this research, I took some from each of these camps and attempted to dovetail and extend the research to produce a more realistic model. Research Questions The research questions that inspired this research were: What are the major factors influencing the evolution of a single language over time? 3

How can Evolutionary Computation be utilized to explore the dynamics of these factors? How do changes observed in the artificial world correlate to those observed in the real world? Can these correlations be used to automate the tuning of the EA to produce more accurate changes? Can these resulting methodologies then be used to simulate the interaction of different languages over time? 2 Research and Experimental Setup As already mentioned, the goal of this research was the development of an experimental framework for analyzing language evolution. The basis for this framework is a core population of individuals or agents (section 2.1) which are able to communicate using a preprogrammed set of statement and phrase types, drawing on a vocabulary (section 2.1.1) database. All aspects of communication will be evolved using a two-tiered EA, performing evolution both on linguistic elements and on communities (section 2.1.2) formed based on unique parameters (section 2.1.3) internal to each agent. 2.1 Agents Agents represent the core element of the framework. Each agent represents a single communicating individual within a large dynamic community. Each agent has a unique capacity for both communicating and comprehending, these defined by internal parameters. The entire population of agents will initially have identical language states. Over time, each agent s internal parameters (as well as the social clique these parameters cause the individual to fall into) will influence the evolution of their internal vocabulary uniquely. This divergence of vocabulary is restricted in multiple ways. One, since the agents communicate with others inside their community constantly, they will be influenced by differences introduced by others in the population. Also, if a change introduced by one agent is not able to be understood by a large number of the agents in its community, it will likely be rejected. 2.1.1 Vocabulary The vocabulary of each agent is made up of various interacting knowledge bases. First of all, each agent has a predefined set of statement types they know how to communicate and interpret. An example would be a query, which would consist of a phrase classified as inquiring ( What is ), an article ( an ), and a noun ( apple orchard ). The system can be extended easily to support any number of statement types. To support these statement types, different classes of phrases are needed. Each phrase is encoded with three pieces of information; 4

its syllabic breakdown, its phonic structure and its semantic context. Syllabic structure is critical in both the language evolution process. Phonic structure is used as a basis for logical evolutionary choices and is also necessary for the interpretation of communication from other agents. Phonic encoding was of my own design and is represented as a single core phonic for each syllable in the word. For now, more complex forms such as diphthongs will be ignored as they are fairly subtle and not critical to a useful model. 2.1.2 Community Just as in real life the concept of community is critical to a person s identity (not to mention that it also determines a lot of the person s environmental influences), this framework was designed to develop dynamic communities of agents. These communities evolve naturally based on the agents own internal parameters. Initially, all agents consider all others to be a member of their clique. Over time, natural tendencies and linguistic divergence will isolate the agents into subgroups. The dividing lines between communities are also unique to each agent; you may consider someone a member of your community while they do not consider you a member of theirs. This allows some interesting dynamics to evolve and is how communities interact. This also exerts a stabilizing effect on the language, not allowing dialects to become completely separate (although some experiments may allow this as an experimental parameter). 2.1.3 Parameters As the agent is the main element of the framework, and their parameters are the expression of their individuality (and subsequently, their experimental usefulness), the selection of parameters for the agents is critical. Although the framework was developed to allow easy extension of agent parameters and interactions, the current parameters are thus: Adaptability represents the willingness of an agent to accept reasonable linguistic differences when interpreting communication from other agents. This is represented as a value between 0 and 1, indicating the probability that a response will be made to an interpreted statement with no exact match in their vocabulary. This is only applicable if a possible interpretation is achieved. Impressionability indicates how likely an agent is to adopt understood differences into its vocabulary. Represented as a value between 0 and 1, indicating the probability that an interpreted phrase will apply its changes to that agent s vocabulary. This is only applicable if a possible interpretation is achieved. Acceptance reflects how willing an agent is to associate with those outside its community. An agent who chooses not to accept another will not respond to messages received from that agent. This will impact the other agent, 5

as this lack of response is considered a failure to understand by that agent and will drive its evolution accordingly. This is only applicable if a possible interpretation is achieved. There are also two dynamic parameters that are used to evaluate the agents in the evolutionary cycle; these are: Prestige is an indicator of how well liked an agent is within their own community. It is an weighted average of the percent of correct responses it receives from someone who has that agent in their community to any communication it generates. Respectability is a measure of how well those outside its community. like it. It is a weighted average of responses which people make to the person who do not consider that person part of their community. Individuality is a measure of the agents uniqueness. It is a weighted average of the percent of sentences matched by the agent which still contain some differences. 2.2 Evolutionary Approach The evolutionary approach taken in this research was rather unique. Instead of optimizing to find a single candidate which maximizes fitness, each agent is optimized individually toward a unique optimum which is related to its internal parameters and to its interactions with other evolving agents (since they are the ones responsible for judging the agent). The purpose of the evolutionary cycle in this research was to guide the development of the agents vocabularies. Agents try and achieve a vocabulary which maximizes the average value of their dynamic statistics. 2.2.1 Genetic Operators: Mutation and Recombination Mutation operates on vocabulary on multiple levels. Consonants can be mutated at the beginning and the end of a word can be mutated in various ways. Phrases with similar phonic forms can have words swapped between them. Words can be replaced with random words with identical phonics, crafted according to loose rules. Statements forms can be combined using crossover at locations with specific phrase types to create entirely new types of statements. 2.2.2 Phase One As hinted at, this research has as its goal only a framework for experimentation. The functionality the system currently possesses (creating a population of agents which conform to some provided suitability function and allowing them to communicate) will only be the first phase of a larger experiment. This first phase will be reasonably similar for all runs and will consist of three phases: 6

1. population generation: The creation of the initial pool of agents is random by default, but a heuristic function can be supplied to guide the creation of more specific agents. 2. classification: the deterministic process of agents creating their communities. This happens slowly, over time. Due to this, phase one will not terminate until all agents have reached a specified maturation age. 3. normalization: the act of killing off agents which don t exhibit desired properties. The basic steps of a phase-one cycle follow: 1. Agent selects a statement form or creates one through crossover. 2. Agent selects phrases to satisfy this form. 3. Agent may perform mutation on these phrases. 4. Agent communicates the completed statement to the other agents who respond. 5. The response consists of what was understood, reduced back to the vocabulary as it existed at the beginning (the vocabulary that the agents once had in common). 6. Originating agent determines how many people understood its statement. 7. Agent updates its internal dynamic statistics accordingly. 8. Agent may refine its community definition based on this new perspective. After this, the main system determines if any agents are unacceptable and kills them if so. If the population becomes too small, new agents may be generated to replaced the killed ones. This process continues until the population meets the provided parameters. 2.2.3 Phase Two The second phase will differ based on the experiment. It will use the evolved and classified populations to analyze the effects of various environmental factors on the agents vocabularies. Some possible future studies include: Cultural Analysis Cultural differences indicate interactions between previously isolated populations after a given interval. Individual Analysis Individual differences are those explained by the cognitive perspective of each entity and will be simulated by a continual evolutionary cycle throughout an entities lifetime, influenced by environment and interaction. 7

Socio-Economic Analysis Socio-Economic differences will be simulated by differing mutation levels and different tolerances for syllabic and semantic constructs. Realistic Model A complex environment with multiple populations over multiple generations each with distinct internal and relative economic differences. 3 Results As the primary purpose of this research was the framework development, all effort to this point has been on fleshing out the phase one mechanisms and confirming the makeup of the resulting communities of agents. Important also is the ability to craft the agent parameters to achieve different effects, so this was addressed. n6 n0 n8 n5 n1 n4 n2 n9 n3 n7 Figure 1: Example graph generated with only ten agents to illustrate the forming of communities. In the directed graph, nodes are connected to nodes that they consider to be a part of their community. You can tell from the unbalanced nature of the nodes that the normalization stage is particularly necessary for small populations. In general, a large enough population should create reasonable communities. Control over the agent makeup at the completion of phase one is achieved in 8

a variety of ways. One, there are a few additional optional parameters that can be specified to require various constraints on the final population. These are: community size represents a permissible range for the average community size. avg prestige (and avg respect/avg indiv) represent a range that the average Prestige, Respectability and Individuality are allowed to reside in. To achieve the maximum in flexibility, you can also specify a function, taking an agent and returning boolean, which is called on every agent at the end of a cycle; if it returns false, the agent will be rejected, a new one inserted, and the process repeated. 4 Conclusion In conclusion, this framework has proven its ability to produce dynamic and controlled sets of agents, able to communicate effectively using independent, changing vocabularies. The novel application of EC has proven itself as a viable simulator of natural evolution insofar as the agents stability was always based on their fraternal makeup, not on any evolutionary factors. Only time will serve to prove the final worth of the system, but it should provide a wealth of new possibilities to researchers for some time. 5 Future Work Having satisfied the goals of the framework, we must look top the future. While many changes are planned to make the framework more flexible and extensible, other avenues of thought may also have merit. In the interest of bringing the linguistic evolution even closer to what is seen in nature, I hope to implement an learning classifier system to evolve the rule set which describes how mutation and combination take place. By combining this with an initial and goal vocabulary states, you could train the system to produce very realistic changes. Also, the concept of communities, while vital and useful, could easily be extended. For example, the current community concept is those people the agent considers to be their friends. You could easily set up parallel communities which ignore likes and dislikes and requires them to communicate, thus simulating (for example) a workplace environment, where regardless of other considerations, communication must take place and succeed. Clearly, there are a great many interesting possibilities for the future. Only time will tell where the true usefulness of this framework will come, but that it will be useful is without a doubt. 9

References [1] S. Kirby. Language evolution without natural selection: From vocabulary to syntax in a population of learners. Technical report, Language Evolution and Computation Research Unit, University of Edinburgh, 1998. [2] B. MacLennan. The emergence of communication through synthetic evolution. Technical report, October 20 1999. To appear in Advances in Evolutionary Synthesis of Neural Systems, edited by Vasant Honavar, Mukesh Patel, and Karthik Balakrishnan - MIT Press. 10