The Best Binary Split Algorithm: A deterministic method for dividing vowel inventories into contrastive distinctive features
|
|
|
- Merry Parrish
- 10 years ago
- Views:
Transcription
1 The Best Binary Split Algorithm: A deterministic method for dividing vowel inventories into contrastive distinctive features Master s Thesis Presented to the Language and Linguistics Program Brandeis University James Pustejovsky, Advisor In Partial Fulfillment of the Requirements for the Degree: Master of Arts in Computational Linguistics By Kobey Shwayder May 2009
2 Acknowledgments Thank you to Andrew Nevins for his support and advice. Thank you, Mindy. iii
3 Abstract The Best Binary Split Algorithm: A deterministic method for dividing vowel inventories into contrastive distinctive features A thesis presented to the Language and Linguistics Program Graduate School of Arts and Sciences Brandeis University Waltham, Massachusetts By Kobey Shwayder The task of finding the distinctive features to best describe a vowel inventory seems nondeterministic. The set of features must describe the inventory in as few features as possible, while providing feature contrasts between vowels to satisfy the active phonological processes in the language. Several previous algorithms have been able to define contrastive vowels for a vowel inventory with a pre-specified set of distinctive features, but these algorithms fail without prior knowledge of the feature set. In this thesis, I have created and implemented the Best Binary Split algorithm, a deterministic algorithm which is able to find the set of distinctive features and the set of contrastive vowel pairs for a language using only the vowel inventory of that language, a global feature hierarchy, and a deterministic process for reranking that hierarchy for each language based on the patterning of natural classes of vowels in that language. iv
4 Table of Contents 1. Introduction Previous Algorithms to find division and contrast Pairwise Algorithm The Successive Division Algorithm Motivation for new algorithm Best Binary Split Algorithm Best Binary Split, first attempt Breaking a feature tie with a global hierarchy The Best Binary Split Algorithm, with Global Feature Hierarchy Problems with the determinism of a global hierarchy The Best Binary Split Algorithm, with active Natural Class patterning Difficulties in transcription systems Implementation Basic Implementation Information The Vowel Class and Inventories The Algorithm Reranking Conclusion Appendix: Python Code for Algorithm Bibliography v
5 1. Introduction Distinctive feature theory is widely celebrated for its ability to describe large phonemic inventories with a small number of parameters and to easily describe the possible contrasts between phonemes that often play a role in phonological processes of a language. For example, using distinctive features to divide a vowel space allows for a concise description of both the vowels in that inventory and the specific contrasts between them. Without the use of distinctive features, vowels in an inventory must be Turkish: i y ɯ u e ø a o compared to every other vowel, creating C(n,2) distinctions (for n vowels). However, distinctive features can significantly reduce the number of distinctions in vowels. For example, the Turkish vowel system, /i,y,e,ø,ɯ,u,a,o/, can be described as a set of 28 binary relations (a relation between all pairs of vowels in the inventory), or, using the features [high, back, round], it can be described as the interaction of three binary relations. Mielke (2008) demonstrates this change graphically: i i y u y u ø o e a e a ø o Reducing twenty-eight binary relations to three (Mielke 2008, p.31, Fig 1.8) 1
6 Not only does this change allow for a simpler description of the system, it also allows for the description of pairs of vowels with minimal contrasts, that is, segments that differ in only one feature. For example, /i/ differs from /ɯ/ only in the feature [back], from /y/ only in the feature [round], and from /e/ only in the feature [high]. While there has been some previous work to develop algorithms to find contrasts in phonemic inventories and to divide the vowels using features in underspecification theories, there have, to my knowledge, been no attempts to codify a deterministic algorithm for creating the proper distinctive feature set for a language. That is to say, all previous theories assume that the distinctive features for a specific inventory have already been chosen and use those pre-specified features to divide that inventory and find the contrasts among its members. In this paper, I propose the Best Binary Split Algorithm, a deterministic algorithm which takes as input the inventory of a language and is able to output the distinctive feature set as well as find the contrasts in the inventory for each feature, which I implemented as a computer program. For this project, I have limited the domain to vowel inventories, in part to limit the scope of the project, but also because there is general consensus about the binary nature of features for vowels, whereas it is unclear if the articulatory features of consonants are truly binary in nature. In section 2, I discuss two previous algorithms, each of which do some part of the task. In section 3, I describe the Best Binary Split Algorithm and some additional methods needed to make the algorithm flexible enough to handle many languages, but still be deterministic. Section 4 includes a discussion of the implementation of the algorithm, the code of which can be found in the Appendix. 2
7 2. Previous Algorithms to find division and contrast 2.1 Pairwise Algorithm The first algorithm to find contrasts in a vowel system is the Pairwise Algorithm (Archangeli 1988): Pairwise Algorithm (Archangeli 1988, p. 192) 1. Fully specify all segments. 2. Isolate all pairs of segments. 3. Determine which segment pairs differ by a single feature specification. 4. Designate such feature specifications as contrastive on the members of that pair. 5. Once all pairs have been examined and appropriate feature specifications have been marked contrastive, delete all unmarked feature specifications on each segment. The Pairwise Algorithm works by examining each pair of segments (there will be C(n,2) pairs for n vowels) to see if any are a minimal pair (differ in only one feature). If so, it uses that differing feature to contrast the segments. Archangeli gives the example of Maranungku, whose inventory is Maranungku: i ʊ ə æ ɑ /i,ɑ,ə,æ,ʊ/, with the features [high, low, back]: 1 Full Specification high low back i + ɑ + + ə + æ ʊ + + The contrasts are High: (ə,ʊ) Low: (ɑ,ə) Back: (i,ʊ), (ɑ,æ) Contrastive Specification high low back i ɑ + + ə æ ʊ Archangeli p
8 One problem with the Pairwise Algorithm is that it assumes the features for the system are already known, and it simply finds the contrasting pairs within those features. So while it is useful to find the contrasts in a specified system, it lacks the capability of finding that system without the nondeterministic choice of features. Dresher (In Press) criticizes the Pairwise Algorithm for breaking if too many features are inputted. For example, the Pairwise Algorithm fails to find any contrastive segments in a three vowel system, /i,a,u/, if the given features are [high, round, back]: 2 Full Specification high round back i + a + u There are no contrasts Contrastive Specification high round back i a u With these features, each of the three vowels contrasts with every other vowel in two features, so there are no minimal pairs, and therefore, no contrasts. 2.2 The Successive Division Algorithm Dresher s own solution to dividing vowel systems into contrastive features is the Successive Division Algorithm (Dresher In Press, 2003a, 2003b). The Successive Division Algorithm begins with the assumption that all sounds are allophones of a single phoneme. If the set of sounds contains more than one contrasting member, then a feature is selected and the set is divided by that feature. The algorithm then recurses on each subset until every set has only one member. More explicitly: 3 2 Dresher In Press. pp Dresher In Press. pp
9 Successive Division Algorithm (Dresher In Press) a. In the initial state, all tokens in inventory I are assumed to be variants of a single member. Set I = S, the set of all members. b. i) If S is found to have more than one member, proceed to (c). ii) Otherwise, stop. If a member, M, has not been designated contrastive with respect to a feature, G, then G is redundant for M. c. Select a new n-ary feature, F, from the set of distinctive features. F splits members of the input set, S, into n sets, F1 Fn, depending on what value of F is true of each member of S. d. i) If all but one of F1 Fn is empty, then loop back to (c). (That is, if all members of S have the same value of F, then F is not contrastive in this set.) ii) Otherwise, F is contrastive for all members of S. e. For each set Fi, loop back to (b), replacing S by Fi. For example, one way to divide up the inventory of Turkish, /i,y,e,ø,ɯ,u,a,o/, is to first divide choose to divide on [back], and then choose each subset to divide on [high], and then each remaining subset on [round]. However, there is a problems to this approach. Since the algorithm repeats independently for Turkish: i y ɯ u e ø o a each subset (algorithm step e), there is nothing to prevent a different choice of feature for one of the subsets, e.g. choosing [low] for the division of {a,o}. One way to divide up Turkish using the Successive Division Algorithm Another way to divide up Turkish using the Successive Division Algorithm This choice fails to adequately represent the fact that Turkish has a beautifully symmetric system of three features, with each vowel contrastive with another vowel for every 5
10 features. However, the main problem with the Successive Division Algorithm is that it is nondeterministic. The choice of features for each subset must correctly reflect the contrasts present in a language s phonological processes in order to produce a correct tree for that language s inventory. Dresher does not present any way of automating this choice, so there is nothing to prevent a deviant analysis that starts dividing the Turkish inventory with a feature not active in Turkish, such as [low] or [ATR]. 2.3 Motivation for new algorithm The new algorithm proposed in this paper will solve the problems of both the Pairwise and Successive Division algorithms. The way to solve the feature specification problem is to start without any knowledge of the features which have been posited for a language, but with the vowel inventory for that language fully specified for all possible features, and to determine not only the set of distinctive features for that language, but the contrasting sets of vowels for each feature. By solving this problem deterministically, we can write an efficient algorithm to allow a computer to calculate the features and contrasts of any given language. 6
11 3. Best Binary Split Algorithm For this algorithm, I am following the ideas presented in Nevins forthcoming book, Locality in Vowel Harmony, that all segments will be fully specified for the distinctive features relevant to their language, and each phonological process may be parameterized with the option to look at any segment with a certain feature value (e.g. palatalize before any [+high]) or only those segments that are contrastive for a feature (e.g. palatalize before any feature contrastive for [+high]). With a fully specified inventory, finding the contrastiveness of a segment for a feature is trivial A segment is contrastive for a feature if there is a segment in the inventory identical to it except with the opposite value for the specified feature (this is like a reverse running of the Pairwise Algorithm). With this framework in mind, the goal of this new algorithm is to take a vowel set and return the minimal set of distinctive features that is able to distinguish each vowel, and the specifications for the features of each vowel. 3.1 Best Binary Split, first attempt The basic idea of this algorithm is to recursively divide a maximally specified vowel set until each vowel is in a singleton set by choosing the best binary split feature, that is, the feature that splits all of the sets as close to in half as possible. In this framework, all segments start fully specified, not just for the features relevant to their language, but for all possible distinctive features. This can be thought of as starting with each segment reflecting the features that directly correlate to its phonetic value. The feature which best splits the set of segments in half as closely possible is then used to divide the set. Then the algorithm recursively divides the sets, optimizing for the best 7
12 binary split for every set, that is, the best feature to choose is the one that divides all subsets of segments as closely in half as possible. When dealing with more than one set of vowels, the algorithm chooses the feature which optimizes splitting every set in half as best as possible. In direct contrast to the Successive Division Algorithm, the chosen feature divides every set, and this feature is applied to every vowel, resulting in the set of vowels with full specification for all the features that are relevant to the full set. One way to think of this algorithm is to draw the vowels and try to divide them all using lines across the entire inventory. For example, a 4-vowel inventory, say /i,u,e,o/, can be divided with the feature set [high, back]. The Successive Division Algorithm would need to draw three lines, since it only works within the sets which have been previously created. So once [high] had been drawn, the [+high] set and the [ high] set must be divided separately. The Basic Binary Split Algorithm, on the other hand, would only draw two lines, since lines can be drawn across the entire inventory. The Successive Division Algorithm divides by [high] and then divides each remaining set by [back] separately: back high i e u o back The Best Binary Split Algorithm cuts the set in half with [high] and then again with [back]: i e u o back high The basic binary division algorithm: 8
13 Best Binary Split Algorithm (version 1, no resolution for tied features): Input: Vowels Sets = [], RelevantFeatures = [], GlobalFeatures = set of all possible distinctive features 1. Fully specify each vowel in Vowels for all features in GlobalFeatures 2. Add the set of all vowels in Vowels to Sets While Sets is not empty, loop (3-6): NewSets = [] 3. (Nondeterministically) Find feature F which best divides all sets in Sets in half. F = FindBestFeature(GlobalFeatures, RelevantFeatures, Sets) 4. For each set s in Sets: subsetplus = all vowels in s that are [+F] subsetminus = all vowels in s that are [ F] If subsetplus has more than one vowel, Add subsetplus to NewSets If subsetminus has more than one vowel, Add subsetminus to NewSets 5. Sets = NewSets 6. Add F to RelevantFeatures 7. Specify each vowel in Vowels with only the features in RelevantFeatures 8. A vowel V is contrastive for a feature F if there exists a vowel U in Vowels identical to V in all RelevantFeatures, but opposite for F Subroutine FindBestFeature(GlobalFeatures, RelevantFeatures, Sets): For each set s in Sets: For each feature f in GlobalFeatures RelevantFeatures: subsetplus = all vowels in s that are [+f] subsetminus = all vowels in s that are [ f] best-split-val = maximum of { length(subsetplus) / length(s), length(subsetminus) / length(s) } /*best-split-val will be between.5 (perfect split) and 1 (no split)*/ For each feature f in GlobalFeatures RelevantFeatures: score = average of each best-split-val for sets in Sets Return feature with lowest score /* i.e. score closest to.5 */ In trying to make the Best Binary Split Algorithm deterministic, one of the first problems we run into is what to do in the case of a tie for best split. For example, if we try to work with the Turkish inventory, /i,y,e,ø,ɯ,u,a,o/, we start by specifying each vowel for all possible distinctive features. I am assuming that the set of all features is [high, low, front, back, ATR, round]. 9
14 high low front back ATR round i y e + + ø ɯ u a + + o Now we find the best split feature with the subroutine FindBestFeature, which calculates the best-split-val for each feature by finding the maximum of the pluses per number of vowels or the minuses per number of vowels: high = max(4/8,4/8) =.5 low = max(1/8,7/8) =.875 front = max(4/8,4/8) =.5 back = max(4/8,4/8) =.5 ATR = max(7/8,1/8) =.875 round = max(4/8,4/8) =.5 We have a four way tie for [high], [front], [back], and [round] The current version of the algorithm has no way to break a tie between features, so one needed to be added. 3.2 Breaking a feature tie with a global hierarchy The way I chose to resolve a tie between features is to impose a hierarchy of features. There is some precedent for such a hierarchy, and the existence of a hierarchy may reflect some sort of feature salience, or a universal tendency to choose one feature over another. In Principles of Phonology, Trubetzkoy sets up four classes of vowel systems: those in which only the aperture feature (height) is distinctive; those which are distinctive in aperture and timbre (backness, roundness); those which are distinctive in aperture, timber, and intensity (ATR); and those which are distinctive in those three and 10
15 pitch (tone). From this Trubetzkoy decided that aperture was primary and, with timbre, basic in any vowel system. 4 Jakobson, in Fundamentals of Language, posit the temporal series of distinctions that children acquire, which order the vowel features from first to last as: narrow vs. wide (height), palatal vs. velar (backness), rounded vs unrounded. 5 From these conclusions, I started the hierarchy as: low/high > front/back > round > ATR. Following the general trend of linguists, I assigned low > high and back > front as default. This was validated by correct results for several vowel sets. Thus, a default hierarchy for features was determined: Default Feature Hierarchy: low > high > back > front > round > ATR Any other vowel features, e.g. creaky, are presumably lower than ATR, since they are typologically rare. To return to the Turkish example, we can use this hierarchy can now be used to break ties. The four way tie of high, front, back, and round, will be broken by choosing the highest in the hierarchy. Resulting in [high] being returned from the subroutine FindBestFeature. The set of vowels is now divided into [+high] and [ high] sets, and the algorithm loops with the set of these two sets. So now we have RelevantFeatures = [high] and these two sets: low front back ATR round i + + y ɯ + + u low front back ATR round e + + ø a + + o Baltaxe pp Explaining part of Trubetzkoy s Principles of Phonology. 5 Jakobson pp
16 So again we find the best split feature with the subroutine FindBestFeature: +high Set high Set Average low max(0/4,4/4) = 1 max(1/4,3/4) = front max(2/4,2/4) =.5 max(2/4,2/4) =.5.5 back max(2/4,2/4) =.5 max(2/4,2/4) =.5.5 ATR max(0/4,4/4) = 1 max(3/4,1/4) = round max(2/4,2/4) =.5 max(2/4,2/4) =.5.5 Now we have a tie for front, back and round. We appeal to the hierarchy to break the tie, so the best feature is [back] So now we have RelevantFeatures = [high, back] and these four sets: low front ATR round i + + y low front ATR round ɯ + u + + low front ATR round e + + ø low front ATR round a + o + + So again we find the best split feature with the subroutine FindBestFeature: {i,y} {ɯ, u} {e, ø} {a, o} Average low front ATR round The best binary feature is [round]. Now there are no sets of vowels left, so the algorithm is finished. The relevant features for Turkish are [high, back, round]. 12
17 high back round i + y + + e ø + ɯ + + u a + o + + With these features, we can now use an algorithm to find the contrastive sets. Find Contrasts: VowelSet, FeatureSet specify vowels in VowelSet only for FeatureSet for each feature F in FeatureSet: for each vowel V in VowelSet: vowel U = vowel V switch feature F in U to the opposite if U is in VowelSet: (V,U) is a contrastive pair for F By running the Turkish inventory through the Find Contrasts algorithm, we find that the contrastive pairs for Turkish are: High: (i,e),(y,ø),(ɯ,a),(u,o) Back: (i,ɯ),(y,u),(e,a),(ø,o) Round: (i,y),(e,ø),(ɯ,u),(a,o) Thus, with the addition of a hierarchy for choosing the best feature from a set of tied features, the Best Binary Split algorithm is now deterministic. 13
18 3.3 The Best Binary Split Algorithm, with Global Feature Hierarchy Best Binary Split Algorithm (version 2, with Feature Hierarchy): Input: Vowels Sets = [], RelevantFeatures = [], GlobalFeatures = set of all possible distinctive features 1. Fully specify each vowel in Vowels for all features in GlobalFeatures 2. Add the set of all vowels in Vowels to Sets While Sets is not empty, loop (3-6): NewSets = [] 3. Find feature F which best divides all sets in Sets in half. F = FindBestFeature(GlobalFeatures, RelevantFeatures, Sets) 4. For each set s in Sets: subsetplus = all vowels in s that are [+F] subsetminus = all vowels in s that are [ F] If subsetplus has more than one vowel, Add subsetplus to NewSets If subsetminus has more than one vowel, Add subsetminus to NewSets 5. Sets = NewSets 6. Add F to RelevantFeatures 7. Specify each vowel in Vowels with only the features in RelevantFeatures 8. A vowel V is contrastive for a feature F if there exists a vowel U in Vowels identical to V in all RelevantFeatures, but opposite for F Subroutine FindBestFeature(GlobalFeatures, RelevantFeatures, Sets): FeatureHierarchy: low > high > back > front > round > ATR For each set s in Sets: For each feature f in GlobalFeatures RelevantFeatures: subsetplus = all vowels in s that are [+f] subsetminus = all vowels in s that are [ f] best-split-val = maximum of { length(subsetplus) / length(s), length(subsetminus) / length(s) } /*best-split-val will be between.5 (perfect split) and 1 (no split)*/ For each feature f in GlobalFeatures RelevantFeatures: score = average of each best-split-val for sets in Sets If there is a single feature with lowest score, return that feature else: resolve ties by choosing the highest tied feature on the FeatureHierarchy With the addition of the global feature hierarchy and the algorithm for finding contrastiveness, the Best Binary Split algorithm can deterministically divide the vowel space and contrasts for many languages. Finnish, for example, famously has back harmony which /i/ and /e/ do not participate in. In radical underspecification theory, this would mean that /i/ and /e/ lack a specification for [±back], however, Finnish has an 14
19 assibilation process in which /t/ becomes [s] across a morpheme boundary when followed by /i/, that is: /t/ [s]/_#[+high,-back] Finnish: i y u e ø o æ ɑ If /i/ lacks specification for [±back], it is unable to participate in this process. However, using the framework of full specification and contrastiveness, the Best Binary Split algorithm can find the feature set for Finnish and show that /i/ and /e/ are not contrastive for [back]. We start by specifying all of the vowels in Finnish, /i,e,æ,y,ø,ɑ,o,u/, for every possible feature, and finding the best split: The best split for the entire inventory is [round]. This divides the inventory into two sets /i,e,æ,ɑ/ and low high back front round ATR i y e + + ø æ ɑ + + o u Score low.75 high.625 back.625 front.625 round.5 ATR.875 /y,ø,o,u/ low high back front ATR i The best split e + + æ among these two sets is ɑ + low high + back front ATR [high]. This y ø + + split makes /i/ a singleton set, o u so it does not get considered in future splits. {i,e,æ,ɑ} {y,ø,o,u} Average low high back front ATR
20 Now there are three sets: / e,æ,ɑ/, /y,ø/ and /o,u/. The best average split for these sets is [back]. low back front ATR e + + æ ɑ + + low back front ATR y + + u + + low back front ATR ø + + o + + {e,æ,ɑ} {y,u} {ø,o} Ave. low back front ATR This leaves most of the vowels in singleton sets, with only /e,æ/ still in one set. low front ATR e + + æ Score low 0.5 front 1 ATR 1 The final set /e,æ/ is broken up by splitting for [low]. Thus, the feature set for Finnish is determined to be [round, high, back, low], so the specifications for each vowel are: round high back low i + y + + e ø + æ + ɑ + + o + + u Along with these specifications, the Find Contrasts algorithm finds the pairs which are identical except for having the opposite value for [back]: Pairs contrastive for [back]: (y,u), (ø,o), (æ,ɑ) With a harmony paradigm that is restricted to contrastive pairs, the Best Binary Split algorithm finds the correct features and contrastive pairs for Finnish. 16
21 3.4 Problems with the determinism of a global hierarchy The benefit of having a deterministic system is that the results can be automatically computed without any knowledge outside of the program input. However, the Best Binary Split Algorithm takes only a vowel inventory as input, and therefore, due to its deterministic nature, will always return the same feature set and contrasts for the same input. In most computations, this is the goal. A system that multiplies two numbers should always return 15 when given 3 and 5. However, among languages with the same inventories, there can be different features that are used to divide that inventory. One example of this difference in feature specification is presented by Dresher (In Press). The languages Latin and Artshi both share the same vowel inventory, /i,e,a,o,u/. For this inventory, Latin and Artshi: i u e o a the Best Binary Split Algorithm outputs the features [high, back, low]: The first feature chosen is high. low high back front round ATR i e + + a + + o u Score low.8 high.6 back.6 front.6 round.6 ATR.8 The next is back. low back front round ATR i + + u low back front round ATR e + + a + + o {i,u} {e, a, o} Average low back front round ATR
22 Finally, the algorithm chooses low. low front round ATR a + o + + Score low 0.5 front 1 round 0.5 ATR 0.5 For the feature set [high, back, low], the algorithm finds the following contrasting pairs: Final Specifications for Latin high back low i + e a + + o + u + + high: (i,e,), (o,u) back: (i,u), (e,o) low: (a, o) This feature set, [high, back, low], and these contrasts are correct for Latin, but not for Artshi. Artshi has an active round contrast in part of its consonant inventory, but all of these consonants appear as [+round] before /o/ or /u/. 6 This indicates that [round] needs to be one of the features in the inventory, despite the fact that Artshi has an identical vowel inventory as Latin. Like Dresher s treatment of this phenomenon, the way to solve this problem is to allow each language to rerank the feature hierarchy. For example, if Artshi chooses a hierarchy and ranks round above low, then the Best Binary Split algorithm succeeds. Artshi Feature Hierarchy: round > low > high > back > front > ATR 6 Dresher, In Press. pp
23 First, best feature is round round low high back front ATR i e + + a + + o u Score round.6 low.8 high.6 back.6 front.6 ATR.8 Next, high. low high back front ATR o + + u low high back front ATR i e + + a + + {o,u} {i,e,a} Average low high back front ATR Finally, low. low back front ATR e + + a + + Score low 0.5 front 0.5 round 0.5 ATR 0.5 With this new hierarchy, Artshi has the feature set [round, high, low], which is exactly what theoretical work posits. Final Specifications for Artshi round high low i + e a + o + u + + Final Specifications for Latin high back low i + e a + + o + u
24 The reranking of features solves the the problem of multiple feature sets from the same inventory, but it also breaks the determinism of the Best Binary Split Algorithm because there needs to be a decision about what ranking a language should use. One simple solution would be to input the rankings along with the vowel inventory. However, this still requires the decision of ranking, it just removes it from the algorithm. Instead, the algorithm can take as input a set of vowel contrasts which correspond to the classes that pattern together. That is, for Artshi, instead of inputting only /i,e,a,o,u/, the algorithm also takes the contrast (o,u) vs. (i,e,a), which is the contrast apparent in the language. With this information, the algorithm can deterministically extract the relevant features and rerank the features for this particular language. 3.5 The Best Binary Split Algorithm, with active Natural Class patterning In order to find the active features in a language from a set of contrasting natural classes, we need an algorithm to find out what features the classes are contrastive for, and then promote that feature in the hierarchy. Rerank Features: ContrastiveClassSets /* A set of pairs of natural classes of vowels */ Hierarchy = low > high > back > front > round > ATR For each pair of natural classes ClassPair in ContrastiveClassSets: for each feature F: if all vowels in one member of ClassPair are [αf] and all vowels of the other member of ClassPair are [βf] where α,β {+, } and α β: move F to the top of the Hierarchy return Hierarchy To show how this works, let us input the active classes for Artshi, [(o,u) vs. (i,e,a)]. There is exactly one pair of natural classes in the contrastive set, whose members are 20
25 (o,u) and (i,e,a). The algorithm finds the features that are [+F] for all vowels of one side and [ F] for every vowel on the other side. high low front back ATR round u o high low front back ATR round i e + + a + + In this case, the only feature which is [+F] in one set and [ F] in the other is round. Round get promoted to the top of the hierarchy, leaving Artshi with the hierarchy: round > low > high > back > front > ATR which was demonstrated above to find the correct feature set for Artshi. By adding this reranking algorithm and natural class pair input to the Best Binary Feature algorithm, the algorithm can now correctly finds the distinctive feature hierarchy for many more languages. By adding the hierarchy as input to the Find Best Feature subroutine, the algorithm is able to calculate the language specific hierarchy with the Rerank Feature subroutine and then use the new feature hierarchy to find the best feature to split the vowel sets. One example of this part of the algorithm working is for Nawuri, whose inventory is /i,ɪ,e,ɛ,u,ʊ,o,ɔ,a/. Version 2 of the Best Binary Split algorithm (the version without contrastive set input) finds the feature set [high, back, ATR, low] for these vowels. However, Nawuri has rounding harmony, and Nawuri: i ɪ u ʊ e ɛ o ɔ a therefore the vowels must be specified for [round]. 21
26 Best Binary Split Algorithm (version 3, with Natural Class patterning input): Input: Vowels, ContrastiveClassSets Sets = [], RelevantFeatures = [], GlobalFeatures = set of all possible distinctive features 1. Fully specify each vowel in Vowels for all features in GlobalFeatures 2. find FeatureHierarchy by running subroutine RerankFeatures(ContrastiveClassSets) 3. Add the set of all vowels in Vowels to Sets While Sets is not empty, loop (3-6): NewSets = [] 4. Find feature F which best divides all sets in Sets in half. F = FindBestFeature(FeatureHierarchy, GlobalFeatures, RelevantFeatures, Sets) 5. For each set s in Sets: subsetplus = all vowels in s that are [+F] subsetminus = all vowels in s that are [ F] If subsetplus has more than one vowel, Add subsetplus to NewSets If subsetminus has more than one vowel, Add subsetminus to NewSets 6. Sets = NewSets 7. Add F to RelevantFeatures 8. Specify each vowel in Vowels with only the features in RelevantFeatures 9. A vowel V is contrastive for a feature F if there exists a vowel U in Vowels identical to V in all RelevantFeatures, but opposite for F Subroutine FindBestFeature(FeatureHierarchy, GlobalFeatures, RelevantFeatures, Sets): For each set s in Sets: For each feature f in GlobalFeatures RelevantFeatures: subsetplus = all vowels in s that are [+f] subsetminus = all vowels in s that are [ f] best-split-val = maximum of { length(subsetplus) / length(s), length(subsetminus) / length(s) } /*best-split-val will be between.5 (perfect split) and 1 (no split)*/ For each feature f in GlobalFeatures RelevantFeatures: score = average of each best-split-val for sets in Sets If there is a single feature with lowest score, return that feature else: resolve ties by choosing the highest tied feature on the FeatureHierarchy Subroutine RerankFeatures(ContrastiveClassSets): /* ContrastiveClassSets is a set of pairs of natural classes of vowels */ Hierarchy = low > high > back > front > round > ATR For each pair of natural classes ClassPair in ContrastiveClassSets: for each feature F: if all vowels in one member of ClassPair are [αf] and all vowels of the other member of ClassPair are [βf] where α,β {+, } and α β: move F to the top of the Hierarchy return Hierarchy Because of the rounding harmony in Nawuri, the [+round] vowels, /u,ʊ,o,ɔ/, appear on the surface together, and the [ round] vowels, /i,ɪ,e,ɛ,a/, appear together, for 22
27 example (from Nevins, In Press, pp ): 7 gɪ-sɪbɪta sandal gi-ke:li: kapok tree gu-jo yam gʊ-wʊrʊ hat (( gʊ-lɔ illness If we input the pair of vowels that pattern together, (u,ʊ,o,ɔ) vs (i,ɪ,e,ɛ,a), the algorithm tries to rerank the features for Nawuri by finding what feature this pair of vowel sets contrasts on: high low front back ATR round u ʊ o ɔ + + The first set agrees on [low,front,round], but the second set only agrees on [round], and the two sets contrast on [round], so round is promoted in Nawuri s feature high low front back ATR round i ɪ + + e + + ɛ + a + + hierarchy, resulting in: round > low > high > back > front > ATR This hierarchy is now used to find the best splitting features: The best split for the entire inventory is [round]. round low high back front ATR i ɪ + + e + + ɛ + a + + o ɔ + + u ʊ Score round.55 low.88 high.55 back.55 front.55 ATR.55 7 Note that Nawuri also has [ATR] harmony, so only vowels with the same [ATR] value will show up together. I am ignoring this for this example to demonstrate how the patterning variant works. If we inputted the [ATR] pattern, the results of the reranking would also promote [ATR], but the results of finding the distinctive features would be the same. 23
28 The best split between these two sets is [high]. low high back front ATR o + + ɔ + u ʊ + + low high back front ATR i ɪ + + e + + ɛ + a + + {u,ʊ,o,ɔ} {i,ɪ,e,ɛ,a} Average low high back front ATR Among the sets remaining from splitting [round] and [high], the best split is [ATR]. low back front ATR o + + ɔ + low back front ATR u + + ʊ + low back front ATR i + + ɪ + low back front ATR e + + ɛ + a + + {u,ʊ} {o,ɔ} {i,ɪ} {e,ɛ,a} Ave. low back front ATR And finally, the best remaining division is [low]. low back front ɛ + a + + Score low 0.5 back 0.5 front 0.5 Thus, the algorithm finds [round, high, ATR, low], and the correct contrastive pairs for rounding. Round contrasts: (i,u), (ɪ,ʊ), (e,o), (ɛ,ɔ) 24
29 Another example of this algorithm working with natural class pairings is for Classical Manchu. Classical Manchu has a strange vowel inventory, /i,u,ʊ,ɔ,ə,a/, with Classical Manchu: i u ʊ ə ɔ a active phonological processes of [ATR] harmony, with pairs (a, ə) and (ʊ, u), and round harmony (among low vowels), with /ɔ/ contrasting with / a/. The Best Binary Algorithm without Natural Class patterning produces the feature set [high, back, low, ATR] for Classical Manchu, with the only ATR contrast being (ʊ, u). However, using the additional input of the pairs (u, ə) vs. (a, ʊ) and (ɔ) vs (a), the algorithm reranks the hierarchy as: ATR > round > low > high > back > front. 8 The algorithm then finds the feature set [ATR, round, high], which correctly divides the vowels of Classical Manchu and correctly provides the following contrasts: ( ( ATR: (a, ə), (ʊ, a) round: (a, ɔ), (i, u) *note that because rounding harmony is restricted to [-high] vowels, the (i,u) contrast never has a phonological effect. 8 In the current algorithm, the order of promotion of features depends on the order of input of the natural class pairs. One possible way to fix this would be to figure out all features which need to be promoted and move them up in order of the default hierarchy. Another way would be to somehow choose between the features based on which ranking is able to give the smallest feature set for the language (i.e. run every variant and choose among them). I have not solved this problem in the current variant of the code because in all of the examples I have tested, it does not seem to matter which active feature is ranked above which other because they both show up in the final specifications. 25
30 3.6 Difficulties in transcription systems One of the difficulties of writing an algorithm to take a vowel system fully specified for all possible distinctive features and dividing it, is that it relies upon the specific information in the IPA symbol of the transcriber. There are many pairs of vowels that are generally considered to be the same in a phonological system (rather than a phonetic system) because the features that would distinguish them are not distinctive in that particular language. This seems to be especially true for low back vowels. However, whether a non-high non-front vowel is actually [+low] or [ low] and [+back] or [ back] can significantly change the result of a deterministic algorithm (especially when run on a computer). Some of the common possible symbol mismatches I encountered while writing this algorithm were: high low front back ATR round ə + ʌ + + /ə/ and /ʌ/: The difference here is [±back]. However, for a fully specified system trying to compare /a/ or /ɑ/ (or a confusion of the two, see below) to one of these results in a single difference for one but two differences for the other. high low front back ATR round ɔ + + ɒ /ɔ/ and /ɒ/: Whether the low back round vowel of a language is actually [+low] does not matter if the language only distinguishes height on [±high], 26
31 however in trying to figure out the distinctions, this difference can be crucial. high low front back ATR round a + + ɑ /a/ and /ɑ/: Many typesetters use the typewriter a, /a/, because it is easier to type. However, the difference between a back and a central vowel can prove critical. high low front back ATR round ɯ ɨ + + /ɯ/ and /ɨ/: Many linguists use these symbols relatively interchangeably because both are [ high, front, round] vowels. However, the actual backness values of these vowels can fool a computer system into deciding on the wrong splits. In testing the Best Binary Split algorithm, there were several cases of languages where the initial attempt provided an incorrect feature set, but the correct answer was obtained by changing one of the symbol to a relatively equivalent one, that is replacing a vowel with one which is phonetically different but phonologically the same for that language s distinctive features. Some of these languages that demonstrate some of the common vowel replacements are Woleaian, Sanjiazi Manchu, Turkish and Nez Perce. 27
32 Nevins (In Press) reports the Woleaian inventory is reported to be /i,y,u,e,ø,o,a,ɔ/. 9 However, the phonology of Woleaian considers /ɔ/ to be [+low], whereas phonetically, the symbol <ɔ> usually stands for a [ low] vowel. However, Woleaian: i y u e ø o a ɔ/ɒ by changing this vowel to be /ɒ/, the Best Binary Split algorithm is able to correctly distinguish the vowels of Woleaian. Sanjiazi Manchu and Turkish suffer from the combining of /ɯ/ and /ɨ/. Both Sanjiazi Manchu: i y ɨ/ɯ u æ a ɔ/ɒ languages have a single backness distinction, so phonologically it does not really matter which symbol is used to represent the [+high, round] vowel that is back (either [+back] or [ front]). In both cases, the Best Binary Search algorithm finds the correct contrasts, but it states the features as [±back] or [±front] depending on the input of / ɯ/ or /ɨ/. Sanjiazi Manchu also has the confusion of /ɔ/ and /ɒ/. With the Turkish: i y ɨ/ɯ u e ø o a input of /ɔ/, the contrast becomes [±high], whereas with /ɒ/, the contrast is returned as [±low]. Nez Perce provides an example of needing to specify the backness of /a/. With Nez Perce: i u ɔ æ a/ɑ just the inventory, /i,æ,a,ɔ,u/, and the ATR contrast pairs (u,ɔ) and (a,æ), when the orthographic <a> is treated as as a true back a, that is, /ɑ/, the feature set comes out correctly, with [ATR, low, back], but the /æ/ and /a/ fail to 9 Nevins, In Press. p
33 contrast for ATR. However, by inputting a phonetically central a, that is, /a/, the feature set and the contrasts are outputted correctly. ( A feature that would be useful for a future version of the Best Binary Search algorithm would be the ability to perform a fuzzy match on a vowel inventory by running the algorithm with commonly interchangeable vowels, so long as the final feature set did not distinguish these vowels. This fuzzy match would work by allowing vowels to shift values in one or possibly two categories, perhaps limited to only commonly confused categories (such as [back] or [low]), creating many possible similar inventories for the language. The Best Binary Search algorithm would then run on each inventory, and some heuristic would choose the best. The heuristic might involve choosing the analysis with the smallest feature set, or the one that had the active natural classes highest ranked in the hierarchy. In either case, the addition of a fuzzy search would greatly increase the theoretical runtime of the algorithm, but the search space is small enough (only a few vowels and a few features), that the performance would probably not be seriously compromised. The fuzzy search feature, however, is not part of the current implementation of the Best Binary Search algorithm. 29
34 4. Implementation In this section I will discuss how the implementation of the Best Binary Search algorithm works and the decisions for representations that were made. 4.1 Basic Implementation Information I implemented the Best Binary Search algorithm in Python, version Although there is nothing inherently Python-specific to the algorithm, Python provides good string and list processing capabilities as well as the ability to use object-oriented programming. Python is also generally considered easy to read, so the choice of Python will hopefully facilitate the readability of the code. 4.2 The Vowel Class and Inventories The first decision in building a system to work with a vowel inventory is how to represent a vowel and how to represent the features of a vowel. I chose to create a Vowel class so that every vowel is represented as an object. Each vowel object has an internal symbol, which is the name of the vowel, e.g. barred_i, and a dictionary of features to represent its fully specified variant. The feature dictionary simply maps the string name for each feature, i.e. high for [high], to an element of the Feature struct. The Feature struct has values Feature.Plus, and Feature.Minus to represent [+F] and [ F], as well as Feature.NotSpecified, which is currently used only for initializing the values of a vowel but is available for use if this representation for vowels were to be used for an algorithm following radical underspecification theory. The Vowel class also provides methods for comparing two vowel objects by checking if their feature 30
35 dictionaries map each features to the same value. With vowels represented as objects, a vowel inventory for a language is simply a list of vowel objects. 4.3 The Algorithm The Python code can be found in the Appendix. What follows is a brief prose descriptions of each method of the implementation. The algorithm is divided into several methods for ease of reading and testing. The top level method is called distinguish which takes input of the vowel sets, the list of unassigned global features, and the set of relevant features (the features already determined to be contrastive for the current vowel set). The distinguish method creates a dictionary mapping every feature to its maximum split score for each set and then passes these feature/score pairings to the findbestsplit algorithm which returns the best splitting feature. With the new best splitting feature, the relevant features set is updated and the distinguish method recurses with the new vowel sets obtained by splitting the vowel sets by the new best split feature with the splitsets method. The splitsets method simply takes the list of vowel sets and a feature and returns a list containing two lists for each vowel set, the [+F] list and the [ F] list. The recursion of the distinguish method ends if there are no more vowel sets left to be split. The findbestsplit method averages the sets for each feature and passes these averaged scores to the findbestsingle method which finds the best single feature for the split. The findbestsingle method simply orders the averaged feature/score pairings by score (lowest to highest, since the minimum is.5, signifying a perfect binary split, and 31
36 the maximum is 1, signifying no split) and returns the lowest. If there is a tie, it calls upon the rankfeatures method to break the tie. The rankfeatures method takes as input a set of tied features and a feature hierarchy (or uses the global hierarchy in for Version 2 of the algorithm), which is simply a dictionary that maps each feature to a score, and returns the highest ranked of the tied features. To find the contrasts in a vowel inventory after discovering the feature set, the algorithm iterates through the relevant features running findcontrasts, which takes the vowel inventory and a feature. For each vowel in the inventory, the vowel is copied into a new vowel, and then the value for the current feature is switched. If this new vowel is in the vowel inventory, then the pair is added to the set of contrastive vowel pairs for that language. 4.4 Reranking For the final version of the algorithm (Version 3), there is the additional input of pairs of patterning vowels. These vowels are used to find the important features in the language to rank them higher than in the default feature hierarchy. This happens in a way very similar to the findcontrasts algorithm implementation. For each pair of patterning vowels, the algorithm finds the features that one side shares and the other lacks by creating ideal pairs for each vowel by copying the vowel and then changing the value of the current feature, and then checking to see if that vowel is in the other pair. If a feature has this property, then it is moved to the top of the hierarchy. This hierarchy is then passed into the distinguish method and passed down to the rankfeatures 32
37 tiebreaker method, which uses the inputted hierarchy instead of the default global hierarchy. Because of the difficulties in transcription mentions in section 3.6 above, I implemented a little bit of fuzziness into the match finding process for the findcontrasts algorithm. Often, vowels that should be paired (say /a/ and /ɔ/ for roundness) also differ in one other feature (in this case, [low]), so these vowels would not normally be considered a match. However adding fuzziness to the match allows matched vowel to deviate from its partner but only by up to two features. This fuzziness value of two is arbitrary, but worked in test cases. With more data, however, it may need to be changed. 33
38 5. Conclusion ` The Best Binary Split Algorithm is deterministically able to find the distinctive feature set for a language s vowel inventory, as well as predict the contrasting vowel pairs for any of the features in that set. Creating a deterministic process allows for an analysis of a process without outside knowledge. This is a benefit because the process is then able to be implemented as a computer program, which allows for efficient and accurate testing of new data to test whether the algorithm holds. In addition, a problem solved by a deterministic process does not need full human intelligence to find the solution. The human ability to choose nondeterministically allows us to solve many problems which computers cannot. However, it seems unlikely that a child, who has already figured out the phonology of his or her native language by a very young age, is able to solve the such complex nondeterministic problems before being able to form full sentences. The fact that figuring out the contrastive distinctive features of a vowel inventory can be done deterministically, and therefore by a computer, suggests that children do not need fully-formed nondeterministic problem-solving skills to be able to decide how their language s vowel system processes function. The Best Binary Split Algorithm may not be the way children actually acquire the contrasts and features of their inventory, but it is a deterministic solution that allows a computer to perform the seemingly difficult task of dividing an undistinguished set of vowels into distinctive features and finding the contrastive pairs. 34
39 Appendix: Python Code for Algorithm #The Best Binary Split Algorithm # Version 2: Global Feature Hierarchy # MA Thesis, Computational Linguistics, Brandeis University Kobey Shwayder from copy import deepcopy class Feature: '''This is the struct to define what + and - mean for a feature''' Plus = 1 Minus = 2 NotSpecified = 5 def FeatureOpposite(featval): '''This method returns the opposite value of a feature''' if featval == 1: return 2 elif featval == 2: return 1 else: return featval GlobalFeats = ['high', 'low', 'front', 'back', 'round', 'ATR'] class Vowel: '''the vowel class''' def init (self, char, high = Feature.NotSpecified, low = Feature.NotSpecified, front = Feature.NotSpecified, back = Feature.NotSpecified, round = Feature.NotSpecified, ATR = Feature.NotSpecified, long = Feature.Minus): self._features = {} self._features['high'] = high self._features['front'] = front self._features['back'] = back self._features['low'] = low self._features['round'] = round self._features['atr'] = ATR self._features['long'] = long self.symbol = char 35
40 def repr (self): '''the string representation of the Vowel''' toreturn = '<Vowel-\'%s\': ' % self.symbol.encode('utf-8') keys = self._features.keys() keys.sort() for f in keys: if(self._features[f] == Feature.Plus): toreturn += "+%s, "%f elif(self._features[f] == Feature.Minus): toreturn += "-%s, "%f else: #Feature.NotSpecified pass return toreturn[:-2] + ">" #remove last comma and space def getitem (self, key): '''Vowel[key]''' return self._features[key] def setitem (self, key, value): '''Vowel[key] = value''' self._features[key] = value def eq (self, other): '''Vowel == other''' if not isinstance(other, self. class ): return False for feat in self._features.keys(): if self._features[feat]!= other._features[feat]: return False return True def ne (self, other): '''Vowel!= other''' return not self. eq (other) def FindContrasts(vowels, feat, featset, trace = 0): '''Given the vowels, a featureset, and a feature, finds all the pairs of vowels that are contrastive for that feature''' pairs = [] visited = [] for vowel in [vow for vow in vowels if vow not in visited]: partner = deepcopy(vowel) partner[feat] = FeatureOpposite(vowel[feat]) for v in [vow for vow in vowels if vow not in visited]: match = reduce(lambda x,y: x and y, map(lambda feat: v[feat] == partner[feat], featset)) if trace > 0: print vowel.symbol, "?=", v.symbol, match if match: pairs.append((vowel, v)) visited.append(vowel) visited.append(v) return pairs 36
41 def rankfeatures(tiedfeatures): '''Splits ties using the golbal hierarchy low > high > back > front > round > ATR''' ranks = {'low' : 0, 'high' : 1, 'back' : 2, 'front':3, 'round':4, 'ATR': 5} ranked = [(ranks[f],f) for f in tiedfeatures] ranked.sort() return [p[1] for p in ranked] def FindBestSingle(matrix, relevantfeats): '''Finds the best feature to split by breaking ties with rankfeatures''' splits = {} for f in matrix: if not splits.has_key(matrix[f]): splits[matrix[f]] = [] splits[matrix[f]].append(f) for k in splits: if len(splits[k]) > 1: splits[k] = rankfeatures(splits[k]) splits = [(k, splits[k]) for k in splits] splits.sort() return splits[0][1][0] def findbestsplit(sets, features, relevantfeats): '''Finds the best split by scoring each feature and passing the scores to FindBestSingle''' if len(sets) == 1: return FindBestSingle(sets[0], relevantfeats) else: scores = {} for feature in features: scorelist = [] for set in sets: if set.has_key(feature): scorelist.append(set[feature]) else: scorelist = [] break if scorelist!= []: scores[feature] = scorelist for key in scores: scores[key] = sum(scores[key])/len(scores[key]) return FindBestSingle(scores, relevantfeats) 37
42 def splitset(vowelsets, feature): '''splits a vowelset on the feature by dividing each subset into two subsets the [+feature] and the [-feature] subset''' newvowelset = [] for vowels in vowelsets: if isinstance(vowels, list): plus = [] minus = [] for vowel in vowels: if vowel[feature] == Feature.Plus: plus.append(vowel) if vowel[feature] == Feature.Minus: minus.append(vowel) for sign in [plus, minus]: if len(sign) > 1: newvowelset.append(sign) return newvowelset def distinguish(vowelsets, features = GlobalFeats, relevantfeats = []): '''the main method for distinguishing a vowel set''' results = [] for vowels in vowelsets: if isinstance(vowels, list): print [v.symbol for v in vowels] matrix = {} for f in features: matrix[f] = [v[f] for v in vowels] for f in matrix: matrix[f] = \ max(float(matrix[f].count(feature.plus))/\ len(matrix[f]), float(matrix[f].count(feature.minus))/\ len(matrix[f])) results.append((vowels, matrix)) if results == []: return relevantfeats bestsplitfeat = findbestsplit([pairs[1] for pairs in results], features, relevantfeats) features.remove(bestsplitfeat) relevantfeats.append(bestsplitfeat) print "best split:", bestsplitfeat, "\n###################\n" return distinguish(splitset(vowelsets, bestsplitfeat), features, relevantfeats) 38
43 if name == " main ": #defining the vowels i = Vowel("i", high = Feature.Plus, front = Feature.Plus, low = Feature.Minus, back = Feature.Minus, round = Feature.Minus, ATR = Feature.Plus) y = Vowel("y", high = Feature.Plus, front = Feature.Plus, low = Feature.Minus, back = Feature.Minus, round = Feature.Plus, ATR = Feature.Plus) e = Vowel("e", high = Feature.Minus, front = Feature.Plus, low = Feature.Minus, back = Feature.Minus, round = Feature.Minus, ATR = Feature.Plus) slash_o = Vowel('slsh-o', high = Feature.Minus, front = Feature.Plus, low = Feature.Minus, back = Feature.Minus, round = Feature.Plus, ATR = Feature.Plus) epsilon = Vowel("epsil", high = Feature.Minus, front = Feature.Plus, low = Feature.Minus, back = Feature.Minus, round = Feature.Minus, ATR = Feature.Minus) oe = Vowel("oe", high = Feature.Minus, front = Feature.Plus, low = Feature.Minus, back = Feature.Minus, round = Feature.Plus, ATR = Feature.Minus) ash = Vowel("ae", high = Feature.Minus, front = Feature.Plus, low = Feature.Plus, back = Feature.Minus, round = Feature.Minus, ATR = Feature.Plus) u = Vowel("u", high = Feature.Plus, front = Feature.Minus, low = Feature.Minus, back = Feature.Plus, round = Feature.Plus, ATR = Feature.Plus) unround_u = Vowel("unrd-u", high = Feature.Plus, front = Feature.Minus, low = Feature.Minus, back = Feature.Plus, round = Feature.Minus, ATR = Feature.Plus) o = Vowel("o", high = Feature.Minus, front = Feature.Minus, low = Feature.Minus, back = Feature.Plus, round = Feature.Plus, ATR = Feature.Plus) baby_gamma= Vowel("b-gam", high = Feature.Minus, front = Feature.Minus, low = Feature.Minus, back = Feature.Plus, round = Feature.Minus, ATR = Feature.Plus) open_o = Vowel('opn-o', high = Feature.Minus, front = Feature.Minus, low = Feature.Minus, back = Feature.Plus, round = Feature.Plus, ATR = Feature.Minus) wedge = Vowel("wedge", high = Feature.Minus, front = Feature.Minus, low = Feature.Minus, back = Feature.Plus, round = Feature.Minus, ATR = Feature.Minus) schwa = Vowel("schwa", high = Feature.Minus, front = Feature.Minus, low = Feature.Minus,back = Feature.Minus, round = Feature.Minus, ATR = Feature.Plus) lax_i = Vowel("lax-i", high = Feature.Plus, front = Feature.Plus, low = Feature.Minus, back = Feature.Minus, round = Feature.Minus, ATR = Feature.Minus) lax_u = Vowel("lax-u", high = Feature.Plus, front = Feature.Minus, low = Feature.Minus, back = Feature.Plus, round = Feature.Plus, ATR = Feature.Minus) 39
44 a = Vowel("a", high = Feature.Minus, front = Feature.Minus, low = Feature.Plus, back = Feature.Plus, round = Feature.Minus, ATR = Feature.Minus) type_a = Vowel("typ-a", high = Feature.Minus, front = Feature.Minus, low = Feature.Plus, back = Feature.Minus, round = Feature.Minus, ATR = Feature.Minus) round_a = Vowel("rnd-a", high = Feature.Minus, front = Feature.Minus, low = Feature.Plus, back = Feature.Plus, round = Feature.Plus, ATR = Feature.Minus) atr_a = Vowel("atr-a", high = Feature.Minus, front = Feature.Minus, low = Feature.Plus, back = Feature.Plus, round = Feature.Minus, ATR = Feature.Plus) barred_i = Vowel("bar-i", high =Feature.Plus, front=feature.minus, low = Feature.Minus, back = Feature.Minus, round = Feature.Minus, ATR = Feature.Plus) #define the vowel set here: vowels = [i, u, open_o, a] #this distinguishes the vowels distinctivefeats = distinguish([vowels]) #pretty print the features print " ", for f in distinctivefeats: print f, "\t", print "" for vowel in vowels: print "%(sym)6s " % {'sym': vowel.symbol.encode('utf-8')}, for f in distinctivefeats: if vowel[f] == Feature.Plus: print "+", elif vowel[f] == Feature.Minus: print "-", print "\t", print "" #print the contrastive pairs for each feature print "\ncontrasts:" for feat in distinctivefeats: print feat,":", pairs = FindContrasts(deepcopy(vowels), feat, distinctivefeats) print [(x.symbol, y.symbol) for (x,y) in pairs] 40
45 #The Best Binary Split Algorithm # Version 3:Reranking using Natural Class patterning # MA Thesis, Computational Linguistics, Brandeis University Kobey Shwayder from copy import deepcopy class Feature: '''This is the struct to define what + and - mean for a feature''' Plus = 1 Minus = 2 NotSpecified = 5 def FeatureOpposite(featval): '''This method returns the opposite value of a feature''' if featval == 1: return 2 elif featval == 2: return 1 else: return featval GlobalFeats = ['high', 'low', 'front', 'back', 'round', 'ATR'] globaldefaultranks = {'low' : 0, 'high' : 1, 'back' : 2, 'front':3, 'round':4, 'ATR': 5} class Vowel: '''the vowel class''' def init (self, char, high = Feature.NotSpecified, low = Feature.NotSpecified, front = Feature.NotSpecified, back = Feature.NotSpecified, round = Feature.NotSpecified, ATR = Feature.NotSpecified, long = Feature.Minus): self._features = {} self._features['high'] = high self._features['front'] = front self._features['back'] = back self._features['low'] = low self._features['round'] = round self._features['atr'] = ATR self._features['long'] = long self.symbol = char def repr (self): '''the string representation of the Vowel''' toreturn = '<Vowel-\'%s\': ' % self.symbol.encode('utf-8') keys = self._features.keys() keys.sort() for f in keys: if(self._features[f] == Feature.Plus): toreturn += "+%s, "%f elif(self._features[f] == Feature.Minus): toreturn += "-%s, "%f else: #Feature.NotSpecified pass return toreturn[:-2] + ">" #remove last comma and space 41
46 def getitem (self, key): '''Vowel[key]''' return self._features[key] def setitem (self, key, value): '''Vowel[key] = value''' self._features[key] = value def eq (self, other): '''Vowel == other''' if not isinstance(other, self. class ): return False for feat in self._features.keys(): if self._features[feat]!= other._features[feat]: return False return True def ne (self, other): '''Vowel!= other''' return not self. eq (other) def rankfeatures(tiedfeatures, ranks = globaldefaultranks): '''Given the tied features and a ranking, split the tie''' ranked = [(ranks[f],f) for f in tiedfeatures] ranked.sort() return [p[1] for p in ranked] def FindBestSingle(matrix, ranking, relevantfeats): '''Finds the best feature to split by breaking ties with rankfeatures''' splits = {} for f in matrix: if not splits.has_key(matrix[f]): splits[matrix[f]] = [] splits[matrix[f]].append(f) for k in splits: if len(splits[k]) > 1: splits[k] = rankfeatures(splits[k], ranks = ranking) splits = [(k, splits[k]) for k in splits] splits.sort() return splits[0][1][0] 42
47 def findbestsplit(sets, ranking, features, relevantfeats): '''Finds the best split by scoring each feature and passing the scores to FindBestSingle''' if len(sets) == 1: return FindBestSingle(sets[0], ranking, relevantfeats) else: scores = {} for feature in features: scorelist = [] for set in sets: if set.has_key(feature): scorelist.append(set[feature]) else: scorelist = [] break if scorelist!= []: scores[feature] = scorelist for key in scores: scores[key] = sum(scores[key])/len(scores[key]) return FindBestSingle(scores, ranking, relevantfeats) def splitset(vowelsets, feature): '''splits a vowelset on the feature by dividing each subset into two subsets the [+feature] and the [-feature] subset''' newvowelset = [] for vowels in vowelsets: if isinstance(vowels, list): plus = [] minus = [] for vowel in vowels: if vowel[feature] == Feature.Plus: plus.append(vowel) if vowel[feature] == Feature.Minus: minus.append(vowel) for sign in [plus, minus]: if len(sign) > 1: newvowelset.append(sign) return newvowelset 43
48 def distinguish(vowelsets, ranks = globaldefaultranks, features = GlobalFeats, relevantfeats = []): '''the main method for distinguishing a vowel set''' results = [] for vowels in vowelsets: if isinstance(vowels, list): print [v.symbol for v in vowels] matrix = {} for f in features: matrix[f] = [v[f] for v in vowels] for f in matrix: matrix[f] = \ max(float(matrix[f].count(feature.plus))/\ len(matrix[f]), float(matrix[f].count(feature.minus))/\ len(matrix[f])) results.append((vowels, matrix)) if results == []: return relevantfeats bestsplitfeat = findbestsplit([pairs[1] for pairs in results], ranks, features, relevantfeats) features.remove(bestsplitfeat) relevantfeats.append(bestsplitfeat) print "best split:", bestsplitfeat, "\n###################\n" return distinguish(splitset(vowelsets, bestsplitfeat), ranks = ranks, features = features, relevantfeats = relevantfeats) def FindContrasts(vowels, feat, featset, trace = 0): '''Given the vowels, a featureset, and a feature, finds all the pairs of vowels that are contrastive for that feature''' pairs = [] visited = [] for vowel in [vow for vow in vowels if vow not in visited]: partner = deepcopy(vowel) partner[feat] = FeatureOpposite(vowel[feat]) for v in [vow for vow in vowels if vow not in visited]: match = reduce(lambda x,y: x and y, map(lambda feat: v[feat] == partner[feat], featset)) if trace > 0: print vowel.symbol, "?=", v.symbol, match if match: pairs.append((vowel, v)) visited.append(vowel) visited.append(v) return pairs 44
49 def FindShared(Class): '''Finds the common feature in that divides the class''' allshared = [] for group in Class: shared = {} for feat in GlobalFeats: if reduce(lambda x,y: x and y, map(lambda p: p[feat] == group[0][feat], group)): shared[feat] = group[0][feat] allshared.append(shared) assert len(allshared) == 2 feats = [] for feat in allshared[0]: if allshared[1].has_key(feat) and\ allshared[1][feat] == FeatureOpposite(allshared[0][feat]): feats.append(feat) return feats def findfuzzy(candidates, target, immutable, fuzziness): '''finds candidate closest to target, making sure immutable feature is the same, within fuzziness limit''' candidates = [c for c in candidates if c[immutable] == target[immutable]] for candidate in candidates: fuzzyfactor = sum([1 for feat in target._features if candidate[feat]!= target[feat]]) if fuzzyfactor <= fuzziness: return candidate def FindPairs(Class, feat): '''finds the pairs from each group that contrast by feat i.e. ATR(u, lax_u)''' pairs = [] fuzziness = 0 while len(class[0]) > 1 and fuzziness < 3: #print "fuzzy = ", fuzziness for phon in Class[0]: partner = deepcopy(phon) partner[feat] = FeatureOpposite(phon[feat]) fuzzymatch = findfuzzy(class[1], partner, feat, fuzziness) if fuzzymatch: #fix the symbol on partner (which is currently == phon): partner = Class[1][Class[1].index(fuzzymatch)] pairs.append((phon, partner)) Class[1].remove(partner) Class[0].remove(phon) else: fuzziness += 1 #print feat, pairs return pairs 45
50 def FindActive(Classes): '''Finds active classes and returns a re-ranking of that language's features to reflect those classes''' pairs = {} # feature => list of tuples of phonemes ranking = globaldefaultranks bestrank = -1 for Class in Classes: feats = FindShared(Class) for feat in feats: pairs[feat] = FindPairs(deepcopy(Class), feat) ranking[feat] = bestrank bestrank = bestrank - 1 return (ranking, pairs) if name == " main ": #defining the vowels i = Vowel("i", high = Feature.Plus, front = Feature.Plus, low = Feature.Minus, back = Feature.Minus, round = Feature.Minus, ATR = Feature.Plus) y = Vowel("y", high = Feature.Plus, front = Feature.Plus, low = Feature.Minus, back = Feature.Minus, round = Feature.Plus, ATR = Feature.Plus) e = Vowel("e", high = Feature.Minus, front = Feature.Plus, low = Feature.Minus, back = Feature.Minus, round = Feature.Minus, ATR = Feature.Plus) slash_o = Vowel('slsh-o', high = Feature.Minus, front = Feature.Plus, low = Feature.Minus, back = Feature.Minus, round = Feature.Plus, ATR = Feature.Plus) epsilon = Vowel("epsil", high = Feature.Minus, front = Feature.Plus, low = Feature.Minus, back = Feature.Minus, round = Feature.Minus, ATR = Feature.Minus) oe = Vowel("oe", high = Feature.Minus, front = Feature.Plus, low = Feature.Minus, back = Feature.Minus, round = Feature.Plus, ATR = Feature.Minus) ash = Vowel("ae", high = Feature.Minus, front = Feature.Plus, low = Feature.Plus, back = Feature.Minus, round = Feature.Minus, ATR = Feature.Plus) u = Vowel("u", high = Feature.Plus, front = Feature.Minus, low = Feature.Minus, back = Feature.Plus, round = Feature.Plus, ATR = Feature.Plus) unround_u = Vowel("unrd-u", high = Feature.Plus, front = Feature.Minus, low = Feature.Minus, back = Feature.Plus, round = Feature.Minus, ATR = Feature.Plus) o = Vowel("o", high = Feature.Minus, front = Feature.Minus, low = Feature.Minus, back = Feature.Plus, round = Feature.Plus, ATR = Feature.Plus) baby_gamma= Vowel("b-gam", high = Feature.Minus, front = Feature.Minus, low = Feature.Minus, back = Feature.Plus, round = Feature.Minus, ATR = Feature.Plus) 46
51 open_o = Vowel('opn-o', high = Feature.Minus, front = Feature.Minus, low = Feature.Minus, back = Feature.Plus, round = Feature.Plus, ATR = Feature.Minus) wedge = Vowel("wedge", high = Feature.Minus, front = Feature.Minus, low = Feature.Minus, back = Feature.Plus, round = Feature.Minus, ATR = Feature.Minus) schwa = Vowel("schwa", high = Feature.Minus, front = Feature.Minus, low = Feature.Minus,back = Feature.Minus, round = Feature.Minus, ATR = Feature.Plus) lax_i = Vowel("lax-i", high = Feature.Plus, front = Feature.Plus, low = Feature.Minus, back = Feature.Minus, round = Feature.Minus, ATR = Feature.Minus) lax_u = Vowel("lax-u", high = Feature.Plus, front = Feature.Minus, low = Feature.Minus, back = Feature.Plus, round = Feature.Plus, ATR = Feature.Minus) a = Vowel("a", high = Feature.Minus, front = Feature.Minus, low = Feature.Plus, back = Feature.Plus, round = Feature.Minus, ATR = Feature.Minus) central_a = Vowel("ctr-a", high = Feature.Minus, front = Feature.Minus, low = Feature.Plus, back = Feature.Minus, round = Feature.Minus, ATR = Feature.Minus) round_a = Vowel("rnd-a", high = Feature.Minus, front = Feature.Minus, low = Feature.Plus, back = Feature.Plus, round = Feature.Plus, ATR = Feature.Minus) barred_i = Vowel("bar-i", high =Feature.Plus, front=feature.minus, low = Feature.Minus, back = Feature.Minus, round = Feature.Minus, ATR = Feature.Plus) vowels = [i, u, y, unround_u, e, o, slash_o, a] #Classes is a list of sets of vowels that pattern together # so if a~e and i~u then Classes = [[[a,u], [i,e]]] Classes = [] #Find the new ranking for this language ranking = FindActive(Classes) #returns (rankings, pairs) #pretty print the rankings featureranks = [(ranking[0][feat], feat) for feat in ranking[0]] featureranks.sort() toprint = "Rankings: " for r in featureranks: toprint += r[1] + ' > ' print toprint[:-3], '\n' 47
52 #this distinguishes the vowels distinctivefeats = distinguish([vowels], ranks = ranking[0]) #pretty print the features print " ", for f in distinctivefeats: print f, "\t", print "" for vowel in vowels: print "%(sym)6s " % {'sym': vowel.symbol.encode('utf-8')}, for f in distinctivefeats: if vowel[f] == Feature.Plus: print "+", elif vowel[f] == Feature.Minus: print "-", print "\t", print "" print "\n" #print the contrastive pairs for each feature print "Contrasts:" for feat in distinctivefeats: print feat,":", pairs = FindContrasts(deepcopy(vowels), feat, distinctivefeats) print [(x.symbol, y.symbol) for (x,y) in pairs] 48
53 Bibliography Archangeli, Diana and Douglas Pulleyblank Grounded Phonology. Cambridge, MA: MIT Press. Archangeli, Diana Aspects of Underspecification Theory. Phonology, Vol. 5, No. 2, Baltaxe, Christiane Foundations of Distinctive Feature Theory. Baltimore: University Park Press. Brakel, Arthur Phonological Markedness and Distinctive Features. Bloomington, IN: Indiana Univ. Press. Chomsky, Noam and Morris Halle The Sound Pattern of English. New York: Harper & Row. Clements, G. N Feature Economy in Sound Systems. Phonology, Vol. 20, No. 3, Crothers, John Typology and Universals of Vowel Systems. in Joseph Greenberg, ed., Universals of Human Language, vol. 2: Phonology, Stanford: Stanford Press. Dresher, B. Elan, Glyne Piggott and Keren Rice Contrast in Phonology: Overview. in C. Dyck, ed., Toronto Working Papers in Linguistics 13, 1, iii-xvii. Dresher, B. Elan. 2003a. Contrast and Asymmetries in Inventories. in Anna-Maria di Sciullo, ed., Asymmetry in Grammar, Vol. 2: Morphology, Phonology, Acquisition. Amsterdam: John Benjamins b. Determining Contrastiveness: A Missing Chapter in the History of Phonology, in Sophie Burelle and Stance Somesfalean, eds., Proceedings of the CLA 2002, c. The Contrastive Hierarchy in Phonology. in Daniel Currie Hall, ed., Toronto Working Papers in Linguistics (Special Issue on Contrast in Phonology), 20, On the Acquisition of Phonological Contrasts. in Jacqueline van Kampen and Sergio Baauw, eds., Proceedings of GALA 2003, Volume 1 (LOT Occasional Series 3), Utrecht: LOT. 49
54 . In Press. The Contrastive Hierarchy in Phonology. Cambridge: Cambridge University Press. Hayes, Bruce Introductory Phonology. Oxford: Wiley-Blackwell. Jakobson, Roman and Morris Halle Fundamentals of Language. The Hague: Mouten & Co. Kenstowicz, Michael Phonology in Generative Grammar. Oxford: Blackwell. Liljencrants, Johan and Bjorn Lindblom Numerical Simulation of Vowel Quality Systems: The Role of Perceptual Contrast. Language, Vol. 48, No. 4 (Dec. 1972), Mielke, Jeff The Emergence of Distinctive Features. New York: Oxford Press. Nevins, Andrew. In Press. Locality in Vowel Harmony. Linguistic Inquiry Monographs #55. Cambridge, MA: MIT Press. Parker, Steve Central vs. Back Vowels. Working Papers of the Summer Institute of Linguistics, University of North Dakota, Session Vol. 44. < Singh, Sadanand Distinctive Features: Theory and Validation. Baltimore: University Park Press. Trubetzkoy, N.S Principles of Phonology. Christiane Baltaxe, trans. Berkeley: University of California Press. 50
The following themes form the major topics of this chapter: The terms and concepts related to trees (Section 5.2).
CHAPTER 5 The Tree Data Model There are many situations in which information has a hierarchical or nested structure like that found in family trees or organization charts. The abstraction that models hierarchical
Analysis of Algorithms I: Binary Search Trees
Analysis of Algorithms I: Binary Search Trees Xi Chen Columbia University Hash table: A data structure that maintains a subset of keys from a universe set U = {0, 1,..., p 1} and supports all three dictionary
Distributed Computing over Communication Networks: Maximal Independent Set
Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.
Diagonalization. Ahto Buldas. Lecture 3 of Complexity Theory October 8, 2009. Slides based on S.Aurora, B.Barak. Complexity Theory: A Modern Approach.
Diagonalization Slides based on S.Aurora, B.Barak. Complexity Theory: A Modern Approach. Ahto Buldas [email protected] Background One basic goal in complexity theory is to separate interesting complexity
CSC 180 H1F Algorithm Runtime Analysis Lecture Notes Fall 2015
1 Introduction These notes introduce basic runtime analysis of algorithms. We would like to be able to tell if a given algorithm is time-efficient, and to be able to compare different algorithms. 2 Linear
So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
Linear Codes. Chapter 3. 3.1 Basics
Chapter 3 Linear Codes In order to define codes that we can encode and decode efficiently, we add more structure to the codespace. We shall be mainly interested in linear codes. A linear code of length
Environmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
The Trip Scheduling Problem
The Trip Scheduling Problem Claudia Archetti Department of Quantitative Methods, University of Brescia Contrada Santa Chiara 50, 25122 Brescia, Italy Martin Savelsbergh School of Industrial and Systems
CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team
CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team Lecture Summary In this lecture, we learned about the ADT Priority Queue. A
6.042/18.062J Mathematics for Computer Science. Expected Value I
6.42/8.62J Mathematics for Computer Science Srini Devadas and Eric Lehman May 3, 25 Lecture otes Expected Value I The expectation or expected value of a random variable is a single number that tells you
Loop Invariants and Binary Search
Loop Invariants and Binary Search Chapter 4.3.3 and 9.3.1-1 - Outline Ø Iterative Algorithms, Assertions and Proofs of Correctness Ø Binary Search: A Case Study - 2 - Outline Ø Iterative Algorithms, Assertions
Data Analysis 1. SET08104 Database Systems. Copyright @ Napier University
Data Analysis 1 SET08104 Database Systems Copyright @ Napier University Entity Relationship Modelling Overview Database Analysis Life Cycle Components of an Entity Relationship Diagram What is a relationship?
FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
Vector storage and access; algorithms in GIS. This is lecture 6
Vector storage and access; algorithms in GIS This is lecture 6 Vector data storage and access Vectors are built from points, line and areas. (x,y) Surface: (x,y,z) Vector data access Access to vector
(IALC, Chapters 8 and 9) Introduction to Turing s life, Turing machines, universal machines, unsolvable problems.
3130CIT: Theory of Computation Turing machines and undecidability (IALC, Chapters 8 and 9) Introduction to Turing s life, Turing machines, universal machines, unsolvable problems. An undecidable problem
Module 2. Software Life Cycle Model. Version 2 CSE IIT, Kharagpur
Module 2 Software Life Cycle Model Lesson 4 Prototyping and Spiral Life Cycle Models Specific Instructional Objectives At the end of this lesson the student will be able to: Explain what a prototype is.
User Stories Applied
User Stories Applied for Agile Software Development Mike Cohn Boston San Francisco New York Toronto Montreal London Munich Paris Madrid Capetown Sydney Tokyo Singapore Mexico City Chapter 2 Writing Stories
Author's Name: Stuart Davis Article Contract Number: 17106A/0180 Article Serial Number: 09-005 Article Title: Loanwords, Phonological Treatment of
1 Author's Name: Stuart Davis Article Contract Number: 17106A/0180 Article Serial Number: 09-005 Article Title: Loanwords, Phonological Treatment of Loanwords, Phonological Treatment of The term loanword
B-bleaching: Agile Overtraining Avoidance in the WiSARD Weightless Neural Classifier
B-bleaching: Agile Overtraining Avoidance in the WiSARD Weightless Neural Classifier Danilo S. Carvalho 1,HugoC.C.Carneiro 1,FelipeM.G.França 1, Priscila M. V. Lima 2 1- Universidade Federal do Rio de
The sound patterns of language
The sound patterns of language Phonology Chapter 5 Alaa Mohammadi- Fall 2009 1 This lecture There are systematic differences between: What speakers memorize about the sounds of words. The speech sounds
How To Choose the Right Vendor Information you need to select the IT Security Testing vendor that is right for you.
Information you need to select the IT Security Testing vendor that is right for you. Netragard, Inc Main: 617-934- 0269 Email: [email protected] Website: http://www.netragard.com Blog: http://pentest.netragard.com
A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:
Binary Search Trees 1 The general binary tree shown in the previous chapter is not terribly useful in practice. The chief use of binary trees is for providing rapid access to data (indexing, if you will)
Output: 12 18 30 72 90 87. struct treenode{ int data; struct treenode *left, *right; } struct treenode *tree_ptr;
50 20 70 10 30 69 90 14 35 68 85 98 16 22 60 34 (c) Execute the algorithm shown below using the tree shown above. Show the exact output produced by the algorithm. Assume that the initial call is: prob3(root)
Solutions to In-Class Problems Week 4, Mon.
Massachusetts Institute of Technology 6.042J/18.062J, Fall 05: Mathematics for Computer Science September 26 Prof. Albert R. Meyer and Prof. Ronitt Rubinfeld revised September 26, 2005, 1050 minutes Solutions
About the NeuroFuzzy Module of the FuzzyTECH5.5 Software
About the NeuroFuzzy Module of the FuzzyTECH5.5 Software Ágnes B. Simon, Dániel Biró College of Nyíregyháza, Sóstói út 31, [email protected], [email protected] Abstract: Our online edition of the software
Novel Data Extraction Language for Structured Log Analysis
Novel Data Extraction Language for Structured Log Analysis P.W.D.C. Jayathilake 99X Technology, Sri Lanka. ABSTRACT This paper presents the implementation of a new log data extraction language. Theoretical
Regular Expressions and Automata using Haskell
Regular Expressions and Automata using Haskell Simon Thompson Computing Laboratory University of Kent at Canterbury January 2000 Contents 1 Introduction 2 2 Regular Expressions 2 3 Matching regular expressions
CALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
Factoring & Primality
Factoring & Primality Lecturer: Dimitris Papadopoulos In this lecture we will discuss the problem of integer factorization and primality testing, two problems that have been the focus of a great amount
Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R
Binary Search Trees A Generic Tree Nodes in a binary search tree ( B-S-T) are of the form P parent Key A Satellite data L R B C D E F G H I J The B-S-T has a root node which is the only node whose parent
Normality Testing in Excel
Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. [email protected]
Graph Security Testing
JOURNAL OF APPLIED COMPUTER SCIENCE Vol. 23 No. 1 (2015), pp. 29-45 Graph Security Testing Tomasz Gieniusz 1, Robert Lewoń 1, Michał Małafiejski 1 1 Gdańsk University of Technology, Poland Department of
Data Modeling Basics
Information Technology Standard Commonwealth of Pennsylvania Governor's Office of Administration/Office for Information Technology STD Number: STD-INF003B STD Title: Data Modeling Basics Issued by: Deputy
Computer Algorithms. NP-Complete Problems. CISC 4080 Yanjun Li
Computer Algorithms NP-Complete Problems NP-completeness The quest for efficient algorithms is about finding clever ways to bypass the process of exhaustive search, using clues from the input in order
Functional Decomposition Top-Down Development
Functional Decomposition Top-Down Development The top-down approach builds a system by stepwise refinement, starting with a definition of its abstract function. You start the process by expressing a topmost
Measuring and Monitoring the Quality of Master Data By Thomas Ravn and Martin Høedholt, November 2008
Measuring and Monitoring the Quality of Master Data By Thomas Ravn and Martin Høedholt, November 2008 Introduction We ve all heard about the importance of data quality in our IT-systems and how the data
MATTHEW K. GORDON (University of California, Los Angeles) THE NEUTRAL VOWELS OF FINNISH: HOW NEUTRAL ARE THEY? *
Linguistica Uralica 35:1 (1999), pp. 17-21 MATTHEW K. GORDON (University of California, Los Angeles) THE NEUTRAL VOWELS OF FINNISH: HOW NEUTRAL ARE THEY? * Finnish is well known for possessing a front-back
PostgreSQL Concurrency Issues
PostgreSQL Concurrency Issues 1 PostgreSQL Concurrency Issues Tom Lane Red Hat Database Group Red Hat, Inc. PostgreSQL Concurrency Issues 2 Introduction What I want to tell you about today: How PostgreSQL
ACKNOWLEDGEMENTS. At the completion of this study there are many people that I need to thank. Foremost of
ACKNOWLEDGEMENTS At the completion of this study there are many people that I need to thank. Foremost of these are John McCarthy. He has been a wonderful mentor and advisor. I also owe much to the other
Binary Trees and Huffman Encoding Binary Search Trees
Binary Trees and Huffman Encoding Binary Search Trees Computer Science E119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Motivation: Maintaining a Sorted Collection of Data A data dictionary
Introduction to Algorithms March 10, 2004 Massachusetts Institute of Technology Professors Erik Demaine and Shafi Goldwasser Quiz 1.
Introduction to Algorithms March 10, 2004 Massachusetts Institute of Technology 6.046J/18.410J Professors Erik Demaine and Shafi Goldwasser Quiz 1 Quiz 1 Do not open this quiz booklet until you are directed
Mathematical Induction. Lecture 10-11
Mathematical Induction Lecture 10-11 Menu Mathematical Induction Strong Induction Recursive Definitions Structural Induction Climbing an Infinite Ladder Suppose we have an infinite ladder: 1. We can reach
Arithmetic Coding: Introduction
Data Compression Arithmetic coding Arithmetic Coding: Introduction Allows using fractional parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip More time costly than Huffman, but integer implementation
Scheduling Algorithm with Optimization of Employee Satisfaction
Washington University in St. Louis Scheduling Algorithm with Optimization of Employee Satisfaction by Philip I. Thomas Senior Design Project http : //students.cec.wustl.edu/ pit1/ Advised By Associate
Finite Automata. Reading: Chapter 2
Finite Automata Reading: Chapter 2 1 Finite Automaton (FA) Informally, a state diagram that comprehensively captures all possible states and transitions that a machine can take while responding to a stream
Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #10 Symmetric Key Ciphers (Refer
Unit 2.1. Data Analysis 1 - V2.0 1. Data Analysis 1. Dr Gordon Russell, Copyright @ Napier University
Data Analysis 1 Unit 2.1 Data Analysis 1 - V2.0 1 Entity Relationship Modelling Overview Database Analysis Life Cycle Components of an Entity Relationship Diagram What is a relationship? Entities, attributes,
CS 2112 Spring 2014. 0 Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions
CS 2112 Spring 2014 Assignment 3 Data Structures and Web Filtering Due: March 4, 2014 11:59 PM Implementing spam blacklists and web filters requires matching candidate domain names and URLs very rapidly
Software Testing. Definition: Testing is a process of executing a program with data, with the sole intention of finding errors in the program.
Software Testing Definition: Testing is a process of executing a program with data, with the sole intention of finding errors in the program. Testing can only reveal the presence of errors and not the
Glossary of Object Oriented Terms
Appendix E Glossary of Object Oriented Terms abstract class: A class primarily intended to define an instance, but can not be instantiated without additional methods. abstract data type: An abstraction
Prepare your result file for input into SPSS
Prepare your result file for input into SPSS Isabelle Darcy When you use DMDX for your experiment, you get an.azk file, which is a simple text file that collects all the reaction times and accuracy of
Laboratory work in AI: First steps in Poker Playing Agents and Opponent Modeling
Laboratory work in AI: First steps in Poker Playing Agents and Opponent Modeling Avram Golbert 01574669 [email protected] Abstract: While Artificial Intelligence research has shown great success in deterministic
Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.
Broadband Networks Prof. Dr. Abhay Karandikar Electrical Engineering Department Indian Institute of Technology, Bombay Lecture - 29 Voice over IP So, today we will discuss about voice over IP and internet
recursion, O(n), linked lists 6/14
recursion, O(n), linked lists 6/14 recursion reducing the amount of data to process and processing a smaller amount of data example: process one item in a list, recursively process the rest of the list
Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration
Chapter 6: The Information Function 129 CHAPTER 7 Test Calibration 130 Chapter 7: Test Calibration CHAPTER 7 Test Calibration For didactic purposes, all of the preceding chapters have assumed that the
Articulatory Phonetics. and the International Phonetic Alphabet. Readings and Other Materials. Review. IPA: The Vowels. Practice
Supplementary Readings Supplementary Readings Handouts Online Tutorials The following readings have been posted to the Moodle course site: Contemporary Linguistics: Chapter 2 (pp. 34-40) Handouts for This
Lecture 2: Regular Languages [Fa 14]
Caveat lector: This is the first edition of this lecture note. Please send bug reports and suggestions to [email protected]. But the Lord came down to see the city and the tower the people were building.
Section IV.1: Recursive Algorithms and Recursion Trees
Section IV.1: Recursive Algorithms and Recursion Trees Definition IV.1.1: A recursive algorithm is an algorithm that solves a problem by (1) reducing it to an instance of the same problem with smaller
Symbol Tables. Introduction
Symbol Tables Introduction A compiler needs to collect and use information about the names appearing in the source program. This information is entered into a data structure called a symbol table. The
1-04-10 Configuration Management: An Object-Based Method Barbara Dumas
1-04-10 Configuration Management: An Object-Based Method Barbara Dumas Payoff Configuration management (CM) helps an organization maintain an inventory of its software assets. In traditional CM systems,
Lecture 1: OT An Introduction
Lecture 1: OT An Introduction 1 Generative Linguistics and OT Starting point: Generative Linguistics Sources: Archangeli 1997; Kager 1999, Section 1; Prince & Smolensky 1993; Barbosa et al. 1998, intro.
A framework for creating custom rules for static analysis tools
A framework for creating custom rules for static analysis tools Eric Dalci John Steven Cigital Inc. 21351 Ridgetop Circle, Suite 400 Dulles VA 20166 (703) 404-9293 edalci,[email protected] Abstract Code
Instructional Design Framework CSE: Unit 1 Lesson 1
Instructional Design Framework Stage 1 Stage 2 Stage 3 If the desired end result is for learners to then you need evidence of the learners ability to then the learning events need to. Stage 1 Desired Results
Positional Numbering System
APPENDIX B Positional Numbering System A positional numbering system uses a set of symbols. The value that each symbol represents, however, depends on its face value and its place value, the value associated
An Empirical Study of Two MIS Algorithms
An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. [email protected],
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation
CS408 Animation Software Design Sample Exam Questions (Second Half)
CS408 Animation Software Design Sample Exam Questions (Second Half) 42. NEW QUESTION Consider the starting position of the following objects, and their desired end position and orientation as part of the
Introduction to Learning & Decision Trees
Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing
A New Interpretation of Information Rate
A New Interpretation of Information Rate reproduced with permission of AT&T By J. L. Kelly, jr. (Manuscript received March 2, 956) If the input symbols to a communication channel represent the outcomes
Outline. 1 Denitions. 2 Principles. 4 Implementation and Evaluation. 5 Debugging. 6 References
Outline Computer Science 331 Introduction to Testing of Programs Mike Jacobson Department of Computer Science University of Calgary Lecture #3-4 1 Denitions 2 3 4 Implementation and Evaluation 5 Debugging
Cryptography and Network Security Department of Computer Science and Engineering Indian Institute of Technology Kharagpur
Cryptography and Network Security Department of Computer Science and Engineering Indian Institute of Technology Kharagpur Module No. # 01 Lecture No. # 05 Classic Cryptosystems (Refer Slide Time: 00:42)
Exercise 1: Python Language Basics
Exercise 1: Python Language Basics In this exercise we will cover the basic principles of the Python language. All languages have a standard set of functionality including the ability to comment code,
This asserts two sets are equal iff they have the same elements, that is, a set is determined by its elements.
3. Axioms of Set theory Before presenting the axioms of set theory, we first make a few basic comments about the relevant first order logic. We will give a somewhat more detailed discussion later, but
RN-Codings: New Insights and Some Applications
RN-Codings: New Insights and Some Applications Abstract During any composite computation there is a constant need for rounding intermediate results before they can participate in further processing. Recently
Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *
Binary Heaps A binary heap is another data structure. It implements a priority queue. Priority Queue has the following operations: isempty add (with priority) remove (highest priority) peek (at highest
The Role of Gestalt in Language Processing
378 Abstracts HILKE ELSEN The Role of Gestalt in Language Processing This article discusses gestalt phenomena in language processing, compiling data from three empirical studies that analyze language acquisition,
Lecture 1. Basic Concepts of Set Theory, Functions and Relations
September 7, 2005 p. 1 Lecture 1. Basic Concepts of Set Theory, Functions and Relations 0. Preliminaries...1 1. Basic Concepts of Set Theory...1 1.1. Sets and elements...1 1.2. Specification of sets...2
Artificial Intelligence
Artificial Intelligence ICS461 Fall 2010 1 Lecture #12B More Representations Outline Logics Rules Frames Nancy E. Reed [email protected] 2 Representation Agents deal with knowledge (data) Facts (believe
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
Improving Knowledge-Based System Performance by Reordering Rule Sequences
Improving Knowledge-Based System Performance by Reordering Rule Sequences Neli P. Zlatareva Department of Computer Science Central Connecticut State University 1615 Stanley Street New Britain, CT 06050
6.852: Distributed Algorithms Fall, 2009. Class 2
.8: Distributed Algorithms Fall, 009 Class Today s plan Leader election in a synchronous ring: Lower bound for comparison-based algorithms. Basic computation in general synchronous networks: Leader election
Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.
Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative
Eliminate Memory Errors and Improve Program Stability
Eliminate Memory Errors and Improve Program Stability with Intel Parallel Studio XE Can running one simple tool make a difference? Yes, in many cases. You can find errors that cause complex, intermittent
Making Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University [email protected] [email protected] I. Introduction III. Model The goal of our research
Encoding script-specific writing rules based on the Unicode character set
Encoding script-specific writing rules based on the Unicode character set Malek Boualem, Mark Leisher, Bill Ogden Computing Research Laboratory (CRL), New Mexico State University, Box 30001, Dept 3CRL,
New Generation of Software Development
New Generation of Software Development Terry Hon University of British Columbia 201-2366 Main Mall Vancouver B.C. V6T 1Z4 [email protected] ABSTRACT In this paper, I present a picture of what software development
Load Balancing. Load Balancing 1 / 24
Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait
Compass Interdisciplinary Virtual Conference 19-30 Oct 2009
Compass Interdisciplinary Virtual Conference 19-30 Oct 2009 10 Things New Scholars should do to get published Duane Wegener Professor of Social Psychology, Purdue University Hello, I hope you re having
The Fuzzy Feeling SAS Provides. Electronic Matching of Records. without Common Keys
The Fuzzy Feeling SS Provides Electronic Matching of Records without Common Keys by Charles Patridge ITT Hartford Insurance Corporate ctuarial Hartford Plaza Hartford, CT 06115 860-547-6644 SUGI 22 March
Software Engineering Techniques
Software Engineering Techniques Low level design issues for programming-in-the-large. Software Quality Design by contract Pre- and post conditions Class invariants Ten do Ten do nots Another type of summary
Clustering & Visualization
Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.
