Easy-First, Chinese, POS Tagging and Dependency Parsing 1
|
|
|
- Aubrey Ward
- 10 years ago
- Views:
Transcription
1 Easy-First, Chinese, POS Tagging and Dependency Parsing 1 ABSTRACT Ji Ma, Tong Xiao, Jing Bo Zhu, Fei Liang Ren Natural Language Processing Laboratory, Northeastern University, Shenyang, China [email protected], [email protected], [email protected], [email protected] The easy-first non-directional dependency parser has demonstrated its advantage over transition based dependency parser which parses a sentence from left to right. This work investigates easy-first method on Chinese POS tagging, dependency parsing and joint tagging and dependency parsing. In particular, we generalize the easy-first dependency parsing algorithm to a general framework and apply this framework to Chinese POS tagging and dependency parsing. We then propose the first joint tagging and dependency parsing algorithm under the easy-first framework. We train the joint model with both supervised objective and additional loss which only relates to one of the individual tasks (either tagging or parsing). In this way, we can bias the joint model towards the preferred task. Experimental results show that both the tagger and the parser achieve state-of-the-art accuracy and runs fast. And our joint model achieves tagging accuracy of which is the best result reported so far. KEYWORDS: dependency parsing, POS tagging, perceptron, easy-first 1 This work was supported in part by the National Science Foundation of China ( , , ). Specialized Research Fund for the Doctoral Program of Higher Education ( ) and the Fundamental Research Funds for the Central Universities.
2 1 Introduction To sequential labelling problems, such as POS tagging or incremental parsing, traditional approaches 1) decompose the input sequence into several individual items each of which corresponds to a token of the input sequence; 2) predict these items in a fixed left-to-right (or right-to-left) order (Collins 2002; Ratnaparkhi 1996). The drawback of such fixed order approach is that when predicting one item, only the labels on the left side can be used while the labels on the right side is still unavailable. (Goldberg and Elhadad, 2010) proposed the easy-first dependency parsing algorithm to relax the fixed left-to-right order and to incorporate more structural features from both sides of the attachment point. Comparing with a deterministic transition based parser which parses the sentence left-to-right, their deterministic easy-first parser achieves significant better accuracy. The key idea behind their parsing algorithm is to always process the easy attachments in the early stages. The hard ones are delayed to late stages until more structural features on both sides of the attachment point are accessible so as to make more informed decision. In this work, we further generalize the algorithm of (Goldberg and Elhadad, 2010) to a general sequential labelling framework. We apply this framework to Chinese POS tagging and dependency parsing which has never been studied under the easy-first framework before. One characteristic of Chinese dependency parsing is that parsing performance can be dramatically affected by the quality of POS tags of the input sentence (Li et al., 2010). Recent work (Li et al., 2010; Hatori et al., 2011; Bohnet and Nivre, 2012) empirically verified that solving POS tagging and dependency parsing jointly can boost the performance of both the two tasks. To further improve tagging and parsing accuracy, we also solve the two tasks jointly. While previous joint methods are either graph-based or transition-based algorithm, in this work we propose the first joint tagging and dependency parsing algorithm under the easy-first framework. In addition, we also adopt a different training strategy to learn the model parameters. Previous approaches all train their joint model with the objective of minimizing the loss between the reference and the predicted output. Those methods make no distinction between the loss caused by POS tagging and the loss caused by dependency parsing. Though such objective is proper for minimizing the total loss, it may not be optimal in terms of pursuing the best result of a certain individual task (neither tagging nor dependency parsing). To this end, our training method also incorporates additional loss which relates to only one of the individual tasks. And the losses are iteratively optimized on the training set. In this way we can bias the joint model towards the preferred task. Similar techniques have been used in parser domain adaptation (Hall et al., 2011). However, to our knowledge, no one has done this in joint tagging and dependency parsing before. Experimental results show that under the easy-first framework, even deterministic tagger and parser achieve quite promising performance. For Chinese POS tagging, the tagger achieves an accuracy of which is among the top Chinese taggers reported so far. Moreover, the tagging speed is about 2000 sentences per-second, much faster than the state-of-the-art taggers (Li et al., 2010; Hatori et al., 2011). For Chinese dependency parsing, when the input sentence is equipped with automatically assigned POS tags, the parser achieves an unlabelled score of and runs at the speed of more 300 sentences per second. Such accuracy is among the state-of-the-art transition-based dependency parser even those parsers are enhanced with beam 2 On the same data set, the best tagger so far reported achieves an accuracy of Joint methods yield higher accuracy, but are not directly comparable.
3 search (section 4). For joint tagging and dependency parsing, we achieve significant improvement over both the two sub-tasks. In particular, we achieve the tagging accuracy of which is the highest score reported on the same data set so far. 2 Easy-First POS tagging, dependency parsing and joint tagging and parsing In this section, we first describe a generalized easy-first sequential labelling framework. Then we show how to apply this framework to the task of POS tagging and dependency parsing. Finally, we propose the joint POS tagging and dependency parsing algorithm under this framework. 2.1 Generalized easy-first sequential labelling algorithm For solving sequential labelling problems, the first step is to decompose the input sequence into a list of small items t 1, t 2,,t n. The items are then labelled separately. This list of items is the input of the generalized algorithm. In addition, the algorithm also requires a set of task specific labels {l 1, l 2,, l m } and a labelling function. Take POS tagging for example, t i corresponds to the i-th word of the input sentence and the label set corresponds to the set of POS tags. The labelling function associates a word with a certain POS tag. The pseudo-code of the algorithm is shown in Algorithm 1. Initially, all the items are marked as unprocessed (line 2). Then, the algorithm solves each of the unprocessed item t by labelling t with a suitable label l. After that, t is marked as processed. This process repeats until no items left unprocessed (line 3 to line 8). One thing to note is that the small items are not processed in a t 1, t 2 to t n order. Instead, at each step the algorithm automatically chooses an item-label pair according to a scoring function (line 4). This function is not only responsible for selecting a correct label for a certain item but also responsible for determining the order in which each of the items are processed. Ideally, the scoring function prefers to process easy items in the early stages and delay the hard ones to late stages. When dealing with the hard ones, the label information built on both sides of the current item become accessible. In this way, more informed prediction can be made and the extent of error propagation can be limited (Goldberg and Elhadad, 2010). Algorithm 1 is a generalization of the easy-first parsing algorithm of (Goldberg and Elhadad, 2010). This generalized version can be naturally instantiated to a wide verity of applications. 2.2 Easy-first POS tagging In this section, we show how to instantiate algorithm 1 into a POS tagger. The problem of POS tagging is to find a sequence of POS tags for the input sentence. For this problem, t i corresponds to i-th word, w i, of the input sentence. L corresponds to the POS tag set, POS. We use x to denote a certain tag in POS. The labelling function, labelling pos (w i, ), set x as the POS tag of w i A concrete example of tagging the sentence 中 国 /China 对 /to 外 /outside world 开 放 /open 稳 步 /steadily 前 行 /advance ( China s opening up steadily advances ) is shown in figure 1. The challenging part of this sentence is w 4, 开 放 /opening up, which can be either a VV (verb) or a NN (noun). If we process this sentence in a left-to-right order, the word 开 放 would be quite difficult. In the easy-first framework, this can be easily handled with the following steps.
4 Algorithm 1: Generalized Easy-First Sequential Labelling Input T= t 1, t 2, t n : a sequence of items to be labelled L={l 1, l 2,, l m }: a set of task specific labels labelling: task specific labelling function 1 2 set each as unprocessed 3 repeat 4 ( 5 labelling ( ) 6 set as processed 7 8 until nprocessed = T Initially (step 0), all the six words w 1,,w 6 are marked as unprocessed. Each step, the tagger enumerates all possible (w, x) pairs and chooses the most confident one according to the scoring function. Since the word 中 国 /China always occurs as a NR (proper noun) in the training set, thus at the step 1, ( 中 国, NR) is selected 3 and the tagger tags 中 国 with NR. After this step, 中 国 is marked as processed. At step 2, for those unprocessed words, the tagger re-computes the scores of all (w, x) pairs based on the local context, surrounding words and tags within a window, and selects the one with the highest score to deal with. This time, ( 外, NN) is selected and the tagger tags 外 as a NN. Similarly, at step 3, the tagger assigns tag P (preposition) to word 对 /to. Step 4 and so on. At the last step (step 6), the only unprocessed word is w 4, 开 放. Since for Chinese, an adverb always precede the verb which it modifies and succeeds the noun which is the subject. Therefore, based on the following tags o 5 :AD and o 6 :VV, the tagger can easily infer that the correct tag for 开 放 is NN. After 开 放 is tagged, all words are processed and the tagging procedure stops. We see algorithm 1 can be easily instantiated to POS tagging, a relatively simple task. In the next sections, we show how to apply algorithm 1 to more complicated tasks. 2.3 Easy-first dependency parsing The easy-first dependency parsing algorithm was originally proposed by (Goldberg and Elhadad, 2010), we re-describe it here for completeness and also to illustrate how algorithm 1 can be instantiated to dependency parsing. Given an input sentence of n words, the task of dependency parsing is: for each word, find its lexical head which it modifies. One exception is the head word of the sentence which does not modify any other word. All the others each modify exactly one word and no one modifies itself. For dependency parsing: t i corresponds to i-th word, w i, of the input sentence. L corresponds to an action set ACT which contains three actions {attach_left, attach_right, set_root}. Note for dependency parsing, each label corresponds to an action which is designed to manipulate a list of partial analysis p 1,, p n, called pending (Goldberg and Elhadad, 2010). p i re- 3 At each step, the selected item is boldfaced. The underlined items are those marked as processed.
5 (Step 0) w1: 中 国 w2: 对 w3: 外 w4: 开 放 w5: 稳 步 w6: 前 行 (Step 1) w1: 中 国 w2: 对 w3: 外 w4: 开 放 w5: 稳 步 w6: 前 行 o1:nr (Step 3, 4 ) (Step 2) w1: 中 国 w2: 对 w3: 外 w4: 开 放 w5: 稳 步 w6: 前 行 o1:nr o3:nn (Step 6) w1: 中 国 w2: 对 w3: 外 w4: 开 放 w5: 稳 步 w6: 前 行 o1:nr o2:p o3:nn o4:nn o5:ad o6:vv FIGURE 1 A trace of easy-first tagger for sentence 中 国 对 外 开 放 稳 步 前 行 ( China s opening up steadily advances ). o i denotes the POS tag of w i -cords the dependency tree rooted at w i and p i is initialized with w i. The main loop of easy-first dependency parsing is shown in figure 2. Each step, the parser computes the scores of all possible (p, a) pairs based on local context, surrounding partial analysis within a window, and then selects the best pair to feed to the labelling function, labelling dep. Then labelling dep performs the given action. In particular, labelling dep (p i, attach_left) set p i as the child of its left neighbour in the pending list. labelling dep (p i, attach_right) set p i as the child of its right neighbour in the pending list. After that, the selected partial analysis p is marked as processed and removed from the pending list (line 7 to line 8). Since at each step, one partial analysis is removed from the pending list, after steps, the pending list only contains one item, say p h, which is the dependency tree of the whole sentence. In such satiation, (p h, set_root) is enforced to be the only choice and labelling dep (p h, set_root) sets the root of p h which is w h as the head of the sentence. An example of parsing the sentence 中 国 对 外 开 放 稳 步 前 行 is shown in figure 3. At the first step, (p 3, attach_left) is selected and fed to the labelling dep function. The labelling function set p 3 as the child of its left neighbour p 2 as shown in figure 2 (1). p 3 is then removed from the pending list. Note that, after p 3 is removed, p 2 and p 4 become neighbours. The following steps are executed in a similar way except for step 6 where only p 6 is left in the pending list. Thus, at step 6, (p 6, set_root) is the only choice. The labelling dep function sets w 6, 前 行 as the head of the sentence. As all partial analysis are marked as processed, the parsing process stops. Head-modifier relationships can be directly read-off the dependency tree. That is, parent is the head of its children. 1 Set 2 3 Set each as unprocessed 4 repeat 5 ( until nprocessed = T FIGURE 2 Main loop of easy-first parsing algorithm
6 (0) p 1 : 中 国 p 2 : 对 p 3 : 外 p 4 : 开 放 p 5 : 稳 步 p 6 : 前 行 (1) p 1 : 中 国 p 2 : 对 p 4 : 开 放 p 5 : 稳 步 p 6 : 前 行 (2) p 1 : 中 国 p 2 : 对 p 4 : 开 放 p 6 : 前 行 p 3 : 外 p 3 : 外 p 5 : 稳 步 (3)t 1 : 中 国 t 4 : 开 放 t 6 : 前 行 (4)... (5)... (6) ROOT t 2 : 对 t 5 : 稳 步 p 6 : 前 行 t 3 : 外 p 4 : 开 放 p 5 : 稳 步 p 1 : 中 国 p 2 : 对 p 3 : 外 FIGURE 3 A trace of easy-first parser for sentence 中 国 对 外 开 放 稳 步 前 行 ( China s opening up steadily advances ) 2.4 The proposed easy-first joint POS tagging and dependency parsing method In this section, we describe the joint tagging and dependency parsing method which is under the easy-first framework. Before we go into the details about the algorithm, one thing should be mentioned. That is, for dependency parsing, POS tag serves as indispensable information (Li et al., 2010). On the one hand, parsing barely based on bi-lexical features without POS tags hurt performance dramatically. On the other hand, if we tag the whole sentence first and then do dependency parsing, then syntactic features which are demonstrated to be effective for tagging disambiguation cannot be utilized. In order to provide POS tags for syntactic disambiguation and provide syntactic features for tagging disambiguation, we add a simple constraint to control the inference order: dependency relationships can only be extracted between two tagged words. Under such constraint, we can guarantee that at least POS tags for the pair of words from which dependency relation is to be extracted are already available. Also, syntactic feature can be utilize- 1 Set 2 3 Set each as unprocessed 4 repeat 5 Initialize 6 ( 7 8 if ( ) until nprocessed = T FIGURE 4 Main part of the joint tagging and dependency parsing algorithm
7 (0) p 1 : 对 p 2 : 外 p 3 : 开 放 (1) (2) valid set:{(p 1, P), (p 1, NN), (p 2, P), (p 2, NN),(p 3, P), (p 3, NN)} p 1 : 对 _P p 2 : 外 p 3 : 开 放 valid set:{(p 2, P), (p 2, NN),(p 3, P), (p 3, NN)} p 1 : 对 _P p 2 : 外 _NN p 3 : 开 放 (3) valid set:{(p 1, attach_right), (p 2, attach_left),(p 3, P), (p 3, NN)} p 1 : 对 _P p 3 : 开 放 p 2 : 外 _NN (4)... FIGURE 5 A trace of easy-first joint tagging and parsing for 对 外 开 放 -d to process the untagged words. Under such constraint, the joint method can be constructed by a simple combination of easy-first POS tagging and easy-first dependency parsing. Similar to previous tasks, for the joint task, t i still corresponds to the i-th word of the input sentence w i. Pending list is also used to record the partial dependency tree structures. The label set of the joint task is the union of POS and ACT. The labelling function of the joint task, labelling Joint, behaves differently on the two types of labels. Particularly, labelling Joint (p i, ) calls labelling POS (w i, x) which associates w i, the root of p i, with POS tag x. labelling Joint (p i, ) calls labelling Dep (p i, a) which performs action a on p i. The main part of the joint method is shown in figure 4. Note that, to satisfy the constraint described above, at the beginning of each step, a valid set of (p, l) pairs are initialized (line 5). For the partial analysis p, if its root word is not yet tagged, only (p, x) are considered as valid where x is a certain POS tag. This enforces that partial analysis must be tagged first. For p which is already tagged, if its neighbour is also tagged, then attaches are allowed between the two neighbours. After initializing the valid set, the joint algorithm computes scores of the (p, l) pairs from the valid set. Then the one with the highest score is selected and be fed to labelling Joint. A partial analysis p is marked as processed only when p is attached to its neighbours or be set as the root of the sentence (line 8 to line 11). Figure 5 gives a toy example of the joint tagging and parsing procedure. The valid set of each step is also shown in the figure. For simplicity, we suppose that the POS tag set only contains two tags NN (noun) and P (preposition). In summary, the key idea behind easy-first method is to always choose the easy items to process. The degree of easy or hard is determined by the scoring function. In the next section, we show how to learn the scoring function from training data.
8 3 Training The scoring functions used in this work are all based on a linear model. Here, is the model s parameter or weight vector and is the feature vector extracted from ( ) pair. In this section, we first describe a general training algorithm which can be used to train models for all the tasks mentioned above and then introduce our training method which is specially designed for the joint model. 3.1 Easy-first training The generalized easy-first training algorithm is shown in algorithm 2 which is based on the structured perceptron. Starting from a zero weight vector, the training process makes several iterations through the training set. Each iteration, algorithm 2 is utilized to process the training instances. A training instance consists of a sequence of items T together with its gold reference R. Here, R takes different forms in different tasks. For example, in POS tagging, R is the gold tag sequence of the input sentence. In dependency parsing, R is the gold dependency tree of the input sentence. When ( ) is not compatible 4 with R, the training algorithm updates the current parameter by punishing the features fired in the chosen ( ) pair and rewarding the features fired in the (t, l) pair which is compatible with R. The algorithm will then move on to the next instance. Note that there might be more than one such pair that are compatible with R. For example, for figure 1 step (2), (w 2, P), (w 3, NN), (w 4, NN), (w 5, AD) and (w 6, VV) are all compatible with the gold tag sequence. Thus, at the beginning of each step, a compatible set of (t, l) pairs are initialized and Algorithm 2: Easy-First Training Input T:,,, R: gold reference of T, : task specific function L:{,, }, : feature weight vector, isallowed: task specific function Output : feature weight vector 1 2 set each as unprocessed 3 repeat 4 } 5 ( 6 if 7 ( ) 8, 9 10 else 11 12, 13 break 14 until = 15 return 4 A (t, l) pair is compatible with R means that in R, t is labelled with l.
9 the one with the highest score is chosen to boost (Goldberg and Elhadad, 2010). Function isallowed is used to check whether a candidate (t, l) pair belongs to the compatible set (line 4). Similar to the labelling function, isallowed can be easily instantiated for different tasks. For POS tagging, isallowed POS (w i, x, R) checks whether x is the gold tag of w i. For dependency parsing, isallowed Dep (p i, a, R) checks: Whether all p i s children supposed by the gold dependency tree has already attached to p i. Whether action a attaches p i to the correct partial analysis. For joint POS tagging and dependency parsing, isallowed Joint behaves differently according to the types of labels: isallowed Joint (p i,, R) returns isallowed POS (w i, x, R). isallowed Joint (p i,, R) returns isallowed Dep (p i, a, R). 3.2 Training with additional loss Algorithm 2 is a variant of the structured perceptron algorithm. The objective is to minimize the loss ), usually hamming loss (DauméIII et al., 2009), between gold reference and the predicted output. Whenever there is a positive loss, parameters are updated. For POS tagging, algorithm 2 minimizes ) where and are predicted and gold tag sequence respectively. For dependency parsing, algorithm 2 minimizes ) where and are predicted and gold dependency tree, respectively. For the joint task, algorithm 2 minimizes ) where denotes the pair of predicted tag sequence and dependency tree. Such loss has no distinction between tagging and parsing: either tagging error or parsing error will lead to non-zero loss and parameters are updated. This loss may not optimal in terms of achieving the best result of the individual tasks. Inspired by (Hall et al., 2011), we slightly modify algorithm 2 to incorporate additional loss which only relates to one of the individual tasks so as to train the joint model towards the that task. Algorithm 3 shows the pseudo-code. The basic idea is that at each iteration of the training process, one of the three losses {,, } is selected and parameters are updated acco- Algorithm 3: Training with additional loss for joint POS tagging and parsing Input T:,,, R:gold reference of T,,,, isallowed Joint, : which can be either or or. Output : feature weight vector 1 Set 2 Set each as unprocessed 3 repeat 4 Initialize 5 } 6 if ( ) 7 8 if ( ) 9 10 ( 11 if 12 (the following are exactly the same as algorithm 2)
10 -rdingly: if is selected, then both tagging and parsing error cause parameter update; if is selected, then only tagging errors can cause parameter update. This can be done by adding all valid attach actions to the compatible set regardless whether those actions are indeed compatible with the gold reference (line 6 to line 7); For, only parsing errors cause parameter update which can be achieved similar to the case of. 4 Experiments To make comparison with previous works, we use Penn Chinese Treebank 5.1 (CTB5) (Xue et al., 2005) to evaluate our method. We use the standard split of CTB5 as described in (Duan et al., 2007): section and are used as training set, section and are used as development set, section and are used as test set. Head finding rules of (Zhang and Clark 2008b) are used to convert the constituent trees into dependency trees. An Intel Core i GHz machine is used for evaluation. For POS tagging and dependency parsing, the number of training iterations are selected according to the model s performance on the development set. The model which achieves the highest score on the development set is selected to run on the test set. 3.3 POS tagging Following Zhang and Clark (2008a), in the experiments, we also use a dictionary to limit the number of candidate tags for each word. That is, for a word which occurs more than M times in the training data, the tagger only considers the POS tags co-occurred with that word in the training data. We found 6 to be the optimal value on the development set. The feature templates we used in the experiments are shown in table 1. Z&C08 denotes the features templates of (Zhang and Clark, 2008a) which include uni-gram and bi-gram as well as some character based features. In addition to Z&C08, we also incorporate bi-directional POS tag features and the resulted feature set is denoted by feature set BD. For feature set OP, we include some order preference features with the goal to signal to the tagger that some words should be delayed to late stages until more surrounding tags are available and more informed prediction can be made. Table 2 shows the performance for different feature set and also gives state-of-the-art results on the same data set. Li-10 denotes the performance of the tagger of (Li et al., 2010) and Hatori-11 denotes the performance of the tagger of (Hatori et al., 2011). From table 2, we can see Feature sets Tagging Feature Templates Z&C08 w i, w i+1, w i w i+1, w i w i-1, FC(w i ), LC(w i ), TS(FC(w i )), TS(LC(w i )), C n (w i ) (n=2 len(w i ) -1), LC(w i-1 )w i FC(w i+1 ), FC(w i )C n (w i ) (n=2 len(w i ) ), LC(w i )C n (w i ) (n=1 len(w i ) -1), C n (w i ) (if C n (w i ) = C n+1 (w i )), tag i-1, tag i-1 tag i-2 BD Z&C08 + tag i+1, tag i+1 tag i+2, tag i-1 tag i+1 OP POS BD + w i tagged i+1, w i tagged i-1, w i tagged i-2, w i tagged i+2 w i tagged i-2 tagged i-1, w i tagged i-1 tagged i+1, w i tagged i+1 tagged i+2 TABLE 1 Feature templates for easy-first POS tagging. Here w i denotes the i-th word of the input sentence, tag i denotes the POS tag of w i, FC(w i )/LC(w i ) denotes the first/last character of w i. C n (w i ) is the n-th character of w i, TS(c) is the POS tag set that co-occurred with character c in the dictionary. tagged k denotes whether w k has already been tagged
11 Feature Sets Development Set Test Set Total Seen Unseen Total Seen Unseen Speed 5 Z&C BD OP Li Hatori TABLE 2 POS tagging performance that bi-directional features are effective for improving tagging performance. With bi-directional features, the tagger achieves the accuracy of which is higher than Li-10 and slightly lower than Hatori-11. By incorporating order preference features, tagging accuracy increases to better than both Li-10 and Hatori-11. Moreover, rather than using Viterbi or beam search, our easy-first tagger is deterministic which is easy to implement and runs in high speed, more than 2000 sentence per second. 3.4 Dependency parsing We use root accuracy, complete match rate and word accuracy or dependency accuracy to evaluate the parser s performance. Feature templates for easy-first dependency parsing are shown in table 3. G&E10 denotes the feature templates used in (Goldberg and Elhadad, 2010) with a slightly modification: the feature templates in the last row of G&E10 were originally designed to deal with English PP attachment ambiguity. Those templates are limited to be used only when Feature sets G&E10 Parsing Feature Templates for p in p i-2, p i-1, p i, p i+1, p i+2, p i+3 : len(p), nc(p) for p q in(p i-2, p i-1 ),(p i-1, p i ),(p i, p i+1 ),(p i+1, p i+2 ),(p i+2, p i+3 ): dis(p, q), dis(p, q)tag p tag q for p in p i-2, p i-1, p i, p i+1, p i+2, p i+3: tag p, w p, tag p lc p, tag p rc p, tag p lc p rc p for p q in(p i, p i+2 ),(p i-1, p i ),(p i, p i+1 ),(p i-1, p i+2 ),(p i+1, p i+2 ): tag p tag q, tag p tag q lc p lc q, w p w q, tag p w q, w p tag q, tag p tag q lc p rc q, tag p tag q rc p lc q, tag p tag q rc p rc q w pi-1 w pi rc pi, tag pi-1 w pi rcw pi, w pi-1 w pi+1 rc pi+1, tag pi-1 w pi+1 rcw pi+1, w pi w pi+1 rc pi+1 tag pi w pi+1 rcw pi+1, w pi+1 w pi+2 rc pi+2, tag pi+1 w pi+2 rcw pi+2, w pi w pi+2 rc pi+2, tag pi+1 w pi+2 rcw pi+2 VTT DEP for p in p i-2, p i-1, p i, p i+1, p i+2 : w p vl(p), w p vr(p), tag p vl(p), tag p vr(p), r 2 cw p, r 2 c p, l 2 cw p, l 2 c p, tag p lc p rc p OP DEP tag pi-2 tag pi-1 tag p, tag pi-1 tag pi tag p+1, tag pi tag pi+1 tag p+2 for p, q in (p i, p i-1 ),(p i, p i-2 ),(p i, p i+1 ),(p i, p i+2 ),(p i, p i+3 ):w p nc(q), tag p nc(q), tag p tag q nc(q), for p, q, r in (p i, p i-1, p i-2 ),( p i, p i-1, p i+1 ),( p i, p i+1, p i+2 ): w p nc(q) nc(r), tag p nc(q) nc(r) TABLE 3 Feature template used for dependency parsing. For a partial dependency tree p, len(p) is the number of words in p. nc(p) denotes whether p is a leaf node. w p and tag p denote p s root word and the POS tag of p s root word, respectively. lc p /rc p denote the POS tag of p s left/right most child. lcw p /rcw p denotes the word form of p s left/right most child. l 2 c p /r 2 c p denote the POS tag of p s second left/right most child. l 2 cw p /r 2 cw p denotes the word form of p s second left/right most child 5 Since experiments in this work and that in the other work are carried on different machines, speed is for reference only.
12 H&S Z&N H&S-H Z&N-H Li-O2 Li-O3 G&E10 VTT OP GoldPOS Word Root Compl Tag Acc AutoPOS Word Root Compl J-N Word Root Compl Speed TABLE 4 Parsing performance. H&S-10 and Z&N-11 denote parsers in Huang and Sagae (2010) and Zhang and Nirve (2011), respectively. H&S-H and Z&N-H denote Hatori et al., (2011) s re-implementation of H&S-10 and Z&N-11, respectively. Li-10-O2/O3 denotes the 2 rd /3 rd graph based model of Li et al., (2010) either or or is a preposition. For Chinese, PP attachment ambiguity is not as prevalent as that of English (Huang et al., 2009) and we found that use these features without any limitation yields better results. VTT includes valence features, tri-gram features and third order features which were proved useful for transition based parsers (Zhang and Nivre, 2011). For OP, some additional order preference feature templates are added. Parsing results are shown in table 4. GoldPOS denotes the input with gold standard POS tag. AutoPOS denotes that the training set are assigned with gold standard POS tag while the test set are tagged by our easy-first tagger. J-N denotes that we use 10-fold Jack-Nifing to train the model. Tag Acc denotes the test set tagging accuracy. From table 4, we can see that valence and tri-gram features are also effective for easy-first parser. For GoldPOS, word accuracy boosted from to For AutoPOS and J-N, word accuracy also increased about 0.2 and 0.3 point, respectively. Order preference features are not as effective as it for tagging. After adding these features, parsing performance rarely changed. One reason might be that some of the features of G&E10 already capture order information (Goldberg and Elhadad, 2010). Comparing with other state-of-the-art parsers on this data set, the performance of the easy-first parser is still lower. This may due to the fact that our parser is greedy, thus more vulnerable to error propagation. Interestingly, for AutoPOS, the easy-first parser achieves the highest complete match rate. This is consistent with (Goldberg and Elhadad, 2010). One thing should be mentioned is that, the deterministic easy-first parser is both easy to implement and runs very fast, more than 350 sentences per second. 3.5 Joint tagging and parsing Similar to (Hatori et al., 2011), for the joint task, we choose the model which performs the best on the development set in terms of word accuracy to run on the test set. Feature templates for joint POS tagging and dependency parsing are shown in table 5. The feature set is the union of the feature set used for POS tagging and the feature set used for dependency parsing. Besides, in-
13 Feature sets Feature templates for Joint tagging and parsing Syn nc(p i-1 ), nc(p i-1 ) nc(p i+1 ), nc(p i+1 ) nc(p i-1 )w i, nc(p i-1 )w i, nc(p i-1 ) nc(p i-1 ) nc(p i+1 )w i nc(p i+1 )w i nc(p i+1 ) nc(p i+1 ) VVT Joint Syn + VVT DEP + OP POS Table 5 Feature templates for joint POS tagging and dependency parsing. -spired by (Hatori et al., 2011), we also incorporate some syntactic features which aims to improve tagging accuracy and theses features are denoted by Syn. Table 6 shows the results for the joint model with different losses. Parsing performance, especially word level accuracy, is largely affected by the different loss settings. When training the joint model with a single loss, word level accuracy is When training with and, word level accuracy increased to These results demonstrate that our training method can bias the joint model towards the desired task. However, as we try different losses, tagging accuracy rarely changes. This may because that the tagging accuracy is already very high and it is quite difficult to achieve further improvement. Comparing with previous results on joint POS tagging and dependency parsing, our method achieves the best result in terms of complete match rate and POS tagging accuracy 6. The word accuracy is still below the best result. This may due to the fact that our joint decoder is deterministic thus suffers more from error propagation comparing with beam search based or dynamic programming based. However, our joint method can also be enhanced with beam search and we leave it to future work. Model losses POS Word Root Compl Speed * Joint * * * Pipeline Other Methods POS Word Root Compl Speed B&N Li-10-V1-O Li-10-V2-O Z&N-Hatori H&S-Hatori TABLE 6 Joint tagging and parsing results. B&N-12 denotes the result of (Bohnet and Nivre, 2012). Li-10-V1-O2 and Li-10-V2-O3 denote results from (Li et al., 2010). Z&N-Hatori and H&S-Hatori denote results from (Hatori et al., 2011). * denotes statistical significant (p < 0.5 by MeNemar s test) comparing with the pipeline method. 6 Comparing with easy-first tagger, the joint model does better at disambiguiting NN-VV, DEC-DEG while poorer at JJ-NN. This result is similar to previous result (Hatori et al, 2011). For limited space, we omit the confusion matrix.
14 4 Related Work 4.1 Easy-first and bi-directional sequential labelling The easy-first dependency parsing algorithm was first proposed by (Goldberg and Elhadad, 2010), they applied the algorithm to English dependency parsing and by combining results with transition based parser and graph based parser, state-of-the-art performance is achieved. This work is a generalization to their algorithm, and we applied the generalized algorithm to Chinese POS tagging and dependency parsing and also achieves good results in terms of both speed and accuracy. We also proposed the first easy-first joint tagging and parsing algorithm. By incorporating additional loss during training, our method achieves the best tagging accuracy reported so far. Shen et al. (2007) proposed a bi-directional POS tagging algorithm which achieves state-of-the-art accuracy on English POS tagging. Comparing to their method, our tagging algorithm in this paper is much simpler and we are the first to use order preference features in POS tagging. Also this is the first work that applies easy-first tagging on Chinese. 4.2 Joint POS tagging and parsing For joint POS tagging and (unlabelled, projective) dependency parsing, Li et al. (2010) proposed the first graph based algorithm. Lee et al. (2011) proposed a graphical model to solve the joint problem. Hatori et al. (2011) proposed the first transition based algorithm. Bohnet and Nivre (2012) extended Hatori et al. (2011) to labelled non-projective dependency parsing. Different from the works talked above, our method is based on the easy-first framework. In addition, all previous joint methods optimize a single loss in the training phase while we are the first to train the joint model with additional loss. 4.3 Multiple loss training Hall et al. (2011) proposed the augmented-loss training for dependency parser that aims at adapting the parser to other domains or to downstream tasks such as MT reordering. They extended structured perceptron with multiple losses each of which is associated with an external training set. Our method is directly inspired by Hall et al. (2011). However, rather than domain adaptation, our method aims at training the joint model to pursue the best result of one of the individual task. Moreover, our method optimizes all loss on a single training set. 5 Conclusion In this paper, we generalize the method of (Goldberg and Elhadad, 2010) to a general framework and apply the framework to Chinese POS tagging and dependency parsing. We also proposed the first joint tagging and dependency parsing algorithm under the easy-first framework. We show that by using order preference features, an easy-first POS tagger can achieve the state-of-the-art accuracy. We show a deterministic easy-first parser can surpass the transition-based parser when the input is associated with automatically generated tags. We also illustrate that by incorporating additional loss in the training process, we can bias the joint model towards the desired task. Acknowledgements: We would like to thank Professor ChangNing Huang, Mu Li, Nan Yang, Zhanghua Li, Yova Goldberg and Jun Hatori for frequent discussions. We also thank Muhua Zhu, Wenliang Chen, Nan Yang and three anonymous reviewers for their insightful suggestions on earlier drafts of this paper.
15 References Bohnet, B. and Nivre J. (2012) A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing. (EMNLP 2012). Collins, M. (2002). Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms.(EMNLP 2002), pages 1-8 Daumé III, H., Langford, J. and Marcu, D. (2009) Search-based structured prediction. In Journel of Machine Learning, 75(3): Duan, X., Zhao, J. and Xu, B. (2007) Probabilistic parsing action models for multilingual dependency parsing. (EMNLP-CoNLL 2007) Goldberg, Y. and Elhadad, M. (2010) An Efficient Algorithm for Eash-First Non-Directional Dependency Parsing. (NAACL 2010), pages Hall, K., McDonald, R., Katz-Brown, J. and Ringgaard, M. (2011) Training dependency parsers by jointly optimizing multiple objectives. (EMNLP 2011) Hatori, J., Matsuzaki, T., Miyao, Y. and Tsujii, J. (2011) Incremental Joint POS Tagging and Dependency Parsing in Chinese. (IJCNLP 2011), pages , Chiang Mai, Thailand. Huang, L. Jiang, W. and Liu, Q. (2009) Bilingually-Constrained (Monolingual) Shift-Reduce Parsing. (EMNLP 2009), pages , Singapore. Huang, L. and Sagae, K. (2010) Dynamic programming for linear-time incremental parsing. (ACL 2010). Lee, J., Naradowsky, J. and Smith, D.A. (2011) A discriminative model for joint morphological disambiguation and dependency parsing. (ACL 2011) Li, Z., Zhang, M., Che, W., Liu, T. Chen, W. and Li, H. (2010) Joint Models for Chinese POS Tagging and Dependency Parsing (EMNLP 2010), pages , Edinburgh. Rataparkhi, A. (1996) A Maximum Entropy Part-Of-Speech Tagger. (EMNLP 1996) Shen, L., Satt, G. and Joshi, A. K. (2007) Guided Learning for Bidirectional Sequence Classification. (ACL 2007), pages , Prague. Tsuruoka, Y. Tsujii, J. (2005). Bidirectional inference with the easiest-first strategy for tagging sequence data. (EMNLP 2005) pages Xue, N., Xia, F., Chiou, F. and Palmer, M. (2005) The Penn Chinese Treebank: Phrase structure annotation of a large corpus. In Natural Language Engineering, volume 11, pages Zhang, Y. and Clark, S. (2008a) Joint Word Segmentation and POS Tagging Using a Single Perceptron. (ACL 2008), Ohia. Zhang, Y. and Clark, S. (2008b) A tale of two parsers: investigating and combining graph-based and transition-based dependency parsing using beam-search. (EMNLP 2008), Hawaii. Zhang, Y. and Nivre, J. (2011) Transition-based Dependency Parsing with Rich Non-local Features. (ACL 2011), Portland.
16
Transition-Based Dependency Parsing with Long Distance Collocations
Transition-Based Dependency Parsing with Long Distance Collocations Chenxi Zhu, Xipeng Qiu (B), and Xuanjing Huang Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science,
DEPENDENCY PARSING JOAKIM NIVRE
DEPENDENCY PARSING JOAKIM NIVRE Contents 1. Dependency Trees 1 2. Arc-Factored Models 3 3. Online Learning 3 4. Eisner s Algorithm 4 5. Spanning Tree Parsing 6 References 7 A dependency parser analyzes
A Fast Decoder for Joint Word Segmentation and POS-Tagging Using a Single Discriminative Model
A Fast Decoder for Joint Word Segmentation and POS-Tagging Using a Single Discriminative Model Yue Zhang and Stephen Clark University of Cambridge Computer Laboratory William Gates Building, 15 JJ Thomson
Building a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
Why language is hard. And what Linguistics has to say about it. Natalia Silveira Participation code: eagles
Why language is hard And what Linguistics has to say about it Natalia Silveira Participation code: eagles Christopher Natalia Silveira Manning Language processing is so easy for humans that it is like
Micro blogs Oriented Word Segmentation System
Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,
Joint Word Segmentation and POS Tagging using a Single Perceptron
Joint Word Segmentation and POS Tagging using a Single Perceptron Yue Zhang and Stephen Clark Oxford University Computing Laboratory Wolfson Building, Parks Road Oxford OX1 3QD, UK {yue.zhang,stephen.clark}@comlab.ox.ac.uk
Word Completion and Prediction in Hebrew
Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology
Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines
, 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing
PULLING OUT OPINION TARGETS AND OPINION WORDS FROM REVIEWS BASED ON THE WORD ALIGNMENT MODEL AND USING TOPICAL WORD TRIGGER MODEL
Journal homepage: www.mjret.in ISSN:2348-6953 PULLING OUT OPINION TARGETS AND OPINION WORDS FROM REVIEWS BASED ON THE WORD ALIGNMENT MODEL AND USING TOPICAL WORD TRIGGER MODEL Utkarsha Vibhute, Prof. Soumitra
Applying Co-Training Methods to Statistical Parsing. Anoop Sarkar http://www.cis.upenn.edu/ anoop/ [email protected]
Applying Co-Training Methods to Statistical Parsing Anoop Sarkar http://www.cis.upenn.edu/ anoop/ [email protected] 1 Statistical Parsing: the company s clinical trials of both its animal and human-based
POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
PoS-tagging Italian texts with CORISTagger
PoS-tagging Italian texts with CORISTagger Fabio Tamburini DSLO, University of Bologna, Italy [email protected] Abstract. This paper presents an evolution of CORISTagger [1], an high-performance
The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
Brill s rule-based PoS tagger
Beáta Megyesi Department of Linguistics University of Stockholm Extract from D-level thesis (section 3) Brill s rule-based PoS tagger Beáta Megyesi Eric Brill introduced a PoS tagger in 1992 that was based
Chapter 8. Final Results on Dutch Senseval-2 Test Data
Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised
Open Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
Customizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
Identifying Prepositional Phrases in Chinese Patent Texts with. Rule-based and CRF Methods
Identifying Prepositional Phrases in Chinese Patent Texts with Rule-based and CRF Methods Hongzheng Li and Yaohong Jin Institute of Chinese Information Processing, Beijing Normal University 19, Xinjiekou
THUTR: A Translation Retrieval System
THUTR: A Translation Retrieval System Chunyang Liu, Qi Liu, Yang Liu, and Maosong Sun Department of Computer Science and Technology State Key Lab on Intelligent Technology and Systems National Lab for
Statistical Machine Translation
Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language
Reliable and Cost-Effective PoS-Tagging
Reliable and Cost-Effective PoS-Tagging Yu-Fang Tsai Keh-Jiann Chen Institute of Information Science, Academia Sinica Nanang, Taipei, Taiwan 5 eddie,[email protected] Abstract In order to achieve
Parsing Software Requirements with an Ontology-based Semantic Role Labeler
Parsing Software Requirements with an Ontology-based Semantic Role Labeler Michael Roth University of Edinburgh [email protected] Ewan Klein University of Edinburgh [email protected] Abstract Software
Blog Post Extraction Using Title Finding
Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School
An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines)
An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines) James Clarke, Vivek Srikumar, Mark Sammons, Dan Roth Department of Computer Science, University of Illinois, Urbana-Champaign.
Sense-Tagging Verbs in English and Chinese. Hoa Trang Dang
Sense-Tagging Verbs in English and Chinese Hoa Trang Dang Department of Computer and Information Sciences University of Pennsylvania [email protected] October 30, 2003 Outline English sense-tagging
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1
Effective Self-Training for Parsing
Effective Self-Training for Parsing David McClosky [email protected] Brown Laboratory for Linguistic Information Processing (BLLIP) Joint work with Eugene Charniak and Mark Johnson David McClosky - [email protected]
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
Natural Language Processing
Natural Language Processing 2 Open NLP (http://opennlp.apache.org/) Java library for processing natural language text Based on Machine Learning tools maximum entropy, perceptron Includes pre-built models
Clustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
POS Tagging 1. POS Tagging. Rule-based taggers Statistical taggers Hybrid approaches
POS Tagging 1 POS Tagging Rule-based taggers Statistical taggers Hybrid approaches POS Tagging 1 POS Tagging 2 Words taken isolatedly are ambiguous regarding its POS Yo bajo con el hombre bajo a PP AQ
Tagging with Hidden Markov Models
Tagging with Hidden Markov Models Michael Collins 1 Tagging Problems In many NLP problems, we would like to model pairs of sequences. Part-of-speech (POS) tagging is perhaps the earliest, and most famous,
A Dynamic Oracle for Arc-Eager Dependency Parsing
A Dynamic Oracle for Arc-Eager Dependency Parsing Yoav Gold ber g 1 Joakim N ivre 1,2 (1) Google Inc. (2) Uppsala University [email protected], [email protected] ABSTRACT The standard training regime
Natural Language Database Interface for the Community Based Monitoring System *
Natural Language Database Interface for the Community Based Monitoring System * Krissanne Kaye Garcia, Ma. Angelica Lumain, Jose Antonio Wong, Jhovee Gerard Yap, Charibeth Cheng De La Salle University
Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems
Accelerating and Evaluation of Syntactic Parsing in Natural Language Question Answering Systems cation systems. For example, NLP could be used in Question Answering (QA) systems to understand users natural
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic
Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic by Sigrún Helgadóttir Abstract This paper gives the results of an experiment concerned with training three different taggers on tagged
Context Grammar and POS Tagging
Context Grammar and POS Tagging Shian-jung Dick Chen Don Loritz New Technology and Research New Technology and Research LexisNexis LexisNexis Ohio, 45342 Ohio, 45342 [email protected] [email protected]
Hybrid Machine Translation Guided by a Rule Based System
Hybrid Machine Translation Guided by a Rule Based System Cristina España-Bonet, Gorka Labaka, Arantza Díaz de Ilarraza, Lluís Màrquez Kepa Sarasola Universitat Politècnica de Catalunya University of the
Deep Learning for Chinese Word Segmentation and POS Tagging
Deep Learning for Chinese Word Segmentation and POS Tagging Xiaoqing Zheng Fudan University 220 Handan Road Shanghai, 200433, China zhengxq@fudaneducn Hanyang Chen Fudan University 220 Handan Road Shanghai,
SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY
SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY G.Evangelin Jenifer #1, Mrs.J.Jaya Sherin *2 # PG Scholar, Department of Electronics and Communication Engineering(Communication and Networking), CSI Institute
CONCEPTUAL MODEL OF MULTI-AGENT BUSINESS COLLABORATION BASED ON CLOUD WORKFLOW
CONCEPTUAL MODEL OF MULTI-AGENT BUSINESS COLLABORATION BASED ON CLOUD WORKFLOW 1 XINQIN GAO, 2 MINGSHUN YANG, 3 YONG LIU, 4 XIAOLI HOU School of Mechanical and Precision Instrument Engineering, Xi'an University
Domain Adaptation for Dependency Parsing via Self-training
Domain Adaptation for Dependency Parsing via Self-training Juntao Yu 1, Mohab Elkaref 1, Bernd Bohnet 2 1 University of Birmingham, Birmingham, UK 2 Google, London, UK 1 {j.yu.1, m.e.a.r.elkaref}@cs.bham.ac.uk,
Chinese-Japanese Machine Translation Exploiting Chinese Characters
Chinese-Japanese Machine Translation Exploiting Chinese Characters CHENHUI CHU, TOSHIAKI NAKAZAWA, DAISUKE KAWAHARA, and SADAO KUROHASHI, Kyoto University The Chinese and Japanese languages share Chinese
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
Peking: Profiling Syntactic Tree Parsing Techniques for Semantic Graph Parsing
Peking: Profiling Syntactic Tree Parsing Techniques for Semantic Graph Parsing Yantao Du, Fan Zhang, Weiwei Sun and Xiaojun Wan Institute of Computer Science and Technology, Peking University The MOE Key
SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems Jin Yang, Satoshi Enoue Jean Senellart, Tristan Croiset SYSTRAN Software, Inc. SYSTRAN SA 9333 Genesee Ave. Suite PL1 La Grande
Probabilistic topic models for sentiment analysis on the Web
University of Exeter Department of Computer Science Probabilistic topic models for sentiment analysis on the Web Chenghua Lin September 2011 Submitted by Chenghua Lin, to the the University of Exeter as
A Comparative Study on Sentiment Classification and Ranking on Product Reviews
A Comparative Study on Sentiment Classification and Ranking on Product Reviews C.EMELDA Research Scholar, PG and Research Department of Computer Science, Nehru Memorial College, Putthanampatti, Bharathidasan
Joint POS Tagging and Text Normalization for Informal Text
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015) Joint POS Tagging and Text Normalization for Informal Text Chen Li and Yang Liu University of Texas
Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.
Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Pranali Chilekar 1, Swati Ubale 2, Pragati Sonkambale 3, Reema Panarkar 4, Gopal Upadhye 5 1 2 3 4 5
A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks
A Knowledge-Poor Approach to BioCreative V DNER and CID Tasks Firoj Alam 1, Anna Corazza 2, Alberto Lavelli 3, and Roberto Zanoli 3 1 Dept. of Information Eng. and Computer Science, University of Trento,
A Mixed Trigrams Approach for Context Sensitive Spell Checking
A Mixed Trigrams Approach for Context Sensitive Spell Checking Davide Fossati and Barbara Di Eugenio Department of Computer Science University of Illinois at Chicago Chicago, IL, USA [email protected], [email protected]
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
Collecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
B-bleaching: Agile Overtraining Avoidance in the WiSARD Weightless Neural Classifier
B-bleaching: Agile Overtraining Avoidance in the WiSARD Weightless Neural Classifier Danilo S. Carvalho 1,HugoC.C.Carneiro 1,FelipeM.G.França 1, Priscila M. V. Lima 2 1- Universidade Federal do Rio de
Neural Networks for Sentiment Detection in Financial Text
Neural Networks for Sentiment Detection in Financial Text Caslav Bozic* and Detlef Seese* With a rise of algorithmic trading volume in recent years, the need for automatic analysis of financial news emerged.
Compact Representations and Approximations for Compuation in Games
Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions
Conditional Random Fields: An Introduction
Conditional Random Fields: An Introduction Hanna M. Wallach February 24, 2004 1 Labeling Sequential Data The task of assigning label sequences to a set of observation sequences arises in many fields, including
Course: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 [email protected] Abstract Probability distributions on structured representation.
NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models
NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig Nara Institute of Science and Technology (NAIST) 1 Part of Speech (POS) Tagging Given a sentence X, predict its
BEER 1.1: ILLC UvA submission to metrics and tuning task
BEER 1.1: ILLC UvA submission to metrics and tuning task Miloš Stanojević ILLC University of Amsterdam [email protected] Khalil Sima an ILLC University of Amsterdam [email protected] Abstract We describe
Identifying Focus, Techniques and Domain of Scientific Papers
Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 [email protected] Christopher D. Manning Department of
Emotion Detection in Code-switching Texts via Bilingual and Sentimental Information
Emotion Detection in Code-switching Texts via Bilingual and Sentimental Information Zhongqing Wang, Sophia Yat Mei Lee, Shoushan Li, and Guodong Zhou Natural Language Processing Lab, Soochow University,
Parallel Compression and Decompression of DNA Sequence Reads in FASTQ Format
, pp.91-100 http://dx.doi.org/10.14257/ijhit.2014.7.4.09 Parallel Compression and Decompression of DNA Sequence Reads in FASTQ Format Jingjing Zheng 1,* and Ting Wang 1, 2 1,* Parallel Software and Computational
Stock Market Forecasting Using Machine Learning Algorithms
Stock Market Forecasting Using Machine Learning Algorithms Shunrong Shen, Haomiao Jiang Department of Electrical Engineering Stanford University {conank,hjiang36}@stanford.edu Tongda Zhang Department of
Turkish Radiology Dictation System
Turkish Radiology Dictation System Ebru Arısoy, Levent M. Arslan Boaziçi University, Electrical and Electronic Engineering Department, 34342, Bebek, stanbul, Turkey [email protected], [email protected]
Semantic annotation of requirements for automatic UML class diagram generation
www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute
An End-to-End Discriminative Approach to Machine Translation
An End-to-End Discriminative Approach to Machine Translation Percy Liang Alexandre Bouchard-Côté Dan Klein Ben Taskar Computer Science Division, EECS Department University of California at Berkeley Berkeley,
HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION
HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University
TREC 2003 Question Answering Track at CAS-ICT
TREC 2003 Question Answering Track at CAS-ICT Yi Chang, Hongbo Xu, Shuo Bai Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China [email protected] http://www.ict.ac.cn/
A New Input Method for Human Translators: Integrating Machine Translation Effectively and Imperceptibly
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015) A New Input Method for Human Translators: Integrating Machine Translation Effectively and Imperceptibly
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University [email protected] Kapil Dalwani Computer Science Department
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 Jin Yang and Satoshi Enoue SYSTRAN Software, Inc. 4444 Eastgate Mall, Suite 310 San Diego, CA 92121, USA E-mail:
Semantic parsing with Structured SVM Ensemble Classification Models
Semantic parsing with Structured SVM Ensemble Classification Models Le-Minh Nguyen, Akira Shimazu, and Xuan-Hieu Phan Japan Advanced Institute of Science and Technology (JAIST) Asahidai 1-1, Nomi, Ishikawa,
Research on the UHF RFID Channel Coding Technology based on Simulink
Vol. 6, No. 7, 015 Research on the UHF RFID Channel Coding Technology based on Simulink Changzhi Wang Shanghai 0160, China Zhicai Shi* Shanghai 0160, China Dai Jian Shanghai 0160, China Li Meng Shanghai
SENTIMENT ANALYSIS: A STUDY ON PRODUCT FEATURES
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Dissertations and Theses from the College of Business Administration Business Administration, College of 4-1-2012 SENTIMENT
Improving Data Driven Part-of-Speech Tagging by Morphologic Knowledge Induction
Improving Data Driven Part-of-Speech Tagging by Morphologic Knowledge Induction Uwe D. Reichel Department of Phonetics and Speech Communication University of Munich [email protected] Abstract
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990
Neural Network based Vehicle Classification for Intelligent Traffic Control
Neural Network based Vehicle Classification for Intelligent Traffic Control Saeid Fazli 1, Shahram Mohammadi 2, Morteza Rahmani 3 1,2,3 Electrical Engineering Department, Zanjan University, Zanjan, IRAN
Extracting Events from Web Documents for Social Media Monitoring using Structured SVM
IEICE TRANS. FUNDAMENTALS/COMMUN./ELECTRON./INF. & SYST., VOL. E85A/B/C/D, No. xx JANUARY 20xx Letter Extracting Events from Web Documents for Social Media Monitoring using Structured SVM Yoonjae Choi,
Cross-Lingual Concern Analysis from Multilingual Weblog Articles
Cross-Lingual Concern Analysis from Multilingual Weblog Articles Tomohiro Fukuhara RACE (Research into Artifacts), The University of Tokyo 5-1-5 Kashiwanoha, Kashiwa, Chiba JAPAN http://www.race.u-tokyo.ac.jp/~fukuhara/
Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde
Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction
Learning Translation Rules from Bilingual English Filipino Corpus
Proceedings of PACLIC 19, the 19 th Asia-Pacific Conference on Language, Information and Computation. Learning Translation s from Bilingual English Filipino Corpus Michelle Wendy Tan, Raymond Joseph Ang,
Grammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University
Grammars and introduction to machine learning Computers Playing Jeopardy! Course Stony Brook University Last class: grammars and parsing in Prolog Noun -> roller Verb thrills VP Verb NP S NP VP NP S VP
Automatic Detection and Correction of Errors in Dependency Treebanks
Automatic Detection and Correction of Errors in Dependency Treebanks Alexander Volokh DFKI Stuhlsatzenhausweg 3 66123 Saarbrücken, Germany [email protected] Günter Neumann DFKI Stuhlsatzenhausweg
A Method for Automatic De-identification of Medical Records
A Method for Automatic De-identification of Medical Records Arya Tafvizi MIT CSAIL Cambridge, MA 0239, USA [email protected] Maciej Pacula MIT CSAIL Cambridge, MA 0239, USA [email protected] Abstract
Network (Tree) Topology Inference Based on Prüfer Sequence
Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 [email protected],
GOAL-BASED INTELLIGENT AGENTS
International Journal of Information Technology, Vol. 9 No. 1 GOAL-BASED INTELLIGENT AGENTS Zhiqi Shen, Robert Gay and Xuehong Tao ICIS, School of EEE, Nanyang Technological University, Singapore 639798
