Adaptive Multi-Compositionality for Recursive Neural Models with Applications to Sentiment Analysis

Proceedngs of the Twenty-Eghth AAAI Conference on Artfca Integence Adapte Mut-Compostonaty for Recurse Neura Modes wth Appcatons to Sentment Anayss L Dong Furu We Mng Zhou Ke Xu State Key Lab of Software Deeopment Enronment, Behang Unersty, Bejng, Chna Mcrosoft Research, Bejng, Chna dongxp@gma.com {fuwe,mngzhou}@mcrosoft.com kexu@nsde.buaa.edu.cn Abstract Recurse neura modes hae acheed promsng resuts n many natura anguage processng tasks. The man dfference among these modes es n the composton functon,.e., how to obtan the ector representaton for a phrase or sentence usng the representatons of words t contans. Ths paper ntroduces a noe Adapte Mut-Compostonaty (AdaMC) ayer to recurse neura modes. The basc dea s to use more than one composton functons and adaptey seect them dependng on the nput ectors. We present a genera framework to mode each semantc composton as a dstrbuton oer these composton functons. The composton functons and parameters used for adapte seecton are earned jonty from data. We ntegrate AdaMC nto exstng recurse neura modes and conduct extense experments on the Stanford Sentment Treebank. The resuts ustrate that AdaMC sgnfcanty outperforms state-of-the-art sentment cassfcaton methods. It heps push the best accuracy of sentence-ee negate/poste cassfcaton from 85.4% up to 88.5%. Introducton Recurse Neura Modes (RNMs), whch utze the recurse structure of the nput (e.g., a sentence), are one famy of popuar deep earnng modes. They are partcuary effecte for many Natura Language Processng (NLP) tasks due to the compostona nature of natura anguage. Recenty, many promsng resuts hae been reported on semantc reatonshp cassfcaton (Socher et a. 2012), syntactc parsng (Socher et a. 2013a), sentment anayss (Socher et a. 2013b), and so on. The man dfference among RNMs es n the semantc composton method,.e., how to obtan the ector representaton for a phrase or sentence usng the representatons of words and phrases t contans. For nstance, we can compute the word ector for the phrase not good wth the ectors of the words not and good. For many tasks, we een need to obtan the ector representatons for sentences. The composton agorthm becomes the key to make the ector representatons go beyond words to phrases and sentences. Contrbuton durng nternshp at Mcrosoft Research. Copyrght c 2014, Assocaton for the Adancement of Artfca Integence (www.aaa.org). A rghts resered. There hae been seera attempts n terature to address the semantc composton for RNMs. Specfcay, RNN (Socher et a. 2011) uses a goba matrx to neary combne the eements of ectors, whe RNTN (Socher et a. 2013b) empoys a goba tensor to mode the products of dmensons. Sometmes t s chaengng to fnd a snge powerfu functon to mode the semantc composton. Intutey, we can empoy mutpe composton functons, nstead of ony usng a snge goba one. Instead of fndng more compex composton functons, MV-RNN (Socher et a. 2012) assgns matrces for eery words to make the compostons specfc. Howeer, the number of composton matrces s the same as ocabuary sze, whch makes the number of parameters qute arge. It s easy to oerft the tranng data and dffcut to be optmzed. Moreoer, MV- RNN needs another goba matrx to neary combne the composton matrces for phrases, whch st makes these compostons not specfc. In order to oercome these shortcomngs and make the compostons specfc, t s better to use a certan number of composton functons, and embed the roe-senste (ngustc and semantc) nformaton nto word ectors to adaptey seect these compostons rather than concrete words. The exampe not (so good) n sentment anayss ustrates ths pont. To obtan the poarty of ths phrase, we frsty combne the words so and good, then combne the not and so good. Specfcay, the frst combnaton s a strengthen composton whch makes the sentment poarty stronger, and the second step s a negaton composton whch negates the poste poarty to negate. In ths paper, we ntroduce a noe Adapte Mut- Compostonaty (AdaMC) method for RNMs. AdaMC conssts of more than one composton functons, and adaptey seects them dependng on the nput ectors. The mode earns to embed the semantc categores of words nto ther correspondng word ectors, and uses them to choose these composton functons adaptey. Specfcay, we propose a parametrzaton method to compute the probabty dstrbuton for eery functon gen the chd ectors. We aso ntroduce a hyper-parameter to mode the adapte preferences oer the dfferent composton functons and show three speca cases of AdaMC. By adjustng ths hyperparameter, there s a contnuous transton between these three speca cases. Moreoer, a these composton functons and how to seect them are automatcay earned from 1537

supersons, nstead of choosng them heurstcay or by manuay defned rues. Hence, task-specfc composton nformaton can be embedded nto representatons and used n the task. We hae conducted extense experments on the Stanford Sentment Treebank. The expermenta resuts ustrate that our approach sgnfcanty mproes the basene methods, and yeds state-of-the-art accuracy on the sentment cassfcaton task. The man contrbutons of ths paper are three-fod: We ntroduce an Adapte Mut-Compostonaty approach nto recurse neura modes to better perform semantc compostons; We propose a parametrzaton method to cacuate the probabtes of dfferent composton functons gen two nput ectors; We present emprca resuts on the pubc aaabe Stanford Sentment Treebank. AdaMC sgnfcanty outperforms state-of-the-art sentment cassfcaton resuts. Reated Work Semantc composton has attracted extense attenton n ector based semantc modes. Let a and b be two words, represented by the ectors a and b. The goa of semantc composton s to compute ector c to represent the phrase ab. Most preous works focus on defnng dfferent composton functons to obtan ector c. Landauer and Dutnas (1997) use the aerage of a and b to obtan the representaton for ab. Mtche and Lapata (2008; 2010) suggest usng weghted addton (c = a + b) and eement-wse mutpcaton (c = a b) to compute ector c. Another operaton s the tensor product (Smoensky 1990; Aerts and Czachor 2004; Cark, Coecke, and Sadrzadeh 2008; Wddows 2008), such as the outer product, for composton. The outer product of two ectors produces a matrx, whch makes the representatons become exponentay arger. An mproement of usng the outer product s to use crcuar conouton (Pate 1991; Jones and Mewhort 2007) as composton functons. It compresses the resut matrx of the outer product to a ector. Baron and Zampare (2010) represent nouns as ectors, and adjectes as matrces. Then the authors appy matrx-by-ector mutpcaton for the adj-noun par composton. Instead of usng ector representatons, Rudoph and Gesbrecht (2010) and Yessenana and Carde (2011) use matrces to represent the phrases and defne the composton functons as matrx mutpcaton. Howeer, most of preous settngs are unsupersed nstead of usng the supersons from the specfc tasks, and eauated by comparng the smarty between short phrases (e.g., adj-noun word pars). For exampe, the ery good and ery bad are regarded as smar phrases, whch sometmes s not feasbe for specfc tasks (e.g., sentment anayss). Recenty, some works use the semantc composton n recurse neura networks to bud deep modes for NLP tasks, and hae acheed some promsng resuts. Socher et a. (2011) earn a matrx W to combne the ectors a, b, and appe a the composton resut s W. Socher et a. (2012) assgn matrx-ector pars (A, a) and (B,b) for words a b and b, where A,B are matrces, and ectors a, b are representatons. When appe two phrases are combned, the composton resut s W, where W s a goba near mappng matrx. Ba Ab Yu, Deng, and Sede (2013) and Socher et a. (2013b) use a tensor ayer to mode the ntersectons between dfferent dmensons of combned ectors. Grefenstette and Sadrzadeh (2011; 2013) present a categorca compostona mode, whch empoys forma semantcs (grammatca structure) to gude the composton sequence. Hermann and Bunsom (2013) and Poajnar, Fagarasan, and Cark (2013) use the parsng resuts based on Combnatory Categora Grammar (CCG) to gude the semantc composton. Socher et a. (2013a) propose to compute the composton ectors dependng on the Part-Of- Speech (POS) tags of two phrases. Our work s sgnfcanty dfferent from them. To begn wth, we do not empoy any syntactc categores (such as CCG combnators and types, and POS tags), whch are assumed to hae been obtaned by exstng parsers. In addton, sometmes the syntactc categores are not hepfu to gude the compostons for some tasks. For nstance, both not good and ery good are ad-adj pars. Howeer, the former s a negaton composton whch negates the poarty strength, and the other one s a ampfer n the sentment anayss task. Our proposed method earns to seect the composton functons dependng on the current ectors (adaptey). Moreoer, our mode can aso eerage these syntactc knowedge n a unfed way by regardng them as features, nstead of usng them heurstcay or by manuay crafted rues. Recurse Neura Modes The recurse neura modes represent the phrases and words as D-dmensona ectors. The modes perform compostons based on bnary trees, and obtan the ector representatons n a bottom-up way. Notaby, the word ectors n the eaf nodes are regarded as parameters, and w be updated accordng to the supersons. Specfcay, the ectors of phrases are computed by the composton of ther chd ectors. The ector of node s cacuated a: = f g, r (1) where, r are the ectors of ts eft chd and rght chd, g s the composton functon, and f s the nonnearty functon (such as tanh, sgmod, softsgn, etc.). As ustrated n Fgure 1, the representaton of so bad s cacuated by the composton of so and bad, and the representaton of trgram not so bad s recursey obtaned by the ectors of not and so bad. The earned representatons are then fed nto a cassfer to predct the abes for nodes. The softmax ayer s used as a standard component, as shown n Fgure 1. The k-th eement of softmax(z) s exp{z k } Pj exp{zj}. It outputs the probabty dstrbuton oer K casses for a gen nput. To be more specfc, the predcton dstrbuton of node s cacuated a: y = softmax U (2) 1538

Negate Negate g Neutra Very Poste g so good Poste Fgure 1: Composton process for. The g s a goba composton functon n recurse neura modes. where U 2 R K D s the cassfcaton matrx, s the ector representaton of node, and y s the predcton dstrbuton for node. The man dfference between recurse neura modes es n the desgn of composton functons. We descrbe two manstream modes and ther composton functons. RNN: Recurse Neura Network The RNN (Socher et a. 2011) s a standard member of recurse neura modes. Eery dmenson of parent ector s cacuated by weghted near combnaton of the chd ectors dmensons. The ector representaton of node s obtaned a: appe = f W r + b (3) where W 2 R D 2D s the composton matrx, b s the bas ector,, r are the eft and rght chd ectors respectey, and f s the nonnearty functon. The dmenson of s the same as ts chd ectors, and t s recursey used n the next composton. RNTN: Recurse Neura Tensor Network The RNTN (Socher et a. 2013b) uses more parameters and a more powerfu composton functon than RNN. The man dea of RNTN s to empoy tensors to mode the ntersecton between eery dmenson of ectors. The ector of node s computed a: = f appe r T T [1:D] appe r + W appe r + b where T [1:D] s a tensor, W 2 R D 2D s a near composton matrx, b s the bas ector, and f s the nonnearty functon. The resut of tensor product s appe r T T [1:D] appe r 2 = 4 appe r appe r! T appe 3 T [1] r. T appe 7 T [D] 5 r (4) (5) where T [d] 2 R 2D 2D s the d-th sce of T [1:D]. It obtans a D-dmensona ector whch defnes mutpe bnear forms. When a the eements of T [1:D] are zero, the mode s the same as RNN. Adapte Mut-Compostonaty for Recurse Neura Modes We ustrate the man dea of proposed Adapte Mut- Compostonaty method n Fgure 2. We use a composton poo whch conssts of C composton functons {g 1,...,g C }. To obtan the ector of so good, we frsty feed ts chd ectors ( so and good ) to a cassfer to get the probabty P g h so, good for usng each composton functon g h. Intutey, we shoud choose composton functons whch strengthen the poarty of good, and predct the so good as ery poste. Smary, we compute the probabty P g h not, so good for eery composton functon. Ideay, we shoud seect negate composton functons to obtan, and the poarty shoud be negate. As shown n ths exampe, t s more reasonabe to use mutpe composton functons than fndng a compex composton functon for the recurse neura modes. Negate Negate Composton Poo g 1... g C Neutra Very Poste Composton Poo g 1... g C so good Poste Fgure 2: The composton poo conssts of mutpe composton functons. It seects the functons dependng on the nput chd ectors, and produces the composton resut usng more than one composton functons. Generay, we defne the composton resut as:! CX = f P g h, r g h, r h=1 where f s the nonnearty functon, g 1,...,g C are the composton functons, and P g h, r s the probabty of empoyng g h gen the chd ectors, r. For the composton functons, we use the same forms as n RNN (Equaton (3)) and RNTN (Equaton (4)). The key pont s how to seect them propery dependng on the chd ectors,.e., how to defne P g h, r. We defne a parametrzaton approach (named AdaMC), and show three speca cases of t. By adjustng the parameter of AdaMC, there s a contnuousy transton between these speca cases. () 1539

AdaMC: Adapte Mut-Compostonaty To start wth, we defne the -softmax functon as: 2 3 exp{ z 1 } 1 -softmax (z) = P exp{ z 4 7. 5 (7) } exp{ z K } where z =[z 1...z K ] T s a ector. Ths functon s known as the Botzmann dstrbuton and Gbbs measure (Georg 2011), whch are wdey used n statstca mechancs. When =0, the -softmax functon produces a unform dstrbuton; When =1, t s the same as softmax functon; When! 1, ths functon ony actates the dmenson wth maxmum weght, and sets ts probabty to 1. We then empoy ths functon to compute the probabty dstrbuton for the composton functons a: 2 P 4 P g 1, 3 r. g C, r 7 5 = -softmax Sappe r where S 2 R C 2D s the matrx used to determne whch composton functon we use, and, r are the eft and rght chd ectors. Ag-AdaMC: Aerage AdaMC We aerage the composton resuts, and ths s a speca case of AdaMC ( =0). The probabty of usng g h s: P g h, r = 1 (9) C where h =1,...,C, and C s the number of composton functons. Weghted-AdaMC: Weghted Aerage AdaMC One speca case of AdaMC s settng =1. It uses the softmax probabty to perform a weghted aerage. 2 P g 1, 3 r 4 7. 5 = softmax Sappe P g C, (10) r r where S 2 R C 2D s the parameter matrx. Max-AdaMC: Max Output AdaMC If we set! 1 n AdaMC, t s a greedy seecton agorthm and ony outputs the composton resut wth maxmum softmax probabty. Mode Tranng We use the softmax cassfer to predct the probabtes for casses, and compare the dstrbuton wth ground truth. We defne the target ector t for node, whch s a bnary ector. If the correct abe s k, we set t k to 1 and the others to 0. Our goa s to mnmze the cross-entropy error between the predcted dstrbuton y and target dstrbuton t. For each sentence, the objecte functon s defned as: mn E( ) = X X t j og yj + X j 2 (8) k k 2 2 (11) where represents the parameters, and the second term s a L 2 -reguarzaton penaty. We empoy back-propagaton agorthm (Rumehart, Hnton, and Wams 198) to propagate the error from the top node to the eaf nodes. The derates are computed and gathered to update the parameters. The detas can be found n the suppementa matera due to space mtatons. The AdaGrad (Duch, Hazan, and Snger 2011) s used to soe ths non-conex optmzaton probem. Experments Dataset Descrpton We eauate the modes on Stanford Sentment Treebank 1. Ths corpus contans the abes of syntactcay pausbe phrases, whch aows us to tran the compostona modes based on the parsng trees. The treebank s but upon 10,2 crtc reews n Rotten Tomatoes 2, whch s orgnay used for sentence-ee sentment cassfcaton (Pang and Lee 2005). The Stanford Parser (Ken and Mannng 2003) s used to parse a these reews to parsng trees, and extract 215,154 phrases. Next, the workers n Amazon Mechanca Turk annotate poarty ees for a these phrases. Most of the shorter phrases are annotated as neutra, and onger phrases tend to be wth stronger poarty. A the sentment scaes are merged to fe categores (ery negate, negate, neutra, poste, ery poste). Experment Settngs We use the standard dataset spts (tran: 8,544, de: 1,101, test: 2,210) n a the experments. For a these modes, we tune the parameters on the de dataset. We use the mnbatch erson AdaGrad n our experments wth the batch sze between 20 and 30. We empoy f = tanh as the nonnearty functon as t s sgnfcanty better than the modes wthout usng nonnearty (Socher et a. 2011). Each word n the ocabuary s assgned wth a ector representaton. To ntaze the parameters, we randomy sampe aues from a unform dstrbuton U (, + ), where s a sma aue. It shoud be noted that the word ectors are regarded as parameters, and w be updated n the tranng process. Eauaton We compare dfferent methods on the Sentment Treebank n ths secton to eauate the effecteness of our methods. SVM. Support Vector Machne (SVM) achees good performance n the sentment cassfcaton task (Pang and Lee 2005). We use the bag-of-words features n our experments. MNB/b-MNB. As ndcated n the work of Wang and Mannng (2012), Mutnoma Naïe Bayes (MNB) often outperforms SVM for sentence-ee sentment cassfcaton. The MNB uses un-gram features, and b-mnb aso uses b-gram features. VecAg. Ths mode (Landauer and Dutnas 1997) aerages chd ectors to obtan the parent ector. It gnores the word order when performng compostons. 1 http://np.stanford.edu/sentment/treebank.htm 2 http://www.rottentomatoes.com 1540

Method Fne-graned Pos./Neg. A Root A Root SVM 4.3 40.7 84. 79.4 MNB 7.2 41.0 82. 81.8 b-mnb 71.0 41.9 82.7 83.1 VecAg 73.3 32.7 85.1 80.1 MV-RNN 78.7 44.4 8.8 82.9 RNN 79.0 43.2 8.1 82.4 Ag-AdaMC-RNN 80.1 43.4 89.1 84.9 Max-AdaMC-RNN 80.3 43.8 91.0 85. Weghted-AdaMC-RNN 80.7 45.4 93. 8.5 AdaMC-RNN 80.8 45.8 93.4 87.1 RNTN 80.7 45.7 87. 85.4 Ag-AdaMC-RNTN 80. 45.7 89.7 8.3 Max-AdaMC-RNTN 80.3 45. 91.3 8. Weghted-AdaMC-RNTN 81.0 4.3 93.8 88.4 AdaMC-RNTN 81.1 4.7 94.1 88.5 Tabe 1: Resuts of eauaton on the Sentment Treebank. The top three methods are n bod and the best s aso underned. Our methods (AdaMC-RNN, AdaMC-RNTN) achee best performances when s set to 2. RNN/RNTN. Recurse Neura Network (Socher et a. 2011) computes the parent ector by weghted near combnaton of the chd ectors dmensons. Recurse Neura Tensor Network (Socher et a. 2013b) empoys tensors to mode ntersectons between dfferent dmenson of chd ectors. We use the same settngs as n the orgna papers. MV-RNN. Ths mode (Socher et a. 2012) assgns a composton matrx for each word. Besdes performng compostons to obtan the ector representatons, the mode uses another goba matrx to combne these composton matrces for phrases. The aboe basene resuts are reported n (Socher et a. 2013b). AdaMC. Compared wth the orgna modes, AdaMC- RNN and AdaMC-RNTN empoy mutpe composton functons, and determne composton functons dependng on the chd ectors n eery step. We use 15 composton functons, and set the sze of word ectors as 25 for AdaMC- RNN and 15 for AdaMC-RNTN n the experments. Ag/Max/Weghted-AdaMC. Speca cases of AdaMC are used n RNN and RNTN. The number of composton functons and dmenson of word ectors are the same as n AdaMC-RNN and AdaMC-RNTN. Tabe 1 shows the eauaton resuts of dfferent modes. SVM and MNB are two effecte basenes for sentment cassfcaton (Wang and Mannng 2012). We notce that VecAg performs better than bag-of-words methods on eauatons for a nodes, whe the performances of root nodes are worse than them. It ndcates that VecAg achees better resuts on short phrases, but t oses some sentment nformaton n the composton process for ong phrases. We aso obtan the concuson that t s more dffcut to get good composton resuts for ong phrases than for short ones. So comparng the eauaton resuts for ong fragments s more meanngfu to us. Compared wth VecAg, the performances of RNN mproe on both short and ong phrases. MV-RNN uses more composton matrces and mproes the accuraces than RNN. Moreoer, the RNTN, whch empoys tensors to mode the ntersectons between dfferent semantc dmensons, achees better resuts than MV-RNN, RNN and bag-of-words modes. Ths ustrates that more powerfu composton functons hep capture compex semantc compostons especay for the onger phrases and the effecteness of recurse neura modes. We then compare our Adapte Mut-Compostonaty (AdaMC) method wth basenes. Frst of a, we appy our approaches n RNN, and there are sgnfcant gans of the eauaton metrcs. Specfcay, the a nodes and root nodes fne-graned accuraces of AdaMC-RNN ncrease by 1.8% and 2.% respectey than RNN whch ony empoys one goba composton functons. For poarty (poste/negate) cassfcaton, the a nodes and root nodes accuraces of our method rse by 7.3% and 4.7% respectey. Moreoer, we fnd that AdaMC-RNN surpasses MV- RNN. The MV-RNN assgns specfc composton matrx for eery word. It s easy to oerft tranng data, because the number of composton matrces s the same as ocabuary sze. MV-RNN needs another goba matrx to compute the composton matrces for ong phrases, whch makes the composton non-specfc. Howeer, AdaMC-RNN empoys a certan number of composton functons and adaptey seects them dependng on the combned ectors. The expermenta resuts ustrate that AdaMC-RNN s a better way to achee specfc compostons than MV-RNN. Furthermore, the fne-graned accuraces of AdaMC-RNN are comparabe wth RNTN, and the poste/negate accuraces are better than RNTN. Notaby, athough AdaMC- RNN empoys fewer parameters than RNTN, t achees better performances wthout empoyng tensors. The resuts ndcate usng mutpe composton functons s another good approach to mproe the semantc composton besdes fndng more powerfu composton functons (such as tensors). We aso appy our method n RNTN, and obtan state-ofthe-art performances. Compared wth RNTN, a nodes and root nodes fne-graned accuraces of AdaMC-RNTN rse by 0.4% and 1.0% respectey. Consderng poste/negate cassfcaton, the a nodes and root nodes accuraces of our method ncrease by.5% and 3.1% respectey than RNTN. Ths demonstrates our method can boost the performances een f we hae used powerfu composton functons. Fnay, we eauate the speca cases ( = 0, 1, and! 1) of AdaMC modes, and they are a of hep to mproe the resuts. The performances of Weghted-AdaMC ( =1) are most smar to AdaMC, and are better than Ag- AdaMC ( =0) and Max-AdaMC (!1) n both RNN and RNTN. To be specfc, the resuts of Max-AdaMC-RNN surpass Ag-AdaMC-RNN. Howeer, the fne-graned accuraces of Ag-AdaMC-RNTN are sghty better than Max- AdaMC-RNTN, and the poste/negate accuraces are just the opposte. We w further expore the dfferences between these speca cases n next secton. Effects of We compare dfferent for AdaMC defned n Equaton (8). Dfferent parameter eads to dfferent composton seecton schemes. When =1, the mode drecty empoys the 1541

(a) Root Fne-graned Accuracy (b) Root Pos/Neg Accuracy Fgure 3: The cure shows the accuracy for root nodes as =0, 2 0, 2 1,...,2 ncreases. AdaMC-RNN and AdaMC- RNTN achee the best resuts at =2 1. reay bad (s n t) (necessary bad) great (Broadway pay) (arty and) jazzy ery bad / ony du / much bad / extremey bad / (a that) bad (s n t) (panfuy bad) / not mean-sprted / not (too sow) / not we-acted / (hae otherwse) (been band) great (cnematc nnoaton) / great subject / great performance / energetc entertanment / great (comedy fmmaker) (Smart and) fun / (ere and) fun / (unque and) entertanng / (gente and) engrossng / (warmth and) humor Tabe 2: We use cosne smarty of composton seecton ectors (Equaton (8)) for phrases to query the nearest compostons. probabtes outputted by the softmax cassfer as weghts to combne the composton resuts. The =0makes the probabtes obey a unform dstrbuton, whe!1resuts n a maxmum probabty seecton agorthm. As demonstrated n Fgure 3, the oera concuson s that the optma tends to be between the settng of Weghted- AdaMC and Max-AdaMC. Both the AdaMC-RNN and AdaMC-RNTN achee the best root fne-graned and poste/negate accuraces at = 2, and they hae a smar trend. Specfcay, the Weghted-AdaMC ( =1) performs better than Ag-AdaMC ( =0) and Max-AdaMC (!1). It ndcates that adapte (roe-senste) compostonaty seecton s usefu to mode the compostons. The Ag-AdaMC does not consder roe-senste nformaton, whe Max-AdaMC does not use mutpe composton functons to get the smoothed resuts. Weghted- AdaMC empoys the probabtes obtaned by softmax cassfer to make trade-offs between them, hence t obtans better performances. Based on ths obseraton, we ntroduce the parameter and empoy the Botzmann dstrbuton n AdaMC to adjust the effects of these two perspectes. Adapte Compostonaty Exampes To anayze the adapte compostonaty seecton, we compute the probabty dstrbuton (as n Equaton (8)) for eery composton,.e., [P (g 1 a, b)...p (g C a, b)] T where a, b are the chd ectors. As demonstrated n Tabe 2, we query some composton exampes and empoy cosne smarty as our smarty metrc. To be specfc, reay bad s a strengthen composton whch makes the sentment poarty stronger, and the most smar compostons are of the same type. Notaby, we notce that the b-gram a that s detected as an ntensfcaton ndcator. The second exampe s a negaton composton. (hae otherwse) (been band) s regarded as a smar composton, whch ustrates the combnaton of hae and otherwse keeps the negaton semantcs of otherwse. The thrd case and the smar compostons are the combnatons of a sentment adjecte word and noun phrase. Ths demonstrates our mode embeds some word-category nformaton n word ectors to seect the composton functons. The ast exampe s two sentment words connected by and. The resuts ustrate our mode recognze ths knd of conjuncton whch jons two non-contrastng tems. From a these cases, we fnd the compostons whch are of smar types are coser, and our method earns to dstngush these dfferent composton types accordng to the supersons of specfc tasks. Concuson and Future Work We propose an Adapte Mut-Compostonaty (AdaMC) method for recurse neura modes to achee better semantc compostons n ths paper. AdaMC uses more than one composton functons and adaptey seects them dependng on the nput ectors. We present a genera framework to mode the composton as a dstrbuton oer the composton functons. We ntegrate AdaMC nto exstng popuar recurse neura modes (such as RNN and RNTN) and conduct experments for sentence-ee sentment anayss 1542

tasks. Expermenta resuts on the Stanford Sentment Treebank show that AdaMC sgnfcanty mproes the basenes wth fewer parameters. We further compare the dstrbuton smartes of composton functons for phrase pars, and the resuts erfy the effecteness of AdaMC on modeng and eeragng the semantc categores of words and phrases n the process of composton. There are seera nterestng drectons for further research studes. For nstance, we can eauate our method n other NLP tasks. Moreoer, externa nformaton (such as part-of-speech tags) can be used as features to seect the composton functons. In addton, we can mx dfferent types of composton functons (such as the near combnaton approach n RNN and the tensor based approach n RNTN) to achee more fexbe choces n the adapte composton methods. Acknowedgments We gratefuy acknowedge hepfu dscussons wth Rchard Socher. Ths research was party supported by the Natona 83 Program of Chna (No. 2012AA011005), the fund of SKLSDE (Grant No. SKLSDE-2013ZX-0), and Research Fund for the Doctora Program of Hgher Educaton of Chna (Grant No. 20111102110019). References Aerts, D., and Czachor, M. 2004. Quantum aspects of semantc anayss and symboc artfca ntegence. Journa of Physcs A: Mathematca and Genera 37(12):L123. Baron, M., and Zampare, R. 2010. Nouns are ectors, adjectes are matrces: Representng adjecte-noun constructons n semantc space. In EMNLP, EMNLP 10, 1183 1193. Cark, S.; Coecke, B.; and Sadrzadeh, M. 2008. A compostona dstrbutona mode of meanng. In Proceedngs of the Second Quantum Interacton Symposum, 133 140. Duch, J.; Hazan, E.; and Snger, Y. 2011. Adapte subgradent methods for onne earnng and stochastc optmzaton. JMLR 12:2121 2159. Georg, H. 2011. Gbbs Measures and Phase Transtons. De Gruyter studes n mathematcs. De Gruyter. Grefenstette, E., and Sadrzadeh, M. 2011. Expermenta support for a categorca compostona dstrbutona mode of meanng. In EMNLP, 1394 1404. Grefenstette, E.; Dnu, G.; Zhang, Y.-Z.; Sadrzadeh, M.; and Baron, M. 2013. Mut-step regresson earnng for compostona dstrbutona semantcs. In Proceedngs of the 10th Internatona Conference on Computatona Semantcs. Hermann, K. M., and Bunsom, P. 2013. The roe of syntax n ector space modes of compostona semantcs. In ACL, 894 904. Jones, M. N., and Mewhort, D. J. 2007. Representng word meanng and order nformaton n a composte hoographc excon. Psychoogca reew 114(1):1. Ken, D., and Mannng, C. D. 2003. Accurate unexcazed parsng. In ACL, 423 430. Landauer, T. K., and Dutnas, S. T. 1997. A souton to pato s probem: The atent semantc anayss theory of acquston, nducton, and representaton of knowedge. Psychoogca reew 211 240. Mtche, J., and Lapata, M. 2008. Vector-based modes of semantc composton. In ACL, 23 244. Mtche, J., and Lapata, M. 2010. Composton n dstrbutona modes of semantcs. Cognte Scence 34(8):1388 1439. Pang, B., and Lee, L. 2005. Seeng stars: expotng cass reatonshps for sentment categorzaton wth respect to ratng scaes. In ACL, 115 124. Pate, T. 1991. Hoographc reduced representatons: Conouton agebra for compostona dstrbuted representatons. In IJCAI, 30 35. Cteseer. Poajnar, T.; Fagarasan, L.; and Cark, S. 2013. Learnng type-dren tensor-based meanng representatons. CoRR abs/1312.5985. Rudoph, S., and Gesbrecht, E. 2010. Compostona matrx-space modes of anguage. In ACL, 907 91. Rumehart, D.; Hnton, G.; and Wams, R. 198. Learnng representatons by back-propagatng errors. Nature 323(088):533 53. Smoensky, P. 1990. Tensor product arabe bndng and the representaton of symboc structures n connectonst systems. Artfca Integence 4(12):159 21. Socher, R.; Ln, C. C.; Ng, A. Y.; and Mannng, C. D. 2011. Parsng Natura Scenes and Natura Language wth Recurse Neura Networks. In ICML. Socher, R.; Hua, B.; Mannng, C. D.; and Ng, A. Y. 2012. Semantc compostonaty through recurse matrx-ector spaces. In EMNLP-CoNLL, 1201 1211. Socher, R.; Bauer, J.; Mannng, C. D.; and Ng, A. Y. 2013a. Parsng Wth Compostona Vector Grammars. In ACL. Socher, R.; Pereygn, A.; Wu, J. Y.; Chuang, J.; Mannng, C. D.; Ng, A. Y.; and Potts, C. 2013b. Recurse Deep Modes for Semantc Compostonaty Oer a Sentment Treebank. In EMNLP, 131 142. Wang, S., and Mannng, C. 2012. Basenes and bgrams: Smpe, good sentment and topc cassfcaton. In ACL, 90 94. Wddows, D. 2008. Semantc ector products: Some nta nestgatons. In Second AAAI Symposum on Quantum Interacton, oume 2, 28th. Cteseer. Yessenana, A., and Carde, C. 2011. Compostona matrx-space modes for sentment anayss. In EMNLP, 172 182. Yu, D.; Deng, L.; and Sede, F. 2013. The deep tensor neura network wth appcatons to arge ocabuary speech recognton. Audo, Speech, and Language Processng, IEEE Transactons on 21(2):388 39. 1543

Adaptive Multi-Compositionality for Recursive Neural Models with Applications to Sentiment Analysis

What is the example of a composton functon?

Who was the State Key Lab of Software Deeopment Enronment?

What event was held at the Twenty - Eghth AAAI?