JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 28, 51-65 (2012) Semantc Lnk Analyss for Fndng Answer Experts * YAO LU 1,2,3, XIAOJUN QUAN 2, JINGSHENG LEI 4, XINGLIANG NI 1,2,3, WENYIN LIU 2,3 AND YINLONG XU 1,3 1 School of Computer Scence and Technology Unversty of Scence and Technology of Chna Hefe, 230026 P.R. Chna 2 Department of Computer Scence Cty Unversty of Hong Kong HKSAR, P.R. Chna 3 Jont Research Lab of Excellence CtyU-USTC Advanced Research Insttute Suzhou, 215123 P.R. Chna 4 School of Computer and Informaton Engneerng Shangha Unversty of Electronc Power Shangha, 200090 P.R. Chna Recommendng unanswered questons to answer experts s an mportant mechansm n User-Interactve Queston Answerng (UIQA) servces and s helpful to reduce asker s watng tme and obtan hgh-qualty answers. In ths paper, we address the task of dentfyng answer experts n UIQA servces wth semantc nformaton extracted from user nteracton behavors. We frst construct the user queston-answer nteracton graph through drect semantc lnks and latent lnks extracted from the records of queston sessons and user profles. After that, two expert-fndng approaches are developed by employng the semantc nformaton n the so-called propagaton lnk analyss method and n the language model, respectvely. Expermental results on Yahoo! Answers dataset show that the extracted semantc nformaton ndeed mproves the performance of both propagaton and language model for the task of answer experts fndng. Keywords: user-nteractve queston answerng, answer expert fndng, semantcs, lnk analyss, language model 1. INTRODUCTION User-Interactve Queston Answerng (UIQA) [1] s now a popular socal network applcaton n the age of Web 2.0. It provdes a platform for people n onlne communtes to seek nformaton and share knowledge through the way of questons and answers. People can ask for ads by postng questons and help others by answerng ther questons. The content of queston and answer n the UIQA communtes provdes a good choce for users to acqure nformaton they need n the form of answers rather than lsts of documents from search engne. However, n exstng UIQA servces, users need to passvely wat for other users to access the portal webste, read questons and provde answers. It may take several hours or even a few days before askers recevng any answer. On the other hand, dfferent answer replers have dfferent expertse levels, some users who provde answers may just want to earn ncentves and they may not be experts on the ques- Receved February 14, 2011; revsed August 9, 2011; accepted August 31, 2011. Communcated by Irwn Kng. * The work was supported by the Natural Scence Foundaton of Chna (No. 60863001, No. 61073038, No. 61073189) and Innovaton Program of Shangha Muncpal Educaton Commsson (No. 10ZZ115). 51
52 YAO LU, XIAOJUN QUAN, JINGSHENG LEI, XINGLIANG NI, WENYIN LIU AND YINLONG XU tons they nvolved. Therefore, automatcally recommendng the unsolved questons to approprate answer experts can be an mportant mechansm n the UIQA servces to reduce asker s watng tme and mprove answer qualty. As the ncreasng spread of Q&A communty, the task of fndng answer experts has become one of the most mportant ssues n UIQA servces. There already have been varous measurements for dentfyng experts n forum systems or UIQA servces n exstng works. Lu et al. [2] defne answer expert as a person who has prevously answered smlar questons wth the new one n the system. Accordng to ths defnton, the problem of expert fndng can be casted nto an nformaton retreval (IR) problem. However, the queston-answerng relatonshps between users, whch are helpful to mprove the performance of expert fndng, are gnored. Therefore, some researchers attempt to utlze the user ask-answer nteracton nto the task of expert fndng. Jurczyk and Agchten [3, 4] employ the lnk structure-based algorthms, such as PageRank [5] and HITS [6], nto the task of rankng answer experts. Each user wll obtan a value whle hgher value corresponds to more expertse than the lower one. However, the relatonshp between users s straghtforwardly queston-answerng lnks n tradtonal lnk analyss whch treats each relatonshp wth equal weght. Actually, a hgh qualty or a spam answer may leads to dfferent weght n the user queston-answer nteracton graph model. Intutvely, user provdes a hgh qualty answer wll be more expertse than the ones provdng spam answers. In ths paper, we propose to fnd answer experts for each category based on mnng the semantc nformaton n UIQA. A user queston-answer nteracton graph wll be frstly constructed based on user nteracton behavours n UIQA. Dfferent sources of semantc nformaton are extracted from the user nteracton behavours and answer contents n queston sessons. After the constructon of the graph, a lnk analyss approach called propagaton [7] s performed to generate the frst expert-fndng approach. Meanwhle, another approach, semantc language model, s proposed by ncorporatng the extracted semantc nformaton nto the tradtonal language model. Expermental results on Yahoo! Answers collecton demonstrate the effectveness of the two expert-fndng approaches and also evdence the usefulness of the extracted semantc nformaton. The remander of the paper s organzed as follows. In secton 2, we brefly revew the related work on expert fndng. In secton 3, the notatons, the constructon of user profles, and the formal defnton of user queston-answer nteracton graph wll be frstly gven. After that, detals of the expert fndng algorthms are ntroduced. We present the evaluatons of the two methods and dscussons n secton 4. Fnally, we draw the concluson and dscuss the future work n secton 5. 2. RELATED WORKS The task of expert fndng s frst ntroduced n TREC 2005 [8]. It ams to fnd the most sutable persons wth the approprate sklls and knowledge. One soluton of prevous works for expert fndng manly reles on the language models (LM) and treats the problem as an nformaton retreval (IR) task. Apart from ths, lnk-based analyss method has also been wdely used. In ths secton, we wll brefly revew the related work from the followng three perspectves: expert fndng based on language model (LM), expert fndng based on lnk analyss and the hybrd of the two approaches.
SEMANTIC LINK ANALYSIS FOR FINDING ANSWER EXPERTS 53 2.1 Language Model-based Expert Fndng Cao et al. [9] propose a two-stage language model for expert fndng at the enterprse track of TREC 2005. The language model conssts of two parts, a co-occurrence model and a relevance model, both of whch are based on statstcal language modellng. Lu et al. [2] defne an expert n UIQA servces as a person who has prevously answered smlar questons to a gven one. Person s expertse s characterzed by user profle whch s derved from the prevously answered questons. The experment s mplemented wth three dfferent language models. Dfferent ways of buldng user profles are mplemented. From the expermental results, they draw the concluson that the user profles bult based on all prevously answered questons produces the best performance. Moreover, the result also reveals that the three language models acheve equvalent performance n the expert fndng task. Zhang et al. [10] propose a mxture model for expert fndng whch s bult based on Probablstc Latent Semantc Analyss (PLSA). The proposed mxture model consders the latent topcs between terms and documents, contans more semantc nformaton. Ther experments ndcate that the proposed mxture model wth semantcs can acheve better performance than the conventonal language model. 2.2 Lnk Analyss-based Expert Fndng Jurczyk and Agchten [3, 4] adopt HITS for estmatng user authorty n the UIQA servce. In ther methods, the behavours of askng and answerng between users can generate a user queston-answer relaton graph, n whch each node represents a user and each edge represents the queston-answerng relatonshp between users. For each user, both the hub and authorty value are calculated, where a hgh hub value means a good authorty, otherwse a low authorty. The experment on Yahoo! Answers dataset shows that the lnk analyss s promsng for estmatng the authorty of users n UIQA servces. Zhang et al. [7] propose a propagaton-based approach for fndng expert n a socal network. In ther method, both personal nformaton and relatonshps among persons are taken nto consderaton for fndng expert n a socal network. They frstly use the personal nformaton to calculate an ntal expert score for each person. Next, accordng to the relatonshps among persons, the propagaton approach runs teratvely to get a more accuracy result. 2.3 Hybrd Approaches Zhou et al. [11] propose a framework for routng a gven queston to the top-k potental experts n a forum effectvely. In ther proposed method, they frst calculate the expertse of users accordng to the content nformaton. After that, they re-rank the user by utlzng the structural relatons among users n the forum system. Fnally, they ntegrate the results from the above two steps n a probablstc model and get a fnal rankng score for each user. Experment on real forum data reveals that the hybrd approach s effectve on dentfyng and rankng answer experts. Kao et al. [12] also propose a hybrd approach to fnd answer experts n specfc category n queston answerng webstes. Ther method takes user subject relevance, user reputaton and authorty nto consderaton. The subject relevance ndcates the relevance between user s doman knowledge
54 YAO LU, XIAOJUN QUAN, JINGSHENG LEI, XINGLIANG NI, WENYIN LIU AND YINLONG XU wth the target queston. User reputaton s calculated by the rato of best answer user provdes and user authorty s derved by applyng lnk analyss n the asker-answerer network. Ther expermental results demonstrate that the novel hybrd approach performs better than other conventonal methods. 3. ANSWER EXPERTS FINDING In ths secton, we frst descrbe the way of constructng user profles from hstorcal questons asked and answered by user n the Q&A communty. Afterwards, we gve a formal defnton of user queston-answer nteracton graph, and then construct the graph based on the queston sessons and user profles. In the graph constructon procedure, two knds of semantc nformaton are consdered. One s the drect relaton lnk whch nvolves semantc nformaton extracted from queston sessons. The other one s the latent lnk dscovered from user profles. After constructng the graph, a lnk analyss method called propagaton [7] wll be employed to rank the experts n a descendng order. Users who earn a hgher value wll be regarded as wth hgher expertse level, and top ones wll be chosen as answer experts. Furthermore, we also gve detals of the semantc language model based on the extracted semantc nformaton. We frst lst the notatons used n ths paper n Table 1. 3.1 User Profles Constructon User profles, whch reflect user s expertse background, are acqured from the queston sesson user nvolved. How to construct user profle s a crucal procedure n expert fndng. In UIQA servce, a user can post a queston wth a queston subject and addtonal queston detals (optonal) to start a queston sesson. Other users browse the questons from categores they are nterested n and answer them. The queston sesson wll be closed when the queston s resolved or the lfetme of queston sesson s termnated. From the descrpton, we can see that a whole queston sesson contans nformaton of (1) who post ths queston; (2) the queston subject, queston detals and category t belongs to; (3) answerers and answers they provded; (4) the best answer chosen for ths queston by the asker. We buld the profles of user u for each category c k wth queston texts only (nclude the queston subject and queston detal nformaton) from user prevously asked and answered questons, denoted as UP k. Each user profle UP k contans two aspects: (1) k all the questons posted by user u n the category c k, denoted as uq ; (2) all the quesk tons answered by user u n the category c k, denoted as ua. The two parts of user prok k fles uq and ua, whch have been pre-processed wth stop words removal and word stemmng, are represented usng Vector Space Model (VSM) as (w 1, w 2,, w,, w m ), where w s the weght for term t T n user asked questons profle or user answered questons profle; T s the vocabulary of all the terms exstng n all the queston and answer text n the system. The weght w s measured by the TFIDF weghtng scheme.
SEMANTIC LINK ANALYSIS FOR FINDING ANSWER EXPERTS 55 Table 1. Notatons and ther descrptons used n our approach. k User profles of user u UP n category Ungram language model for queston qd. M c k. qd k All the questons posted by user Ungram language model for th uq M u n the category c k. a answer a. k All the questons answered by ua R user u n the category c k. BA Asker ratng for the best answer. G User queston-answer nteracton Votng score evaluaton for th answer. V dgraph. a U All users n UIQA system. N pos Number of postve votes receved. R All relatonshps between users n Number of negatve votes receved. N UIQA system. neg r j Relatonshp from user u to u j. N all Number of total votes receved. sr j Semantc relaton lnk between Parameter to determne the weght λ asker u j and answerer u. of best answer qualty (set as 0.6). qd Unque dentfer for each queston q. lnks. Threshold for dstngush latent θ Q All the questons user answered n qd qd The dffcult level of queston qd. U c category c. ar Relevance score of th answer. w Weght of term t. Threshold for dstngush answer aq Qualty score of th answer. δ experts. A qd Answer set to the queston qd. S(u ) p Expert score of user u n the p teraton. T qd Date tme of queston qd posted. ε Threshold for the termnal condton. Date tme of th answer beng repled. u vded by user u to the queston q. q Semantc weght of the answer pro- T a sw Average tme of the queston beng answered (measured by sec- T avg onds). 3.2 User Queston-Answer Interacton Graph In UIQA servces, user behavours are reflected n dfferent aspects. Generally speakng, a user plays three roles n the UIQA system: asker, answer and evaluator. Users wth dfferent roles n the same queston sesson wll have nteracton among each other. These nteracton relatons are utlzed to construct the user queston-answer nteracton graph. The graph s defned as a drected graph G = (U, R), where u U represents a user and r j R represents the relatonshp from users u to u j. In tradtonal lnk analyss methods, the nteracton relatonshp s only a smple lnk wthout takng nto account any semantc nformaton. Takng all dfferent context and nteracton nformaton nto consderaton, we extract several knds of semantc nformaton for the lnk analyss algorthm. In the followng two sub-sectons, we wll elaborate the constructon of user queston-answer nteracton graph from two aspects: drect relaton lnks and latent relaton lnks.
56 YAO LU, XIAOJUN QUAN, JINGSHENG LEI, XINGLIANG NI, WENYIN LIU AND YINLONG XU 3.3 Drect Relaton Lnks The drect user relatonshp lnks can be obtaned drectly from the UIQA system based on the nformaton of each queston sesson. For example, f user u s the drect answerer of a queston asked by u j, there wll be an edge from u to u j. However, ths knd of lnk s purely queston-answer relatonshp wthout semantc nformaton. Therefore, we propose a new semantc lnk to replace the pure queston-answer lnk. In a queston sesson qd, the semantc relaton lnk sr j between asker u j and answerer u s defned as a four-tuple structure sr j = (qd, qd qd, ar, aq ), where qd s the unque dentfer for each queston, qd qd ndcates the dffcult level of queston qd, ar and aq represent the relevance and qualty of the th answer repled by user u respectvely. These knds of semantc nformaton cannot be obtaned drectly; nstead, they can be extracted from context and other nteracton nformaton from the queston sesson. (1) Queston Dffculty Levels The dffculty for each queston vares greatly. The easy one can be answered by many users and the hard one may get few answers. Specfcally, we consder the answer tme nterval on measurng the queston dffculty. The average consumng tme of all the answers to the queston wll be calculated. The longer tme cost, the more dffcult the queston s. The dffculty level qd qd of queston qd can be calculated accordng to the followng formula: qd T avg qd 1 T = + (1 e avg ), 1 + A = τ a A qd ad ( T T ) a A qd qd, (1) (2) where A qd s the answer set correspondng to queston qd; T qd s the date tme of queston qd beng posted and T a s the date tme of th answer a beng submtted; T avg s the average tme n terms of second of the queston beng answered; τ s the tuneable parameter whch s set as 1/3600 to avod droppng too fast. (2) Answer Relevance Answer relevance reflects the degree of correlaton between a queston and ts answers. Among the varous answers, some ones mght have low relevance or even beng spam answers. Therefore, answer relevance s an mportant ndcator of answer qualty and answer provder s expertse level. Here we use the KL-dvergence language model [13] to calculate the relevance score between queston qd and ts th answer a : KL( M a M qd) ar = Relevance( a, qd) = e, (3) pw ( Ma ) KL( M a M ) ( ) log, qd = p w M a (4) pw ( M ) w where M qd and M a s the ungram language model for queston qd and answer a respec- q
SEMANTIC LINK ANALYSIS FOR FINDING ANSWER EXPERTS 57 tvely; KL(M a M qd ) represents the KL-dvergence between M qd and M a ; w s the words exstng both n the queston and answers. The hgher KL-dvergence score obtans, the lower relevance score calculated; and vce versa. (3) Answer Qualty Answer qualty s another ndcator of answerers expertse level. Generally, the best answer chosen by asker should have a hgher qualty than others. On the other hand, other users usually take the role as evaluator for votng answers. Hence, the votng nformaton for answers also reflects answer qualty. We adopt Eq. (5) to calculate the answer qualty aq for the th answer a of queston qd. RBA + 2 λ + V 5 aq = 1 λ + Va Aqd 1 a If a s best answer, Otherwse (5) V a 1 Npos Nneg = +, (6) A N qd all where R BA s the asker ratng for the best answer; λ s a parameter to determne the weght of the best answer, whch s set as 0.6 n ths paper to make sure that the best answer gets the maxmum value among the answers of the queston; V a s the votng score for a whch s derved by Eq. (6), n whch N pos, N neg and N all represent the number of postve, negatve and total votes that answer a receved. If there s no votng for the answer, the second part of addton n the Eq. (6) wll be 0. After extractng all the useful semantc nformaton from all queston sessons n one category, we can buld the user nteracton semantc lnks n the user queston-answer nteracton graph. 3.4 Dscoverng Latent Relaton Lnks Up to present, the dscussed relatonshp lnk between users s only a drect queston-answer relatonshp. However, snce all the users answer questons randomly n the Q&A communty, consderng only the drect queston-answer relatonshp may not reflect the real relatonshp between users. For example, f another user C who has answered smlar questons wth user B s or C can answer B s queston but ths relatonshp cannot be explctly dscovered, there wll be no lnk between user B and C. Snce latent lnks cannot be obtaned drectly from the queston sessons as drect lnk analyss does, we propose to dscover those latent lnks from user profles such as assocaton relaton between users [14]. Latent lnk analyss [15] s used to fnd the latent relatonshp lnks among users who have no drect lnks. The basc dea for fndng the latent lnk from two users, u and u j (.e., whether the user u s the latent answerer of user u j ), s to measure the smlarty k k between user u s answered queston profles ua and u j s asked queston profles uq j n category c k. The smlarty s denoted as Latent_Relaton(u, u j c k ) and can be calculated
58 YAO LU, XIAOJUN QUAN, JINGSHENG LEI, XINGLIANG NI, WENYIN LIU AND YINLONG XU usng the cosne measure as follows, k k ua uq k k j Latent_Relaton ( u, u j ck) = Sm( ua, uq j) =, (7) k k ua uq j k k where ua and uq j are vectors of user profles as ntroduced n secton 3.1. If the latent relaton lnk score between u and u j s hgher than threshold θ, we wll consder there s a latent queston-answer relatonshp lnk from u to u j. Then, we wll add an edge from u to u j n the user queston-answer nteracton graph. Snce there are a great number of users n each category, t s tme consumng to dscover all the latent relatonshp between each other. In ths paper, we just recognze the latent answer lnks from canddate experts to all users. Those canddate experts are estmated by a co-occurrence language model. Each canddate s expertse score s estmated by the probablty of a user u beng an expert for a gven category c,.e., P(u c). It could Q be calculated by the sum of probablty of user u beng an expert to all the questons U c he answered n category c as the followng Eq. (8) descrbes. Pu ( c) = Pu ( q) (8) Q q u c Based on the co-occurrence model proposed n [9], P(u q) can be calculated as Eq. (9). (9) P( u q) = P( u, t q) = P( t q) P( u t, q), t T t T where T represents the vocabulary of all the terms. P(t q), whch ndcates the relevance between t and q, can be calculated by Eq. (10). After that, Eq. (9) can be smplfed as Eq. (11). 0 f t q Pt ( q) =, 1 f t q P( u q) = P( u t, q) = P( u t). t q t q (10) (11) From the above, we estmate the user expert score as the followng Eq. (12). (12) Pu ( c) = Pu ( q) = Pu ( t) Q q Uc q Uc t q Q Based on the Bayes Rule, we can calculate the condtonal probablty of user u under term t for each queston q n category c as Eq. (13) shows, where P(u, t) represents the co-occurrence probablty of the user u and term t; P(t) ndcates the probablty of term t occurrng n all user profles of category c. Pu ( t) Pu (, t) = (13) Pt () Afterwards, on the bass of ntal expert scores obtaned, we rank all the users n a descendng order and choose top δ ones as answer expert canddates accordng to the con-
SEMANTIC LINK ANALYSIS FOR FINDING ANSWER EXPERTS 59 cluson of Bouguessa et al. [16]. The parameter settng wll be dscussed specfcally n secton 4.2. After dentfyng the latent lnks, we can obtan a new user queston-answer nteracton graph wth both drect semantc lnks and latent lnks. We present an example of the graph n Fg. 1. Fg. 1. Example of user queston-answer nteracton graph wth both drect semantc lnks and latent lnks. 3.5 Semantc Propagaton for Fndng Answer Experts After generatng the whole user queston-answer nteracton graph wth drect semantc lnks and latent lnks, we employ a propagaton-based algorthm [7] to rank answer experts for each category. The basc dea underlyng the propagaton method s that a user wll have hgher expertse level f he answers lots of experts questons. From the generated user queston-answer nteracton graph for each category c k, we use dfferent weghts to represent the mportance of dfferent knds of relatonshps. The weght for each lnk ndcates how well the expert score of a user propagates to ts neghbours and back. At the begnnng, the expert score for each user n the graph s set as 1. The propagaton process runs n teratons. In each teraton, the expert score of each user wll be calculated based on the expert score of hm and hs neghbours n last teraton. After that, each expert score wll be normalzed by dvdng the maxmal expert score of current teraton. The expert score S(u ) p+1 of user u n the p + 1 teraton phase s computed from S(u ) p n p teraton phase as follows, Su ( ) Su ( ) w(( u, u), e) Su ( ) (14) p+ 1 p p = + j j uj U e rj where w((u, u j ), e) represents the propagaton coeffcent and e r j s one knd of relatonshp from user u to u j ; U stands for all neghbourng nodes beng answered by u n the graph; r j stands for two relatonshps from the user u to u j,.e., drect semantc queston-answer relaton lnk and latent queston-answer relaton lnk. Therefore, the propagaton coeffcent s the weght of edge n the user queston-answer nteracton graph whch s calculated as Eq. (15) shows.
60 YAO LU, XIAOJUN QUAN, JINGSHENG LEI, XINGLIANG NI, WENYIN LIU AND YINLONG XU 2 2 qdqd ( ar + aq )/2 If e s drect semantc lnk w(( u, uj), e) = (15) Latent_Relaton ( u, u c ) If e s latent lnk j k The propagaton wll stop when the maxmal change of the expert score s below a threshold ε (whch we set here as 0.001). Base on the propagaton theory ntroduced n [17], each expert score wll converge to a constant value. After the propagaton, new expert scores for users n each category wll be obtaned. Sortng the score n a descendng order, we can obtan the new experts rankng n each category. 3.6 Semantc Language Model The language model (LM) based approach of fndng answer experts measures user s expertse level manly based on the term occurrence between user profles and ther answered questons. To the best of our knowledge, semantc nformaton extracted from queston sesson s never used n the LM-based approaches before. Therefore, we propose the semantc language model (SLM) whch ncorporates the proposed semantc nformaton nto the tradtonal language model. Specfcally, SLM estmates the probablty of P(u q) n Eq. (8) through the semantc weght mentoned n Eq. (15). In SLM, P(u q) s calculated accordng to Eqs. (16) and (17). q Pu ( c) = swu Pu ( q), (16) q u Q q Uc 2 2 qd sw = qd ( ar + aq )/2, (17) q where sw s the semantc weght of the answer provded by user u to queston q. u 4. EXPERIMENTS AND EVALUATION To evaluate the performance of the proposed methods, fve dfferent evaluaton crtera are ntroduced n secton 4.1. After that, we choose the best parameter settng n secton 4.2. Fnally, we compare our proposed methods wth the baselne methods and dscuss the results n secton 4.3. 4.1 Evaluaton Crtera In our experment, we evaluate the effectveness of the proposed answer experts fndng method on Yahoo! Answers dataset provded by Lu et al. [18]. Experment conducted on the whole dataset ams to fnd experts for each category. For evaluaton, we obtan an expert rank lst for each category as the ground truth based on the scorng rules n Yahoo! Answers portal. In ths expert rank lst, the hgher score a user obtans, the more expertse he has. We choose the top δ ones as the experts n ths category. The evaluaton s conducted based on the followng fve evaluaton metrcs used for the expert fndng task n the TREC Enterprse Track: Mean Average Precson (MAP), Mean Recprocal Rank (MRR), Precson@N (P@N), R-Precson and bpref [19].
SEMANTIC LINK ANALYSIS FOR FINDING ANSWER EXPERTS 61 4.2 Parameters Settng In ths secton, we dscuss how to set the parameters of n the proposed method. The cutoff value for dentfyng answer expert canddates s an mportant parameter n the frst step of dscoverng latent lnks. Takng the category Books&Authors as an example, the statstc nformaton of best answer numbers users receved s shown n Fg. 2. As observed n the fgure, most of users receve only 1 to 5 best answers. Accordng to the concluson n [16], authortatve answers occupy only about 0.6%-0.7% of total users n each category. Therefore, n the determnaton of expert canddates and expert dentfcaton n the fnal expert rankng, we choose the parameterδ as 1%. Another mportant parameter needs to be dscussed s the threshold for dstngushng latent lnks θ. In the process of dscoverng latent relatonshp lnks, lower weght lnks wll have a weak mpact n the lnk analyss. Hence, choosng an approprate threshold s crtcal n the latent lnk dscoverng procedure. For ths evaluaton, we test the value for θ from 0 to 0.5. As shown n Fg. 3, t s easy to fnd that when θ s set to 0.01, the task of expert fndng obtans the best performance. In addton, as the value of θ ncreases, the dscovered latent lnks wll be less, and the performance goes down, whch ndcatng that latent relatonshp lnks s effectve n the expert fndng task. The chosen value for θ also demonstrates that not all the latent lnks are helpful for the expert fndng task. Some of them may ntroduce nose. Therefore, an approprate threshold for dstngushng latent lnks s mportant for answer experts fndng. Fg. 2. Hstogram of the statstc of user best answer count n category of Books & Authors. Fg. 3. Performance evaluaton of dfferent values of parameter θ. 4.3 Evaluaton of Semantc Propagaton and Semantc Language Model We compare our proposed approach whch consderng dfferent knds of nteracton nformaton between users wth the tradtonal lnk analyss method (baselne) n the task of expert fndng. In tradtonal lnk analyss method (TL), we just consder the drect askanswer relaton between users, n whch the weght for each lnk s equal. Dfferent from tradtonal drect lnk, the drect semantc lnk method (DSL) consders the semantc nformaton extracted from user nteracton nto the weght measurement for each lnk. Latent lnk analyss (LL), as a specal semantc lnk, reflects potental lnks between users extracted from ther answer contents. The overall expermental results are shown n Fg. 4, from whch we can see that
62 YAO LU, XIAOJUN QUAN, JINGSHENG LEI, XINGLIANG NI, WENYIN LIU AND YINLONG XU consderng the semantc nformaton extracted from the user nteracton s effectve to mprove the precson of answer experts fndng. Compared wth the tradtonal lnk analyss method, the drect semantc lnk method mproves the precson of answer expert fndng task. After ncorporatng the latent lnk nto the drect semantc method, the precson of answer expert dentfcaton s mproved. From the results and analyss, we fnd that the expertse level of a user depends on the queston dffculty he/she has answered and the relevance and qualty of answer he/she has provded. Therefore, we can fnd that the most desrable expert s the person who answers a lot of dffcult questons and provde many hgh-qualty answers. In addton, takng the latent lnk relaton nto account can further mprove the accuracy. To evaluate the effectveness of the extracted the semantc language model (SLM), we choose the LM-based approach n the step of generatng answer expert canddates n secton 3.5 as the baselne method and compare t wth SLM. The expermental result s shown n Fg. 5. From the fgure, we can see that the LM-based approach for expert fndng s also mproved after ncorporatng the semantc nformaton. Snce the semantc nformaton s obtaned from queston sessons and user profles n UIQA, t can be regarded as mportant background knowledge. Therefore, t s reasonable to wtness the mprovement of tradtonal methods for answer experts fndng task f such background knowledge s ncluded. Fg. 4. Performance evaluaton of comparng dfferent rankng methods. Fg. 5. Comparson of semantc language model and the tradtonal language model approaches for answer expert fndng task. 5. CONCLUSION AND FUTURE WORK In ths paper, we ntroduce two approaches, semantc propagaton and semantc language model, for the answer experts fndng task whch combne dfferent knds of semantc nformaton extracted from user nteracton n UIQA system. Expermental results on Yahoo! Answers collecton demonstrate the effectveness of the two expert-fndng approaches and also evdence the usefulness of the extracted semantc nformaton. Snce the semantc nformaton mentoned n ths paper s straghtforwardly calculated from the queston sessons, dfferent aspects of the semantc nformaton can be further studed n the future.
SEMANTIC LINK ANALYSIS FOR FINDING ANSWER EXPERTS 63 REFERENCES 1. W. Y. Lu, T. Y. Hao, W. Chen, and M. Feng, A web-based platform for user-nteractve queston-answerng, World Wde Web: Internet and Web Informaton Systems, Vol. 12, 2009, pp. 107-124. 2. X. Y. Lu, W. B. Croft, and M. Koll, Fndng experts n communty-based queston answerng servces, n Proceedngs of ACM 14th Conference on Informaton and Knowledge Management, 2005, pp. 315-316. 3. P. Jurczyk and E. Agchten, Hts on queston answer portals: Exploraton of lnk analyss for author rankng, n Proceedngs of the 30th Annual Internatonal ACM SIGIR Conference, 2007, pp. 845-846. 4. P. Jurczyk and E. Agchten, Dscoverng authortes n queston answer communtes by usng lnk analyss, n Proceedngs of ACM 17th Conference on Informaton and Knowledge Management, 2007, pp. 919-922. 5. L. Page, S. Brn, R. Motwan, and T. Wnograd, The PageRank ctaton rankng: brngng order to the web, Stanford Dgtal Lbrary, workng paper SIDL-WP-1999-0120, 1999. 6. J. M. Klenberg, Authortatve sources n a hyperlnked Envronment, n Proceedngs of the 9th Annual ACM-SIAM Symposum on Dscrete Algorthms, 1998, pp. 668-677. 7. J. Zhang, J. Tang, and J. Z. L, Expert fndng n a socal network, n Proceedngs of the 12th Internatonal Conference on Database Systems for Advanced Applcaton, 2007, pp. 1066-1069. 8. N. Craswell, A. P. de Vres, and I. Soboroff, Overvew of the TREC-2005 enterprse track, n Proceedngs of the 14th Text REtreval Conference, NIST Specal Publcaton: SP 500-266, 2005. 9. Y. B. Cao, J. J. Lu, S. H. Bao, and H. L, Research on expert search at enterprse track of TREC 2005, n Proceedngs of the 14th Text REtreval Conference, NIST Specal Publcaton: SP 500-266, 2005. 10. J. Zhang, J. Tang, L. Lu, and J. Z. L, A mxture model for expert fndng, n Proceedngs of the 12th Pacfc-Asa Conference on Knowledge Dscovery and Data Mnng, 2008, pp. 466-478. 11. Y. H. Zhou, G. Cong, B. Cu, C. S. Jensen, and J. J. Yao, Routng questons to the rght users n onlne communtes, n Proceedngs of the 25th Internatonal Conference on Data Engneerng, 2009, pp. 700-711. 12. W. C. Kao, D. R. Lu, and S. W. Wang, Expert fndng n queston-answerng webstes: A novel hybrd approach, n Proceedngs of the 25th ACM Symposum on Appled Computng Conference, 2010, pp. 867-871. 13. B. X. Wang, X. L. Wang, C. J. Sun, B. Q. Lu, and L. Sun, Modellng semantc relevance for queston-answer pars n web socal communtes, n Proceedngs of the 48th Annual Meetng of the Assocaton for Computatonal Lngustcs, 2010, pp. 1230-1238. 14. X. F. Luo, Z. Xu, J. Yu, and X. Chen, Buldng assocaton lnk network for semantc lnk on web resources, IEEE Transactons on Automaton Scence and Engneerng, Vol. 8, 2011, pp. 482-494. 15. Y. Lu, X. J. Quan, X. L. N, W. Y. Lu, and Y. L. Xu, Latent lnk analyss for expert
64 YAO LU, XIAOJUN QUAN, JINGSHENG LEI, XINGLIANG NI, WENYIN LIU AND YINLONG XU fndng n user-nteractve queston answerng servces, n Proceedngs of the 5th Internatonal Conference on Semantc, Knowledge and Grd, 2009, pp. 54-59. 16. M. Bouguessa, B. Dumouln, and S. R. Wang, Identfyng authortatve actors n queston-answerng forums The case of Yahoo! Answers, n Proceedngs of the 14th ACM SIGKDD Internatonal Conference on Knowledge Dscovery and Data Mnng, 2008, pp. 866-874. 17. P. F. Felzenszwalb and D. P. Huttenlocher, Effcent belef propagaton for early vson, Internatonal Journal of Computer Vson, Vol. 70, 2006, pp. 41-54. 18. Y. Lu, J. Ban, and E. Agchten, Predcton nformaton seeker satsfacton n communty queston answerng, n Proceedngs of the 31st Annual Internatonal ACM SIGIR Conference, 2008, pp. 483-490. 19. C. Buckley and E. M. Voorhees, Retreval evaluaton wth ncomplete nformaton, n Proceedngs of the 27th Annual Internatonal ACM SIGIR Conference, 2004, pp. 25-32. Yao Lu ( ) s currently a Ph.D. student n the School of Computer Scence and Technology, Unversty of Scence and Technology of Chna. He also jons n the collaborated Ph.D. educaton scheme of the Cty Unversty of Hong Kong n 2008. He receved hs B.S. degree from the Anhu Agrculture Unversty n 2007. Hs research nterests nclude nformaton retreval, machne learnng, data mnng and queston answerng. Xaojun Quan ( ) s currently a Ph.D. student n department of Computer Scence, Cty Unversty of Hong Kong. He receved the B.E. degree n Computer Scence from the Chang an Unversty n 2005 and the M.E. degree n Computer Scence from Unversty of Scence and Technology of Chna n 2008. Hs research nterests nclude data mnng, nformaton retreval, queston answerng and ant-phshng. Jngsheng Le ( ) s currently a professor wth the School of Computer and Informaton Engneerng, Shangha Unversty of Electronc Power. Hs research nterests nclude web nformaton retreval, machne learnng, data mnng, and cloud computng. He receved hs B.S. n Mathematcs from Shanx Normal Unversty n 1987, and M.S. and Ph.D. n Computer Scence from Xnjang Unversty n 2000 and 2003 respectvely. Currently, he s leadng a group of research students dong research on Cloud computng.
SEMANTIC LINK ANALYSIS FOR FINDING ANSWER EXPERTS 65 Xnglang N ( ) s a Ph.D. student n the School of Computer Scence and Technology at Unversty of Scence and Technology of Chna. He also jons n the collaborated Ph.D. educaton scheme of the Cty Unversty of Hong Kong. He receved hs B.S. degree from the Hefe Unversty of Technology n 2006. Hs research nterests nclude nformaton retreval, machne learnng and natural language processng. Wenyn Lu ( ) s an assstant professor n the computer scence department at the Cty Unversty of Hong Kong. Before that, he was a full tme researcher at Mcrosoft Research Chna/ Asa. Hs research nterests nclude queston answerng, ant-phshng, graphcs recognton, and performance evaluaton. He has a B.Eng. and M.Eng. n computer scence from Tsnghua Unversty, Bejng and a DSc from the Technon, Israel Insttute of Technology, Hafa. In 2003, he was awarded the Internatonal Conference on Document Analyss and Recognton Outstandng Young Researcher Award by the Internatonal Assocaton for Pattern Recognton (IAPR). He had been TC10 char of IAPR for 2006-2010 and has been a guest professor of Unversty of Scence and Technology of Chna (USTC) snce 2005. He s a Fellow of IAPR and a senor member of IEEE. Ynlong Xu ( ) receved hs B.S. n Mathematcs from Pekng Unversty n 1983, and M.S. and Ph.D n Computer Scence from Unversty of Scence and Technology of Chna (USTC) n 1989 and 2004 respectvely. He s currently a professor wth the School of Computer Scence and Technology at USTC. Pror to that, he served the Department of Computer Scence and Technology at USTC as an assstant professor, a lecturer, and an assocate professor. Currently, he s leadng a group of research students n dong some networkng and hgh performance computng research. Hs research nterests nclude network codng, wreless network, combnatoral optmzaton, desgn and analyss of parallel algorthm, parallel programmng tools, etc. He receved the Excellent Ph.D. Advsor Award of Chnese Academy of Scences n 2006.