Dscusses and Recommendaton Systems

Size: px
Start display at page:

Download "Dscusses and Recommendaton Systems"

Transcription

1 10 Content-based Recommendaton Systems Mchael J. Pazzan 1 and Danel Bllsus 2 1 Rutgers Unversty, ASBIII, 3 Rutgers Plaza New Brunswck, NJ [email protected] 2 FX Palo Alto Laboratory, Inc., 3400 Hllvew Ave, Bldg. 4 Palo Alto, CA [email protected] Abstract. Ths chapter dscusses content-based recommendaton systems,.e., systems that recommend an tem to a user based upon a descrpton of the tem and a profle of the user s nterests. Content-based recommendaton systems may be used n a varety of domans rangng from recommendng web pages, news artcles, restaurants, televson programs, and tems for sale. Although the detals of varous systems dffer, content-based recommendaton systems share n common a means for descrbng the tems that may be recommended, a means for creatng a profle of the user that descrbes the types of tems the user lkes, and a means of comparng tems to the user profle to determne what to recommend. The profle s often created and updated automatcally n response to feedback on the desrablty of tems that have been presented to the user Introducton A common scenaro for modern recommendaton systems s a Web applcaton wth whch a user nteracts. Typcally, a system presents a summary lst of tems to a user, and the user selects among the tems to receve more detals on an tem or to nteract wth the tem n some way. For example, onlne news stes present web pages wth headlnes (and occasonally story summares) and allow the user to select a headlne to read a story. E-commerce stes often present a page wth a lst of ndvdual products and then allow the user to see more detals about a selected product and purchase the product. Although the web server transmts HTML and the user sees a web page, the web server typcally has a database of tems and dynamcally constructs web pages wth a lst of tems. Because there are often many more tems avalable n a database than would easly ft on a web page, t s necessary to select a subset of tems to dsplay to the user or to determne an order n whch to dsplay the tems.

2 Content-based recommendaton systems analyze tem descrptons to dentfy tems that are of partcular nterest to the user. Because the detals of recommendaton systems dffer based on the representaton of tems, ths chapter frst dscusses alternatve tem representatons. Next, recommendaton algorthms suted for each representaton are dscussed. The chapter concludes wth a dscusson of varants of the approaches, the strengths and weaknesses of content-based recommendaton systems, and drectons for future research and development Item Representaton Items that can be recommended to the user are often stored n a database table. Table 10.1 shows a smple database wth records (.e., rows ) that descrbe three restaurants. The column names (e.g., Cusne or Servce) are propertes of restaurants. These propertes are also called attrbutes, characterstcs, felds, or varables n dfferent publcatons. Each record contans a value for each attrbute. A unque dentfer, ID n Table 10.1, allows tems wth the same name to be dstngushed and serves as a key to retreve the other attrbutes of the record. Table A restaurant database ID Name Cusne Servce Cost Mke s Pzza Italan Counter Low Chrs s Cafe French Table Medum Jacques Bstro French Table Hgh The database depcted n Table 10.1 could be used to drve a web ste that lsts and recommends restaurants. Ths s an example of structured data n whch there s a small number of attrbutes, each tem s descrbed by the same set of attrbutes, and there s a known set of values that the attrbutes may have. In ths case, many machne learnng algorthms may be used to learn a user profle, or a menu nterface can easly be created to allow a user to create a profle. The next secton of ths chapter dscusses several approaches to creatng a user profle from structured data. Of course, a web page typcally has more nformaton than s shown n Table 10.1, such as a text descrpton of the restaurant, a restaurant revew, or even a menu. These may easly be stored as addtonal felds n the database and a web page can be created wth templates to dsplay the text felds (as well as the structured data). However, free text data creates a number of complcatons when learnng a user profle. For example, a profle mght ndcate that there s an 80% probablty that a partcular user would lke a French restaurant. Ths mght be added to the profle because a user gave a postve revew of four out of fve French restaurants. However, unrestrcted text felds are typcally unque and there would be no opportunty to provde feedback on fve restaurants descrbed as A charmng café wth attentve staff overlookng the rver. An extreme example of unstructured data may occur n news artcles. Table 10.2 shows an example of a part of a news artcle. The entre artcle can be treated as a large unrestrcted text feld.

3 Table Part of a newspaper artcle Lawmakers Fne-Tunng Energy Plan SACRAMENTO, Calf. -- Wth Calforna's energy reserves remanng all but depleted, lawmakers prepared to work through the weekend fne-tunng a plan Gov. Gray Davs says wll put the state n the power busness for "a long tme to come." The proposal nvolves partally takng over Calforna's two largest utltes and sgnng long-term contracts of up to 10 years to buy electrcty from wholesalers. Unrestrcted texts such as news artcles are examples of unstructured data. Unlke structured data, there are no attrbute names wth well-defned values. Furthermore, the full complexty of natural language may be present n the text feld ncludng polysemous words (the same word may have several meanngs) and synonyms (dfferent words may have the same meanng). For example, n the artcle n Table 10.2, Gray s a name rather than a color, and power and electrcty refer to the same underlyng concept. Many domans are best represented by sem-structured data n whch there are some attrbutes wth a set of restrcted values and some free-text felds. A common approach to dealng wth free text felds s to convert the free text to a structured representaton. For example, each word may be vewed as an attrbute, wth a Boolean value ndcatng whether the word s n the artcle or wth an nteger value ndcatng the number of tmes the word appears n the artcle. Many personalzaton systems that deal wth unrestrcted text use a technque to create a structured representaton that orgnated wth text search systems [34]. In ths formalsm, rather than usng words, the root forms of words are typcally created through a process called stemmng [30]. The goal of stemmng s to create a term that reflects the common meanng behnd words such as compute, computaton, computer computes and computers. The value of a varable assocated wth a term s a real number that represents the mportance or relevance. Ths value s called the tf*df weght (term-frequency tmes nverse document frequency). The tf*df weght, w(t,d), of a term t n a document d s a functon of the frequency of t n the document (tf t,d), the number of documents that contan the term (df t ) and the number of documents n the collecton (N). 1 1 Note that n the descrpton of tf*df weghts, the word document s tradtonally used snce the orgnal motvaton was to retreve documents. Whle the chapter wll stck wth the orgnal termnology, n a recommendaton system, the documents correspond to a text descrpton of an tem to be recommended. Note that the equatons here are representatve of the class of formulae called tf*df. In general, tf*df systems have weghts that ncrease monotoncally wth term frequency and decrease monotoncally wth document frequency.

4 tf ( tf t, d log ) 2 log t, d t w ( t, d) = 2 (10.1) Table 10.3 shows the tf*df representaton (also called the vector space representaton) of the complete artcle excerpted n Table The terms are ordered by the tf*df weght. The ntuton behnd the weght s that the terms wth the hghest weght occur more often n that document than n the other documents, and therefore are more central to the topc of the document. Note that terms such as utl (a stem of utlty ), power, megawatt, are among the hghest weghted terms capturng the meanng. N df N df Table tf*df representaton of the artcle n Table 10.2 utl power megawatt electr energ calforna debt lawmak state wholesal partal consum alert scroung advoc-0.09 test bal-out crs amd prce long bond plan term-0.08 grd reserv blackout bd market fne deregul-0.07 spral deplet lar Of course, ths representaton does not capture the context n whch a word s used. It loses the relatonshps between words n the descrpton. For example, a descrpton of a steak house mght contan the sentence, there s nothng on the menu that a vegetaran would lke whle the descrpton of a vegetaran restaurant mght menton vegan rather than vegetaran. In a manually created structured database, the cusne attrbute havng a value of vegetaran would ndcate that the restaurant s ndeed a vegetaran one. In contrast, when convertng an unstructured text descrpton to structured data, the presence of the word vegetaran does not always ndcate that a restaurant s vegetaran and the absence of the word vegetaran does not always ndcate that the restaurant s not a vegetaran restaurant. As a consequence, technques for creatng user profles that deal wth structured data need to dffer somewhat from those technques that deal wth unstructured data or unstructured data automatcally and mprecsely converted to structured data. One varant on usng words as terms s to use sets of contguous words as terms. For example, n the artcle n Table 10.2, terms such as energy reserves and power busness mght be more descrptve of the content than these words treated as ndvdual terms. Of course, terms such as all but would also be ncluded, but one would expect that these have very low weghts, n the same way that all and but ndvdually have low weghts and are not among the most mportant terms n Table t

5 10.2 User Profles A profle of the user s nterests s used by most recommendaton systems. Ths profle may consst of a number of dfferent types of nformaton. Here, we concentrate on two types of nformaton: 1. A model of the user s preferences,.e., a descrpton of the types of tems that nterest the user. There are many possble alternatve representatons of ths descrpton, but one common representaton s a functon that for any tem predcts the lkelhood that the user s nterested n that tem. For effcency purposes, ths functon may be used to retreve the n tems most lkely to be of nterest to the user. 2. A hstory of the user s nteractons wth the recommendaton system. Ths may nclude storng the tems that a user has vewed together wth other nformaton about the user s nteracton, (e.g., whether the user has purchased the tem or a ratng that the user has gven the tem). Other types of hstory nclude savng queres typed by the user (e.g., that a user searched for an Italan restaurant n the zp code). There are several uses of the hstory of user nteractons. Frst, the system can smply dsplay recently vsted tems to facltate the user returnng to these tems. Second, the system can flter out from a recommendaton system an tem that the user has already purchased or read. 2 Another mportant use of the hstory n content-based recommendaton systems s to serve as tranng data for a machne learnng algorthm that creates a user model. The next secton wll dscuss several dfferent approaches to learnng a user model. Here, we brefly descrbe approaches of manually provdng the nformaton used by recommendaton systems: user customzaton and rule-based recommendaton systems. In user customzaton, a recommendaton system provdes an nterface that allows users to construct a representaton of ther own nterests. Often check boxes are used to allow a user to select from the known values of attrbutes, e.g., the cusne of restaurants, the names of favorte sports teams, the favorte sectons of a news ste, or the genre of favorte moves. In other cases, a form allows a user to type words that occur n the free text descrptons of tems, e.g., the name of a muscan or author that nterests the user. Once the user has entered ths nformaton, a smple database matchng process s used to fnd tems that meet the specfed crtera and dsplay them to the user. There are several lmtatons of user customzaton systems. Frst, they requre effort from the user and t s dffcult to get many users to make ths effort. Ths s partcularly true when the user s nterests change, e.g., a user may not follow football 2 Of course, n some stuatons t s approprate to recommend an tem the user has purchased and n other stuatons t s not. For example, a system should contnue to recommend an tem that wears out or s expended, such as a razor blade or prnt cartrdge, whle there s lttle value n recommendng a CD or DVD a user owns.

6 durng the season but then become nterested n the Superbowl. Second, customzaton systems do not provde a way to determne the order n whch to present tems and can fnd ether too few or too many matchng tems to dsplay. Fgure 10.1 shows book recommendatons at Amazon.com. Although Amazon.com s usually thought of as a good example of collaboratve recommendaton (see Chapter 9 of ths book [35]), parts of the user s profle can be vewed as a content-based profle. For example, Amazon contans a feature called favortes that represents the categores of tems preferred by users. These favortes are ether calculated by keepng track of the categores of tems purchased by users or may be set manually by the user. Fgure 10.2 shows an example of a user customzaton nterface n whch a user can select the categores. In rule-based recommendaton systems, the recommendaton system has rules to recommend other products based on the user hstory. For example, a system may contan a rule that recommends the sequel to a book or move to people who have purchased the early tem n the seres. Another rule mght recommend a new CD by an artst to users that purchased earler CDs by that artst. Rule-based systems may capture several common reasons for makng recommendatons, but they do not offer the same detaled personalzed recommendatons that are avalable wth other recommendaton systems. Fg Book recommendatons by Amazon.com.

7 Fg User customzaton n Amazon.com 10.3 Learnng a User Model Creatng a model of the user s preference from the user hstory s a form of classfcaton learnng. The tranng data of a classfcaton learner s dvded nto categores, e.g., the bnary categores tems the user lkes and tems the user doesn t lke. Ths s accomplshed ether through explct feedback n whch the user rates tems va some nterface for collectng feedback or mplctly by observng the user s nteractons wth tems. For example, f a user purchases an tem, that s a sgn that the user lkes the tem, whle f the user purchases and returns the tem that s a sgn that the user doesn t lke the tem. In general, there s a tradeoff snce mplct methods can collect a large amount of data wth some uncertanty as to whether the user actually lkes the tem. In contrast, when the user explctly rates tems, there s lttle or no nose n the tranng data, but users tend to provde explct feedback on only a small percentage of the tems they nteract wth. Fgure 10.3 shows an example of a recommendaton system wth explct user feedback. The recommender MyBestBets by ChoceStream s a web based nterface to a televson recommendaton system. Users can clck on the thumbs up or thumbs down buttons to ndcate whether they lke the program that s recommended. By necessty, ths system requres explct feedback because t s not ntegrated wth a televson [1] and cannot nfer the user s nterests by observng the user s behavor.

8 Fg A recommendaton system usng explct feedback The next secton revews a number of classfcaton learnng algorthms. Such algorthms are the key component of content-based recommendaton systems, because they learn a functon that models each user s nterests. Gven a new tem and the user model, the functon predcts whether the user would be nterested n the tem. Many of the classfcaton learnng algorthms create a functon that wll provde an estmate of the probablty that a user wll lke an unseen tem. Ths probablty may be used to sort a lst of recommendatons. Alternatvely, an algorthm may create a functon that drectly predcts a numerc value such as the degree of nterest. Some of the algorthms below are tradtonal machne learnng algorthms desgned to work on structured data. When they operate on free text, the free text s frst converted to structured data by selectng a small subset of the terms as attrbutes. In contrast, other algorthms are desgned to work n hgh dmensonal spaces and do not requre a preprocessng step of feature selecton Decson Trees and Rule Inducton Decson tree learners such as ID3 [31] buld a decson tree by recursvely parttonng tranng data, n ths case text documents, nto subgroups untl those subgroups contan only nstances of a sngle class. A partton s formed by a test on

9 some feature -- n the context of text classfcaton typcally the presence or absence of an ndvdual word or phrase. Expected nformaton gan s a commonly used crteron to select the most nformatve features for the partton tests [38]. Decson trees have been studed extensvely n use wth structured data such as that shown n Table Gven feedback on the restaurants, a decson tree can easly represent and learn a profle of someone who prefers to eat n expensve French restaurants or nexpensve Mexcan restaurants. Arguably, the decson tree bas s not deal for unstructured text classfcaton tasks [29]. As a consequence of the nformaton-theoretc splttng crtera used by decson tree learners, the nductve bas of decson trees s a preference for small trees wth few tests. However, t can be shown expermentally that text classfcaton tasks frequently nvolve a large number of relevant features [17]. Therefore, a decson tree s tendency to base classfcatons on as few tests as possble can lead to poor performance on text classfcaton. However, when there are a small number of structured attrbutes, the performance, smplcty and understandablty of decson trees for content-based models are all advantages. Km et al. [18] descrbe an applcaton of decson trees for personalzng advertsements on web pages. RIPPER [9] s a rule nducton algorthm closely related to decson trees that operates n a smlar fashon to the recursve data parttonng approach descrbed above. Despte the problematc nductve bas, however, RIPPER performs compettvely wth other state-of-the-art text classfcaton algorthms. In part, the performance can be attrbuted to a sophstcated post-prunng algorthm that optmzes the ft of the nduced rule set wth respect to the tranng data as a whole. Furthermore, RIPPER supports mult-valued attrbutes, whch leads to a natural representaton for text classfcaton tasks,.e., the ndvdual words of a text document can be represented as multple feature values for a sngle feature. Whle ths s essentally a representatonal convenence f rules are to be learned from unstructured text documents, the approach can lead to more powerful classfers for sem-structured text documents. For example, the text contaned n separate felds of an emal message, such as sender, subect, and body text, can be represented as separate mult-valued features, whch allows the algorthm to take advantage of the document s structure n a natural fashon. Cohen [10] shows how RIPPER can classfy e-mal messages nto user defned categores Nearest Neghbor Methods The nearest neghbor algorthm smply stores all of ts tranng data, here textual descrptons of mplctly or explctly labeled tems, n memory. In order to classfy a new, unlabeled tem, the algorthm compares t to all stored tems usng a smlarty functon and determnes the "nearest neghbor" or the k nearest neghbors. The class label or numerc score for a prevously unseen tem can then be derved from the class labels of the nearest neghbors. The smlarty functon used by the nearest neghbor algorthm depends on the type of data. For structured data, a Eucldean dstance metrc s often used. When usng the vector space model, the cosne smlarty measure s often used [34]. In the Eucldean

10 dstance functon, the same feature havng a small value n two examples s treated the same as that feature havng a large value n both examples. In contrast, the cosne smlarty functon wll not have a large value f correspondng features of two examples have small values. As a consequence, t s approprate for text when we want two documents to be smlar when they are about the same topc, but not when they are both not about a topc. The vector space approach and the cosne smlarty functon have been appled to several text classfcaton applcatons ([11], [39], [2]) and, despte the algorthm s unquestonable smplcty, t performs compettvely wth more complex algorthms. The Daly Learner system uses the nearest neghbor algorthm to create a model of the user s short term nterests [7]. Gxo, a personalzed news system, also uses text smlarty as a bass for recommendaton (Fgure 10.4). The headlnes are preceded by an con that ndcates how popular the tem s (the frst bar) and how smlar the story s to stores that have been read by the user before (the second bar). The fact that these bars dffer shows the value of personalzng to the ndvdual. Fg Gxo presents personalzed news based on smlarty to artcles that have prevously been read 10.6 Relevance Feedback and Roccho s Algorthm Snce the success of document retreval n the vector space model depends on the user s ablty to construct queres by selectng a set of representatve keywords [34], methods that help users to ncrementally refne queres based on prevous search results have been the focus of much research. These methods are commonly referred to as relevance feedback. The general prncple s to allow users to rate documents returned by the retreval system wth respect to ther nformaton need. Ths form of feedback can subsequently be used to ncrementally refne the ntal query. In a

11 manner analogous to ratng tems, there are explct and mplct means of collectng relevance feedback data. Roccho s algorthm [33] s a wdely used relevance feedback algorthm that operates n the vector space model. The algorthm s based on the modfcaton of an ntal query through dfferently weghted prototypes of relevant and non-relevant documents. The approach forms two document prototypes by takng the vector sum over all relevant and non-relevant documents. The followng formula summarzes the algorthm formally: D D Q + 1 = α Q + β γ (10.2) D D rel nonrel Here, Q s the user s query at teraton, and α, β, and γ are parameters that control the nfluence of the orgnal query and the two prototypes on the resultng modfed query. The underlyng ntuton of the above formula s to ncrementally move the query vector towards clusters of relevant documents and away from rrelevant documents. Whle ths goal forms an ntutve ustfcaton for Roccho s algorthm, there s no theoretcally motvated bass for the above formula,.e., nether performance nor convergence can be guaranteed. However, emprcal experments have demonstrated that the approach leads to sgnfcant mprovements n retreval performance [33]. In more recent work, researchers have used a varaton of Roccho s algorthm n a machne learnng context,.e., for learnng a user profle from unstructured text ([15], [3], [29]). The goal n these applcatons s to automatcally nduce a text classfer that can dstngush between classes of documents. In ths context, t s assumed that no ntal query exsts, and the algorthm forms prototypes for classes analogously to Roccho s approach as vector sums over documents belongng to the same class. The result of the algorthm s a set of weght vectors, whose proxmty to unlabeled documents can be used to assgn class membershp. Smlar to the relevance feedback verson of Roccho s algorthm, the Roccho-based classfcaton approach does not have any theoretc underpnnngs and there are no performance or convergence guarantees Lnear Classfers Algorthms that learn lnear decson boundares,.e., hyperplanes separatng nstances n a mult-dmensonal space, are referred to as lnear classfers. There are a large number of algorthms that fall nto ths category, and many of them have been successfully appled to text classfcaton tasks [20]. All lnear classfers can be descrbed n a common representatonal framework. In general, the outcome of the learnng process s an n-dmensonal weght vector w, whose dot product wth an n- dmensonal nstance, e.g., a text document represented n the vector space model, results n a numerc score predcton. Retanng the numerc predcton leads to a lnear regresson approach. However, a threshold can be used to convert contnuous

12 predctons to dscrete class labels. Whle ths general framework holds for all lnear classfers, the algorthms dffer n the tranng methods used to derve the weght vector w. For example, the equaton below s known as the Wdrow-Hoff rule, delta rule or gradent descent rule and derves the weght vector w by ncremental vector movements n the drecton of the negatve gradent of the example's squared error [37]. Ths s the drecton n whch the error falls most rapdly. w 1, = w, 2 ( w x y ) x, + η (10.3) The equaton shows how the weght vector w can be derved ncrementally. The nner product of nstance x and weght vector w s the algorthm s numerc predcton for nstance x. The predcton error s determned by subtractng the nstance s known score, y, from the predcted score. The resultng error s then multpled by the orgnal nstance vector x and the learnng rate η to form a vector that, when subtracted from the weght vector w, moves w towards the correct predcton for nstance x. The learnng rate η controls the degree to whch every addtonal nstance affects the prevous weght vector. An alternatve algorthm that has expermentally been shown to outperform the approach above on text classfcaton tasks wth many features s the exponentated gradent (EG) algorthm. Kvnen and Warmuth [19] prove a bound for EG s error, whch depends only logarthmcally on the number of features. Ths result offers a theoretc argument for EG s performance on text classfcaton problems, whch are typcally hgh-dmensonal. An mportant advantage of the above learnng schemes for lnear algorthms s that they can be performed on-lne,.e., the current weght vector can be modfed ncrementally as new nstances become avalable. Ths s a crucal advantage for applcatons that operate under real-tme constrants. Fnally, t s mportant to note that whle the above approaches tend to converge on hyperplanes that separate the tranng data accurately, the hyperplane s generalzaton performance mght not be optmal. A related approach amed at mprovng generalzaton performance s known as support vector machnes [36]. The central dea underlyng support vector machnes s to maxmze the classfcaton margn,.e., the dstance between the decson boundary and the closest tranng nstances, the socalled support vectors. A seres of emprcal experments on a varety of benchmark data sets ndcated that lnear support vector machnes perform partcularly well on text classfcaton tasks [17]. The man reason for ths s that the margn maxmzaton s an nherently bult-n overfttng protecton mechansm. A reduced tendency to overft tranng data s partcularly useful for text classfcaton algorthms, because n ths doman hgh dmensonal concepts must often be learned from lmted tranng data, whch s a scenaro prone to overfttng.

13 10.8 Probablstc Methods and Naïve Bayes In contrast to the lack of theoretcal ustfcatons for the vector space model, there has been much work on probablstc text classfcaton approaches. Ths secton descrbes one such example, the naïve Bayesan classfer. Early work on a probablstc classfer and ts text classfcaton performance was reported by Maron [24]. Today, ths algorthm s commonly referred to as a naïve Bayesan Classfer [13]. Researchers have recognzed Naïve Bayes as an exceptonally well-performng text classfcaton algorthm and have frequently adopted the algorthm n recent work ([27], [28], [25]). The algorthm s popularty and performance for text classfcaton applcatons have prompted researchers to emprcally evaluate and compare dfferent varatons of naïve Bayes that have appeared n the lterature (e.g. [26], [21]). In summary, McCallum and Ngam [26] note that there are two frequently used formulatons of naïve Bayes, the multvarate Bernoull and the multnomal model. Both models share the followng prncples. It s assumed that text documents are generated by an underlyng generatve model, specfcally a parameterzed mxture model: P( d C θ ) = P( c θ ) P( d c ; θ ) (10.4) = 1 Here, each class c corresponds to a mxture component that s parameterzed by a dsont subset of θ, and the sum of total probablty over all mxture components determnes the lkelhood of a document. Once the parameters θ have been learned from tranng data, the posteror probablty of class membershp gven the evdence of a test document can be determned accordng to Bayes rule: P( c ˆ) θ P( d c ; ˆ) θ P ( c d; ˆ) θ = (10.5) P( d ˆ) θ Whle the above prncples hold for naïve Bayes classfcaton n general, the multvarate Bernoull and multnomal models dffer n the way p(d c ; θ) s estmated from tranng data. The multvarate Bernoull formulaton was derved wth structured data n mnd. For text classfcaton tasks, t assumes that each document s represented as a bnary vector over the space of all words from a vocabulary V. Each element B t n ths vector ndcates whether a word appears at least once n the document. Under the naïve Bayes assumpton that the probablty of each word occurrng n a document s ndependent of other words gven the class label, p(d c ; θ) can be expressed as a smple product:

14 V P( d c ; θ ) = ( B P( w c ; θ ) + (1 B )(1 P( w c ; θ ))) (10.6) t= 1 t t Bayes-optmal optmal estmates for p(w t c ; θ) can be determned by word occurrence countng over the data: D 1+ BtP( c d ) P( w c ; θ ) (10.7) t = 1 = D 2 + = 1 P( c t d ) In contrast to the bnary document representaton of the multvarate Bernoull model, the multnomal formulaton captures word frequency nformaton. Ths model assumes that documents are generated by a sequence of ndependent trals drawn from a multnomal probablty dstrbuton. Agan, the naïve Bayes ndependence assumpton allows p(d c ; θ) to be determned based on ndvdual word probabltes: t d Nt = P( d ) P( w c ; θ (10.8) P ( d c ; θ ) ) t= 1 t Here, N t s the number of occurrences of word w t n document d. Takng word frequences nto account, maxmum lkelhood estmates for p(w t c ; θ) can be derved from tranng data: 1+ N P( c d ) P( w c ; θ ) (10.9) t t = 1 = V D V + D s= 1 = 1 N s P( c d ) Emprcally, the multnomal naïve Bayes formulaton was shown to outperform the multvarate Bernoull model. Ths effect s partcularly notceable for large vocabulares (McCallum and Ngam, 1998). Even though the naïve Bayes assumpton of class-condtonal attrbute ndependence s clearly volated n the context of text classfcaton, naïve Bayes performs very well. Domngos and Pazzan [12] offer a possble explanaton for ths paradox by showng that class-condtonal feature ndependence s not a necessary condton for the optmalty of naïve Bayes. The naïve Bayes classfer has been used n several content-based recommendaton systems ncludng Syskll & Webert [29].

15 10.9 Trends n Content-Based Flterng Belkn & Croft [5] surveyed some of the frst content-based recommendaton systems and noted that they made use of technology related to nformaton retreval such as tf*df and Roccho s method. Indeed, some of the early work on content-based recommendaton used the term query to refer to user models. In ths vew, a user model s a saved query (or a set of saved queres) that can retreve addtonal or new nformaton of nterest to the user. Some representatve early systems nclude a system at Bellcore [14] that found new techncal reports related to prevously read reports and LyrcTme [22] that recommended songs n a multmeda player based on a profle learned from the user s feedback on pror songs played. The creaton and rapd growth of the World Wde Web n the md 1990s made access to vast amounts of nformaton possble and created problems of locatng and dentfyng personally relevant nformaton. Some n the Machne Learnng communty appled tradtonal machne learnng methods to user modelng of document nterests. These methods reduced the text tranng data to a few hundred hghly relevant words usng technques such as nformaton theory or tf*df. Some representatve systems ncluded WebWatcher [16] and Syskll & Webert [29]. Fg The Syskll & Webert system learns a model of the user s preference for web pages

16 10.10 Lmtatons and Extensons Although there are dfferent approaches to learnng a model of the user s nterest wth content-based recommendaton, no content-based recommendaton system can gve good recommendatons f the content does not contan enough nformaton to dstngush tems the user lkes from tems the user doesn t lke. In recommendng some tems, e.g., okes or poems, there often sn t enough nformaton n the word frequency to model the user s nterests. Whle t would be possble to tell a lawyer oke from a chcken oke based upon word frequences, t would be dffcult to dstngush a funny lawyer oke from other lawyer okes. As a consequence, other recommendaton technologes, such as collaboratve recommenders [35], should be used n such stuatons. In some stuatons, e.g., recommendng moves, restaurants, or televson programs, there s some structured nformaton (e.g., the genre of the move as well as actors and drectors) that can be used by a content-based system. However, ths nformaton mght be supplemented by the opnons of other users. One way to nclude the opnons of other users n the frameworks dscussed n Secton 10.2 s to add addtonal data assocated to the representaton of the examples. For example, Basu et al. [4] add features to examples that ndcate the dentfers of other users who lke an tem. Rpper was appled to the resultng data that could learn profles wth both collaboratve and content-based features (e.g., a user mght lke a scence fcton move f USER-109 lkes t). Although not strctly a content-based system, the same technology as content-based recommenders s used to learn a user model. Indeed, Bllsus and Pazzan [6] have shown that any machne learnng algorthm may be used as the bass for collaboratve flterng by transformng user ratngs to attrbutes. Chapter 12 of ths book [8] dscusses a varety of other approaches to combnng content and collaboratve nformaton n recommendaton systems. A fnal usage of content n recommendatons s worth notng. Smple contentbased rules may be used to flter the results of other methods such as collaboratve flterng. For example, even f t s the case that people who buy dolls also buy adult vdeos, t mght be mportant not to recommend adult tems n a partcular applcaton. Smlarly, although not strctly content-based, some systems mght not recommend tems that are out of stock Summary Content-based recommendaton systems recommend an tem to a user based upon a descrpton of the tem and a profle of the user s nterests. Whle a user profle may be entered by the user, t s commonly learned from feedback the user provdes on tems. A varety of learnng algorthms have been adapted to learnng user profles, and the choce of learnng algorthm depends upon the representaton of content.

17 References 1. Al, K., van Stam, W.: TVo: Makng Show Recommendatons Usng a Dstrbuted Collaboratve Flterng Archtecture. In: Proceedngs of the Tenth ACM SIGKDD Internatonal Conference on Knowledge Dscovery and Data Mnng. Seattle, WA. (2004) Allan, J., Carbonell, J., Doddngton, G., Yamron, J., Yang, Y.: Topc Detecton and Trackng Plot Study Fnal Report. In: Proceedngs of the DARPA Broadcast News Transcrpton and Understandng Workshop. Lansdowne, VA (1998) Balabanovc, M., Shoham Y.: FAB: Content-based, Collaboratve Recommendaton. Communcatons of the Assocaton for Computng Machnery 40(3) (1997) Basu, C., Hrsh, H., Cohen W.: Recommendaton as Classfcaton: Usng Socal and Content-Based Informaton n Recommendaton. In: Proceedngs of the 15th Natonal Conference on Artfcal Intellgence, Madson, WI (1998) Belkn, N., Croft, B.: Informaton Flterng and Informaton Retreval: Two Sdes of the Same Con? Communcatons of the ACM 35(12) (1992) Bllsus, D., Pazzan, M.: Learnng Collaboratve Informaton Flters. In: Proceedngs of the Internatonal Conference on Machne Learnng. Morgan Kaufmann Publshers. Madson, WI (1998) Bllsus, D., Pazzan, M., Chen, J.: A Learnng Agent for Wreless News Access. In: Proceedngs of the Internatonal Conference on Intellgent User Interfaces (2002) Burke, R.: Hybrd Web Recommender Systems. In: Bruslovsky, P., Kobsa, A., Nedl, W. (eds.): The Adaptve Web: Methods and Strateges of Web Personalzaton. Lecture Notes n Computer Scence, Vol Sprnger-Verlag, Berln Hedelberg New York (2007) ths volume 9. Cohen, W.: Fast Effectve Rule Inducton. In: Proceedngs of the Twelfth Internatonal Conference on Machne Learnng, Tahoe Cty, CA. (1995) Cohen, W.: Learnng Rules that Classfy E-mal. In: Papers from the AAAI Sprng Symposum on Machne Learnng n Informaton Access (1996) Cohen, W., Hrsh, H. Jons that Generalze: Text Classfcaton Usng WHIRL. In: Proceedngs of the Fourth Internatonal Conference on Knowledge Dscovery & Data Mnng, New York, NY (1998) Domngos, P., Pazzan, M. Beyond Independence: Condtons for the Optmalty of the Smple Bayesan Classfer. Machne Learnng 29 (1997) Duda, R., Hart, P.: Pattern Classfcaton and Scene Analyss. New York, NY: Wley and Sons (1973) 14. Foltz, P., Dumas, S.: Personalzed Informaton Delvery: An Analyss of Informaton Flterng Methods. Communcatons of the ACM 35(12) (1992) Ittner, D., Lews, D., Ahn, D.: Text Categorzaton of Low Qualty Images. In: Symposum on Document Analyss and Informaton Retreval, Las Vegas, NV (1995) Joachms, T., Fretag, D., Mtchell, T.: WebWatcher: A Tour Gude for the World Wde Web. In: Proceedngs of the 15th Internatonal Jont Conference on Artfcal Intellgence. Nagoya, Japan (1997) Joachms, T.: Text Categorzaton Wth Support Vector Machnes: Learnng wth Many Relevant Features. In: European Conference on Machne Learnng, Chemntz, Germany (1998) Km, J., Lee, B., Shaw, M., Chang, H., Nelson, W.: Applcaton of Decson-Tree Inducton Technques to Personalzed Advertsements on Internet Storefronts. Internatonal Journal of Electronc Commerce 5(3) (2001) Kvnen, J., Warmuth, M.: Exponentated Gradent versus Gradent Descent for Lnear Predctors. Informaton and Computaton 132(1) (1997) 1-63

18 20. Lews, D., Schapre, R., Callan, J., Papka, R.: Tranng Algorthms for Lnear Text Classfers. In: Proceedngs of the 19th Annual Internatonal ACM SIGIR Conference on Research and Development n Informaton Retreval, Konstanz, Germany (1996) Lews, D.: Naïve (Bayes) at Forty: The Independence Assumpton n Informaton Retreval. In: European Conference on Machne Learnng, Chemntz, Germany (1998) Loeb, S.: Archtectng Personal Delvery of Multmeda Informaton. Communcatons of the ACM 35(12) (1992) Mandel, M., Polner, G., Ells, D.: Support Vector Machne Actve Learnng for Musc Retreval. ACM Multmeda Systems Journal 12(1) (2006) Maron, M.: Automatc Indexng: An Expermental Inqury. Journal of the Assocaton for Computng Machnery 8(3) (1961) McCallum, A., Rosenfeld, R., Mtchell T., Ng, A.: Improvng Text Classfcaton by Shrnkage n a Herarchy of Classes. In: Proceedngs of the Internatonal Conference on Machne Learnng. Morgan Kaufmann Publshers. Madson, WI (1998) McCallum, A., Ngam, K.: A Comparson of Event Models for Nave Bayes Text Classfcaton. In: AAAI/ICML-98 Workshop on Learnng for Text Categorzaton, Techncal Report WS-98-05, AAAI Press (1998) Mtchell, T.: Machne Learnng. McGraw-Hll (1997) 28. Ngam, K., McCallum, A., Thrun, S., Mtchell, T.: Learnng to Classfy Text from Labeled and Unlabeled Documents. In: Proceedngs of the 15th Internatonal Conference on Artfcal Intellgence, Madson, WI (1998) Pazzan M., Bllsus, D.: Learnng and Revsng User Profles: The Identfcaton of Interestng Web Stes. Machne Learnng 27(3) (1997) Porter, M.: An Algorthm for Suffx Strppng. Program 14(3) (1980) Qunlan, J.: Inducton of Decson Trees. Machne Learnng 1(1986) Qunlan, J.: C4.5: Programs for Machne Learnng. Morgan Kauffman (1993) 33. Roccho, J.: Relevance Feedback n Informaton Retreval. In: G. Salton (ed.). The SMART System: Experments n Automatc Document Processng. NJ: Prentce Hall (1971) Salton, G. Automatc Text Processng. Addson-Wesley (1989) 35. Schafer, B., Frankowsk, D., Herlocker, J., Sen, S.: Collaboratve Flterng Recommender Systems. In: Bruslovsky, P., Kobsa, A., Nedl, W. (eds.): The Adaptve Web: Methods and Strateges of Web Personalzaton. Lecture Notes n Computer Scence, Vol Sprnger-Verlag, Berln Hedelberg New York (2007) ths volume 36. Vapnk, V.: The Nature of Statstcal Learnng Theory. Sprnger: New York (1995) 37. Wdrow, A., Hoff, M.: Adaptve Swtchng Crcuts. WESCON Conventon Record 4 (1960) Yang, Y., Pedersen J.: A Comparatve Study on Feature Selecton n Text Categorzaton. In: Proceedngs of the Fourteenth Internatonal Conference on Machne Learnng, Nashvlle, TN (1997) Yang, Y.: An Evaluaton of Statstcal Approaches to Text Categorzaton. Informaton Retreval 1(1) (1999) 67-88

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna [email protected] Abstract.

More information

Using Content-Based Filtering for Recommendation 1

Using Content-Based Filtering for Recommendation 1 Usng Content-Based Flterng for Recommendaton 1 Robn van Meteren 1 and Maarten van Someren 2 1 NetlnQ Group, Gerard Brandtstraat 26-28, 1054 JK, Amsterdam, The Netherlands, [email protected] 2 Unversty of

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye [email protected] [email protected] [email protected] Abstract - Stock market s one of the most complcated systems

More information

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

Single and multiple stage classifiers implementing logistic discrimination

Single and multiple stage classifiers implementing logistic discrimination Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht [email protected] 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

Searching for Interacting Features for Spam Filtering

Searching for Interacting Features for Spam Filtering Searchng for Interactng Features for Spam Flterng Chuanlang Chen 1, Yun-Chao Gong 2, Rongfang Be 1,, and X. Z. Gao 3 1 Department of Computer Scence, Bejng Normal Unversty, Bejng 100875, Chna 2 Software

More information

IMPACT ANALYSIS OF A CELLULAR PHONE

IMPACT ANALYSIS OF A CELLULAR PHONE 4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng

More information

Enterprise Master Patient Index

Enterprise Master Patient Index Enterprse Master Patent Index Healthcare data are captured n many dfferent settngs such as hosptals, clncs, labs, and physcan offces. Accordng to a report by the CDC, patents n the Unted States made an

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

Recurrence. 1 Definitions and main statements

Recurrence. 1 Definitions and main statements Recurrence 1 Defntons and man statements Let X n, n = 0, 1, 2,... be a MC wth the state space S = (1, 2,...), transton probabltes p j = P {X n+1 = j X n = }, and the transton matrx P = (p j ),j S def.

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION NEURO-FUZZY INFERENE SYSTEM FOR E-OMMERE WEBSITE EVALUATION Huan Lu, School of Software, Harbn Unversty of Scence and Technology, Harbn, hna Faculty of Appled Mathematcs and omputer Scence, Belarusan State

More information

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary

More information

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification IDC IDC A Herarchcal Anomaly Network Intruson Detecton System usng Neural Network Classfcaton ZHENG ZHANG, JUN LI, C. N. MANIKOPOULOS, JAY JORGENSON and JOSE UCLES ECE Department, New Jersey Inst. of Tech.,

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada [email protected] Abstract Ths s a note to explan support vector machnes.

More information

Fast Fuzzy Clustering of Web Page Collections

Fast Fuzzy Clustering of Web Page Collections Fast Fuzzy Clusterng of Web Page Collectons Chrstan Borgelt and Andreas Nürnberger Dept. of Knowledge Processng and Language Engneerng Otto-von-Guercke-Unversty of Magdeburg Unverstätsplatz, D-396 Magdeburg,

More information

A neuro-fuzzy collaborative filtering approach for Web recommendation. G. Castellano, A. M. Fanelli, and M. A. Torsello *

A neuro-fuzzy collaborative filtering approach for Web recommendation. G. Castellano, A. M. Fanelli, and M. A. Torsello * Internatonal Journal of Computatonal Scence 992-6669 (Prnt) 992-6677 (Onlne) Global Informaton Publsher 27, Vol., No., 27-39 A neuro-fuzzy collaboratve flterng approach for Web recommendaton G. Castellano,

More information

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Greedy Method. Introduction. 0/1 Knapsack Problem The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton

More information

An Empirical Study of Search Engine Advertising Effectiveness

An Empirical Study of Search Engine Advertising Effectiveness An Emprcal Study of Search Engne Advertsng Effectveness Sanjog Msra, Smon School of Busness Unversty of Rochester Edeal Pnker, Smon School of Busness Unversty of Rochester Alan Rmm-Kaufman, Rmm-Kaufman

More information

Assessing Student Learning Through Keyword Density Analysis of Online Class Messages

Assessing Student Learning Through Keyword Density Analysis of Online Class Messages Assessng Student Learnng Through Keyword Densty Analyss of Onlne Class Messages Xn Chen New Jersey Insttute of Technology [email protected] Brook Wu New Jersey Insttute of Technology [email protected] ABSTRACT Ths

More information

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

Context-aware Mobile Recommendation System Based on Context History

Context-aware Mobile Recommendation System Based on Context History TELKOMNIKA Indonesan Journal of Electrcal Engneerng Vol.12, No.4, Aprl 2014, pp. 3158 ~ 3167 DOI: http://dx.do.org/10.11591/telkomnka.v124.4786 3158 Context-aware Moble Recommendaton System Based on Context

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao

More information

1 Example 1: Axis-aligned rectangles

1 Example 1: Axis-aligned rectangles COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton

More information

Project Networks With Mixed-Time Constraints

Project Networks With Mixed-Time Constraints Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

A Probabilistic Theory of Coherence

A Probabilistic Theory of Coherence A Probablstc Theory of Coherence BRANDEN FITELSON. The Coherence Measure C Let E be a set of n propostons E,..., E n. We seek a probablstc measure C(E) of the degree of coherence of E. Intutvely, we want

More information

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

Study on Model of Risks Assessment of Standard Operation in Rural Power Network Study on Model of Rsks Assessment of Standard Operaton n Rural Power Network Qngj L 1, Tao Yang 2 1 Qngj L, College of Informaton and Electrcal Engneerng, Shenyang Agrculture Unversty, Shenyang 110866,

More information

Improved SVM in Cloud Computing Information Mining

Improved SVM in Cloud Computing Information Mining Internatonal Journal of Grd Dstrbuton Computng Vol.8, No.1 (015), pp.33-40 http://dx.do.org/10.1457/jgdc.015.8.1.04 Improved n Cloud Computng Informaton Mnng Lvshuhong (ZhengDe polytechnc college JangSu

More information

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching) Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton

More information

Using Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council

Using Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council Usng Supervsed Clusterng Technque to Classfy Receved Messages n 137 Call Center of Tehran Cty Councl Mahdyeh Haghr 1*, Hamd Hassanpour 2 (1) Informaton Technology engneerng/e-commerce, Shraz Unversty (2)

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

Lecture 2: Single Layer Perceptrons Kevin Swingler

Lecture 2: Single Layer Perceptrons Kevin Swingler Lecture 2: Sngle Layer Perceptrons Kevn Sngler [email protected] Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses

More information

Statistical Methods to Develop Rating Models

Statistical Methods to Develop Rating Models Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and

More information

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

How To Understand The Results Of The German Meris Cloud And Water Vapour Product Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller

More information

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance Calbraton Method Instances of the Cell class (one nstance for each FMS cell) contan ADC raw data and methods assocated wth each partcular FMS cell. The calbraton method ncludes event selecton (Class Cell

More information

PEER REVIEWER RECOMMENDATION IN ONLINE SOCIAL LEARNING CONTEXT: INTEGRATING INFORMATION OF LEARNERS AND SUBMISSIONS

PEER REVIEWER RECOMMENDATION IN ONLINE SOCIAL LEARNING CONTEXT: INTEGRATING INFORMATION OF LEARNERS AND SUBMISSIONS PEER REVIEWER RECOMMENDATION IN ONLINE SOCIAL LEARNING CONTEXT: INTEGRATING INFORMATION OF LEARNERS AND SUBMISSIONS Yunhong Xu, Faculty of Management and Economcs, Kunmng Unversty of Scence and Technology,

More information

Logistic Regression. Steve Kroon

Logistic Regression. Steve Kroon Logstc Regresson Steve Kroon Course notes sectons: 24.3-24.4 Dsclamer: these notes do not explctly ndcate whether values are vectors or scalars, but expects the reader to dscern ths from the context. Scenaro

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

The Current Employment Statistics (CES) survey,

The Current Employment Statistics (CES) survey, Busness Brths and Deaths Impact of busness brths and deaths n the payroll survey The CES probablty-based sample redesgn accounts for most busness brth employment through the mputaton of busness deaths,

More information

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

Conversion between the vector and raster data structures using Fuzzy Geographical Entities Converson between the vector and raster data structures usng Fuzzy Geographcal Enttes Cdála Fonte Department of Mathematcs Faculty of Scences and Technology Unversty of Combra, Apartado 38, 3 454 Combra,

More information

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model

More information

Extending Probabilistic Dynamic Epistemic Logic

Extending Probabilistic Dynamic Epistemic Logic Extendng Probablstc Dynamc Epstemc Logc Joshua Sack May 29, 2008 Probablty Space Defnton A probablty space s a tuple (S, A, µ), where 1 S s a set called the sample space. 2 A P(S) s a σ-algebra: a set

More information

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,

More information

Probabilistic Latent Semantic User Segmentation for Behavioral Targeted Advertising*

Probabilistic Latent Semantic User Segmentation for Behavioral Targeted Advertising* Probablstc Latent Semantc User Segmentaton for Behavoral Targeted Advertsng* Xaohu Wu 1,2, Jun Yan 2, Nng Lu 2, Shucheng Yan 3, Yng Chen 1, Zheng Chen 2 1 Department of Computer Scence Bejng Insttute of

More information

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm Document Clusterng Analyss Based on Hybrd PSO+K-means Algorthm Xaohu Cu, Thomas E. Potok Appled Software Engneerng Research Group, Computatonal Scences and Engneerng Dvson, Oak Rdge Natonal Laboratory,

More information

Mining Multiple Large Data Sources

Mining Multiple Large Data Sources The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall SP 2005-02 August 2005 Staff Paper Department of Appled Economcs and Management Cornell Unversty, Ithaca, New York 14853-7801 USA Farm Savngs Accounts: Examnng Income Varablty, Elgblty, and Benefts Brent

More information

RequIn, a tool for fast web traffic inference

RequIn, a tool for fast web traffic inference RequIn, a tool for fast web traffc nference Olver aul, Jean Etenne Kba GET/INT, LOR Department 9 rue Charles Fourer 90 Evry, France [email protected], [email protected] Abstract As networked

More information

Semantic Link Analysis for Finding Answer Experts *

Semantic Link Analysis for Finding Answer Experts * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 28, 51-65 (2012) Semantc Lnk Analyss for Fndng Answer Experts * YAO LU 1,2,3, XIAOJUN QUAN 2, JINGSHENG LEI 4, XINGLIANG NI 1,2,3, WENYIN LIU 2,3 AND YINLONG

More information

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS 21 22 September 2007, BULGARIA 119 Proceedngs of the Internatonal Conference on Informaton Technologes (InfoTech-2007) 21 st 22 nd September 2007, Bulgara vol. 2 INVESTIGATION OF VEHICULAR USERS FAIRNESS

More information

Credit Limit Optimization (CLO) for Credit Cards

Credit Limit Optimization (CLO) for Credit Cards Credt Lmt Optmzaton (CLO) for Credt Cards Vay S. Desa CSCC IX, Ednburgh September 8, 2005 Copyrght 2003, SAS Insttute Inc. All rghts reserved. SAS Propretary Agenda Background Tradtonal approaches to credt

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

Performance Analysis and Coding Strategy of ECOC SVMs

Performance Analysis and Coding Strategy of ECOC SVMs Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04), pp.67-76 http://dx.do.org/0.457/jgdc.04.7..07 Performance Analyss and Codng Strategy of ECOC SVMs Zhgang Yan, and Yuanxuan Yang, School

More information

A DATA MINING APPLICATION IN A STUDENT DATABASE

A DATA MINING APPLICATION IN A STUDENT DATABASE JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul

More information

Invoicing and Financial Forecasting of Time and Amount of Corresponding Cash Inflow

Invoicing and Financial Forecasting of Time and Amount of Corresponding Cash Inflow Dragan Smć Svetlana Smć Vasa Svrčevć Invocng and Fnancal Forecastng of Tme and Amount of Correspondng Cash Inflow Artcle Info:, Vol. 6 (2011), No. 3, pp. 014-021 Receved 13 Janyary 2011 Accepted 20 Aprl

More information

Learning from Multiple Outlooks

Learning from Multiple Outlooks Learnng from Multple Outlooks Maayan Harel Department of Electrcal Engneerng, Technon, Hafa, Israel She Mannor Department of Electrcal Engneerng, Technon, Hafa, Israel [email protected] [email protected]

More information

Traffic State Estimation in the Traffic Management Center of Berlin

Traffic State Estimation in the Traffic Management Center of Berlin Traffc State Estmaton n the Traffc Management Center of Berln Authors: Peter Vortsch, PTV AG, Stumpfstrasse, D-763 Karlsruhe, Germany phone ++49/72/965/35, emal [email protected] Peter Möhl, PTV AG,

More information

BUSINESS PROCESS PERFORMANCE MANAGEMENT USING BAYESIAN BELIEF NETWORK. 0688, [email protected]

BUSINESS PROCESS PERFORMANCE MANAGEMENT USING BAYESIAN BELIEF NETWORK. 0688, dskim@ssu.ac.kr Proceedngs of the 41st Internatonal Conference on Computers & Industral Engneerng BUSINESS PROCESS PERFORMANCE MANAGEMENT USING BAYESIAN BELIEF NETWORK Yeong-bn Mn 1, Yongwoo Shn 2, Km Jeehong 1, Dongsoo

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,

More information

Cloud-based Social Application Deployment using Local Processing and Global Distribution

Cloud-based Social Application Deployment using Local Processing and Global Distribution Cloud-based Socal Applcaton Deployment usng Local Processng and Global Dstrbuton Zh Wang *, Baochun L, Lfeng Sun *, and Shqang Yang * * Bejng Key Laboratory of Networked Multmeda Department of Computer

More information

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy Fnancal Tme Seres Analyss Patrck McSharry [email protected] www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton

More information

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing A Replcaton-Based and Fault Tolerant Allocaton Algorthm for Cloud Computng Tork Altameem Dept of Computer Scence, RCC, Kng Saud Unversty, PO Box: 28095 11437 Ryadh-Saud Araba Abstract The very large nfrastructure

More information

Simple Interest Loans (Section 5.1) :

Simple Interest Loans (Section 5.1) : Chapter 5 Fnance The frst part of ths revew wll explan the dfferent nterest and nvestment equatons you learned n secton 5.1 through 5.4 of your textbook and go through several examples. The second part

More information

Web Object Indexing Using Domain Knowledge *

Web Object Indexing Using Domain Knowledge * Web Object Indexng Usng Doman Knowledge * Muyuan Wang Department of Automaton Tsnghua Unversty Bejng 100084, Chna (86-10)51774518 Zhwe L, Le Lu, We-Yng Ma Mcrosoft Research Asa Sgma Center, Hadan Dstrct

More information

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES In ths chapter, we wll learn how to descrbe the relatonshp between two quanttatve varables. Remember (from Chapter 2) that the terms quanttatve varable

More information

A Performance Analysis of View Maintenance Techniques for Data Warehouses

A Performance Analysis of View Maintenance Techniques for Data Warehouses A Performance Analyss of Vew Mantenance Technques for Data Warehouses Xng Wang Dell Computer Corporaton Round Roc, Texas Le Gruenwald The nversty of Olahoma School of Computer Scence orman, OK 739 Guangtao

More information

WISE-Integrator: An Automatic Integrator of Web Search Interfaces for E-Commerce

WISE-Integrator: An Automatic Integrator of Web Search Interfaces for E-Commerce WSE-ntegrator: An Automatc ntegrator of Web Search nterfaces for E-Commerce Ha He, Wey Meng Dept. of Computer Scence SUNY at Bnghamton Bnghamton, NY 13902 {hahe,meng}@cs.bnghamton.edu Clement Yu Dept.

More information

Using Association Rule Mining: Stock Market Events Prediction from Financial News

Using Association Rule Mining: Stock Market Events Prediction from Financial News Usng Assocaton Rule Mnng: Stock Market Events Predcton from Fnancal News Shubhang S. Umbarkar 1, Prof. S. S. Nandgaonkar 2 1 Savtrba Phule Pune Unversty, Vdya Pratshtan s College of Engneerng, Vdya Nagar,

More information

320 The Internatonal Arab Journal of Informaton Technology, Vol. 5, No. 3, July 2008 Comparsons Between Data Clusterng Algorthms Osama Abu Abbas Computer Scence Department, Yarmouk Unversty, Jordan Abstract:

More information

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

FORMAL ANALYSIS FOR REAL-TIME SCHEDULING

FORMAL ANALYSIS FOR REAL-TIME SCHEDULING FORMAL ANALYSIS FOR REAL-TIME SCHEDULING Bruno Dutertre and Vctora Stavrdou, SRI Internatonal, Menlo Park, CA Introducton In modern avoncs archtectures, applcaton software ncreasngly reles on servces provded

More information

A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns

A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns A study on the ablty of Support Vector Regresson and Neural Networks to Forecast Basc Tme Seres Patterns Sven F. Crone, Jose Guajardo 2, and Rchard Weber 2 Lancaster Unversty, Department of Management

More information

A Programming Model for the Cloud Platform

A Programming Model for the Cloud Platform Internatonal Journal of Advanced Scence and Technology A Programmng Model for the Cloud Platform Xaodong Lu School of Computer Engneerng and Scence Shangha Unversty, Shangha 200072, Chna [email protected]

More information