Spam Detection in Voice-over-IP Calls through Semi-Supervised Clustering
|
|
- Arleen Lester
- 8 years ago
- Views:
Transcription
1 Spam Detecton n Voce-over-IP Calls troug Sem-Supervsed Clusterng Yu-Sung Wu, Saurab Bagc Navjot Sng Ratsameetp Wta 3 Scool of Electrcal & Computer Eng., Purdue Unversty, West Lafayette, IN 4797 vaya Labs, 33 Mt. ry Rd., Baskng Rdge, NJ 79 3 Culalongkorn Unversty, Taland {yswu,sbagc@purdue.edu, sng@avaya.com, Ratsameetp.W@Student.cula.ac.t bstract In ts paper, we present an approac for detecton of spam calls over IP telepony called SPIT n VoIP systems. SPIT detecton s dfferent from spam detecton n emal n tat te process as to be soft real-tme, fewer features are avalable for eamnaton due to te dffculty of mnng voce traffc at runtme, and smlarty n sgnalng traffc between legtmate and malcous callers. Our approac dffers from estng work n ts adaptablty to new envronments wtout te need for laborous and errorprone manual parameter confguraton. We use clusterng based on te call parameters, usng optonal user feedback for some calls, wc tey mark as SPIT or non-spit. We mprove on a popular algortm for sem-supervsed learnng, called MPC-Means, to make t scalable to a large number of calls and operate at runtme. Our evaluaton on captured call traces sows a ffteen fold reducton n computaton tme, wt mprovement n detecton accuracy. eywords Voce-over-IP systems, spam detecton, spt detecton, semsupervsed learnng, clusterng. Introducton s te popularty of VoIP systems ncreases, tey are beng subjected to dfferent knds of securty treats []. large class of te treats suc as call reroutng, toll fraud, and conversaton jackng ncur devatons n te protocol state macnes and can be detected troug montorng te protocol state transtons [],[3]. ddtonally, cryptograpcally secure versons of te common VoIP protocols, suc as Secure SIP and Secure RTP, address many of te attacks presented n te lterature. However, spam calls n VoIP [4], commonly called SPIT, are becomng an ncreasng nusance. Te ease wt wc automated SPIT calls can be launced can deral te adopton of VoIP as a crtcal nfrastructure element. Estng montorng and cryptograpc solutons are not mmedately applcable to SPIT detecton. In ts paper, we address te problem of detecton of SPIT calls. Detecton of spam emals s a mature feld and tere are some smlartes to our problem. In bot domans, users can provde feedback about ndvdual emal or call, for te latter, troug a bult-n button n some commercally avalable VoIP pones. However, tere est sgnfcant dfferences VoIP traffc s real-tme and te detecton sould deally be real-tme as well; some features are epensve to etract n real-tme, especally tose n voce traffc; te sgnalng patterns are lkely smlar n legtmate and malcous calls renderng content-based flterng on sgnalng traffc neffectve; and features from multple protocols used n VoIP may be relevant. In ts paper, we present te desgn of a system tat uses sem-supervsed macne learnng for detecton of SPIT calls. It bulds on te noton of clusterng wereby calls wt smlar features are placed n a cluster for SPIT or legtmate calls. Call features nclude tose etracted drectly from sgnalng traffc, tose etracted from meda traffc, suc as proporton of slence n te call, and tose derved from calls. However, prevous approaces tat use tresolds [5] on te call features are dffcult to use n practce snce te nature of SPIT calls vares wdely. Terefore, we learn te features to use and ter relatve mportance n clusterng troug runtme observatons, wc nclude user feedback. Te popular sem-supervsed clusterng algortm called MPC-Means [6] scales as O(N 3 ) were N s te number of calls. Ts would generally be too epensve for real-tme operaton. We modfy ts to create our algortm called empc-means, usng VoIP specfc features to reduce t to O(N). Suc specalzaton ncludes te early use of user feedback and pror knowledge of te number of clusters. ddtonally, we create an ncremental protocol called pmpc-means, tat can perform te detecton as soon as te call s establsed. We evaluate te protocols usng four call traces wt dfferent caracterstcs of SPIT and non-spit calls, over dfferent proportons of user feedback and accuracy of te user feedback. Wt a batc of 4 calls, empc-means s 5 tmes faster tan MPC-Means, wle acevng better detecton coverage n terms of true and false postves. Snce pmpc-means can eamne a lmted set of call
2 features, t works well only wt a large fracton of calls wt accurate user feedback.. Related work Rosenberg [4] detals te problem of VoIP SPIT and gves varous g-level conceptual solutons. Te solutons can be placed n tree categores [7]: () Nonntrusve metods based on te ecange and analyss of sgnalng messages; () Interacton metods tat create nconvenences for te caller by requestng tem to pass a ceckng procedure before te call s establsed; (3) Callee nteracton metods tat ecange nformaton wt te callee on eac call. n eample work n category s [8] were te autors look at te SIP sgnalng traffc pattern to detect SPIT. However, tey do not provde quanttatve data on te detecton accuracy. Our epermental results ndcate solely relyng on SIP message patterns wll gve low detecton coverage. Te work by Quttek [7] generates a greetng sound or faked rng tone to te caller rgt after te call s establsed and montors te response voce patterns from te caller to dfferentate between uman caller and a SPIT generator. Ts falls n category. In comparson, our work encompasses categores and 3. olan [9] presents an approac wc mantans te trust nformaton for eac caller. Te nformaton can be automatcally bult up troug user feedback, or troug a propagaton of reputaton va socal networks. Te approac can be used n our system were we can embed te caller s trust as one of te call features. However, te reputaton database may grow large and a reputaton system can be gamed by false prase or false blame. Clusterng s a way to learn a classfcaton from te data [], especally wt unlabeled data. Clusterng tecnques ave been used for detectng e-mal spam n [],[]. On te oter and, classfcaton tecnques suc as SVM [3] are popular for data classfcaton. However, tey typcally requre labeled data and do not take unlabeled data nto consderaton. Recent developments n sem-supervsed classfcaton tecnques [4], suc as sem-supervsed SVM [5], ncorporate bot labeled and unlabeled data. 3. Desgn 3. Structure of VoIP calls Tere are typcally tree pases nvolved n a VoIP pone call [6]. Te frst pase s call establsment troug a tree-way andsake, wc nvolves () te caller sendng a SIP INVITE message to te proy server and te server forwardng te INVITE message to te callee, () te callee replyng wt a SIP O message, and () te caller sendng SIP C message to complete te call establsment pase. Te second pase s te conversaton, wc contans te meda stream (voce) transmtted between te caller and te callee typcally usng RTP/RTCP [7]. Te last pase s te call tear down pase, wc can be ntated by eter te caller or te callee sendng a SIP BYE message followed by SIP O and SIP C messages. 3. Caracterstcs of VoIP SPIT calls blacklst-based approac can be used at te call establsment pase based on source IP or From URI to drop calls from known SPIT sources. In te meda stream pase, a typcal pattern one can magne for SPIT calls s tat te caller speaks more tan te callee. noter pattern s tat te lengt of te meda stream pase,.e., te call duraton, s sorter n te case of calls answered by a lve person snce SPIT calls are generally undesrable. lso, one can assume tat t s more lkely tat for a SPIT call, a call termnaton wll be ntated by te callee,.e., te callee sends te SIP BYE message. Snce SPIT calls are usually large volume calls made by some sptter wtn a perod of tme, we found tat t s also useful to look for patterns n a batc of calls. Certan features are avalable wen lookng at te collectve set of calls, suc as te nter-arrval tme between calls. lso statstcal learnng can only occur wt a batc of calls. 3.3 Detecton sceme VoIP envronment typcally conssts of multple domans wt eac doman composed of a few proy servers and pones belongng to end users. Fgure sows an eample VoIP envronment consstng of two domans. In a VoIP envronment, a proy server s man functon s to route te sgnalng messages. For te specfc eample we sow, ere Proy # s used to route te sgnalng Legend S SIP based VoIP Proy Server # Clent-sde Detector : normal user : sptter Server-sde Detector Clent-sde Detector S B C Spt Detector Clent-sde Detector SIP based VoIP Proy Server # Fgure. Detectng Spt Calls n a VoIP Envronment S E F
3 messages among pones {,B,C. nd smlarly, Proy # s used to route te sgnalng messages among pones {E,F. Cross doman pone calls {,B,C {E,F are collaboratvely andled by Proy # and Proy #. Once a pone call s establsed, subsequent messages (sgnalng and voce) can travel drectly between pones wtout nvolvng te proes. However, an ISP can mandate all traffc pass troug te proes, wc s often te case for bllng and securty purposes. Our approac n detectng SPIT calls nvolves placng local detectors at te SIP proes and te pones n te managed doman. Te domans tat ave our detecton mecansm are called managed domans and oters are called unmanaged domans. Essentally, te detectors requre observablty of te sgnalng and te meda streams wtn te managed doman. sptter can est as any pone n a VoIP envronment, weter wtn a managed (pone B) or an unmanaged doman (pone E). Te embedded detectors collect te nformaton of te pone calls and send tem to te SPITDetector, were te logc for dfferentatng SPIT calls from non-spit calls eecutes. Te decodng of te traffc and calculaton of te call features are andled by te respectve serversde/clent-sde detectors and only a dgest of te necessary nformaton s forwarded up to te detector, tus mnmzng network traffc. SPITDetector supports two modes of detecton: Mode : Look at eac pone call wt early detecton: In ts mode, te SPITDetector as to determne weter a call s a SPIT or not before te meda stream of te call s establsed. Ts means tat te detecton as to be completed before te callee pcks up te pone. Ts mode s useful from an end-user s pont of vew snce SPIT calls can be potentally blocked wtout furter annoyance. Mode B: Look at te wole batc of pone calls: Wt Mode B, we assume receved calls are kept n a collecton wc are ten presented n a batc to our sem-supervsed clusterng algortm. Ts mode provdes ger detecton accuracy tan Mode due to te avalablty of complete call feature nformaton. Mode B s attractve to a servce provder, rater tan to an end user. 4. SPIT Detecton usng Sem-Supervsed Clusterng 4. Background In our problem contet, eac VoIP call s regarded as one data pont. We are nterested n clusterng call data ponts nto two clusters, one contanng te SPIT calls, and te oter contanng te non-spit calls. In general, tere may be multple sub-clusters wtn eac cluster correspondng to radcally dfferent knds of SPIT or non- SPIT calls. We eplore ts approac of multple subclusters furter n Sec Sem-supervsed clusterng [8], [9], [6] s a recent development n te data clusterng researc communty tat ams to address te ssue of selectng te proper crtera for clusterng. Sem-supervsed clusterng allows te use of optonal labeled data for a subset of te runtme observatons to progressvely modfy te clusterng crtera. Ts means tat one does not need to determne a pror wc features of te data ponts sould be used for clusterng. Te clusterng crtera wll be traned nto generatng clusters tat obey te user-labeled data as fatfully as possble [6]. Te mplct assumpton s tat user feedback s perfectly accurate. In our work ere, we evaluate te mpact of nose n te user feedback. 4. VoIP call features for clusterng We construct a data pont from eac VoIP call based on 7 features: -. From/To URI, 3. Start tme, 4.Duraton, 5. # of SIP INVITE messages, 6. # of C messages, 7-8. # of BYE messages from caller/callee, 9. Tme snce te last call from te orgnator of te current call, -5. # of,, 3, 4, 5, and 6 SIP Response messages, 6. Call frequency of te orgnator of te current call, 7. Rato of non-slence duraton of te callee to te caller meda streams. For Mode early detecton, only features,, 3, and 9 are avalable. Feature 7 s derved from te RTP meda stream by clent-sde detectors f te meda streams are confgured to flow drectly between clents [] or t can be provded by te server-sde detector f te meda streams are confgured to flow troug te SIP Proy. We select te unverse of features usng our doman knowledge, to cover dfferent facets of a VoIP call and to lmt te number of features so tat onlne clusterng s feasble. 4.3 Labeled data va user feedback Pone calls receved n te managed doman can ave optonal user feedback nformaton ndcatng weter a call s a SPIT call or a non-spit call. Te correspondng data pont wll be labeled wt a SPIT or a non-spit tag and fed nto te sem-supervsed clusterng process. Suc a data pont wll be used for adjustng te clusterng crtera. 4.4 Etended -Means for sem-supervsed clusterng: MPC-Means For ts work, we select te sem-supervsed clusterng algortm called MPC-Means [6]. τ mpckm (, j) (, j) ( ) ( ( )) = μl log det l l χ + wj fm (, j ) l lj M + wj fc, j l = lj C () 3
4 T ( ) ( ) μ = μ l l μ () f M (, j) = j + j l lj (3) f C ( j) = l l j l l j (4) T = X ( μ)( μ) X T + wj ( ) ( j )( j ) l lj, j M T + wj ( ) ( )( ), j C (5) T ( j)( j) l = l j Eq. () s te objectve functon tat MPC-Means mnmzes. l s te cluster tat pont s assocated wt. Te man dea s te same as -Means were ntra-cluster dstance s beng mnmzed. However te Eucldean dstance metrc n MPC-Means s wegted by a clusterspecfc matr l (one can also use te same matr across all clusters)[6]. l s modfed based on user feedback and ponts n cluster l followng Eq.(5). Te user labeled data n MPC-Means s suppled n te form of clusterng constrants M (must lnk sets) and C (cannot lnk set). Here te M set specfes pars of data ponts tat sould be put n te same cluster wle te C set specfes tose pars of data ponts tat sould not be put n te same cluster. In Eq. (), te last two terms are used to add penalty to te objectve functon from te volaton of tese constrants. Te functon f M returns a value proportonal to te dstance between te two ponts tat are n dfferent clusters. Te functon f C returns a value tat s nversely proportonal to te dstance between two ponts tat are n te same cluster. Te ponts l and l represent te two fartest data ponts n X l wt respect to ter dstance computed usng l. Te pseudo code for MPC-Means s lsted as lgortm below. Input: Set of data ponts { M = (, j), Set of cannot-lnk constrants C (, j) N X =, Set of must-lnk constrants = { = {, # of clusters, Sets of constrants costs W and W, t. Output: Dsjont -parttonng { X of X suc tat objectve = functon τ mpckm s locally mnmzed. Metod:. Intalze clusters:.. Create te λ negboroods { N P λ P= λ f () Intalze { μ = from M and C. usng wegtest fartest-frst traversal startng from te largest N P. Else () Intalze { μ λ = wt centrods of { N P λ P= Intalze remanng clusters at random. Repeat untl convergence.. For eac data pont X argmn μ log det ( = ( ( ) ) (, ) [ ] j (, ) [ ]) + (, j) M w j M j j (, j) C = l C j j t ssgn to X + ( t ) t.. For eac cluster X, { μ + + ( t+ X ) X.3. Update_metrcs for all clusters { X = (Eq. (5)).4. t t + lgortm. MPC-Means (dapted from [6]) 4.4. Mappng user feedback to par-wse constrants n MPC-Means Te system keeps two sets: F S (data ponts of SPIT calls from feedback) and F N (data ponts of non-spit calls from feedback). For a data pont, wc as user feedback, te user ndcates F S or F N. Wt respect to te MPC-Means algortm, must-lnk constrants M are derved onlne from pars of ponts (, j ) F S or (, j ) F N. Smlarly, cannot-lnk constrants C are created onlne from (, j ), were F S and j F N. For ease of eposton, we ntally dscuss te case wt clusters one eac for SPIT and non-spit calls. We dscuss te etenson to multple clusters n Sec Buldng detecton predcate Gven a cluster X from te clusterng algortm, we use te number of data ponts wt dfferent user feedback n te cluster to determne te assocaton of te cluster. If X FS > X FN, te calls n X wll be consdered SPIT calls; else, tey wll be consdered non-spit calls. 4.5 Effcent MPC-Means In te cluster assgnment step of MPC-Means (Step.) te tme complety on teratng troug te mustlnk/cannot-lnk peers of pont s a O(N) operaton. X s te wole set of data ponts suppled to te clusterng algortm. N= X s te number of data ponts. Te determnaton of te mamally separated ponts ' and '' used n f c (.) (Step. of lgortm ) and update_metrcs (Step.3) as tme complety O(N ). Ts mples MPC-Means s O(N 3 ) snce te operaton as to be done for eac data pont (actually O(cN 3 ) were c s a small fed number of teratons tll convergence). Tus, MPC-Means does not scale well wt large data sets. For our applcaton, were N can be undreds for a small-szed doman or tousands for a md-szed doman, t turns out to be probtve tme-wse to apply te orgnal MPC- Means drectly. Terefore, we adapt MPC-Means nto te empc- Means (effcent MPC-Means) algortm (lgortm ). In t, te mamally separated ponts are estmated troug an O() appromaton algortm. We use an O(N) 4
5 mplementaton for te negborood creaton process n te cluster ntalzaton step of MPC-Means. ddtonally, te general practcal eperence wt a - Means based algortm s tat t converges wtn a small number of teratons for te man loop (Step n MPC- Means). Combned tese make empc-means O(N) and te constant s small for a range of VoIP call traces empc-means : Intalze clusters Te empc-means algortm creates te ntal negboroods drectly from te user feedback F S and F N sets. Specfcally, t creates w negboroods {F S, F N, n3, n4,, nw, were { n3, n4,, nw = X-F S -F N s te set of data ponts not covered by te user feedback. Te complety of ts step s O(N). We use te same wegted-fartest-frst traversal as n MPC-Means, wc s O(N) wen te number of clusters s a constant. Overall, te ntalze clusters n empc-means as O(N) complety empc-means : effcent estmaton of mamally separated ponts (, ) In MPC-Means, to fnd te eact mamally separated ponts (, ) used n Eq. (4) and matr updatng[6], t requres evaluatng te dstance j for every par of ponts (, j ) X, wc s an O(N ) operaton. Snce te matr s updated n eac teraton of te loop of step n lgortm, ts evaluaton as to be repeated as well. In empc-means, we estmate te mamally separated ponts by frst puttng data ponts from X nto an array R[..N] n a random orderng. We ten terate troug consecutve elements R[] and R[+] n te array., to (R[ ], R[ +]) tat gves te mamal We set ( ) value of R[ '] R[ ' + ]. Ts operaton (Step n lgortm ) s performed once rgt after te cluster ntalzaton step and s done tmes, once for eac cluster. Te tme complety of ts step s O(N). However, snce te matr s updated n eac teraton of MPC-Means (Step.3, lgortm ), te estmate (, ) as to be updated accordngly as well. We embed te updatng process nto te calculaton of te parameterzed Eucldean dstance j (Eq. ()). Te parameterzed Eucldean dstance s calculated n Eq. (3) and Eq. (4) as well. Te dea ere s tat wen a par of ponts (, j ) s found to ave a greater dstance tan te current estmate (, ) at te tme of evaluatng te parameterzed Eucldean dstance, we wll set te mamally separated ponts estmate to (, j ). Te advantage of ts approac s tat t s an O() operaton and does not ncrease te order of complety of empc- Means. However, ts s an appromaton because suppose, n te loop to terate troug all te ponts, we are at pont and are calculatng - B. Te pont C s to be consdered n a later teraton and (, C ) appens to be te fartest par of ponts. Ten, te computaton for pont wll not ave te accurate dstance for te fartest par of ponts. Hereafter, wen we refer to Eucldean dstance computaton, we mean tat t as mamally separated pont estmaton embedded wtn t. To nsure tat f C (.) functon (Eq. (4)) does not evaluate to negatve values wt our appromated estmaton of (, ), we enforce tat te second term s always evaluated before te frst term so tat tere s an opportunty to update (, ) Use only a fed number of constrants n cluster assgnment step In te cluster assgnment step of MPC-Means (Step., lgortm ), rater tan teratng troug te complete must-lnk/cannot-lnk peers of, wc makes Step. O(N ), we coose a fed-szed subset of tem. Ts corresponds to Step 3. n empc-means. Ts optmzaton s nted at by te fact tat te mustlnk/cannot-lnk nformaton n our doman as sgnfcant redundancy. set of k and k calls placed, troug user feedback, n te SPIT and non-spit categores generates k +k must-lnk and k k cannot-lnk constrants. On te oter and, we see from epermental results n [6] tat MPC-Means can work reasonably well even wt a lmted numbers of constrants. Te cluster assgnment step tus becomes O(N). In general, ts can negatvely affect te clusterng qualty. However, we beleve t s a trade-off tat s necessary n an effort to make te detecton sceme scalable Pre metrcs update on te startng cluster(s) In MPC-Means, te frst update metrcs step (Step.3) occurs only after te frst teraton of te cluster assgnment step (Step.). In te frst teraton of te cluster assgnment, a default dentty matr s assgned to, wc drectly affects te qualty of te generated clusters from te frst teraton and as a long-term effect on te qualty of te eventual clusters as we see emprcally. Terefore, n empc-means we conduct a metrcs update (Step., empc-means, lgortm ) early on, rgt after te ntal clusters are generated from te cluster ntalzaton step. Intutvely, te user feedback s avalable at te outset and ts optmzaton allows te matr to mmedately adapt to te user feedback, wc results n more accurate clusterng. ddtonally, t mproves te convergence speed as we see later (Table ). Input: Set of data ponts { M N X =, Set of must-lnk constrants = = {(, j), Set of cannot-lnk constrants C {(, j) =, Number of clusters, Sets of constrants costs W and W, 5
6 , t = Output: Dsjont -parttonng { X = of X suc tat objectve functon τ mpckm s locally mnmzed. Metod: (). If ntal cluster centrods { μ s not gven n te nput =.. Create te λ negboroods { N P λ P= wt steps from Sec f λ Use wegtest fartest-frst traversal to select () Optonal ntal cluster centrods { μ Else negboroods { N ( ). P = () ssgn te data ponts { X N P( ) () Intalze { () { μ = X N λ = Intalze remanng clusters at random Intalze { μ () =.. Update metrcs for all clusters { X ([6]). =. Intalzaton of mamally separated ponts (, ) wt respect to eac. 3. Repeat untl convergence 3.. For eac X M { (, j ) M, M = ctssze Randomly select. C (, j) C, C = ctssze ( { ( ) = argmn μ log det ( ) (, ) (, ) ) + w f l + w f = l (, j) M j M j j (, j) C j C j j t ssgn to X + ( t+ ) t+ μ t+ X X 3.. For eac cluster X, { ( ) 3.3. Update_metrcs for all clusters { 3.4. t t+ X = ([6]) lgortm. empc-means lgortm sows te proposed empc-means wt te above modfcatons to MPC-Means. Step decdes te startng centrods (means) for te clusters troug te use of ntal user feedback. For te specfc case of te user flaggng calls as SPIT or non-spit, =. Step ntalzes te mamally separated ponts estmaton. Step 3. performs te cluster assgnment. Step 3. updates te mean. Note tat te mean can be updated n constant tme by keepng te sum of te data ponts and performng an addton/subtracton wen a data pont s assocated wt/unassocated from a cluster. Step 3.3 updates te matr for eac cluster. Te goal of ts process s to pck s suc tat te objectve functon (Eq. ()) s mnmzed for te cluster assgnment done n te current teraton of Step 3. Conceptually, ts process wll result n s tat puts ger wegts on tose features wc are consstent among data ponts n te same cluster and lower wegts on tose tat are less consstent. 4.6 Progressve MPC-Means = Te empc-means algortm assumes tat te data ponts are avalable n a batc, and s tus suted for Mode B (batc mode) detecton (Sec. 3.3). To support Mode per-call early detecton, we create a varant called progressve MPC-Means (pmpc-means). Te pseudo code s gven as lgortm 3. Te dea ere s tat wen a new call comes n, pmpc-means performs only te cluster assgnment step and only for te new data pont. Te features From URI, To URI, Start tme, and Tme from te last call by te same caller are avalable at te begnnng of te pone call and are used n pmpc- Means. For te features tat are not avalable, pmpc- Means flls te data pont wt te mean values from te cluster to wc ts pont s dstance s beng computed. Ts s mplctly carred out n Step 4 of lgortm 3. In pmpc-means, te update metrcs operaton only occurs occasonally wen te cluster means ave canged sgnfcantly (eceedng a gven tresold d tresold ). Estmatng te mean s an O() operaton for eac new data pont. Ts amortzes over many calls te cost of computaton and te cost of re-clusterng all estng data ponts. However, a cost as to be pad n advance, wc s tat we requre reasonably szed cluster(s) to be grown on te ntal data ponts ( X > t tresold ) troug empc- Means. Te reason s tat we want te ntal matr to be as accurate as possble. lgortm: pmpc-means X = ( t ) Input: new data pont t., Dsjont -parttonng { {,,.., ( t ) X t =. Output: Te cluster assocaton l t for te pont t. Dsjont -parttonng { X of X = {,,..,, = t t. Internal Varables: Metod:. If t < t tresold. If { X { μ = t { ; { X ( ) X X t = ( t ) = = (all clusters are empty) X X { t. Call empc-means to generate { X { μ μ = ; Return { { ; Return = from M (, j) M, M = ctssze 3. Randomly select. C (, j ) C, C = ctssze 4. ( = argmn μ log det ( ) X. ( ) (, ) (, ) ) + w f l + w f = l ( t ) 5. { (, j) M j M j j (, j) C j C j j ; X () { X X X = t 6. If μ μ / > dtresold of 6
7 /, are te mamally separated ponts wrt / Call empc-means wt ntal centrods { μ { X = on ( t ) X { ; μ μ. = lgortm 3. pmpc-means 4.7 Mult-Class empc clusterng = to generate We create a varant of empc n wc te ntal clusters are splt nto sub-clusters based on te call types calls gong to voce mal, calls termnated mmedately after te call s establsed, and te remanng calls. Tese tree types ebt dfferent patterns n te nonslence call duraton rato (feature 7, Sec. 4.). Te subclusters are formed for bot SPIT and non-spit calls. Ts s an attempt to gude te clusterng process troug epert knowledge. Te user feedback owever s only able to dfferentate between SPIT and non-spit calls, and not place a call nto a sub-cluster. 5. Eperments and Results 5. Testbed We set up a two-doman testbed wt a topology smlar to Fgure, one of te domans beng protected by our detecton tecnque. We use stersk as te VoIP proy servers and MjSp for te pone clents. Eac doman as 9 pones actng as non-sptters and 6 pones actng as sptters. We use te Posson dstrbuton to model call arrval tmes and te Eponental dstrbuton to model call duratons. Te generaton of call traces was done by only one of te co-autors wtout provdng any nformaton about te nature of non-spit and SPIT calls to te rest of te team. Ts was done by desgn so tat te team workng on te detecton system does not ave any pror knowledge of te call m. Ideally we would ave lked to perform te evaluaton on trd-party call traces. However, at te tme of wrtng, no suc call trace s publcly avalable. 5. Summary of call trace dataset We collected four call traces from our testbed wt varyng call caracterstcs as follows (call trace name, Non-SPIT Call lengt average, Non-SPIT Call nterarrval tme average, SPIT Call lengt average, SPIT call nter-arrval tme average, Number of SPIT calls n trace, Number of non-spit calls n trace): (v4, 5, 3,,,, 7), (v5, 5,,,, 45, 338), (v6, 5, 3,,, 94, 89), (v7, 5, 3, 5,, 8, 3). Te tme unt s mnute. In terms of smlarty between SPIT and non-spit calls, n decreasng order, te call traces are v5, v7, v6, and v4. Tere are oter caracterstcs wc are sared by te four call traces. Eamples nclude a 6% cance of a call beng ung up by te caller for a non-spit call and a % cance of beng ung up by te caller (sptter) for a SPIT call. Te meda streams for a SPIT call are domnated by te sptter wle for a non-spit call, te non-slence duraton on te caller and te callee meda streams are about te same on average. Oter epermental parameter settngs are: at most 5 must-lnk and 5 cannot-lnk constrants are used. Te pmpc-means algortm uses data ponts ntally wt empc-means before commencng ncremental operaton. Eac data pont n te eperment s based on te average from 5 runs wt te same parameter settngs. 5.3 Effect of proporton of user feedback We evaluate te effect of te proporton of calls tat come wt user feedback. We assume te same rato for bot SPIT and non-spit calls. We assume te feedback s perfectly accurate. Fgure sows te clusterng qualty wt respect to four dfferent algortms proposed on call trace 4 n terms of te F-Measure [6]. larger F-Measure value means better qualty clusterng. From Sec. 5., we know tat call trace 4 ebts a very clear dstncton between SPIT and non-spit calls n terms of call duraton and call nterarrval tme. Ts makes empc perform well wt user feedback rato as low as.. Te orgnal MPC-Means aceves te same level but wt a ger user feedback rato of.. Te mproved result of empc s due to te pre-metrcs update (Sec ), wc creates a more accurate wegt matr based on user feedback, pror to teratng over te data ponts. Te F-Measure from empc Mult Class drops wt ncreasng user feedback rato because we break te cluster nto sub-clusters based on te call types. s a result, empc Mult Class wll put dfferent types of SPIT and non-spit calls nto dfferent sub-clusters. Bot wll urt te F-Measure snce by defnton of F-Measure, tese calls sould be clustered nto te same cluster. Ts negatve effect grows stronger as te user feedback rato ncreases. Fgure 3 and Fgure 4 sow te true postve (TP) and false postve (FP) rates of SPIT detecton on call trace v4. Wat we can see ere s tat empc Mult Class actually performs well despte te poor F-Measure. empc Mult Class performs worse tan empc at low user feedback rato because breakng te ntal cluster nto sub-clusters reduces te number of call data ponts wt feedback n eac sub-cluster. Ts results n poor clusterng and ence low detecton accuracy. Compared to empc, MPC s detecton accuracy lags bend due to te lack of premetrcs updatng. pmpc performs rater poorly even wt call trace v4. However, t s stll n te usable range (e.g..63 True Postve wt a user feedback rato of.). pmpc s poor performance s due to te lmted features avalable before te meda stream s establsed. 7
8 Due to space constrants, we sow only te True Postve curves for call traces v5, v6, and v7 n Fgure 5, Fgure 6, and Fgure 7 respectvely. ll te algortms perform worse wt call trace v5 due to same nter-arrval tme of SPIT and non-spit calls. Ts makes te tme snce last call from te same caller and call frequency (features 9 and 6 n Sec. 4.) muc less useful. noter factor s te number of SPIT calls n te call trace s decreased to 45 (compared to n v4) wc furter lowers te clusterng qualty and detecton accuracy. Fgure 8 summarzes te True Postve rates from empc across te four call traces. Ts bascally corresponds to ow salent te dfferences between SPIT calls and non- SPIT calls n te call traces are. In order, te easest one s v4, followed closely by v6, and ten v7. Te ardest s v5. In v5, SPIT calls are almost ndstngusable from sortduraton non-spit calls. We sow error-bar ( ± s.t.d.) for empc n Fgure. Tey are omtted n te rest of te fgures for presentaton clarty. Te general trend s tat te errors dmns wt ncreasng rato of user feedback. We observe less tan ± 5% error across te eperments on call traces 4, 6, and 7 wen user rato s set beyond.. For call trace 5, te error s ger (up to ± 5% at. rato). F-Measure rato of calls wt feedback True Postve Rate rato of calls wt feedback False Postve Rate rato of calls wt feedback Fgure. Call trace v4 / F-Measure Fgure 3. Call trace v4 / TP Fgure 4. Call trace 4 / FP True Postve Rate rato of calls wt feedback True Postve Rate rato of calls wt feedback True Postve Rate rato of calls wt feedback Fgure 5. Call trace 5 / TP Fgure 6. Call trace 6 / TP Fgure 7. Call trace 7 / TP True Postve v4 v5 v6 v rato of calls wt feedback True Postve Rate Nose level False Postve Rate Nose level Fgure 8. Compare empc True Postve Rate across call traces Fgure 9. TP vs. Nose n User Feedback Fgure. FP vs. Nose n User Feedback 5.4 Scalablty of eecuton tme In ts eperment we compare te runnng tmes of MPC and empc by varyng te number of call data ponts. Call trace v7 s used for ts eperment. For MPC, we apply eact optmzatons wc do not cause loss of accuracy. For eample, te mamally separated ponts evaluaton s re-eecuted only wen te matr gets canged. Te results are based on code compled wt 8
9 MPC Tme (ms) MPC empc 3 4 Fgure. Runnng Tme MS VC++ 8. wt default optmzaton level runnng on Wndows XP, Intel E64.3 GHz CPU. s Fgure sows, MPC ebts non-lnear growt n te runnng tme as te number of call data ponts ncreases (error bars are ± std.). empc, on te oter and, ebts a lnear growt n te runnng tme. lso, MPC takes sgnfcantly longer to run compared wt empc 5 tmes longer for a batc of 4 calls. Lookng at te number of teratons tat eac algortm takes to converge (Table ), empc fares better. Te runnng tme advantage of empc comes from te lower number of teratons as well as te lower runnng tme of eac teraton. Te lower number of teratons s eplaned by empc s update of s on te ntalzed clusters. For call trace v5, te smlarty n SPIT and non-spit calls renders te ntalzaton neffectve and te number of teratons s rougly equal for bot algortms. volume = Num of data ponts empc Tme (ms) 5.5 Effect of nose n user feedback MPC empc v v v v verage Table. Number of teratons to convergence We evaluated dfferent algortms wt varous nose levels n te user feedback. Wen we say te nose level s c, t means tat a fracton c of te user feedback s false,.e., a SPIT call s reported as non-spit and vce-versa. We sow te result wt call trace 6 for ts eperment. Te user feedback rato s fed at.3. Fgure 9 sows te true postve rate decreases as te nose level ncreases. Observng te false postve rates n Fgure, we conclude tat pmpc s completely unusable troug te wole nose level range wle te oter algortms are usable at low nose levels. We conclude tat pmpc s usable only for a g proporton of accurate user feedback. Beyond nose level.5 empc performance drops below tat of MPC due to our desgn of te detecton predcate (Sec. 4.4.), namely, consderng te cluster tat contans more calls marked by te user as SPIT tan non-spit, to be te SPIT cluster. Wt nose level above.5, te user feedback s wrong more often tan rgt and te negatve effect s more pronounced n empc tan MPC, snce t dd a better job of clusterng on te user feedback tan MPC. s an eample of a usable operatng pont, consder tat at nose levels. or below, empc as bot true postve and true negatve above.8. volume = volume = TP - FP TP - FP TP - FP nose level.5 rato of feedback Fgure. MPC (TP FP) for call trace v6 -.5 nose level 5.6 Evaluaton wt nose and feedback rato Here we perform an evaluaton of all four proposed algortms wt respect to te four call traces. Our evaluaton metodology consders te combned effect of proporton of user feedback and te nose level and te results are sown n Fgure, Fgure 3, and Fgure 4. In te 3D plot, te Z-as corresponds to TP-FP, te dfference between True Postve rate and False Postve rate, wt respect to eac par of feedback rato and nose level. Intutvely, f TP-FP s greater tan zero, t means te detecton gves more correct results tan ncorrect.5 rato of feedback Fgure 3. empc (TP FP) for call trace v6 -.5 nose level.5 rato of feedback Fgure 4. pmpc (TP FP) for call trace v6 results and can be regarded as a vald operatng pont were te detecton s useful. Due to page lengt lmtaton, we sow te 3D plots only for call trace 6. general trend we can see n te 3D plots s tat wen fng te nose level, te TP-FP value clmbs to a peak and ten goes down wen varyng te feedback rato from to. Tere s no sarp breakdown of performance for any of te algortms. If te user feedback s accurate, ten even wt low rato of user feedback, te performance s good for MPC and empc. Te performance of pmpc on te oter and s acceptable only close to te etreme regon of almost perfect user feedback for almost all calls. To 9
10 gve an overall quantfcaton of te detecton qualty, we defne te volume metrc based on te ntegral (Eq. (6)). In te deal case were TP-FP s mantaned at troug te entre range of nose levels and feedback rato values, te volume wll be.9. Table sows te volume for eac combnaton of algortm and call trace. Call trace v5 gves te lowest volume correspondng to te worst performance for all algortms. veraged over te entre range, we see tat empc performs best followed by empc (Mult Class), MPC, and pmpc. Volume = ( TP FP) df dn TP-FP Volume (6) n= f =. n: nose level, f:feedback rato v4 v5 v6 v7 avg. MPC empc (Mult Class) empc pmpc Table. Summary of TP-FP volume comparsons 6. CONCLUSION In ts paper, we proposed a new approac to detect SPIT calls n a VoIP envronment. We map eac pone call nto a data pont based on an etendable set of call features, derved from te sgnalng as well as te meda protocols. Ts converts te problem of SPIT detecton nto a data classfcaton problem, were a classc soluton s te use of clusterng. We apply sem-supervsed clusterng, wc allows for te optonal use of user feedback for more accurate classfcaton. Ts corresponds to users flaggng some calls as SPIT and oters as legtmate. We create a new algortm called empc-means, based on a prevous algortm called MPC-Means, wc provdes lnear tme performance wt te number of calls. empc-means ncludes a premetrcs-update step, wc contrbutes to g (> 9%) detecton true postve rates wt less tan % user feedback data ponts for tree of te four call traces used ere. We found tat t s dffcult to attan g detecton accuracy based only on features avalable n te call establsment pase, wc would enable a SPIT call to be dropped wtout te user needng to answer te call. Ts algortm pmpc performs well only wt accurate user feedback for a majorty of calls. 7. REFERENCES [] VOIPS, "VoIP Treat Taonomy," 8. [] Y. S. Wu, S. Bagc, S. Garg, and N. Sng, "SCIDIVE: a stateful and cross protocol ntruson detecton arctecture for voce-over-ip envronments," n DSN, 4, pp [3] H. Sengar, D. Wjesekera, H. Wang, and S. Jajoda, "VoIP Intruson Detecton Troug Interactng Protocol State Macnes," n DSN, 6, pp [4] C. J. J. Rosenberg, "RFC 539 : Te Sesson Intaton Protocol (SIP) and Spam," 8. [5] D. Sn, J. n, and C. Sm, "Progressve Mult Gray- Levelng: Voce Spam Protecton lgortm," IEEE Network, vol., pp. 8-4, 6. [6] M. Blenko, S. Basu, and R. J. Mooney, "Integratng constrants and metrc learnng n sem-supervsed clusterng," n ICML, 4, pp [7] J. Quttek, S. Nccoln, S. Tartarell, M. Stemerlng, M. Brunner, and T. Ewald, "Detectng SPIT Calls by Ceckng Human Communcaton Patterns," n ICC, 7, pp [8] R. MacIntos and D. Vnokurov, "Detecton and mtgaton of spam n IP telepony networks usng sgnalng protocol analyss," n IEEE/Sarnoff Symposum on dvances n Wred and Wreless Communcaton, 5, pp [9] P. olan and R. Dantu, "Soco-tecncal defense aganst voce spammng," CM Transactons on utonomous and daptve Systems (TS), vol., 7. [] J. MacQueen, "Some metods for classfcaton and analyss of multvarate observatons," n te Fft Berkeley Symposum on Matematcal Statstcs and Probablty, 967, p. 4. [] P. Hader, U. Brefeld, and T. Sceffer, "Supervsed clusterng of streamng data for emal batc detecton," n ICML, 7, pp [] M. Sasak and H. Snnou, "Spam Detecton Usng Tet Clusterng," n Internatonal Conference on Cyberworlds, 5. [3] C. J. C. Burges, " tutoral on support vector macnes for pattern recognton," Data Mnng and nowledge Dscovery, vol., pp. -67, 998. [4] G. Druck, C. Pal,. McCallum, and X. Zu, "Semsupervsed classfcaton wt ybrd generatve/dscrmnatve metods," n DD, 7, pp [5]. Bennett and. Demrz, "Sem-supervsed support vector macnes," dvances n Neural Informaton processng systems, pp , 999. [6] J. Rosenberg, "RFC 36 - SIP: Sesson Intaton Protocol,". [7] H. Sculzrnne, "RFC RTP: Transport Protocol for Real-Tme pplcatons," 996. [8] N. Grra, M. Crucanu, and N. Boujemaa, "Unsupervsed and Sem-supervsed Clusterng: a Bref Survey," Revew of Macne Learnng Tecnques for Processng Multmeda Content, Report of te MUSCLE European Network of Ecellence (FP6), 4. [9] T. Fnley and T. Joacms, "Supervsed clusterng wt support vector macnes," n ICML, 5, pp [] vop-nfo.org, "stersk SIP Meda Pat."
The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis
The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.
More informationE-learning Vendor Management Checklist
E-learning Vendor Management Checklist June 2008 Permission is granted to print freely, unmodified, this document from www.doingelearning.com or to copy it in electronic form. If linked to from the net
More informationThe OC Curve of Attribute Acceptance Plans
The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4
More informationWhat is Candidate Sampling
What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble
More informationFeature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College
Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure
More informationA Secure Password-Authenticated Key Agreement Using Smart Cards
A Secure Password-Authentcated Key Agreement Usng Smart Cards Ka Chan 1, Wen-Chung Kuo 2 and Jn-Chou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,
More informationAn Alternative Way to Measure Private Equity Performance
An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate
More informationInstitute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic
Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange
More informationA DATA MINING APPLICATION IN A STUDENT DATABASE
JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul
More informationDEFINING %COMPLETE IN MICROSOFT PROJECT
CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,
More informationAn Interest-Oriented Network Evolution Mechanism for Online Communities
An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne
More informationCausal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting
Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of
More informationCredit Limit Optimization (CLO) for Credit Cards
Credt Lmt Optmzaton (CLO) for Credt Cards Vay S. Desa CSCC IX, Ednburgh September 8, 2005 Copyrght 2003, SAS Insttute Inc. All rghts reserved. SAS Propretary Agenda Background Tradtonal approaches to credt
More informationSpam Detection in Voice-over-IP Calls through Semi-Supervised Clustering
Spam Detection in Voice-over-IP Calls through Semi-Supervised Clustering Yu-Sung Wu, Saurabh Bagchi Purdue University, USA Navjot Singh Avaya Labs, USA Ratsameetip Wita Chulalongkorn University, Thailand
More informationVision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION
Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble
More informationThe Cox-Ross-Rubinstein Option Pricing Model
Fnance 400 A. Penat - G. Pennacc Te Cox-Ross-Rubnsten Opton Prcng Model Te prevous notes sowed tat te absence o arbtrage restrcts te prce o an opton n terms o ts underlyng asset. However, te no-arbtrage
More informationService Provider SIP trunk Validation Detailed Test Plan
E Document Number EDC-827327 Based on emplate EDC-206096 Rev 35 Create By Cecly Lu ervce Provder P trunk Valdaton Detaled Plan odfcatons Revson Name User d Date Comments 1 ony Banuelos tbanuelo 11/2/2009
More informationLogistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification
Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson
More informationModule 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..
More informationSTATISTICAL DATA ANALYSIS IN EXCEL
Mcroarray Center STATISTICAL DATA ANALYSIS IN EXCEL Lecture 6 Some Advanced Topcs Dr. Petr Nazarov 14-01-013 petr.nazarov@crp-sante.lu Statstcal data analyss n Ecel. 6. Some advanced topcs Correcton for
More informationA DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña
Proceedngs of the 2008 Wnter Smulaton Conference S. J. Mason, R. R. Hll, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION
More informationCHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol
CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL
More informationNetwork Security Situation Evaluation Method for Distributed Denial of Service
Network Securty Stuaton Evaluaton Method for Dstrbuted Denal of Servce Jn Q,2, Cu YMn,2, Huang MnHuan,2, Kuang XaoHu,2, TangHong,2 ) Scence and Technology on Informaton System Securty Laboratory, Bejng,
More informationAN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE
AN APPOINTMENT ORDER OUTPATIENT SCHEDULING SYSTEM THAT IMPROVES OUTPATIENT EXPERIENCE Yu-L Huang Industral Engneerng Department New Mexco State Unversty Las Cruces, New Mexco 88003, U.S.A. Abstract Patent
More informationOn-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features
On-Lne Fault Detecton n Wnd Turbne Transmsson System usng Adaptve Flter and Robust Statstcal Features Ruoyu L Remote Dagnostcs Center SKF USA Inc. 3443 N. Sam Houston Pkwy., Houston TX 77086 Emal: ruoyu.l@skf.com
More informationData Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,
More informationINVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS
21 22 September 2007, BULGARIA 119 Proceedngs of the Internatonal Conference on Informaton Technologes (InfoTech-2007) 21 st 22 nd September 2007, Bulgara vol. 2 INVESTIGATION OF VEHICULAR USERS FAIRNESS
More informationOn the Use of Bayesian Networks to Analyze Survey Data
On te Use of Bayesan Networks to Analyze Survey Data P. Sebastan 1 (1 and. Ramon ( (1 Department of atematcs and Statstcs, Unversty of assacusetts. ( Cldren's Hosptal Informatcs Program, Harvard Unversty
More informationRequIn, a tool for fast web traffic inference
RequIn, a tool for fast web traffc nference Olver aul, Jean Etenne Kba GET/INT, LOR Department 9 rue Charles Fourer 90 Evry, France Olver.aul@nt-evry.fr, Jean-Etenne.Kba@nt-evry.fr Abstract As networked
More informationUpdating the E5810B firmware
Updatng the E5810B frmware NOTE Do not update your E5810B frmware unless you have a specfc need to do so, such as defect repar or nstrument enhancements. If the frmware update fals, the E5810B wll revert
More informationANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING
ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING Matthew J. Lberatore, Department of Management and Operatons, Vllanova Unversty, Vllanova, PA 19085, 610-519-4390,
More informationbenefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).
REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or
More informationTraffic-light a stress test for life insurance provisions
MEMORANDUM Date 006-09-7 Authors Bengt von Bahr, Göran Ronge Traffc-lght a stress test for lfe nsurance provsons Fnansnspetonen P.O. Box 6750 SE-113 85 Stocholm [Sveavägen 167] Tel +46 8 787 80 00 Fax
More informationExhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation
Exhaustve Regresson An Exploraton of Regresson-Based Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The
More informationCan Auto Liability Insurance Purchases Signal Risk Attitude?
Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang
More informationA Fuzzy Group Decision Making Approach to Construction Project Risk Management
Internatonal Journal of Industral Engneerng & Producton Researc Marc 03, Volume 4, Number pp. 7-80 ISSN: 008-4889 ttp://ijiepr.ust.ac.r/ A Fuzzy Group Decson Makng Approac to Constructon Project Rsk Management
More information1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)
6.3 / -- Communcaton Networks II (Görg) SS20 -- www.comnets.un-bremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes
More informationCourse outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy
Fnancal Tme Seres Analyss Patrck McSharry patrck@mcsharry.net www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton
More informationTraffic State Estimation in the Traffic Management Center of Berlin
Traffc State Estmaton n the Traffc Management Center of Berln Authors: Peter Vortsch, PTV AG, Stumpfstrasse, D-763 Karlsruhe, Germany phone ++49/72/965/35, emal peter.vortsch@ptv.de Peter Möhl, PTV AG,
More informationRobust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School
Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management
More informationQOS DISTRIBUTION MONITORING FOR PERFORMANCE MANAGEMENT IN MULTIMEDIA NETWORKS
QOS DISTRIBUTION MONITORING FOR PERFORMANCE MANAGEMENT IN MULTIMEDIA NETWORKS Yumng Jang, Chen-Khong Tham, Ch-Chung Ko Department Electrcal Engneerng Natonal Unversty Sngapore 119260 Sngapore Emal: {engp7450,
More informationSPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:
SPEE Recommended Evaluaton Practce #6 efnton of eclne Curve Parameters Background: The producton hstores of ol and gas wells can be analyzed to estmate reserves and future ol and gas producton rates and
More informationAn RFID Distance Bounding Protocol
An RFID Dstance Boundng Protocol Gerhard P. Hancke and Markus G. Kuhn May 22, 2006 An RFID Dstance Boundng Protocol p. 1 Dstance boundng Verfer d Prover Places an upper bound on physcal dstance Does not
More informationMultiple-Period Attribution: Residuals and Compounding
Multple-Perod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens
More information7.5. Present Value of an Annuity. Investigate
7.5 Present Value of an Annuty Owen and Anna are approachng retrement and are puttng ther fnances n order. They have worked hard and nvested ther earnngs so that they now have a large amount of money on
More informationDescriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications
CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary
More informationTools for Privacy Preserving Distributed Data Mining
Tools for Prvacy Preservng Dstrbuted Data Mnng hrs lfton, Murat Kantarcoglu, Jadeep Vadya Purdue Unversty Department of omputer Scences 250 N Unversty St West Lafayette, IN 47907-2066 USA (clfton, kanmurat,
More informationAnswer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy
4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.
More informationStudy on Model of Risks Assessment of Standard Operation in Rural Power Network
Study on Model of Rsks Assessment of Standard Operaton n Rural Power Network Qngj L 1, Tao Yang 2 1 Qngj L, College of Informaton and Electrcal Engneerng, Shenyang Agrculture Unversty, Shenyang 110866,
More informationHow To Calculate The Accountng Perod Of Nequalty
Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.
More informationThe Safety Board recommends that the Penn Central Transportation. Company and the American Railway Engineering Association revise
V. RECOWNDATONS 4.! The Safety Board recommends that the Penn Central Transportaton Company and the Amercan Ralway Engneerng Assocaton revse ther track nspecton and mantenance standards or recommended
More informationSMOOTH TRAJECTORY PLANNING ALGORITHMS FOR INDUSTRIAL ROBOTS: AN EXPERIMENTAL EVALUATION
1. Albano LANZUTTI SMOOTH TRAJECTORY PLANNING ALGORITHMS FOR INDUSTRIAL ROBOTS: AN EXPERIMENTAL EVALUATION 1. DIPARTIMENTO DI INGEGNERIA ELETTRICA, GESTIONALE E MECCANICA UNIVERSITA' DI UDINE, UDINE ITALY
More informationA Simple Approach to Clustering in Excel
A Smple Approach to Clusterng n Excel Aravnd H Center for Computatonal Engneerng and Networng Amrta Vshwa Vdyapeetham, Combatore, Inda C Rajgopal Center for Computatonal Engneerng and Networng Amrta Vshwa
More informationSingle and multiple stage classifiers implementing logistic discrimination
Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,
More informationForecasting the Direction and Strength of Stock Market Movement
Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems
More informationA Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification
IDC IDC A Herarchcal Anomaly Network Intruson Detecton System usng Neural Network Classfcaton ZHENG ZHANG, JUN LI, C. N. MANIKOPOULOS, JAY JORGENSON and JOSE UCLES ECE Department, New Jersey Inst. of Tech.,
More informationAn Integrated Approach of AHP-GP and Visualization for Software Architecture Optimization: A case-study for selection of architecture style
Internatonal Journal of Scentfc & Engneerng Research Volume 2, Issue 7, July-20 An Integrated Approach of AHP-GP and Vsualzaton for Software Archtecture Optmzaton: A case-study for selecton of archtecture
More informationCluster Analysis. Cluster Analysis
Cluster Analyss Cluster Analyss What s Cluster Analyss? Types of Data n Cluster Analyss A Categorzaton of Maor Clusterng Methos Parttonng Methos Herarchcal Methos Densty-Base Methos Gr-Base Methos Moel-Base
More informationLatent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006
Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model
More informationA DISTRIBUTED REPUTATION MANAGEMENT SCHEME FOR MOBILE AGENT- BASED APPLICATIONS
Bamasak & Zhang: A Dstrbuted Reputaton Management Scheme for Moble Agent-Based Applcatons A DISTRIBUTED REPUTATION MANAGEMENT SCHEME FOR MOBILE AGENT- BASED APPLICATIONS Omama Bamasak School of Computer
More informationMaximizing the Bandwidth Multiplier Effect for Hybrid Cloud-P2P Content Distribution
Maxmzng te Bandwdt Multpler Effect for Hybrd Cloud-P2P Content Dstrbuton Zenua L,2, Teyng Zang 3, Yan Huang, Z-L Zang 2, Yafe Da Pekng Unversty 2 Unversty of Mnnesota 3 ICT, CAS Tencent Researc Bejng,
More informationStochastic Protocol Modeling for Anomaly Based Network Intrusion Detection
Stochastc Protocol Modelng for Anomaly Based Network Intruson Detecton Juan M. Estevez-Tapador, Pedro Garca-Teodoro, and Jesus E. Daz-Verdejo Department of Electroncs and Computer Technology Unversty of
More informationPerformance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application
Internatonal Journal of mart Grd and lean Energy Performance Analyss of Energy onsumpton of martphone Runnng Moble Hotspot Applcaton Yun on hung a chool of Electronc Engneerng, oongsl Unversty, 511 angdo-dong,
More informationL10: Linear discriminants analysis
L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss
More informationDynamic Pricing for Smart Grid with Reinforcement Learning
Dynamc Prcng for Smart Grd wth Renforcement Learnng Byung-Gook Km, Yu Zhang, Mhaela van der Schaar, and Jang-Won Lee Samsung Electroncs, Suwon, Korea Department of Electrcal Engneerng, UCLA, Los Angeles,
More informationActivity Scheduling for Cost-Time Investment Optimization in Project Management
PROJECT MANAGEMENT 4 th Internatonal Conference on Industral Engneerng and Industral Management XIV Congreso de Ingenería de Organzacón Donosta- San Sebastán, September 8 th -10 th 010 Actvty Schedulng
More information) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance
Calbraton Method Instances of the Cell class (one nstance for each FMS cell) contan ADC raw data and methods assocated wth each partcular FMS cell. The calbraton method ncludes event selecton (Class Cell
More informationComputer-assisted Auditing for High- Volume Medical Coding
Computer-asssted Audtng for Hgh-Volume Medcal Codng Computer-asssted Audtng for Hgh- Volume Medcal Codng by Danel T. Henze, PhD; Peter Feller, MS; Jerry McCorkle, BA; and Mark Morsch, MS Abstract The volume
More informationAnts Can Schedule Software Projects
Ants Can Schedule Software Proects Broderck Crawford 1,2, Rcardo Soto 1,3, Frankln Johnson 4, and Erc Monfroy 5 1 Pontfca Unversdad Católca de Valparaíso, Chle FrstName.Name@ucv.cl 2 Unversdad Fns Terrae,
More informationFault tolerance in cloud technologies presented as a service
Internatonal Scentfc Conference Computer Scence 2015 Pavel Dzhunev, PhD student Fault tolerance n cloud technologes presented as a servce INTRODUCTION Improvements n technques for vrtualzaton and performance
More informationProject Networks With Mixed-Time Constraints
Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa
More information"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *
Iranan Journal of Scence & Technology, Transacton B, Engneerng, ol. 30, No. B6, 789-794 rnted n The Islamc Republc of Iran, 006 Shraz Unversty "Research Note" ALICATION OF CHARGE SIMULATION METHOD TO ELECTRIC
More informationEVALUATING THE PERCEIVED QUALITY OF INFRASTRUCTURE-LESS VOIP. Kun-chan Lan and Tsung-hsun Wu
EVALUATING THE PERCEIVED QUALITY OF INFRASTRUCTURE-LESS VOIP Kun-chan Lan and Tsung-hsun Wu Natonal Cheng Kung Unversty klan@cse.ncku.edu.tw, ryan@cse.ncku.edu.tw ABSTRACT Voce over IP (VoIP) s one of
More informationMethodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications
Methodology to Determne Relatonshps between Performance Factors n Hadoop Cloud Computng Applcatons Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng and
More informationAuthenticated AODV Routing Protocol Using One-Time Signature and Transitive Signature Schemes
JOURNAL OF NETWORKS, VOL. 1, NO. 1, MAY 2006 47 Autentcated AODV Routng Protocol Usng One-Tme Sgnature and Transtve Sgnature Scemes Sd Xu Unversty of Wollongong, Wollongong, Australa Emal: sdx86@uow.edu.au
More informationEnabling P2P One-view Multi-party Video Conferencing
Enablng P2P One-vew Mult-party Vdeo Conferencng Yongxang Zhao, Yong Lu, Changja Chen, and JanYn Zhang Abstract Mult-Party Vdeo Conferencng (MPVC) facltates realtme group nteracton between users. Whle P2P
More informationCanon NTSC Help Desk Documentation
Canon NTSC Help Desk Documentaton READ THIS BEFORE PROCEEDING Before revewng ths documentaton, Canon Busness Solutons, Inc. ( CBS ) hereby refers you, the customer or customer s representatve or agent
More informationEffective Network Defense Strategies against Malicious Attacks with Various Defense Mechanisms under Quality of Service Constraints
Effectve Network Defense Strateges aganst Malcous Attacks wth Varous Defense Mechansms under Qualty of Servce Constrants Frank Yeong-Sung Ln Department of Informaton Natonal Tawan Unversty Tape, Tawan,
More informationEnterprise Master Patient Index
Enterprse Master Patent Index Healthcare data are captured n many dfferent settngs such as hosptals, clncs, labs, and physcan offces. Accordng to a report by the CDC, patents n the Unted States made an
More informationHow To Understand The Results Of The German Meris Cloud And Water Vapour Product
Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller
More informationProactive Secret Sharing Or: How to Cope With Perpetual Leakage
Proactve Secret Sharng Or: How to Cope Wth Perpetual Leakage Paper by Amr Herzberg Stanslaw Jareck Hugo Krawczyk Mot Yung Presentaton by Davd Zage What s Secret Sharng Basc Idea ((2, 2)-threshold scheme):
More informationEvidence for Adverse Selection in the Automobile Insurance Market
Evdence for Adverse Selecton n te Automoble Insurance Market Racel J. Huang * Assstant Professor, Fnance Department Mng Cuan Unversty, Tape, Tawan Larry Y. Tzeng Professor, Department of Fnance Natonal
More informationPSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12
14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed
More informationNumber of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35,000 100,000 2 2,200,000 60,000 350,000
Problem Set 5 Solutons 1 MIT s consderng buldng a new car park near Kendall Square. o unversty funds are avalable (overhead rates are under pressure and the new faclty would have to pay for tself from
More informationSurvey on Virtual Machine Placement Techniques in Cloud Computing Environment
Survey on Vrtual Machne Placement Technques n Cloud Computng Envronment Rajeev Kumar Gupta and R. K. Paterya Department of Computer Scence & Engneerng, MANIT, Bhopal, Inda ABSTRACT In tradtonal data center
More informationJoint Scheduling of Processing and Shuffle Phases in MapReduce Systems
Jont Schedulng of Processng and Shuffle Phases n MapReduce Systems Fangfe Chen, Mural Kodalam, T. V. Lakshman Department of Computer Scence and Engneerng, The Penn State Unversty Bell Laboratores, Alcatel-Lucent
More informationOn the Optimal Control of a Cascade of Hydro-Electric Power Stations
On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;
More informationThe Greedy Method. Introduction. 0/1 Knapsack Problem
The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton
More informationA Design Method of High-availability and Low-optical-loss Optical Aggregation Network Architecture
A Desgn Method of Hgh-avalablty and Low-optcal-loss Optcal Aggregaton Network Archtecture Takehro Sato, Kuntaka Ashzawa, Kazumasa Tokuhash, Dasuke Ish, Satoru Okamoto and Naoak Yamanaka Dept. of Informaton
More informationForecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network
700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School
More informationAllocating Time and Resources in Project Management Under Uncertainty
Proceedngs of the 36th Hawa Internatonal Conference on System Scences - 23 Allocatng Tme and Resources n Project Management Under Uncertanty Mark A. Turnqust School of Cvl and Envronmental Eng. Cornell
More informationMETHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS
METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng
More informationHP Mission-Critical Services
HP Msson-Crtcal Servces Delverng busness value to IT Jelena Bratc Zarko Subotc TS Support tm Mart 2012, Podgorca 2010 Hewlett-Packard Development Company, L.P. The nformaton contaned heren s subject to
More informationFREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES
FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES Zuzanna BRO EK-MUCHA, Grzegorz ZADORA, 2 Insttute of Forensc Research, Cracow, Poland 2 Faculty of Chemstry, Jagellonan
More informationIntra-year Cash Flow Patterns: A Simple Solution for an Unnecessary Appraisal Error
Intra-year Cash Flow Patterns: A Smple Soluton for an Unnecessary Apprasal Error By C. Donald Wggns (Professor of Accountng and Fnance, the Unversty of North Florda), B. Perry Woodsde (Assocate Professor
More informationCHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES
CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES In ths chapter, we wll learn how to descrbe the relatonshp between two quanttatve varables. Remember (from Chapter 2) that the terms quanttatve varable
More informationIncome Tax Statistics Analysis: A Comparison of Microsimulation Versus Group Simulation
INTERNAONAL JOURNAL OF MICROSIMULAON (009) (1) 3-48 Income Tax Statstcs Analyss: A Comparson of Mcrosmulaton Versus Group Smulaton Heko Müller 1 and Caren Suret 1 Rur-Unversty of Bocum, Faculty of Economcs,
More informationA Performance Analysis of View Maintenance Techniques for Data Warehouses
A Performance Analyss of Vew Mantenance Technques for Data Warehouses Xng Wang Dell Computer Corporaton Round Roc, Texas Le Gruenwald The nversty of Olahoma School of Computer Scence orman, OK 739 Guangtao
More informationCHAPTER-II WATER-FLOODING. Calculating Oil Recovery Resulting from Displ. by an Immiscible Fluid:
CHAPTER-II WATER-FLOODING Interfacal Tenson: Energy requred ncreasng te area of te nterface by one unt. Te metods of measurng IFT s nclude a rng tensometer, pendant drop and spnnng drop tecnques. IFT s
More informationStatistical Methods to Develop Rating Models
Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and
More information