|
|
|
- Benjamin Parsons
- 10 years ago
- Views:
Transcription
1 DataClusteringAnalysisinaMultidimensionalSpace A.BouguettayaandQ.LeViet QueenslandUniversityofTechnology SchoolofInformationSystems Brisbane,Qld4001,Australia theresultofafairlyexhaustivestudytoevaluatethreecommonlyusedclusteringalgorithms, Thereisawidechoiceofmethodswithdierentrequirementsincomputerresources.Wepresent Clusteranalysistechniquesareusedtoclassifyobjectsintogroupsbasedontheirsimilarities. Abstract namely,singlelinkage,completelinkage,andcentroid.theclusteranalysisstudyisconducted inthe2dimensionalspace.threetypesofstatisticaldistributionareused.twodierent 1 IntroductionandMotivation typesofdistancestocomparelistsofobjectsarealsoused.theresultspointtosomestartling similaritiesinthebehaviorandstabilityofallclusteringmethods. groupsbasedontheirsimilarities.indatabases,clusteranalysishasbeenusedtore-allocatestored informationbasedonpredenedcriteriawiththegoaltoimprovetheeciencyofdataretrieval basedontheirdegreeofassociation[21].insimplewords,clusteranalysisclassiesitemsinto Clusteranalysisisagenericnameformultivariateanalysistechniquestocreategroupsofobjects reducethenumberofdiskaccesses.inadistributedenvironment,clusteringisevenmoreimportant becauseoftheimpactontheresponsetimeiftherequesteddataisphysicallylocatedatdierent toanother.byre-allocatingdata,relatedinformationisphysicallystoredascloseaspossibleto operations.thestandardwayforevaluatingdegreeofsimilaritiesvariesfromoneapplication sites.theneedtodoclusteringisclear.therearemanyissuesthatneedtobeaddressed: 1.Calculationofthedegreeofassociationbetweendierenttypesofobjects. 2.Determinationofanacceptablecriteriontoevaluatethe\goodness"ofclusteringmethods. 3.Adaptabilityoftheclusteringmethodswithdierentdistributionsofdata:randomlyscattered,skewedorconcentratedaroundcertainregions,etc. theclusteringsomeofwhichare:hierarchicalversuspartitional,agglomerativeversusdivisive, Severalclusteringmethodshavebeenproposedthatdierintheapproachtakentoperform ExtrinsicversusIntrinsic,etc.[7][10][8][14][22][23][25][18][1].Inthatrespect,eachclustering methodhasadierenttheoreticalbasisandisapplicabletoparticularelds.weproposeafairly 1
2 exhaustivestudyofwellknownclusteringtechniquesinthe2-dspace.previousstudies,thatare lessexhaustiveintheiranalysis,havefocusedonthe1-dspace[5].ourexperimentincludeavariety 1.1Denitions ofenvironmentsettingstotesttheclusteringtechniquessensibilityandbehavior.theresultscan beofparamountimportanceacrossawiderangeofapplications. thispaper.theontologieswecoverarethefollowing:clusteranalysis,objects,clusters,distance Wepresentahighleveldescriptionofthedierentontologiesusedintheresearchliteratureand Clusteranalysisisaboutthegenerationofgroupsofentitiesthattasetofdenitions.The andsimilarity,coecientofcorrelation,andstandarddeviation. groupwhichformsaclustershouldhavehigherdegreeofassociationswithingroupmembers thanmembersofdierentgroups.atahighlevelofabstraction,aclustercanbeviewedasa tion.clusteranalysishasapropertythatmakesitdierentfromotherclassicationmethods, namely,informationaboutclassesofgroupingsarenotknownpriortotheprocessing.items aregroupedintoclustersthataredenedbymembersofthoseclusters. groupof\similar"objects.clusteranalysisissometimesreferredtoasautomaticclassica- Objects(oritems)areusedinabroadsense.Theycanbeanythingthatrequiretobeclassied basedoncertaincriteria.theobjectmayrepresentasingleattributeinarelationaldatabase oracomplexobjectinanobject-orienteddatabaseprovidedthatitcanberepresentedasa samemeasurementspace.ina1{d(onedimensional)environment,anobjectisrepresented pointinameasurementspace.obviously,allobjectstobeclusteredshouldbedenedinthe Clustersaregroupsofobjectslinkedtogetheraccordingtosomerules.Thegoalofclusteringis numbers. asapointbelongingtoasegmentdenedbytheinterval[a,b]whereaandbarearbitrary tondgroupscontainingobjectsmosthomogeneouswithinthesegroups.homogeneityrefers clustersinameasurementspace:asahypotheticalpointwhichisnotanobjectinthecluster, orasanexistingobjectintheclustercalledcentroidorclusterrepresentative. tothecommonpropertiesoftheobjectstobeclustered.therearetwowaystorepresent thenumberofobjectsinthecluster.fromthatpointofview,asingleobjectisacluster Clustersarerepresentedinthemeasurementspaceinthesameformastheobjectsthey contain.todistinguishbetweenanobjectandacluster,additionalinformationisneeded: DistanceandSimilarity:Toclusteritemsinadatabaseorinanyotherenvironment,some containingexactlyoneobject.anexampleofclustersina1{denvironmentisfg,fg, f,,g,etc. thechoicemayhaveaneectontheresultsobtained.objectswhichhavemorethanone dimensionmayuserelativeornormalizedweighttoconverttheirdistancetoanarbitrary measureofdistancesorsimilarities.thereareanumberofsimilaritymeasuresavailableand meansofquantifyingthedegreeofassociationsbetweenitemsareneeded.theymaybea scalesotheycanbecompared.oncetheobjectsaredenedinthesamemeasurementspace asthepoints,itisastraightforwardexercisetocomputethedistanceorsimilarity.the smallerthedistancethemoresimilartwoobjectsare.themostpopularchoiceincomputing distanceistheeuclideandistancewith:2
3 wherenisthenumberofdimensions.forthe1{dspace,thedistancebecomes: d(i;j)=q(xi1?xj1)2+(xi2?xj2)2+:::+(xin?xjn)2 ThereisalsotheManhattandistanceorcityblockconceptsthatarerepresentedasfollows: d(i;j)=jxi?xjj Thedistancebetweentwoclustersinvolvessomeorallitemsofthetwoclustersandiscalculateddierentlydependingontheclusteringmethod. d(i;j)=jxi1?xj1j+jxi2?xj2j+:::+jxin?xjnj StandardDeviationisthemeasurementoftheuctuationofofvaluesascomparedtothemean ThestandarddeviationofarandomvariableXisgivenby value.inthisstudy,standarddeviationisusedtoshowtheacceptabilityoftheresults. CoecientofCorrelationisthemeasurementofthestrengthofrelationshipbetweentwovariableXandY.Itessentiallyanswersthequestion\howsimilarareXandY?".Thevalues (X)=qE(X2)?E2(X) ofthecoecientsofcorrelationrangefrom0to1wherethevalue0pointstonosimilarityandthevalue1pointshighsimilarity.thecoecientofcorrelationisusedtond thesimilarityamongobjects.thecorrelationroftworandomvariablesxandywhere: X=(x1;x2;x3;:::;xn)andY=(y1;y2;y3;:::;yn)isgivenbytheformula: wheree(x)=pni=1xi n ande(y)=pni=1yi r= p(e(x2)?e2(x)p(e(y2)?e2(y) je(x;y)?e(x)e(y)j 1.2RelatedWork n ande(x;y)=pni=1xiyi n. andchemicalstructures[18][27][14][22]. Clusteranalysishasbeenusedinseveraleldsofsciencetodeterminetaxonomyrelationships amongentities,includingeldsdealingwithspecies,psychiatricproles,censusandsurveydata, ofrelateddatainadatabasetoimprovetheperformanceofdbmss.datarecordswhichare caseofdatabaseclustering,theabilitytocategorizeitemsintogroupsallowsthere-allocation learningtodatacompression[11].ourapplicationdomainisintheareaofdatabases.inthe Clusteringapplicationsrangefromdatabases(eg.dataclusteringanddatamining)tomachine frequentlyreferencedtogetheraremovedincloseproximitytoreduceaccesstime.toreachthis goal,clusteranalysisisusedtoformclustersbasedonthesimilaritiesofdataitems.datamaybe 3
4 OODBsusesomelimitedformofclusteringtoimprovetheirperformance.However,theyare re-allocatedbasedonvaluesofanattribute,groupofattributesoronaccessingpatterns.these mostlystaticinnature[4].thecaseofoodbsisuniqueinthattheunderlyingmodelprovidesa criteriadeterminethemeasuringdistancebetweendataitems. testbedfordynamicclustering.thisisthereasonwhyclusteringtakesonawholenewmeaning WiththeadventofOODBs,theneedofecientclusteringtechniquesbecomescrucial.Some withoodbs.therehasbeenasurgeinthenumberofstudiesofdatabaseclustering[13][17][24] [3][2].Inparticular,therewererecentlyanumbersofstudieswhichinvestigateadaptiveclustering techniques,i.e.,theclusteringtechniqueswhichcancopewithchangingaccesspatternandperform of,previouslyunknownpatternsinlargedatasetsstoredindatabases[19][9].thepatternsare clusteringon-line[5],[5][26]. thenusedtopredictthemodelofdataclassication.thereisawiderangeofbenetsfor\mining" datatondinterestingassociations.datawarehousesbecomevaluableintermsofunderstanding, Indatabaseminingandknowledgediscovery,theprimarygoalisthesearchfor,andthediscovery managing,andusingpreviouslyunknownrelationshipsbetweensetsofdata.ourexperimentsare targetspecicapplicationsandareapplicabletoawiderangeofdomains. meanttoprovideagenericviewonhowdataisclustered.inthisregard,theexperimentsdonot Average,withdierentsettings[5].Thepreliminaryndingseemtopointthatthechoiceofclusteringmethodbecomesirrelevantintermsofnaloutcomes.thestudypresentedhereextendronmentsettings.Weinvestigatedthreecommonlyusedclusteringalgorithms:Slink,Clink,and Thisresearchbuildsuponpreviousworkthatwehaveconductedusingadierentsetofenvi- ourpreviousworktoincludeseveralothersettings[5].thenewenvironmentssettingsincludeadditionalparametersthatinclude:anewclusteringmethod,astatisticaldistribution,largerinput behaviorandsensitivityoftheconsideredclusteringmethods. s,andspacedimension.theaimistoprovideabasisforamorecategoricalargumentastothe Ouraimistoseewhetherclusteringisdependentonthewayobjectsaregenerated.Therstone statisticaldistributions;andweselectedthethreethatcloselymodelrealworlddistributions[6]. 1.3StatisticalDistributions Theobjectsusedinthisstudyconsistsofpointslyingintheinterval[0,1].Therearenumerous istheuniformdistributionandthesecondoneisthepiecewisedistributionandthethirdoneis thegaussiandistribution.inwhatfollows,wedescribethestatisticaldistributionsthatweused UniformDistribution inourexperiments. Piecewise(Skewed)Distribution Therespectivedistributionfunctionisthefollowing:F(x)=x. Thedensityfunctionofthisdistributionisf(x)=F0(x)=18xsuchthat0x1. Therespectivedistributionfunctionisthefollowing: 4
5 F(x)= 8 ><>: 0:05 0:475if0:37x<0:62 0:525if0:62x<0:743 0:95 if0x<0:37 1 if0:743x<0:89 Gaussian(Normal)Distribution Thedensityfunctionofthisdistributionis:f(x)=F(b)?F(a) if0:89x1 b?a 8xsuchthatax<b. Therespectivedistributionfunctionisthefollowing:F(x)=1 p2e?(x?)2 2isthevariance. Thisisatwo-parameter(and)distribution,whereisthemeanofthedistributionand 22 InproducingsamplesfortheGaussiandistribution,wechoose=and=. f(x)=f0(x)=1 p2?x 3e?(x?)2 22 F(x)= 8 ><>: 0:00132if0:1x<0:2 0:02277if0:2x<0:3 0:15867if0:3x<0:4 0:49997if0:4x<0:5 Followingisanoutlineofthepaper.Insection2,theclusteringmethodsusedinthisstudyare Forvaluesofxthatareintherange[,1],thedistributionissymmetric. 1 for0:0x1 described.insection3,wedetailtheexperimentsconductedinthisstudy.insection4,weprovide theinterpretationsoftheexperimentresults.insection5,weprovidesomeconcludingremarks. 2Therearedierentwaystoclassifyclusteringmethodsaccordingtothetypeofclusterstructure theyproduce.thesimplenon-hierarchicalmethodsdividethedatasetofnobjectsintomclusters, ClusteringMethods memberofaclusterwithwhichitismostsimilarto,andtheclustermayberepresentedbya wherenooverlapisallowed.theyarealsoknownaspartitioningmethods.eachitemisonlya linkeduntileveryiteminthedatasetislinkedtoformonecluster.hierarchicalmethodscanbe: centroidorclusterrepresentativethatrepresentsthecharacteristicsofallcontainedobjects.this methodisheuristicallybasedandmostlyappliedinsocialsciences. eitheragglomerative,withn-1pairwisejoinsfromanunclustereddataset.inotherwords, Hierarchicalmethodsproduceanesteddatasetinwhichpairsofitemsorclustersaresuccessively fromnclustersofoneobject,thismethodgraduallyformsoneclusterofnobjects.ateach step,clustersorobjectsarejoinedtogetherintolargerandlargerclustersendingwithone bigclustercontainingallobjects. 5
6 ordivisive,inwhichallobjectsbelongtoasingleclusteratthebeginning,thentheyare dividedintosmallerclustersoverandoveruntilthelastclustercontainingtwoobjectshave methods".thehierarchicaltreemaybepresentedasadendrogram,inwhichpairwisecoupling Inbothcases,theresultoftheprocedureisahierarchicaltree,hencethename\hierarchical beenbrokenapartintobothatomicconstituents. ofthesimilarityisrepresentednumerically.divisivemethodsarelesscommonlyusedandonly agglomerativemethodswillbediscussedinthispaper. oftheobjectsinthedatasetisshownandthelengthofthebranches(vertices)orthevalue Thesemethodshavebeenusedintheexperimentspresentedinthispaper. Thesectiondescribesthehierarchicalagglomerativeclusteringmethodsandtheircharacteristics. 2.1HierarchicalClusteringTechniques Singlelinkageclusteringmethod(Slink):Thedistancebetweentwoclustersisthesmallest distanceofalldistancesbetweentwoitems(x;y),denoteddx;y,suchthatxisamember ofaclusterandyisamemberofanothercluster.thismethodisalsoknowasthenearest neighbormethod.thedistancedx;yiscomputedasfollows: isthesimplestamongallclusteringmethods.ithassomeattractivetheoreticalproperties where(x;y)areclusters,and(x;y)areobjectsinthecorrespondingclusters.thismethod DX;Y=minfDx;ygwithx2X;y2Y Completelinkageclusteringmethod(Clink):Thesimilaritycoecientisthelongestdistancebetweenanypairofobjects,denotedDx;y,takenfromtwoclusters.Thismethodisalso [12].However,ittendstoformlongorchainingclusters.Thismethodmaynotverysuitable forobjectsconcentratedaroundsomecentersinthemeasurementspace. calledfurthestneighborclusteringmethod[16][22][8].thedistanceiscomputedasfollows: Centroid/medianmethod:Clustersinthismethodarerepresentedbya\centroid",apoint inthemiddleofthecluster.thedistancebetweentwoclustersisthedistancebetweentheir DX;Y=minfDx;ygwithx2X;y2Y 2.2GeneralAlgorithm centroids.thismethodalsohasaspecialcaseinwhichthecentroidofthesmallergroupis leveledtothelargerone. [8].Asarststep,objectsaregeneratedusingarandomnumbergenerator.Inourcase,these objectsareobjectsintheinterval[0,1].aftertheseobjectsarecreated,theyarecomparedto Weprovideexamplesusingthethreeclusteringmethods.Moreexamplescanbefoundin[22] eachotherbymeasuringthedistance.thedistancebetweentwoclustersiscomputedusingthe similaritycoecient.thewayobjectsandclustersofobjectsarejoinedtogethertoformlarger clustersvarieswiththeapproachused.weoutlineagenericalgorithmthatisapplicabletoall 6
7 clusteringmethods.essentially,itconsistsoftwophases.therstphaserecordsthesimilarity coecients.thesecondphasecomputestheminimumandtheperformsclustering.initiallyevery clusterconsistsofexactlyoneobject. 1.Scanallclustersandrecordallsimilaritycoecients. 4.Goto(1). 3.Ifexactlyoneclusterremainsthenstop. 2.Computetheminimumofallsimilaritycoecientsandthenjointhecorrespondingclusters. WhenperformingStep2,thersttwoclustersarejoined.However,whencomputingthesimilarity Step1,threesuccessiveclustersaretobejoined(theyallhavetheminimumsimilarityvalue). coecientbetweenthisnewclusterandthethirdcluster,thesimilarityvaluemaynowbedierent ThereisacasewhenusingClinkmethodwhereambiguitymayarise.Supposewhenperforming fromtheminimumvalue.thequestionnowiswhatthenextstepshouldbe.thereareessentially twopossibilities: Eitherproceedbyjoiningclustersusingarecomputationofthesimilaritycoecientforeach Orjoinallthoseclustersthathavethesimilaritycoecientdierentatonceanddonot recomputethesimilarityinstep2. timeinstep2. therstalternative. Ingeneral,thereisnoevidencethatoneisbetterthantheother[22].Forourstudy,weselected 2.3Examples Followingareexamplesofhowdataisclusteredtoprovideanideahowdierentclusteringmethods workwiththesamesetofdata.example1usesslinkmethod,example2usesclinkmethod,and Example3usesCentroid.Thesampledatahas10itemsandeachitemhasanidentication;anda valueonwhichthedistanceiscalculated.theuniformdistributionwasusedtogeneratetheabove setofobjects.forthesakeofsimplicity,weconsiderthe1-dspaceforgeneratingdataobjects. Example1:Slink 3.Joinclustersf4gandf8gat Joinclustersf6gandf10,1gat Joinclustersf10gandf1gatdistance Joinclustersf4,8gandf3,7gandf2,6,10,1,5,9gasoneclusterat Joinclustersf3gandf7gasoneclusterandf2gf6,10,1gf5gandf9gasanothercluster at
8 Value Id Table1:Exampleofasampledatalist Example2:Clink Figure1:ClusteringTreeusingSlink Joinclustersf3gandf7gasonecluster,andf2gandf6gasanothercluster,andf5g 2.Joinclustersf4gandf8gat Joinclustersf1gandf10gatdistance Joinclustersf4,8gandf3,7gat Joinclustersf2,6gandf1,10gat andf9gasanotherclusterat Joinclustersf4,8,3,7gandf2,6,10,1,5,9gat Joinclustersf2,6,10,1gandf5,9gat
9 Example3:Centroid Figure2:ClusteringTreeusingClink Joinclustersf2gandf6gasonecluster,andf3gandf7gasanothercluster,andf4g 1.Joinclustersf10gandf1gatdistanceof andformthecentralpointat Joinclustersf5gandf9gat058852andformthecentralpointat andf8gasanotherclusterat andformthecentralpointsat , 4.Joinclustersf1,10gandf2,6gat andformthecentralpointat ,and Joinclustersf1,10,2,6gandf5,9gat andformthecentralpointat 5.Joinclustersf3,7gandf4,8gat andformthecentralpointat Joinclustersf1,10,2,6,5,9gandf3,7,4,8gat Figure3:ClusteringTreeusingCentroid
10 treeinwhichnobjectsarelinkedbyn{1connectionsandthereisnocycleinthetree.msthas Inthisstudy,thehierarchicaltreeisimplementedasaminimumspanningtree(MST).MSTisa 3 ExperimentDescription thefollowingpropertiesthatmakeitsuitabletorepresentthehierarchicaltree. Anyisolateditemcanbeconnectedtoanearestneighbor. Anyisolatedfragment(sub-setofaMST)inanycasecanbeconnectedtoanearestneighbor 0to1inclusiveandtheirsrangefrom100to500.Thecongruentiallinearalgorithmisusedto Datausedinthispaperisdrawnfromatwo-dimensional(2-D)space.Theirvaluesrangefrom bytheavailablelink. generatedataobjects[15][20].theseedisthesystemtime.eachexperimentfollowedthefollowing steps: Calculatethecoecientofcorrelationforeachclusteringmethod. Generatelistsofobjects. Carryouttheclusteringprocesswiththreedierentclusteringmethods. lationiscalculated.theleastsquareapproximation(lsa)isusedtoevaluatetheacceptabilityof denedbythecorrespondingstandarddeviation,theapproximationisdeemedtobeacceptable. theapproximation.ifacoecientofcorrelationobtainedusingthelsa,fallswithinthesegment Eachexperimentisrepeated100timesandthestandarddeviationofthecoecientsofcorre- dierencebetweentwoobjects.theothermethodofcomputingthedistanceistousetheminimum obtainedfromlistsofobjects,tocomputetheircoecientofcorrelation.thedistanceusedin thecoecientofcorrelationcouldforinstancebecomputedusingtheactuallinear(euclidean) Typesofdistancesincomparingtrees:Therearetwowaysofcomparingtwotrees, numberofedges(ofatree)neededtojointwoobjects.thelatterhasanadvantageovertheformer inthatitprovidesamore"natural"implementationofacorrelation.wecallthersttypeof distance,lineardistanceandtheseconddistance,edgedistance.oncewechooseadistancetype, wecomputethecoecientofcorrelationbyselectingonepairofidentiersinthesecondlist parametershavebeenused.forinstance,theclusteringmethodisoneparameter.thereare3*2*3 (shorterlist)andcomputeitsdistanceandthenlookforthesamepairintherstlistandcompute itsdistance.werepeatthesameprocessforallremainingpairsinthesecondlist. =18possiblewaystocomputethecoecientofcorrelationfortwolistsofobjects.Indeed,we havethefollowingchoices: Thereareseveraltypesofcoecientsofcorrelation.Thisstemsfromthefactthatseveral firstparameter8><>:slink Clink secondparameter(lineardistance Centroid 10edgedistance
11 thirdparameter8><>:uniformdistribution piecewisedistribution thedatainput.thisdetermineswhatkindofdataistobecomparedandwhatitsis.the Theotherdimensionofthiscomparisonstudythathasadirectinuenceontheclusteringis gaussiandistribution totheinputdata.foreverytypeofcoecientofcorrelationmentionedabove,eleven(11)typesof situations(hence,elevencoecientsofcorrelation)havebeenisolated.itisourbeliefthatthese followingcaseshavebeenidentiedtocheckthesensitivityofeachclusteringmethodwithregard casescloselyrepresent,whatmayinuencethechoiceofaclusteringmethod. 1.ThecoecientofcorrelationisbetweenpairsofobjectsdrawnfromasetSandpairsof 2.ThecoecientofcorrelationisbetweenpairsofobjectsdrawnfromSandpairsofobjects objectsdrawnfromthersthalfofthesamesets.thersthalfofsisusedbeforetheset drawnfromthesecondhalfofs.thesecondhalfofsisusedbeforethesetissorted. issorted. 3.ThecoecientofcorrelationisbetweenpairsofobjectsdrawnfromthersthalfofS,say number1andsoisgiventherstobjectofs02.thesecondobjectofs2isgivenasidentier thenumber2andsoisgiventhesecondobjectofs02andsoon. S2,andpairsofobjectsdrawnfromthersthalfofanothersetS',sayS02.Thetwosetsare givenascendingidentiersafterbeingsorted.therstobjectofs2isgivenasidentierthe 4.ThecoecientofcorrelationisbetweenpairsofobjectsdrawnfromthesecondhalfofS,say 5.ThecoecientofcorrelationisbetweenpairsofobjectsdrawnfromSandpairsofobjects S2,andpairsofobjectsdrawnfromthesecondhalfofS',sayS02.Thetwosetsaregiven ascendingidentiersafterbeingsortedinthesamewasasthepreviouscase. 6.Thecoecientofcorrelationdenitionisthesameasthefthcoecientofcorrelationexcept drawnfromtheunionofasetxands.thesetxcontains10%newrandomlygenerated thatxnowcontains20%newrandomlygeneratedobjects. objects. 8.Thecoecientofcorrelationdenitionisthesameasthefthcoecientofcorrelationexcept 7.Thecoecientofcorrelationdenitionisthesameasthefthcoecientofcorrelationexcept thatxnowcontains40%newrandomlygeneratedobjects. thatxnowcontains30%newrandomlygeneratedobjects. 10.ThecoecientofcorrelationisbetweenpairsofobjectsdrawnfromSusingtheuniform 9.ThecoecientofcorrelationisbetweenpairsobjectsdrawnfromSusingtheuniformdistributionandpairsofobjectsdrawnfromS'usingthepiecewisedistribution. distributionandpairsofobjectsdrawnfroms'usingthegaussiandistribution. 11
12 11.ThecoecientofcorrelationisbetweenpairsofobjectsdrawnfromSusingtheGaussian distributionandpairsofobjectsdrawnfroms'usingthepiecewisedistribution. Inanutshell,theabovecoecientsofcorrelationaremeanttoanalyzedierentsituationsinthe evaluationofresults.wepartitionthesesituationsintothreesituations,representedbythreeblocks ofcoecientsofcorrelation: FirstBlock:Therst,second,thirdandfourthcoecientsofcorrelationareusedtocheckthe SecondBlock:Thefth,sixth,seventhandeightcoecientsofcorrelationareusedtocheck theinuenceofthedata. inuenceofthecontextonhowobjectsareclustered. ThirdBlock:Theninth,tenth,andeleventhcoecientsofcorrelationareusedtocheckthe relationwhichmayexistbetweentwolistsobtainedusingtwodierentdistributions. lationandstandarddeviationvalues(ofthesametype)arecomputed.theleastsquareapproxi- mationisthenappliedtoobtainthefollowingequation: Toensurethestatisticalrepresentativityoftheresults,theaverageof100coecientofcorre- Thecriterionforagoodapproximation(oracceptability)isgivenbytheinequality: f(x)=ax+b deviationforyi.ifthisinequalityissatised,fisthenagoodapproximation.theleastsquare whereyiisthecoecientofcorrelation,fistheapproximationfunctionandisthestandard jyi?f(xi)j(yi)foralli approximation,ifacceptable,helpspredictthebehaviorofclusteringmethodsfordatapoints beyondtherangeconsideredinourexperiments. Asstatedearlier,theaimofthisstudyistoconductexperimentstodeterminethestabilityof 4clusteringmethodsandhowtheycomparetoeachother.Forthesakeofreadability,ashorthand ResultsandtheirInterpretations notationisusedtoindicateallpossiblecases.asimilarnotationhasbeenusedinourprevious ofcorrelation.thetablesdescribetheleastsquareapproximationsofthecoecientsofcorrelations. distributionandlineardistance;theabbreviationsulisused. ndings[5].forinstance,torepresenttheinputwiththefollowingparameters:slink,uniform Resultsarepresentedinguresandtables.Theguresdescribethedierenttypesofcoecients 12
13 Term Slink Clink Centroid Shorthand UniformDistr. S GaussianDistr.GC PiecewiseDistr.PO LinearDistanceLU Table2:ListofAbbreviations EdgeDistance E Coecient First&Fifth Second&Sixth Third&Seventh Correlation ofdashed solid dotted First Blocksand SecondTenth Ninth Eleventh Coecient Correlation ofdashed solid dotted ThirdBlock Table3:GraphicalRepresentationofallTypesofCorrelationCoecientsandStandardDeviations Fourth&Eighth dash-dotted 4.1AnalysisoftheStabilityandSensitivityoftheClusteringMethods Werstlookatthedierentclusteringmethodsandanalyzehowstableandsensitivetheyareto tothechangesinparametervalues Slink:ResultsInterpretation thevariousparameters.inessence,weareinterestedinknowinghoweachclusteringmethodreacts Wethenprovideaninterpretationofthecorrespondingresults. Welookatthebehaviorofthe3blocksofcoecientsofcorrelationvaluesasdenedinsection3. thecontext.fig.4representtherst4coecientsofcorrelation.the(small)dierencebetweenl Aspreviouslymentioned,therst4coecientsofcorrelationaremeanttotesttheinuenceof Firstblockofcoecientsofcorrelation andevaluesisconsistentlythesameacrossallexperimentswiththeexceptionofthoseexperiments comparingdierentdistributions(fig.6,fig.9,andfig.12).wenotethatthevaluesusingeare However,whenEisused,thismaynotbetrue(eg.treethatisnotheightbalanced)sincethe distanceisequaltothenumberofedgesconnectingtwomembersbelongingtodierentclusters. WhenLisused,thedistancebetweenthemembersoftwoclustersisthesameforallmembers. consistentlysmallerthanthosevaluesusingl.thereasonliesindierenceincomputingdistances. InthecaseofFig.6,Fig.9,andFig.12thedierenceisattenuatedduetotheuseofdierent 13
14 Single linkage Uniform dist. Linear Single linkage Uniform dist. Edge Single linkage Piecewise dist. Linear Single linkage Piecewise dist. Edge 0.9 Single linkage Gaussian dist. Linear Single linkage Gaussian dist. Edge Figure4:Slink:FirstBlockofCoecientofCorrelation distributions. ofcorrelationisalmostthesame.thispointstothefactthatthedistancetypedoesnotplaya WhenthevaluesinLandEarecomparedagainsteachother,thetrendamongthe4coecients majorroleinthenalclustering. typesofcorrelationcomparedataobjectsdrawnfromdierentsets.oneshouldexpecttheformer fourthtypesofcorrelationbecauseofthecorrespondingintrinsicsemantics.therstandsecond typesofcorrelationcomparesdataobjectsdrawnformthesameinitialset.thethirdandfourth Thevaluesoftherstandsecondtypesofcorrelationarelargerthanthoseofthethirdand dataobjectstobemorerelatedthanthelatterdataobjects.thestandarddeviationvaluesexhibit pointstothefactthatthedierenttypesofcorrelationbehaveinauniformandpredictablefashion. roughlythesamekindofbehaviorastheircorrespondingcoecientofcorrelationvalues.this totheimportantobservationthatthedatacontextdoesnotseemtoplayasignicantroleinthe onthenalclustering.notethattheslopevalueisalmostequaltozero.thisisalsoconrmedby naldataclustering.likewise,thedatasetdoesnotseemtohaveanysubstantialinuence Sincethecoecientofcorrelationvaluesaregreaterthan.5inalltheabovecases,thispoints 14
15 theuniformbehaviorofthestandarddeviationvaluesasdescribedabove. Secondblockofcoecientsofcorrelation Single linkage Uniform dist. Linear Single linkage Uniform dist. Edge Single linkage Piecewise dist. Linear 0.9 Single linkage Piecewise dist. Edge Single linkage Gaussian dist. Linear Single linkage Gaussian dist. Edge depictsthenextblockofcoecientofcorrelation. Thenext4coecientsofcorrelationchecktheinuenceofthedataonclustering.Fig.5 Figure5:Slink:SecondBlockofCoecientofCorrelation WhenthevaluesinLandEarecompared,thereisnosubstantialdierencewhichisindicative oftheindependenceoftheclusteringfromanytypeofdistanceused.asinthepreviouscase, correlations. thestandarddeviationvaluesexhibitthesamebehaviorasthecorrespondingcoecientofcorrelationvalues.thisisreminiscentofauniformandpredictablebehaviorofthedierenttypesof thepreviouscase,thedatadoesnotseemtoinuencethenalclusteringoutcomeastheslope isnearlyequaltozero.likewise,thedatasetdoesnotseemtohaveanysubstantialinuence Thehighvaluesindicatethatthecontextshavelittleeectonhowdataisclustered.Asin 15
16 onthenalclustering.notethattheslopevalueisalmostequaltozero.thisisalsoconrmedby theuniformbehaviorofthestandarddeviationvaluesasdescribedabove. Thirdblockofcoecientsofcorrelation Single linkage... Linear Single linkage... Edge 5 5 std / cc (9 11) 5 5 std / cc (9 11) Thenext3coecientsofcorrelationchecktheinuenceofthedistributionforLandE.All Figure6:Slink:ThirdBlockofCoecientofCorrelation 5 otherparametersaresetandthesameforthepairsofsetsofobjectstobecompared. case,showsvaluesthatareabitlowerthanthevaluesincurvesrepresentingug(uniformand Gaussiandistributions)andGP(GaussianandPiecewisedistributions).Thiscanbeexplained Fig.6depictsthelastthreetypesofcoecientsofcorrelations. bytheproblemofbootstrappingtherandomnumbergenerator.thisisaconstantmostofthe ThecurverepresentingthecaseforUP(UniformandPiecewisedistributions)ineitherLorE experimentsconductedinthisstudy.therstconcurrentexperiments(slinkusinglande) exhibitabehaviorthatisalittledierentfromtheotherpiecesofexperiments. Thisisindicativeoftheindependenceoftheclusteringfromthetypesofdistancesused.Like thepreviouscase,thestandarddeviationvaluesexhibitthesamebehaviorasthecorresponding coecientofcorrelationvalues.thisisreminiscentofauniformandpredictablebehaviorofthe WhenthevaluesinthecaseofLandEarecompared,nosubstantialdierenceisobserved. dierenttypesofcorrelations. clusteringonewayortheother.asinthepreviouscase,thedatadoesnotseemtoinuence seemtohaveanysubstantialinuenceonthenalclustering.notethattheslopevalueisalmost thenalclusteringoutcomeastheslopeisnearlyequaltozero.likewise,thedatasetdoesnot Sincevaluesconvergetothevalue.5,thisindicatesthatthedistributionsdonotinuencethe equaltozero.thisisalsoconrmedbytheuniformbehaviorofthestandarddeviationvaluesas describedabove. behaviorastheslinkmethod.fig.7,fig.8,and,fig.9depictthetherst,second,thirdblocks Inessence,theexperimentsfortheClinkclusteringmethodfollowthesametypeofpatternand 4.1.2Clink:ResultsInterpretation 16
17 Complete linkage Uniform dist. Linear Complete linkage Uniform dist. Edge Complete linkage Piecewise dist. Linear Complete linkage Piecewise dist. Edge Complete linkage Gaussian dist. Linear Complete linkage Gaussian dist. Edge Figure7:Clink:FirstBlockofCoecientofCorrelation ofcoecientsofcorrelation. asbothfollowaverysimilarpatternofbehaviorandthevaluesforboththecoecientsofcorrelationandthestandarddeviationarequitesimilar. TheinterpretationsthatapplyfortheSlinkmethodalsoapplyfortheClinkclusteringmethods thesametypeofpatternandbehavior.theonlydierenceliesinthevaluesforthedierent AswasthecaseforSlinkandClink,theexperimentsfortheCentroidclusteringmethodsfollow correlationsobtainedusingdierentparameters.fig.10,fig.11,and,fig.12depictthetherst, 4.1.3Centroid:ResultsInterpretation second,thirdblocksofcoecientsofcorrelation. clusteringmethodsasitsfollowsasimilarpatternofbehavior.similarly,thevaluesforboththe coecientsofcorrelationandthestandarddeviationarealsosimilar. TheinterpretationsthatapplyforthepreviousclusteringmethodsalsoapplyfortheCentroid 17
18 Complete linkage Uniform dist. Linear Complete linkage Uniform dist. Edge 1 Complete linkage Piecewise dist. Linear Complete linkage Piecewise dist. Edge Complete linkage Gaussian dist. Linear Complete linkage Gaussian dist. Edge Figure8:Clink:SecondBlockofCoecientofCorrelation Complete linkage... Linear 5 Complete linkage... Edge std / cc (9 11) Figure9:Clink:ThirdBlockofCoecientofCorrelation 5 std / cc (9 11)
19 Centroid linkage Uniform dist. Linear 5 Centroid linkage Uniform dist. Edge Centroid linkage Piecewise dist. Linear Centroid linkage Piecewise dist. Edge Centroid linkage Gaussian dist. Linear Centroid linkage Gaussian dist. Edge Figure10:Centroid:FirstBlockofCoecientofCorrelation correlation.block1,block2,andblock3correspondrespectivelytotherstblockof4coecients Table4showsasummaryoftheresultsthataveragesoutthedierentcomputedcoecientsof 4.2SummaryofResults ofcorrelation,thesecondblockof4coecientsofcorrelation,andthethirdblockof3coecients ofcorrelation. 4.3AcceptabilityoftheLeastSquareApproximation thestandarddeviation.ifthisisthecase,thenwesaythattheapproximationisgood.otherwise, Table5,Table6,andTable7(seeAppendix)representtheleastsquareapproximationsforall coecientofcorrelationvaluesfallwithintheintervaldelimitedbytheapproximatingfunctionand thecurvesshowninthisstudy.theacceptabilityofanapproximationdependsonwhetherallthe 19
20 0.9 Centroid linkage Uniform dist. Linear Centroid linkage Uniform dist. Edge Centroid linkage Piecewise dist. Linear 0.9 Centroid linkage Piecewise dist. Edge 0.9 Centroid linkage Gaussian dist. Linear Centroid linkage Gaussian dist. Edge Figure11:Centroid:SecondBlockofCoecientofCorrelation 5 Centroid linkage... Linear 5 Centroid linkage... Edge 5 5 std / cc (9 11) 5 5 Figure12:Centroid:ThirdBlockofCoecientofCorrelation std / cc (9 11)
21 Block1LU.6 SlinkClinkCentroid EU.5 G Block2LU.8 P.7 G P EU.65 G Block3L P.8 G E Table4:SummaryofResults.45 welookathowmanypointsdonotfallwithintheboundariesanddeterminethegoodnessofthe function.usingthesefunctions,willenableustopredictthebehavioroftheclusteringmethods small.thispointstothestabilityofallresults.allapproximationsyieldalmostparallellinesto withhigherdatasets. thex-axis. AsTable5,Table6,andTable7(seeAppendix)show,thevaluesoftheslopesareallvery 4.4ComparisonofResultsacrossClusteringMethods approximationslistedintablesmentionedabovearegoodapproximations. Theacceptabilitytestwasrunandallpointspassedthetestsatisfactorily.Thereforeallthe candrawnfromtheexperimentsshowinthedierentguresandtables. Inwhatfollows,wecomparethedierentclusteringmethodsagainsteachotherusingthedierent parametersusedinthisstudy.werelyontheresultsobtainedandthegeneralobservationsthat 1.Theresultsshowthatacrossspacedimensions,thecontextdoesnotcompletelyhidethesets. Forinstance,therstandsecondtypesofcoecientsofcorrelation(asshowninallgures) 2.Theresultsshowthatgiventhesamedistributionandtypeofdistance,allclusteringmethods arealittledierentfromthethirdandfourthtypesofcoecientofcorrelation(asshownin allgures).thevaluesclearlyshowwhatkindsofcoecientsarecomputed. 3.Slink,Clink,andCentroidseemtohaveverysimilarbehavior.Thecoecientsofcorrelation exhibitthesamebehaviorandyieldapproximatelythesamevalues. valuesarealsoveryclose.anexplanationforthesimilarityinbehaviorbetweenslink,clink, andcentroidisthatthesemethodsarebasedononesingleobjectperclustertodetermine similaritydistances. 21
22 4.Thesecondblockofcoecientsofcorrelationforallclusteringmethods,demonstratethat 5.Theresultsalsoshowthatallclusteringmethodsareequallystable.Thisndingcomesasa closetothevalue1. thecontextdoesnotinuencethedataclusteringbecauseallcoecientsofcorrelationare 6.Theresultsshowthatthedatadistributiondoesnotsignicantlyaecttheclusteringtechniquesbecausethevaluesobtainedareverysimilartoeachother.Thatisarelativelymajor surprise,asintuitively,oneexpectsaclusteringmethodtobemorestablethantheothers. 7.Thethirdblockofcoecientsofcorrelationacrossallclusteringmethodsshowthatthethree ndingastheresultsstronglypointtotheindependenceofthedistributionandthedata clustering. 8.Thetypeofdistance(linearoredge)doesnotinuencetheclusteringprocessasthereare methodsarelittleornotperturbedeveninanoisyenvironmentsincetherearenosignicant dierencesinresultsfromuniformandpiecewise,andgaussiandistributions. Theresultsobtainedinthisstudyconrmthosefrom[5]whichusedone-dimensional(1{D) oredgedistances. nosignicantdierencesbetweenthecoecientsofcorrelationobtainedusingeitherlinear datasampleandfewerparameters.theresultspointverystronglythatingeneral,noclustering techniqueisbetterthananother.whatthisessentiallymeansisthatthereisaninherentwayfor dataobjectstoclusterandindependentlyfromthetechniquesused. computationattractivenessandnothingelse.thisisaveryimportantresultasinthepastthere wasneveranevidencethatclusteringmethodshadverysimilarbehavior. Theotherimportantresultsthattheonlydiscriminatorforselectingaclusteringmethodisits inuencetheoutcomeoftheclusteringprocess.indeed,allclusteringmethodsconsideredhere exhibitabehaviorthatisalmostconstantregardlessoftheparametersbeingusedincomparing them. Theresultspresentedhereareacompellingevidencethatclusteringmethodsdonotseemto rangeofparameterstotestthestabilityandsensitivityoftheclusteringmethods.theseexperimentswereconductedforobjectsthatareinthe2-dspace.theresultsobtainedoverwhelmingly andregardlessoftheparametersused.thefactthatdataobjectsaredrawnfromdierentdata Inthisexhaustivestudy,weconsideredthreeclusteringmethods.Theexperimentsinvolvedawide 5 Conclusion pointtothestabilityofeachclusteringmethodandthelittlesensitivitytonoise.themoststartling ndingsofthisstudyishowever,thatallclusteringmethodsexhibitanalmostidenticalbehavior; seemtoplayamajorroleinthenalshapeoftheclustering.thisalsomeansthattheonlycriterionthatshouldbeusedtoselectoneclusteringmethodistheattractivenessofthecomputational Theabovendingsareofparamountimplications.Inthatregard,oneofthemostimportant resultsisthatobjectshaveanaturaltendencytoclusterthemselves.clusteringmethodsdonot spacesdoesnotchangetheabovendings[5]. complexityoftheclusteringalgorithm. 22
23 Akbowledgments References WewouldliketothankMostefaGoleaandAlexDelisfortheirinsightfulanddetailedcomments. [1]M.S.AlderferandR.K.Blasheld.ClusterAnalysis.SagePublication,California,1984. [2]JayBanerjee,WonKim,Sung-JoKim,andJorgeGarza.Clusteringadagforcaddatabases. [3]VeroniqueBenzaken.Anevaluationmodelforclusteringstrategiesintheo2object-oriented IEEETransactionsonSoftwareEngineering,14(11):1684{1699,1988. [4]VeroniqueBenzakenandClaudeDelobel.Enhancingperformanceinapersistentobjectstore: Clusteringstrategiesino2.InPODS,1990. databasesystem.inicdt,1990. [6]A.DelisandV.R.Basili.DataBindingTool:aToolforMeasurementBasedAdaSource [5]A.Bouguettaya.On-lineClustering.IEEETransactionsonKnowledgeandDataEngineering, ReusabilityandDesignAssessment.InternationalJournalofSoftwareEngineeringandKnowledgeEngineering,3(3):287{318,November (2),April1996. [7]R.DubesandA.K.Jain.ClusteringMethodologiesinExploratoryDataAnalysis.Advances [8]B.Everitt.ClusterAnalysis.HeinemannEducationalBooks,Yorkshire,England,1977. [9]UsamaM.Fayyad,GregoryPiatetsky-Shapiro,PadhraicSmyth,andRamasamyUthurusamy, incomputers,19,1980. [10]J.A.Hartigan.ClusteringAlgorithms.JohnWiley&Sons,London,1975. editors.advancesinknowledgediscoveryanddatamining.aaaipress/mitpress,menlo [11]AnilK.Jain,JianchangMao,andK.M.Mohiuddin.Articialneuralnetworks.Computer, Park,CA,1996. [12]N.JardineandR.Sibson.MathematicalTaxonomy.JohnWiley&Sons,London,1971. [13]Jia-bing,R.Cheng,andA.R.Hurson.Eectiveclusteringofcomplexobjectsinobject-oriented 29(3):31{44,1996. [14]L.KaufmanandP.J.Rousseeuw.FindingGroupsinData,anIntroductiontoClusterAnalysis. databases.insigmod,1991. [15]D.E.Knuth.TheArtofComputerProgramming.Addison-Wesley,Reading,MA,1971. [16]G.NLanceandW.T.Williams.AGeneralTheoryforClassicationSortingStrategy.The JohnWiley&Sons,London,1990. ComputerJournal,9(..):373{386,
24 [17]WilliamJ.McIverandRogerKing.Self-adaptive,on-linereclusteringofcomplexobjectdata. [18]F.Murtagh.ASurveyofRecentAdvancesinHierachicalClusteringAlgorihms.TheComputer InSIGMOD,1994. [19]G.Piatetsky-ShapiroandW.J.Frawley,editors.KnowledgeDiscoveryinDatabases.AAAI Journal,26(4):354{358,1983. [20]W.H.Press.NumericalRecipesinC:TheArtofScienticProgramming.CambridgeUniversityPress,2ndedition,1992. Press,MenloPark,CA,1991. [21]E.Ramussen.ClusteringAlgorithmsinInformationRetrieval.InW.BFrakesR.Baeza- [22]H.C.Romesburg.ClusterAnalysisforResearchers.KriegerPublishingCompany,Malabar, Yates,editor,InformationRetrieval:DataStructuresandAlgorithms.Prentice-Hall,EnglewoodClis,NJ,1990. [23]R.C.TryonandD.E.Bailey.ClusterAnalysis.McGraw-Hill,NewYork,1970. [24]ManolisM.TsangarisandJereyF.Naughton.Astochasticapproachforclusteringinobject FL,1990. [25]P.Willett.RecentTrendsinHierarchicDocumentClustering,aCriticalReview.Information bases.insigmod,1991. [26]C.T.Yu,C.Suen,K.Lam,andM.K.Siu.AdaptiveRecordClustering.ACMTransactions ProcessingandManagement,9(24):577{597,1988. [27]J.Zupan.ClusteringofLargeDataSets.ResearchStudiesPress,Letchworth,England,1982. ondatabasesystems,2(10):180{204,june
25 TypeFirstCorrelation SUL X+2 SUE X+6 SPL SPE X X+9 SecondCorrelation X X X X X X+0 ThirdCorrelation X X+9 FourthCorrelation SGL X+7 SGE X X X X X X X X+3 CUL X+8 CUE X+3 CPL X X X X X X+2 CPE X X X X+2 CGL X X X+2 CGE X X X X+1 OUL X X X X+8 OUE X X X X+8 OPL X X X X X X+2 OPE X X X X X+2 OGL X X+3 OGE X X X X X X X X X X X+6 TypeFifthCorrelation Table5:FunctionApproximationoftheFirstBlockofCoecientsofCorrelation SUL X+8 SUE X+0 SPL SPE X X X+5 SixthCorrelation X X+1 SeventhCorrelationEighthCorrelation SGL X X X X X X X X+0 SGE X X X X+8 CUL X X+3 CUE X X X X+0 CPL X X X X X X X+7 CPE X+7 CGL X X X X X X X+0 CGE X+6 OUL X X X+4 OUE X X X X+5 OPL X X X X+9 OPE X X X X+1 OGL X X X X+3 OGE X X X X X X X X X X X X+4 Table6:FunctionApproximationoftheSecondBlockofCoecientsofCorrelation 25
26 TypeNinthCorrelation SL SE CL CE X X X X X+2 TenthCorrelation X X X X X X X+1 EleventhCorrelation Table7:FunctionApproximationoftheThirdBlockofCoecientsofCorrelation OL OE X X X X X X+9 26
3-17 15-25 5 15-10 25 3-2 5 0. 1b) since the remainder is 0 I need to factor the numerator. Synthetic division tells me this is true
Section 5.2 solutions #1-10: a) Perform the division using synthetic division. b) if the remainder is 0 use the result to completely factor the dividend (this is the numerator or the polynomial to the
Zeros of Polynomial Functions
Zeros of Polynomial Functions Objectives: 1.Use the Fundamental Theorem of Algebra to determine the number of zeros of polynomial functions 2.Find rational zeros of polynomial functions 3.Find conjugate
Xxxxxxxxxxxxxxxx Xxxxxxxxxxxxxxxx
DEPARTMENT OF THE TREASURY INTERNAL REVENUE SERVICE WASHINGTON, D.C. 20224 201432038 TAX EXEMPT AND GOVERNMENT ENTITIES DIVISION MAY 1 5 2014 Uniform Issue List: 408.03-00 Legend: Taxpayer A IRA X IRAY
Contents. Financial Analysis Report
Contents Page Introduction 5 Operational Costs & Profit/Loss 7 Detail of Expenses & Profit/Loss for 2007 9 Operating Ratio Analysis 13 Operational Costs by Company Sales 19 Profit/Loss History 1985 through
All of my instructors showed a true compassion for teaching. This passion helped students enjoy every class. Amanda
F 228 D O z F/ Fx L / H V L I & P G G F Q, z,, B, z -, q k k k FUN F x 20% 02 F 9185957834 I P G j P, E, j, k,, ; I I G F Ex 2011 H B H 2011-2012 F H E U F P G I G L L 228 D & 228 k B P 04 F 9185957834
Magrathea Non-Geographic Numbering Plan
Magrathea Tel: 0345 004 0040 Fax: 0345 004 0041 e-mail: [email protected] Magrathea Non-Geographic Numbering Plan Personal Numbering Service 07011 2xxxxx Personal Number 19.98p (pn8) 07031
Sample Size Calculation for Longitudinal Studies
Sample Size Calculation for Longitudinal Studies Phil Schumm Department of Health Studies University of Chicago August 23, 2004 (Supported by National Institute on Aging grant P01 AG18911-01A1) Introduction
On computer algebra-aided stability analysis of dierence schemes generated by means of Gr obner bases
On computer algebra-aided stability analysis of dierence schemes generated by means of Gr obner bases Vladimir Gerdt 1 Yuri Blinkov 2 1 Laboratory of Information Technologies Joint Institute for Nuclear
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Probability for Estimation (review)
Probability for Estimation (review) In general, we want to develop an estimator for systems of the form: x = f x, u + η(t); y = h x + ω(t); ggggg y, ffff x We will primarily focus on discrete time linear
Bayesianprobabilisticextensionsofadeterministicclassicationmodel K.U.Leuven,Belgium IwinLeenenandIvenVanMechelen AndrewGelman ColumbiaUniversity,NewYork binarypredictorvariablesx1;:::;xk,abooleanregressionmodelisaconjunctive(ordisjunctive)logicalcombinationconsistingofasubsetsofthe
CROSS REFERENCE. Cross Reference Index 110-122. Cast ID Number 110-111 Connector ID Number 111 Engine ID Number 112-122. 2015 Ford Motor Company 109
CROSS REFERENCE Cross Reference Index 110-122 Cast ID Number 110-111 Connector ID Number 111 112-122 2015 Ford Motor Company 109 CROSS REFERENCE Cast ID Number Cast ID Ford Service # MC Part # Part Type
Michigan Public School Accounting Manual presented by Glenda Rader Grand Ledge Public Schools September 23, 2015
Michigan Public School Accounting Manual presented by Glenda Rader Grand Ledge Public Schools September 23, 2015 Introduction Serves as MANDATORY Guide to the uniform classification and recording of accounting
Rendering Area Sources D.A. Forsyth
Rendering Area Sources D.A. Forsyth Point source model is unphysical Because imagine source surrounded by big sphere, radius R small sphere, radius r each point on each sphere gets exactly the same brightness!
A a. Cursive Practice. Name: Write the letter on the lines. Write each letter pair. Write each word. Write the sentence twice.
A A a B B ear ib rake table job cub C C c ca ch cl Cecil could catch a cold. D D d do dl dr od ud David demanded a dirty dog. E E F F G G g go ga gh og ag Ginger gave geese George. H H I I n e t gi li
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
CERTIFIED TRANSLATION
CERTIFIED TRANSLATION Prepared by: American Education and Translation Services, LLC A Corporate Member of the American Translators Association (ATA #: 249353) American Education & Translation Services,
UNIVERSITY OF NORTH FLORIDA Office for Research and Sponsored Programs (ORSP) COLLECTION PROCEDURES
UNIVERSITY OF NORTH FLORIDA Office for Research and Sponsored Programs (ORSP) COLLECTION PROCEDURES I. Overview II. Definitions III. Who is Affected by This Procedure IV. Procedures V. Appendix I. OVERVIEW
Table of Contents. Volume No. 2 - Classification & Coding Structure TOPIC NO. 60101 Function No. 60100 - CARS TOPIC CHART OF ACCOUNTS.
Table of Contents Overview 2 Classification Structure 3 General Coding Requirements 5 Content and Use 5 DOA Contact 6 1 Overview Basis of Accounting The Commonwealth Accounting and Reporting System (CARS)
STAT 350 Practice Final Exam Solution (Spring 2015)
PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects
Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection
Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics
List the elements of the given set that are natural numbers, integers, rational numbers, and irrational numbers. (Enter your answers as commaseparated
MATH 142 Review #1 (4717995) Question 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Description This is the review for Exam #1. Please work as many problems as possible
Changes to telemarketing and non-geographic numbers in the UK. Your questions answered
Changes to telemarketing and non-geographic numbers in the UK Your questions answered Changes to telemarketing and non-geographic numbers come into effect on 13 June 2014 What s happening? There s a new
Solvency ii: an overview. Lloyd s July 2010
Solvency ii: an overview Lloyd s July 2010 Contents Solvency II: key features Legislative process Solvency II implementation Conclusions 2 Solvency II: key features 3 Solvency II the basics Introduces
The Heat Equation. Lectures INF2320 p. 1/88
The Heat Equation Lectures INF232 p. 1/88 Lectures INF232 p. 2/88 The Heat Equation We study the heat equation: u t = u xx for x (,1), t >, (1) u(,t) = u(1,t) = for t >, (2) u(x,) = f(x) for x (,1), (3)
Clustering in the Linear Model
Short Guides to Microeconometrics Fall 2014 Kurt Schmidheiny Universität Basel Clustering in the Linear Model 2 1 Introduction Clustering in the Linear Model This handout extends the handout on The Multiple
1 Review of Newton Polynomials
cs: introduction to numerical analysis 0/0/0 Lecture 8: Polynomial Interpolation: Using Newton Polynomials and Error Analysis Instructor: Professor Amos Ron Scribes: Giordano Fusco, Mark Cowlishaw, Nathanael
2011 Latin American Network Security Markets. N991-74 July 2011
2011 Latin American Network Security Markets July 2011 Table of Contents Methodology and Market Definitions Methodology.. Market Definitions Market Overview Market Overview... Market Engineering Measurements.
Your gas and electricity bill actual readings
Your Energy Company 1 Electric Avenue town GA5 3DE www.yourenergycompany.co.uk Mr Sample 123 Sample Street Anytown AT1 B23 Your account number 1234 5678 1234 5678 Date of bill: 10 April 2014 Your gas and
Question 1a of 14 ( 2 Identifying the roots of a polynomial and their importance 91008 )
Quiz: Factoring by Graphing Question 1a of 14 ( 2 Identifying the roots of a polynomial and their importance 91008 ) (x-3)(x-6), (x-6)(x-3), (1x-3)(1x-6), (1x-6)(1x-3), (x-3)*(x-6), (x-6)*(x-3), (1x- 3)*(1x-6),
To provide Employees and Managers with a clear understanding of how training is identified and supported at PSUAD.
February 16, 2014 Training Guidelines Purpose of these Guidelines To provide Employees and Managers with a clear understanding of how is identified and supported at PSUAD. Introduction Training is generally
Probability Theory. Florian Herzog. A random variable is neither random nor variable. Gian-Carlo Rota, M.I.T..
Probability Theory A random variable is neither random nor variable. Gian-Carlo Rota, M.I.T.. Florian Herzog 2013 Probability space Probability space A probability space W is a unique triple W = {Ω, F,
A Tutorial on Probability Theory
Paola Sebastiani Department of Mathematics and Statistics University of Massachusetts at Amherst Corresponding Author: Paola Sebastiani. Department of Mathematics and Statistics, University of Massachusetts,
4 Sums of Random Variables
Sums of a Random Variables 47 4 Sums of Random Variables Many of the variables dealt with in physics can be expressed as a sum of other variables; often the components of the sum are statistically independent.
Sections 2.11 and 5.8
Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and
100. In general, we can define this as if b x = a then x = log b
Exponents and Logarithms Review 1. Solving exponential equations: Solve : a)8 x = 4! x! 3 b)3 x+1 + 9 x = 18 c)3x 3 = 1 3. Recall: Terminology of Logarithms If 10 x = 100 then of course, x =. However,
Better credit models benefit us all
Better credit models benefit us all Agenda Credit Scoring - Overview Random Forest - Overview Random Forest outperform logistic regression for credit scoring out of the box Interaction term hypothesis
PES 1110 Fall 2013, Spendier Lecture 27/Page 1
PES 1110 Fall 2013, Spendier Lecture 27/Page 1 Today: - The Cross Product (3.8 Vector product) - Relating Linear and Angular variables continued (10.5) - Angular velocity and acceleration vectors (not
College / Admin Unit Space Auditor Training. Instructor: Ray Dinello, Director Facilities Information Systems
Archibus Space Management: College / Admin Unit Space Auditor Training Instructor: Ray Dinello, Director Facilities Information Systems Introduction - Space Pilot: Ray Dinello Back Ground Space Data Collection
Accounting Notes. Purchasing Merchandise under the Perpetual Inventory system:
Systems: Perpetual VS Periodic " Keeps running record of all goods " Does not keep a running record bought and sold " is counted once a year " is counted at least once a year " Used for all types of goods
Angelika Mader Veri cation of Modal Properties Using Boolean Equation Systems EDITION VERSAL 8
UsingBooleanEquationSystems VericationofModalProperties AngelikaMader EDITIONVERSAL8 Band1:E.Kindler:ModularerEntwurf Herausgeber:WolfgangReisig Lektorat:RolfWalter EDITIONVERSAL Band2:R.Walter:PetrinetzmodelleverteilterAlgorithmen.
6.1 Add & Subtract Polynomial Expression & Functions
6.1 Add & Subtract Polynomial Expression & Functions Objectives 1. Know the meaning of the words term, monomial, binomial, trinomial, polynomial, degree, coefficient, like terms, polynomial funciton, quardrtic
National Qualifications Framework for Higher Education in Thailand IMPLEMENTATION HANDBOOK
National Qualifications Framework for Higher Education in Thailand IMPLEMENTATION HANDBOOK November 2006 National Qualifications Framework for Higher Education in Thailand Implementation Handbook Table
Algebra Sequence - A Card/Board Game
Algebra Sequence - A Card/Board Game (Based on the Sequence game by Jax, Ltd. Adapted by Shelli Temple) ASSEMBLY: Print out the game board twice, trim the excess white edges and glue into a file folder.
The Convolution Operation
The Convolution Operation Convolution is a very natural mathematical operation which occurs in both discrete and continuous modes of various kinds. We often encounter it in the course of doing other operations
Lab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:
Lab 5 Linear Regression with Within-subject Correlation Goals: Data: Fit linear regression models that account for within-subject correlation using Stata. Compare weighted least square, GEE, and random
Scientic Computing 2013 Computer Classes: Worksheet 11: 1D FEM and boundary conditions
Scientic Computing 213 Computer Classes: Worksheet 11: 1D FEM and boundary conditions Oleg Batrashev November 14, 213 This material partially reiterates the material given on the lecture (see the slides)
HEALTH SYSTEM INTERFUND JOURNAL ENTRY EXAMPLES
Journal Entry (JE) Type Scenario JE Instructions (UPHS/Center 21) JE Instructions (CPUP Departments) 1. Services Provided and Expense/Revenue Sharing When services are provided to a University Department
~ EQUIVALENT FORMS ~
~ EQUIVALENT FORMS ~ Critical to understanding mathematics is the concept of equivalent forms. Equivalent forms are used throughout this course. Throughout mathematics one encounters equivalent forms of
Accounting Notes. Types (classifications) of Assets:
Types (classifications) of s: 1) Current s - short lived assets used in the operations of a business 2) Plant s - long lived tangible assets used in the operations of a business 3) Long Term Investment
AN EXERCISE IN SERIATION DATING
Courtesy of George Brauer, Director Center for Archaeology, Office of Social Studies Baltimore County Public Schools, Towson, Maryland AN EXERCISE IN SERIATION DATING Background information Archaeologists
Lecture 5 Least-squares
EE263 Autumn 2007-08 Stephen Boyd Lecture 5 Least-squares least-squares (approximate) solution of overdetermined equations projection and orthogonality principle least-squares estimation BLUE property
ALGEBRA REVIEW LEARNING SKILLS CENTER. Exponents & Radicals
ALGEBRA REVIEW LEARNING SKILLS CENTER The "Review Series in Algebra" is taught at the beginning of each quarter by the staff of the Learning Skills Center at UC Davis. This workshop is intended to be an
Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza
Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and
(DSSORA)isaninteractivemathematicalprogrammingsystemforoptimalresourceallocationdevelopedtosupportdecisionsofinvestment
Jean-MichelThizy1,DanielE.Lane1,SavvasPissarides2&SurendraRawat1;3? 1FacultyofAdministration,UniversityofOttawa,OttawaONK1N6N5,Canada, InteractiveMultipleCriteriaOptimizationfor CapitalBudgetinginaCanadian
VLSM Static routing. Computer networks. Seminar 5
VLSM Static routing Computer networks Seminar 5 IP address (network and host part) Address classes identified by first three bits Subnet mask determines how the IP address is divided into network and host
Introduction to Probability
Introduction to Probability EE 179, Lecture 15, Handout #24 Probability theory gives a mathematical characterization for experiments with random outcomes. coin toss life of lightbulb binary data sequence
2.3. Finding polynomial functions. An Introduction:
2.3. Finding polynomial functions. An Introduction: As is usually the case when learning a new concept in mathematics, the new concept is the reverse of the previous one. Remember how you first learned
Storm Damage Arbitration Agreement ADR Systems File # xxxxxxxxx Insurance Claim # xxxxxxxxxx
Storm Damage Arbitration Agreement ADR Systems File # Insurance Claim # x I. Parties A. xxxxx B. xxxxx II., Time and Location of the Arbitration : Time: Location: III. Rules Governing the Arbitration Each
SUGI 29 Posters. Web Server
Paper 151-29 Clinical Trial Online Running SAS. on the Web without SAS/IntrNet. Quan Ren ABSTRACT During clinical trial, it is very important for the project management to have the most recent updated
MECHANICAL ENGINEERING PROGRAMME DIRECTOR S MESSAGE
MECHANICAL ENGINEERING PROGRAMME DIRECTOR S MESSAGE Welcome to the Mechanical Engineering Programme. Our program has two areas of specialization, Plant and Production Engineering, both of which are directly
UMD Naming Convention for Active Directory
UMD Naming Convention for Active Directory We anticipate that many departments and units, large and small, will elect to join the UMD forest. Most of the administrative responsibilities in the forest will
Digital Imaging and Multimedia. Filters. Ahmed Elgammal Dept. of Computer Science Rutgers University
Digital Imaging and Multimedia Filters Ahmed Elgammal Dept. of Computer Science Rutgers University Outlines What are Filters Linear Filters Convolution operation Properties of Linear Filters Application
FOIL FACTORING. Factoring is merely undoing the FOIL method. Let s look at an example: Take the polynomial x²+4x+4.
FOIL FACTORING Factoring is merely undoing the FOIL method. Let s look at an example: Take the polynomial x²+4x+4. First we take the 3 rd term (in this case 4) and find the factors of it. 4=1x4 4=2x2 Now
acyclotomicpolynomial).otherexamples,writingthefactorizationsasdierencesof squares,are (5y2)5?1 (3y2)3+1 3y2+1=(3y2+1)2?(3y)2
Abstract.TheCunninghamprojectseekstofactornumbersoftheformbn1withb= 2;3;:::small.OneofthemostusefultechniquesisAurifeuillianFactorizationwherebysuch AndrewGranvilleandPeterPleasants AURIFEUILLIANFACTORIZATION
INTERPOLATION. Interpolation is a process of finding a formula (often a polynomial) whose graph will pass through a given set of points (x, y).
INTERPOLATION Interpolation is a process of finding a formula (often a polynomial) whose graph will pass through a given set of points (x, y). As an example, consider defining and x 0 =0, x 1 = π 4, x
P R E F E I T U R A M U N I C I P A L D E J A R D I M
C O N T R A T O N 7 8 / 2 0 1 4 C o n t r a t o d e P r e s t a ç ã o d e S e r v i ç o s A d v o c a t í c i o s q u e e n t r e s i c e l e b r a m o M u n i c í p i o d e J A R D I M - M S e A IR E
Examples of Tasks from CCSS Edition Course 3, Unit 5
Examples of Tasks from CCSS Edition Course 3, Unit 5 Getting Started The tasks below are selected with the intent of presenting key ideas and skills. Not every answer is complete, so that teachers can
6.2 Solving Nonlinear Equations
6.2. SOLVING NONLINEAR EQUATIONS 399 6.2 Solving Nonlinear Equations We begin by introducing a property that will be used extensively in this and future sections. The zero product property. If the product
Graphic Designing with Transformed Functions
Math Objectives Students will be able to identify a restricted domain interval and use function translations and dilations to choose and position a portion of the graph accurately in the plane to match
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
Total Credits: 32 credits are required for master s program graduates and 53 credits for undergraduate program.
Middle East Technical University Graduate School of Social Sciences Doctor of Philosophy in Business Administration In the Field of Quantitative Methods Aim of the PhD Program: Quantitative Methods is
Instructions for the Completion of the Report on Interest Rates on Loans and Deposits
CROATIAN NATIONAL BANK RESEARCH AND STATISTICS AREA STATISTICS DEPARTMENT Instructions for the Completion of the Report on Interest Rates on Loans and Deposits 1 CONTENTS GENERAL INSTRUCTIONS 3 1. INTRODUCTION
Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
HT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
620M User's Guide. Motor Finance Company
620M User's Guide PROM LoanMaker Motor Finance Company Table of Contents Description 3 Compute Full Loan - [Payment] Function Key 3 Year 2000 Compliant 4 Short Loan Routine - [Loan] Function Key 5 Setup
Unsupervised Learning and Data Mining. Unsupervised Learning and Data Mining. Clustering. Supervised Learning. Supervised Learning
Unsupervised Learning and Data Mining Unsupervised Learning and Data Mining Clustering Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...
13. Write the decimal approximation of 9,000,001 9,000,000, rounded to three significant
æ If 3 + 4 = x, then x = 2 gold bar is a rectangular solid measuring 2 3 4 It is melted down, and three equal cubes are constructed from this gold What is the length of a side of each cube? 3 What is the
The History of NAICS
The History of NAICS By James T. Saint, CCIM Real Estate Advocate 5 Apr 2007 While many real estate professionals and business executives are reasonably familiar with the older Standard Industrial Classification
The North American Industry Classification System (NAICS)
The North American Industry Classification System (NAICS) 1 The North American Industry Classification System (NAICS) has replaced the U.S. Standard Industrial Classification (SIC) system http://www.census.gov/epcd/www/naics.html
74LVC1G14. Description. Pin Assignments. Features. Applications SINGLE SCHMITT-TRIGGER INVERTER 74LVC1G14
SINGLE SCHMITT-TRIGGER INVERTER Description Pin ssignments The is a single 1-input Schmitt-trigger inverter with a standard push-pull output. The device is designed for operation with a power supply range
Simple Programming in MATLAB. Plotting a graph using MATLAB involves three steps:
Simple Programming in MATLAB Plotting Graphs: We will plot the graph of the function y = f(x) = e 1.5x sin(8πx), 0 x 1 Plotting a graph using MATLAB involves three steps: Create points 0 = x 1 < x 2
Finite cloud method: a true meshless technique based on a xed reproducing kernel approximation
INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING Int. J. Numer. Meth. Engng 2001; 50:2373 2410 Finite cloud method: a true meshless technique based on a xed reproducing kernel approximation N.
EECS 556 Image Processing W 09. Interpolation. Interpolation techniques B splines
EECS 556 Image Processing W 09 Interpolation Interpolation techniques B splines What is image processing? Image processing is the application of 2D signal processing methods to images Image representation
GLENN A. GRANT, J.A.D. Acting Administrative Director of the Courts MEMORANDUM
Administrative Office of the Courts GLENN A. GRANT, J.A.D. Acting Administrative Director of the Courts www.njcourts.com phone: 609-984-0275 fax: 609-984-6968 MEMORANDUM TO: FROM: Assignment Judges Civil
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,
2.2 Derivative as a Function
2.2 Derivative as a Function Recall that we defined the derivative as f (a) = lim h 0 f(a + h) f(a) h But since a is really just an arbitrary number that represents an x-value, why don t we just use x
NEXT. Tools of the Participant Portal: Scientific Reports & Deliverables
NEXT Tools of the Participant Portal: Scientific Reports & Deliverables Scientific Reporting and Deliverables: Terminology Scientific Reporting: Standardised format & always due at the end of Reporting
ASUH Funding Fiscal Procedures
ASUH Funding Fiscal Procedures 2465 Campus Rd., Honolulu, HI 96822 Phone #: (808) 956-4822 Fax #: (808) 956-5360 Email: [email protected] Website: http://asuh.hawaii.edu/ Table of Contents Cover Sheet 1
MIMO CHANNEL CAPACITY
MIMO CHANNEL CAPACITY Ochi Laboratory Nguyen Dang Khoa (D1) 1 Contents Introduction Review of information theory Fixed MIMO channel Fading MIMO channel Summary and Conclusions 2 1. Introduction The use
First degree price discrimination ECON 171
First degree price discrimination Introduction Annual subscriptions generally cost less in total than one-off purchases Buying in bulk usually offers a price discount these are price discrimination reflecting
ASSESSING FINANCIAL EDUCATION: EVIDENCE FROM BOOTCAMP. William Skimmyhorn. Online Appendix
ASSESSING FINANCIAL EDUCATION: EVIDENCE FROM BOOTCAMP William Skimmyhorn Online Appendix Appendix Table 1. Treatment Variable Imputation Procedure Step Description Percent 1 Using administrative data,
IP Address Structure
Motivation A virtual network operates like a physical network and needs an addressing scheme, a packet format, and delivery techniques. An addressing scheme is critical and must appear to be a single uniform
Factoring. Factoring Polynomial Equations. Special Factoring Patterns. Factoring. Special Factoring Patterns. Special Factoring Patterns
Factoring Factoring Polynomial Equations Ms. Laster Earlier, you learned to factor several types of quadratic expressions: General trinomial - 2x 2-5x-12 = (2x + 3)(x - 4) Perfect Square Trinomial - x
