Deductive and Inductive Stream Reasoning for Semantic Social Media Analytics
|
|
|
- Noreen Cummings
- 10 years ago
- Views:
Transcription
1 DeductiveandInductiveStreamReasoning forsemanticsocialmediaanalytics DavideBarbieri 1,DanieleBraga 1,StefanoCeri 1,EmanueleDellaValle 1,YiHuang 2,VolkerTresp 2, AchimRettinger 3,andHendrikWermser 4 1 Dip.diElettronicaeInformazione,PolitecnicodiMilano,Milano,Italy 2 SiemensAG,CorporateTechnology,Munich,Germany 3 KarlsruheInstituteofTechnology,Karlsruhe,Germany 4 TechnicalUniversityofMunich,Munich,Germany Abstract.Knowledgeevolutionisamajorchallengeinknowledgemanagement.Whenknowledgeevolution isconveyedintheformofdatastreams,acombinedapproachofdeductiveandinductivereasoningcan leveragetheclearseparationbetweentheevolving(streaming)andthestaticpartsoftheknowledgeatthe conceptual and technological level. In particular, the notion of RDF streams is the glue for the interoperabilityofinductiveanddeductivereasoningtechniqueswithmethodsandsystemsforprocessing datastreams.inthispaper,weshowourcombinedapproachappliedtosocialnetworkanalysis,wegive experimental evidence of good performances, and demonstrate the effectiveness of the approach for extractingtrendsfrommicroblogsandfeeds. Keywords: stream reasoning, online learning, social media analytics, deductive reasoning, inductive reasoning,rdfstreams,sparql,csparql 1 Introduction WhatarethehottesttopicsunderdiscussiononTwitter?Whichtopicshavemyclosefriendsdiscussedinthe lasthour?whichmovieismyfriendmostlikelytowatchnext?whichtuscanyredwineshouldirecommend tothem?theinformationrequiredtoanswerthesequeriesisbecomingavailableontheweb,asmany popular social networks publish microblogs and feeds. This trend is often referred to as the Twitter phenomenon.thesefeedsare continuous flowsofinformationwhererecentitemsaretypicallymore relevant than older ones, properly representing a stream. However, their interpretation requires rich background knowledge to fulfill meaningful reasoning tasks, beyond standard stream processing capabilities. Stream processing has been studied by both, the database [dsmsbook] and data mining communities. SpecializedDataStreamManagementSystems(DSMS)areavailableonthemarketandDSMSfeaturesare appearinginmajordatabaseproducts,suchasoracleanddb2.onlinestreamminingisappliedinmany contexts, e.g. to computer network traffic for intrusion detection, to Web searches for online recommendations [SuYYT10], and to sensor data for automated realtime decision making. These applications represent a paradigm change in information processing techniques, as data streams are processed on the fly, without being stored, and processing units produce their results without explicit invocation. DSMSaredesignedtoprocessrealtimeparallelqueriesoverpossiblyburstydata,buttheycannotperform reasoning tasks as complex as those required to answer the queries raised in the first paragraph. Understanding and interpreting the Twitter phenomenon (and many other data streams) requires
2 connectingquickandconcisestreams,representingchangesoccurringintherealworld,torichbackground knowledge bases. This join is crucial, for instance, for (1) understanding the topics discussed in the streams with the help of topic taxonomies, (2) recommending most attractive movies to particular user profiles and (3) predicting future behaviors basedon the analysis of past behaviors (movie attendance). Theseexamplesshowtheneedforconnectingdatastreamprocessingtechniqueswithreasoningmethods, ofbothinductiveanddeductivenature,inordertosupportsocialmediaanalytics.webelievethatthisisa goodexampleofhowanovelandhighimpactresearcharea,thatwecalledstreamreasoning[ieeeissr], enablesthemergingofdatastreamsandrichbackgroundknowledge. Extending reasoning methods to support changing knowledge is a known challenge for the reasoning community.indeductivereasoning,variousmethodsexisttorevisebeliefsbasedonrecentinformation.in inductivereasoning,abodyofresearchindataminingandmachinelearningalreadysupportsonlinedata analysis.however,littleworkhasbeendoneontheapplicationofmachinelearningtostreamsasrichand structured as those considered here. We believe that data streams are an ideal model for changes occurringintherealworld,aswellasasuitablemeanstodelimitthesourceandthenatureofchange, clearlyseparatingthestaticpartofknowledgefromthatpartofknowledgethatchangesinrealtime. Thisseparationisbothconceptualandtechnological,andallowstheuseofexistingsystemsfordatastream managementononehandandforinductiveordeductivereasoningontheother.also,datastreamseasily allowanovelinterconnectionofdeductiveandinductivereasoning.inourapproach,the glue forthis interconnection is the notion of RDF streams, together with an extension of the SPARQL language for continuousqueries. Thenextsectionintroducesthenotionofstreamreasoningandbrieflydescribesourpreviousworkonthe developmentofastreamreasoningplatform.section3describesthedatastreamsgeneratedbythesocial networkglue 1,usedforourexperiments.Section4showsexamplesofapplicationsofstreamreasoningto therunningexample.anevaluationofourapproachispresentedinsection5,whilesection6drawssome conclusionsandprovidesanoutlookonfuturework. 2 StreamReasoning Deductiveandinductivestreamreasoning,describednextinmoredetail,extendthefollowingnotion. StreamReasoning:reasoninginrealtimeonhugeandpossiblynoisydatastreams,tosupporta largenumberofconcurrentdecisionprocesses. Wenowcharacterizestreamreasoningwithrespecttothenotionsofstreams,windows,andcontinuous processing,threekeyconceptsinstreamprocessing[babu01continuous]. Streams Data streams are unbounded sequences of timevarying data elements that form a continuous flow of information. Recent elements are more relevant as they describe the currentstateofadynamicsystem.beingrdfthedatainterchangeformatforreasoners,we movefromthenotionofrdfstreamsasthefuelforstreamreasoning.rdfstreamsaredefined asorderedsequencesofpairs,madeofrdftriplesandtheirtimestampst i : (<subj i, pred i, obj i >, T i ) (<subj i+1, pred i+1, obj i+1 >, T i+1 ) 1
3 Timestamps can be considered as annotations of RDF triples. They are monotonically non decreasing in the stream (T i T i+1 ), and adjacent triples may have the same timestamp, if occurring atthesametime. WindowsTraditionalreasoningproblemsassumethatalltheavailableinformationshouldbe consideredforsolvingaproblem.streamreasoning,instead,restrictsprocessingtoacertain windowofconcern,focusingonasubsetofrecentstatementsinthestream,whileprevious statements are ignored. However, the cumulative effect of past windows (processed in the past)andpresentwindowscanbetakenintoaccount. Continuous Processing Traditional reasoning approaches have welldefined beginnings and ends for reasoning tasks, respectively when tasks are presented to the reasoner and when results are delivered. Stream reasoning moves from this processing model to a continuous model,wheretasksareregisteredandcontinuouslyevaluatedreasoneragainstflowingdata. We are pushing our Stream Reasoning vision within the LarKC 2 project [ICSC08], whose main goal is to develop a platform for reasoning onmassive heterogeneous information such as socialmedia data. The platform has a pluggable architecture to exploit techniques and heuristics from diverse areas such as databases,machinelearning,andsemanticweb. DeductiveStreamReasoning.In[EDBT2010,ESWC2010],wespecifiedageneralflexiblearchitecturefor reasoning over data streams and rich background knowledge, within the LarKC conceptual architecture [ICSC08],leveragingexistingDSMSandSPARQLengines.WeintroducedCSPARQL(ContinuousSPARQL)as anextensiontosparqltoqueryrdfstreams[www2009,edbt2010],andin[eswc2010]weelaborated on the deductive reasoning support to CSPARQL, proposing an efficient incremental technique that exploitsthetransientnatureofstreamsformaintainingthematerializationoftheirontologicalentailments. InductiveStreamReasoning.Thechallengeshereare,first,thelargeamountofinformation,thatneedsto beprocessedinagiventimewindow,second,thestructuredmultirelationalnatureofthedata,third,the sparsityofthetypicallyhighdimensionaldata,and,fourth,thefactthatthedataisoftenincomplete.in [IRMLeS2009]amachinelearningapproachhasbeendescribed,thatissuitableforthischallengingdata situationandthathasbeentermedthestatisticalunitnodeset(suns)learningapproach.in[ilp2010],the approachwasextendedforonlineinductivereasoning. Window Legend Selector DSMS. datastream RDFstream RDFgraph C P Abstracter DSMS CSPARQLquery SPARQLwith Probability Deductive Reasoner C C C Abstracter LongTerm Matrix Abstracter Hype Matrix Inductive Reasoner Inductive Reasoner P P SocialMediaAnalytics Figure1.ArchitectureofasimpleStreamReasonerappliedtoSocialMediaAnalysis. 2
4 Figure1showsthearchitectureofasimpleStreamReasonerconsistingofstreamliningwithintheLarKC platformasetofspecializedplugins.aselectionpluginextractstherelevantdataineachinputstreamby exploitingthewindowprocessingabilityofadsms.thewindowcontentisfedintoasecondpluginthat abstracts from fine grain data streams into aggregated events and produces RDF streams as output. A SPARQLengineabletooperateunderdifferententailmentregimesconstitutesthedeductivereasonerplug in.csparqlqueriesaredirectlyregisteredinthedeductivereasoner.theresultscanbeofimmediateuse or used by two more subworkflows, both constituted by an abstracter and an inductive reasoner. As explainedinsection4.3,theinferencesofthetwoinductivereasonerscanbequeriedusinganextended versionofsparqlwhichsupportsprobabilities[irmles2009].note,thatthissimplesetupcanbeextended by arbitrarily combining and iterating the deductive and inductive reasoners. E.g., it might be helpful to feedthefindingsoftheinductivereasonerbacktothedeductivereasonertodeducefurtherknowledge. 3 ExperimentalData WehavebasedourexperimentsonGlue 3,asocialnetworkthatallowsuserstoconnecttoeachotherand share Web navigation experiences. In addition, Glue uses semantic recognition techniques to identify books,movies,andothersimilartopics,andpublishesthemintheformofdatastreams.userscanobserve thestreamsandreceiverecommendationsoninterestingfindingsbytheirfriends. BoththesocialnetworkdataandtherealtimestreamsareaccessibleviaRESTAPIs.Ourexperimentsbuild onadapters[sdow2009]thatexportgluedataasrdfstreams.theentitiesandrelationshipsconsidered intheexperimentsaredescribedinumlinfigure2. Users have names, and know and follow other users. Wellknown Semantic Web vocabularies [BojarsBPTD08] are used: the Friend of a Friend vocabulary (FOAF) for the user names and the knows relationship, and the SemanticallyInterlinked Online Communities (SIOC) for the follows relationship. Objectsrepresententitiesoftherealworld(suchasmoviesorbooks),withnameandcategory.Resources representsourcesofinformationthatdescribetheactualobjects,suchaswebpagesaboutaparticular movieorbook.asvocabularies,weusedrdfs:labelforthenamesandskos:subjecttolinkanobjecttoits category,bymeansofthesubjectattribute.moreover,categoriesidentifiersaretakenfromyago. background knowledge Resource URL rdfs:label links describes Object URL rdfs:label skos:subject owl:sameas accesses likes dislikes User URL foaf:name foaf:knows sioc: follows data stream Figure2.Entitiesandrelationshipsinourexperiments. 3
5 Theinformationdescribedsofarisstaticbackgroundknowledge,i.e.,intheexperimentsweassumedthat itisstableinaperiodcomparablewiththesizeofawindow.ofcourseupdatesofthisinformationarealso allowed,butdonotinterferewithwindowprocessing.wealsohavestreaminginformation,namelythe notificationsofthebehaviorofuserswithrespecttoresources(and,transitively,toobjects).theaccesses, likes, and dislikes relationships represent the events occurring when users access resources or express opinions about them. In the rest of this paper, we refer to this vocabulary with the prefix sd. Quite straightforwardly, all interaction of a generic users U with a resource R generates a triple of the form <U, sd:accesses, R>, and selected interactions generate triples of the form <U, sd:likes, R> and <U, sd:dislikes,r>.anexampleofpossibletriplesinastreamis: (<:Giulia, sd:accesses, :Avatar>, T13:18:05) (<:John, sd:accesses, :Twilight>, T13:36:23) (<:Giulia, sd:likes, :Avatar>, T13:42:07) 4 StreamReasoningatWork In this section, we progressively apply the simple Stream Reasoner presented in Section 2 to the experimentaldatadescribedinsection3.westartwithanexampleofsocialmediaanalysisperformedbya CSPARQLqueryundersimpleRDFentailment.Then,weexplainhowcomplexconditionscanbeexpressed using OWLRL and how our Deductive Stream Reasoner can efficiently answer CSPARQL queries under OWL2RLentailment.ThelastpartofthissectionisdedicatedtoInductiveStreamReasoningperformedon topofrdfstreamsgeneratedbycsparqlqueriesregisteredinthedeductivestreamreasoner. 4.1 CSPARQLUnderSimpleRDFEntailment CSPARQL[WWW2009,EDBT2010]isanextensionofSPARQLforexpressingcontinuousqueriesoverRDF graphs and RDF streams. Like SPARQL, it can be executed under multiple entailment regimes 4. CSPARQL under simple RDF entailment does not require reasoning, but it is already very useful. For instance, it can be used to discover causal relationships between different users actions in Glue. The followingqueryidentifiesuserswhoareopinionmakers,i.e.,whoarelikelytoinfluencethebehaviorof theirfollowers. 1. REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS 2. CONSTRUCT {?opinionmaker sd:about?resource } 3. FROM STREAM < [RANGE 30m STEP 5m] 4. WHERE { 5.?opinionMaker?opinion?resource. 6.?follower sioc:follows?opinionmaker. 7.?follower?opinion?resource. 8. FILTER ( cs:timestamp(?follower) > cs:timestamp(?opinionmaker) 9. &&?opinion!= sd:accesses ) 10. } 11. HAVING ( COUNT(DISTINCT?follower) > 3 ) Lines 1 and 3 tell the CSPARQL engine to register the continuous query on the stream of interactions generatedbyglueconsideringawindowof30minutesthatslidesevery5minutes.line2tellstheengine to generate an RDF stream as output. The basic triple pattern (BTP) at line 5 matches interactions of potentialopinionmakerswiththeresources.line6matchesthefollowersoftheopinionmakers,andline7 matches their interactions with the resources. The FILTER clause uses the custom value testing function 4
6 cs:timestamp thatreturnsthetimestampoftherdftripleproducingthebinding 5.Itcheckswhetherthe interactions of the followers occur on the same resource after those of the opinion maker. Note that timestamps are taken from variables that occur only once in patterns applied to streaming triples, thus avoidingambiguity.also,thequeryfiltersoutactionsoftype accesses",thatarenormallyrequiredbefore expressinganopinionsuchas like"or dislike".finally,thehavingclausedistinguishespotentialopinion makersfromactualopinionmakers,checkingthatatleastthreefollowersimitatedtheirbehavior. Two approaches, alternative to CSPARQL, exist: Streaming SPARQL [StreamingSPARQL] and Time Annotated SPARQL [TASPARQL]. Both languages introduce the concept of windows, but only CSPARQLbringsthenotionofcontinuousprocessing,typicalofstreamprocessing,intothelanguage.All otherproposalsrelyonpermanentstoringofthe streamand processit byoneshotqueries.moreover, only CSPARQL proposes an extension to SPARQL to support aggregates. This extension permits optimizations[edbt2010]thatpush,wheneverpossible,aggregatescomputationascloseaspossibletothe rawdatastreams. 4.2 CSPARQLandDeductiveStreamReasoning RunningCSPARQLqueriesunderexpressiveOWLreasoningregimeswidensthespectrumofanalysisthat the Stream Reasoner can perform. For instance, we may define a movie opinion maker as an opinion makerwho recently likedonlymovies.thedefinitioninowlofuserswholikeonlymoviesis Class( sd:useronlyinterestinmovies complete intersectionof( sd:user restriction(sd:likes allvaluesfrom(yago:movie)) ) ) ThisontologicaldefinitioncanbeusedinthefollowingCSPARQLquery: 1. REGISTER STREAM MovieOpinionMakers COMPUTED EVERY 5m AS 2. CONSTRUCT {?opinionmaker sd:about?resource } 3. FROM STREAM < [RANGE 30m STEP 5m] 4. WHERE { 5.?opinionMaker a sd:useronlyinterestinmovies. 6.?opinionMaker?opinion?resource. 7.?follower sioc:follows?opinionmaker. 8.?follower?opinion?resource. 9. FILTER ( cs:timestamp(?follower) > cs:timestamp(?opinionmaker) 10. &&?opinion!= sd:accesses ) 11. } 12. HAVING (COUNT(DISTINCT?follower) > 3) For instance, if a window contains the triples shown below, then Giulia is an instance of UserOnlyInterestInMovies,whileJohnisnot(healsolikedabook). (<:Giulia, sd:likes, :Avatar>, (<:John, sd:likes, :StarWars>, (<:John, sd:likes, :WutheringHeights>, (<:Giulia, sd:likes, :AliceInWonderland>, T13:18:05) T13:36:23) T13:38:07) T13:42:07) 5 Ifthevariablegetsboundmultipletimes,thefunctionreturnsthemostrecenttimestampvaluerelativetothequery evaluationtime.
7 Theevaluationofthequeryrequiresreasoningbothonthetriplesinthewindowandonthebackground knowledge about objects described in Glue. In particular, the reasoner must check if users match the ontologicaldefinitionbeforecheckingiftheyareopinionmakers.notethatthedeductivestreamreasoner mustknowtheontologicaldefinition,butalsohastocombinetherdfstreamwithrelevantbackground knowledgeaboutmoviesandbooks(i.e.,ithastoknowthatwutheringheightsisabookwhiletheother itemsaremovies). Performing this reasoning task efficiently is not obvious. Possible existing techniques are: incremental maintenanceofmaterializedviewsinlogic[volzsm05],graphdatabases[cde98],extensionsoftherete algorithm for incremental rulebased reasoning [VLDB93], recent attempts to incremental reasoning in description logics [ISWC2007]. All these methods operate incrementally, but none of them is explicitly dedicatedtodatastream.in[eswc2010],weproposedatechniqueforefficientlycomputingthisclassof CSPARQLqueries,thatincrementallymaintainsamaterializationofontologicalentailmentsexploitingthe transientnatureofstreamingdata.byaddingexpirationtimeinformationtoeachrdftriple,weshowthat it is possible to compute a new complete and correct materialization whenever the window slides, by droppingexplicitstatementsandentailmentsthatareexpired,andthenonlyaddingthedeductionsthat dependonthenewtriplesthatenteredthewindow. 4.3 InductiveStreamReasoningusingCSPARQL StillwonderingaboutGiulia,wemayquerywhichmoviesGiuliawilllikethemost,evenifshehasnotseen themyet.theanswerisbuiltforanadhocquery;thesystemusesthelastwindowinthestreaminorder todeterminesuchpredictedprobability. 1. SELECT?movie?prob 2. FROM STREAM < [RANGE 30m STEP 5m] 3. WHERE { :Giulia sd:likes?movie. WITH PROB?prob 4.?movie a yago_movie. 5. FILTER (?prob > 0 &&?prob < 1 ) 6. } ORDER BY?prob Atline3wehighlightedinboldtheconstructWITH PROB whichextendssparqlwiththeabilitytoqueryan inductedmodel.thevariable?probassumesthevalue1forthemoviesshehaswatchedandassumesthe estimatedprobabilitiesbetweenzeroandonefornextmoviesshewouldliketowatch.theclauseorder BYisusedtoreturnmoviessortedwithdecreasingtheprobabilities.Thequeryanswerincludespairsof movietitleandpredictedlikelihoodasfollows: (:WutheringHeightsTvMovie, ) (:StarWars, ) ForrunninginductivereasoningonsemanticdataweusetheSUNSlearningapproachmentionedinSection 2. In the approach, we first define the statistical unit, the population, the sampling procedure and the features.astatisticalunitisanobjectofacertaintype,e.g.,auser.thepopulationisthesetofstatistical units under consideration. For instance, in the experiments of described in this paper the population is definedastheusersofgluesocialnetwork.fortrainingmodelswesampleasubsetfromthepopulation. Then, based on the sample, the SUNS constructs data matrices by transforming the set of RDFtriples relatedtostatisticalunitsintomatrices.therowsinthematrixstandforinstancesofastatisticunitand columnsrepresenttheirfeaturesderivedfromtheassociatedrdfgraph.thebinaryentriesoneandzero representthetruthvalues true and unknown ofthecorrespondingtriples.supposethatrowsareusers
8 andcolumnsaremovies.anoneofthe(i,j)entryinthematrixindicatesthattheithuserratesthejth movieasliked;otherwiseitisunknownwhetherornotthatuserlikesthatmovie. Afterthetransformationamultivariateanalysisofthedatamatricesisperformed.Multivariateprediction methods are especially suited for the challenging data situation: large scale, multirelational, high dimensional and highly sparse. The multivariate modeling problem can be solved via singular value decomposition (SVD), nonnegative matrix factorization (NNMF) [NNMF], and latent Dirichlet allocation (LDA)[LDA].Allthreeapproachesestimateunknownmatrixentriesviaalowrankmatrixapproximation. NNMFisadecompositionundertheconstraintsthatalltermsinthefactoringmatricesarenonnegative, whileldaisbasedonabayesiantreatmentofagenerativetopicmodel 6.Aftermatrixcompletion,thezero entries are replaced with certainty values representing the likelihood that the corresponding triples are true.in[ilp2010]wehaveinvestigatedtheperformanceofthesemethodsinofflineaswellasinonline settings, following different sampling strategies. In this context, online setting means that the trained modelisappliedtopredictrelationshipsbetweenentitiesatquerytime,includingthenewentitiesunseen inthetrainingdataset. Inourexampleuseristhemainentityofinterest(andthereasoner sstatisticalunit).eachuserisinvolved inanumberofrelationshipssuchasinterestsinmovies,booksandotheritems,thefriendshiprelationships andthefollowsrelationship.alldataisreferredtousersandisdescribedbyrdftriples,expressingthat users relate to objects.csparqlcontinuouslydeliversnewwindowsof(aggregated)featurestothe inductivereasoningandtheresultsofcsparqlaretransformedintoadatamatrix,whichbecomesthe input for the inductive reasoner. At predefined time intervals, a learning module applies a multivariate analysistothedatamatrices.asecondlearningmodule,calledhypemodel,monitorsrapidchanges(see Section5.2).Thetwodatamatricescontain,respectively,thelongterminformationandtheshortterm trendsor hypes.the hypematrix issimplypopulatedwiththecurrentwindowcontent,whereasthe longtermmatrixiscontinuouslyupdatedandevolvesovertime. 5 Evaluation Asevaluation,wefirstpresentastresstesttoshowthescalabilityofourapproach,andthenevaluatethe application to a real case. We show that each architectural component separately applies orthogonal optimizations,yieldingtoanefficientsolutionwhenonesystem soutputisfedasinputtothenextsystem. 5.1 PerformanceandScalabilityEvaluation Asin[SDOW2009],wecomparedtheexecutiontimeofaCSPARQLqueryinourdeductivestreamreasoner totheexecutiontimeofanequivalentsparqlqueryonarq 7 withinferencesupport.weshowthetests forthequerypresentedinsection4.2,runonapentiumcore2quad2.0ghzwitha2gbram. 6 Recently,wedevelopedaregularizedSVDwhichisinsensitiveontherankusedinthematrixfactorizationstep. 7
9 ResponseTimeCSPARQLvs.SPARQL ms numberoftriplesinthewindow(csparql)orintherepository(sparql) SPARQL Linear(SPARQL) CSPARQL200t/s Linear(CSPARQL200t/s) Figure3.ThewindowbasedselectionofCSPARQLoutperformstheFILTERbasedselectionofSPARQL. AlittlechangetotheschematorepresentinteractionsallowswritinganequivalentSPARQLquery. 1. CONSTRUCT {?opinionmaker sd:about?resource} 2. FROM < 3. WHERE {?opinionmaker?opinion [:about?resource ; 4. dc:created?dateopinionm.] 5.?opinionMaker a sd:useronlyinterestinmovies. 6.?follower sioc:follows?opinionmaker. 7.?follower?opinion [:about?resource ; 8. dc:created?dateopinionf.] 9. FILTER (?opinion!= sd:accesses) && 10.?dateOpinionM > " T13:00:00Z"^^xsd:dateTime && 11.?dateOpinionM < " T13:30:00Z"^^xsd:dateTime && 12.?dateOpinionF > " T13:00:00Z"^^xsd:dateTime && 13.?dateOpinionF < " T13:30:00Z"^^xsd:dateTime && 14.?dateOpinionF >?dateopinionm) 15. } 16. HAVING (COUNT(DISTINCT?follower) > 3) Thecodeinboldadds(a)twoBTP(lines4and8)thatmatchthecreationdateoftheinteraction,and(b) fourfilterconditions(lines1013)thatselectthesametimeintervalofthecsparqlquery.notably,thec SPARQLsyntaxismorehandyandterse. WeregisteredtheCSPARQLqueryinourengine,wefedRDFtriplesinourengineatarateof200tripleper second (t/s) and we measured the time required to compute the answer. Using ARQ, we executed the equivalent SPARQL query six times against repositories containing a growing number of triples and we againmeasuredthetimerequiredtocomputeeachanswer. The results are shown in Figure 3. By comparing the linear regressions of the two experiments, named Linear(SPARQL),andLinear(CSPARQL200t/s),weseethattheCSPARQLwindowbasedselectionalways performssignificantlybetterthanthefilterbasedselectionofsparqlinjena.
10 5.2 EvaluationonaRealCaseScenario To prove the effectiveness of stream reasoning for social media analytics, we evaluated the accuracy of topnmovierecommendations.first,wecompareddiverseinductivereasoningapproacheswithcommon recommendation methods, some of which were realized by deductive stream reasoning. Secondly, we examinedtheperformanceofthecombinationofbothinductiveanddeductivestreamingreasoning. Togatheradatasetfortheevaluation,weusedapredefinedCSPARQLqueryandthentranscodedand storedtheoutputrdfstreamsintoadatamatrix.thematrixwascontinuouslyupdatedfromfebruary19 toapril22,2010.finally,weselected245860interactionsmadeby2457users.inparticular,weexamined theinteractionsof Likedmovies relationship.thetransformeddatamatrix(seesection4.3)isextremely sparse with only 0.002% nonzero elements. To make statistically significant evaluations we removed all users with almost no interactions and items that were evaluated less than 5 times. After pruning, the resultingsubsetconsistsof1455usersand7724featureswithsparsityof0.02%.theitemmostspecifiedby usersisthe"likedmovies"relationwitharound2500movies."likedmusic"and"likedrecording_artists" was specified about 1300 times and "Liked moviestars", "Liked tv_shows" and "Liked video_games" approximately600times.theremaining18featureswerementionedlessthan250times. Being most easily adaptable to the dynamic setting, we applied SVD and regularized SVD for movie recommendations.asbaselinemethodsweutilized,first,agloballikedmovielistcarriedoutbyasimple registeredcsparqlquery,shownbelow. 1. REGISTER STREAM MostLiked COMPUTED EVERY 1d AS 2. SELECT?movie (SUM(?user) AS?noOfUser) 3. FROM STREAM < [RANGE XX STEP XX] 4. WHERE {?movie a yago_movie. 5.?user sd:likes?movie.} 6. GROUP BY?movie 7. ORDER BY DESC(?nrOfUser) Second,weextractedthemostlikedmoviesofaperson sfriends,calculatedalsothroughacorresponding registeredcsparqlquery(notshown).third,weappliedtheknearestneighbour(knn)regression,using the same userbased and moviebased similarity measures defined in [IEEEIC2003]. By means of cross validationwecarefullytunedparametersofeachmethod. Figure4.AccuracyoftopNmovierecommendations.Left:inductivestreamreasoninganddeductivestream reasoningseparately.right:combinationofinductiveanddeductivestreamreasoning
11 Figure 4 left shows the evaluation results, i.e. the percentage of truly liked movies in the top N recommendations where N = 10, 20, 30, 40 and 50. First, SVD and regularized SVD outperformed all baselinemethods.inparticular,theregularizedsvdperformedmuchbetterthananyothermethodand was robust and insensitive on its parameters as expected. Secondly, both knn lines are above the baselines.thismeansthattheusersandtheitemscollectedsharesomecommonregularity.forexample, userswholikethesameactorsarelikelytowatchmoviesinwhichthoseactorsplay.takingsuchadditional featuresintoaccountimprovessignificantlytheaccuracyofmovierecommendations.thedataregularityis exploited of course by SVD and regularized SVD as well. Thirdly, the two baseline methods almost completelyoverlapped.thereasonmightbethatthemostuserswouldliketowatchmoviesfromthesame goballistofthemostpopularmoviesratherthanconsideringthepreferencesoftheirfriends. Asthesecondpartoftheempiricalstudy,weexperimentedwithcombiningtheoutputofthedeductive and the inductive reasoning module. In this scenario the inductive module models the longterm preferencesofusers,which,inthisexperiment,istrainedonlyondatathatisolderthan30days.itisquite reasonable to assume that the longterm preference model is updated only at larger intervals since the requiredcomputationscanbequitecostly,ifonewantstoavoidsubsampling.thedeductivereasoning module contributes predictions in the form of MostLiked, which simply aggregates recent recommendations to capture the shortterm trends of hype. Strictly speaking, the latter is also an inductiveprocess,butaggregationisinherentlysupportedbythedeductivecomponentaswell. Notethattheshorttermtrendcouldhavebeenpredictedbyamultivariateanalysissimilartothelong termmoduleandcombinedinacomparablewaytotheoutputofthedeductivemodule.however,figure4 right already shows clearly that the combination of the longterm inductive model and the MostLiked deductive model outperforms both separated methods. Experiments with more sophisticated hype modules andtheexplorationofdifferentcombinationschemesarepartoffuturework. 6 ConclusionandOutlook We have shown that RDF streams can serve as an interconnecting mechanism to support change managementwithindeductiveandinductivereasoners.suitabletypeextensionsprovidebothtimestamps and probabilities to RDF triples, thereby injecting the required ingredients in the basic RDF model. ExtensionstotheSPARQLlanguagemakeRDFstreamsfirstclasscitizens,allowingstaticandstreamingRDF datatobeusedwithinqueriesandallowingseveralwindowdefinitionoptions.csparqlembeddedwithin theentailmentofadeductivereasonercanbeusedtogenerateanoutputstream,whichcanthenfeedan inductive reasoner. Conversely, results of inductive reasoners, described by RDF triples tagged with probabilities,canbeinspectedbysparqlengines. This paper has illustrated a sequential integration between two existing stream reasoning environments within the LarKC platform. However, the pluggable architecture of LarKC allows also for other form of integrationwithreasonerssharingthesamerdfresources,freelyreactingtordfstreams,andmutually interacting. Acknowledgements ThisworkwaspartiallysupportedbytheEuropeanprojectLarKC(FP ).
12 References This article has been accepted for publication in IEEE Intelligent Systems but has not yet been fully edited. [babu01continuous]s.babuandj.widom, ContinuousQueriesoverDataStreams, SIGMODRec.,2001, 30(3): [BojarsBPTD08]U.Bojrs,J.G.Breslin,V.Peristeras,G.Tummarello,S.Decker,"InterlinkingtheSocialWeb withsemantics,"ieeeintelligentsystems,23(3),2008,pp [CDE98] Y. Zhuge, and H. GarciaMolina, Graph structured views and their incremental maintenance, Proc.ofthe14thInt.Conf.onDataEngineering,1998,pp [dsmsbook]m.garofalakis,,etal., DataStreamManagement:ProcessingHighSpeedDataStreams(Data CentricSystemsandApplications), SpringerVerlagNewYork,Inc.,2007. [EDBT2010] D. Barbieri, et al., An Execution Environment for CSPARQL Queries, Proc. Intl. Conf. on ExtendingDatabaseTechnology(EDBT2010),2010. [ESWC2010]D.Barbieri,etal., Incrementalreasoningonstreamsandrichbackgroundknowledge, Proc. oftheextendedsemanticwebconf.(eswc2010),2010. [ICSC08] D. Fensel et al., Towards LarKC: A Platform for WebScale Reasoning, Proc. IEEE Int l Conf. SemanticComputing(ICSC08),IEEEPress,2008. [IEEEIC2003]G.Linden,etal., Amazon.comRecommendations:ItemtoItemCollaborativeFiltering, IEEE InternetComputing,vol.7,2003,pp [IEEEISSR]E.DellaValle,etal., It'saStreamingWorld!ReasoninguponRapidlyChangingInformation, IEEEIntelligentSystems,24(6),2009,pp [ILP2010]Y.Huang,etal.,"MultivariatePredictionforLearningontheSemanticWeb,"Proc.ofthe20thInt. Conf.onInductiveLogicProgramming(ILP2010),2010. [IRMLeS2009] V. Tresp, et al., "Materializing and querying learned knowledge," Proc. of the 1st ESWC WorkshoponInductiveReasoningandMachineLearningontheSemanticWeb(IRMLeS2009),2009. [ISWC2007]B.CuencaGrau,etal., Historymatters:Incrementalontologyreasoningusingmodules, Proc. ofthe6thint.semanticwebconf.(iswc2007),springer,2007,pp [LDA]D.M.Blei,etal., Latentdirichletallocation, J.Mach.Learn.Res.,3,2003. [NNMF] D. D. Lee and H. S. Seung, Learning the parts of objects by nonnegative matrix factorization, Nature,1999. [SDOW2009]D.F.Barbieri,etal,"ContinuousQueriesandRealtimeAnalysisofSocialSemanticDatawith CSPARQL,"Proc.ofSocialDataontheWebWorkshopatISWC2009,2009. [StreamingSPARQL]A.Bolles,etal., StreamingSPARQL:ExtendingSPARQLtoProcessDataStreams, Proc. ofeuropeansemanticwebconf.(eswc2008),2008,pp [SuYYT10] J.H. Su, H.H. Yeh, P.S. Yu, V.S. Tseng, "Music Recommendation Using Content and Context InformationMining,"IEEEIntelligentSystems,25(1),2010,pp.1626.
13 T h i s a r t i c l e h a s b e e n a c c e p t e d S o m e c o n t e n t m a y c h a n g e [TASPARQL] A. Rodriguez, et al., Semantic Management of Streaming Data, Proc. Intl. Workshop on SemanticSensorNetworks(SSN),2009. [VLDB93] F. Fabret, et al., An adaptive algorithm for incremental evaluation of production rules in databases, Proc.ofthe19thInt.Conf.onVeryLargeDataBases(VLDB1993),1993,pp [VolzSM05] R. Volz, et al. Incrementally maintaining materializations of ontologies stored in logic databases, J.DataSemantics,2,2005,pp.134. [WWW2009]D.F.Barbierietal., CSPARQL:SPARQLforContinuousQuerying, Proc.18thInt lworldwide WebConf.(WWW09),ACMPress,2009,pp D i g i t a l O b j e c t I n d e n t i f i e r 1
14 DavideBarbieriDavideFrancescoBarbieriisaPhDstudentatDEI,PolitecnicodiMilano,since2008.His researchinterestsaddressseveralissuesrelatedtostreamprocessing.csparqlwillbethemaintopicof hisphddissertation. Dip.diElettronicaeInformazione, PolitecnicodiMilano, P.zaL.DaVinci,32, 20133Milano,Italy Tel.: Fax: Daniele Braga is an Assistant Professor at DEI, Politecnico di Milano, since He also got his PhD in Computer Science at Politecnico di Milano. His research interests address several issues related to the manipulation of semistructured data (visual languages and advanced processing for XML data), service integration(withspecificreferencetocomplexqueriesoverheterogeneouswebdatasources),andstream Reasoning(CSPARQLandreasoningoverstreamingdata). Dip.diElettronicaeInformazione, PolitecnicodiMilano, P.zaL.DaVinci,32, 20133Milano,Italy Tel.: Fax: Stefano Ceri is professor of Database Systems at the Dipartimento di Elettronica e Informazione (DEI), PolitecnicodiMilano.HewasvisitingprofessorattheComputerScienceDepartmentofStanfordUniversity ( ).Hisresearchworkcoversoverthreedecades( )andhasbeengenerallyconcerned withextendingdatabasetechnologiestoincorporatenewfeatures:distribution,objectorientation,rules, streamingdata;withtheadventoftheweb,researchhasbeentargetedtowardstheengineeringofweb based applications and complex search systems. He is editor in chief of the book series "Data Centric SystemsandApplications"(SpringerVerlag),andauthorofabout250articlesonInternationalJournalsand ConferenceProceedingsandofnineinternationalbooks.HehasbeenawardedanIDEASAdvancedGrant, fundedbytheeuropeanresearchcouncil(erc),onsearchcomputing( ). Dip.diElettronicaeInformazione, PolitecnicodiMilano, P.zaL.DaVinci,32, 20133Milano,Italy Tel.: Fax: [email protected]
15 EmanueleDellaValleisanAssistantProfessorof"SoftwareProjectManagement"attheDepartmentof ElectronicsandInformationofthePolitecnico di MilanosinceJuly2008.HestartedCEFRIEL ssemantic WebActivitiesin2001andhecoordinatedtheSemanticWebgroupuntilJune2008.Hiscurrentinterestis onscalableprocessingofinformationatsemanticlevel.currentlyheisfocusingonstreamreasoning. Dip.diElettronicaeInformazione, PolitecnicodiMilano, P.zaL.DaVinci,32, 20133Milano,Italy Tel.: Fax: YiHuangisastuffscientistinSiemensCorporateTechnologysince2008andheisfinishingPh.D.inLudwig MaximilianUniversityofMunich,Germany,wherehereceivedhisDiplomadegreeinComputerSciencein 2005.HisresearchinterestsfocusonStatisticalMachineLearning,TextMining,InformationRetrievaland SemanticWeb. SiemensAG CorporateTechnologyCorporateResearchandTechnologies CTTDETC3 OttoHahnRing Munich,Germany Tel.:+49(89) Fax:+49(89) VolkerTrespreceivedaDiplomadegreefromtheUniversityofGöttingen,Germany,in1984,andaPh.D. fromyaleuniversityin1989.since1989heistheheadofaresearchteaminmachinelearningatsiemens, CorporateResearchandTechnology.In1994hewasavisitingscientistattheMassachusettsInstituteof Technology's Center for Biological and Computational Learning. Each summer (since 2003) he gives a lecture on machine learning and data mining at the University of Munich. He has been involved in all leading program committees in machine learning and is on the organizing committee of the annual LearningWorkshop.Hehasservedonnumerousindustrialandacademicadvisoryboards. SiemensAG CorporateTechnologyCorporateResearchandTechnologies CTTDETC3 OttoHahnRing Munich,Germany Tel.:+49(89) Fax:+49(89) [email protected]
16 AchimRettingerisaprojectmanagerandassistantprofessorattheInstituteofAppliedInformaticsand FormalDescriptionMethods(AIFB)attheKarlsruheInstituteofTechnology(KIT,Germany).Hisresearch interestsandpublicationsareincombiningmachinelearning,knowledgediscoveryandhumancomputer systems with semantic technologies. Achim Rettinger did his PhD studies at the Technische Universtät München(TUM,Germany)andattheSiemensAGinMunich,Germany.Hestudiedandworkedonresearch projectsattheuniversityofkoblenzlandau(germany),osakauniversity(japan),universityofbath(uk), UniversityofAlberta(Canada)andUniversityofGeorgia(USA). InstituteAIFBBuilding11.40 KITCampusSüd D76128Karlsruhe,Germany Tel.: Fax: HendrikWermserdidhisBachelor'sthesisatTechnischeUniversitätMünchen(TUM)andatSiemensAG, Munich, Germany and is currently pursuing his Master's degree at TUM. His research interests are in informationretrieval,recommendersystemsandmachinelearning. TechnicalUniversityofMunich Theresienstr.106 D80333München,Germany Phone: Fax:
Deductive and Inductive Stream Reasoning for Semantic Social Media Analytics
S o c i a l M e d i a A n a l y t i c s a n d I n t e l l i g e n c e Deductive and Inductive Stream Reasoning for Semantic Social Media Analytics Davide Barbieri, Daniele Braga, Stefano Ceri, and Emanuele
Querying RDF Streams with C-SPARQL
Querying RDF Streams with C-SPARQL Davide Francesco Barbieri Daniele Braga Stefano Ceri Emanuele Della Valle Michael Grossniklaus Politecnico di Milano, Dipartimento di Elettronica e Informazione Piazza
LinkZoo: A linked data platform for collaborative management of heterogeneous resources
LinkZoo: A linked data platform for collaborative management of heterogeneous resources Marios Meimaris, George Alexiou, George Papastefanatos Institute for the Management of Information Systems, Research
Semantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
The Ontology and Architecture for an Academic Social Network
www.ijcsi.org 22 The Ontology and Architecture for an Academic Social Network Moharram Challenger Computer Engineering Department, Islamic Azad University Shabestar Branch, Shabestar, East Azerbaijan,
A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1
A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 Yannis Stavrakas Vassilis Plachouras IMIS / RC ATHENA Athens, Greece {yannis, vplachouras}@imis.athena-innovation.gr Abstract.
The Ontological Approach for SIEM Data Repository
The Ontological Approach for SIEM Data Repository Igor Kotenko, Olga Polubelova, and Igor Saenko Laboratory of Computer Science Problems, Saint-Petersburg Institute for Information and Automation of Russian
How To Evaluate Web Applications
A Framework for Exploiting Conceptual Modeling in the Evaluation of Web Application Quality Pier Luca Lanzi, Maristella Matera, Andrea Maurino Dipartimento di Elettronica e Informazione, Politecnico di
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
Industry 4.0 and Big Data
Industry 4.0 and Big Data Marek Obitko, [email protected] Senior Research Engineer 03/25/2015 PUBLIC PUBLIC - 5058-CO900H 2 Background Joint work with Czech Institute of Informatics, Robotics and
OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP
Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key
DISCOVERING RESUME INFORMATION USING LINKED DATA
DISCOVERING RESUME INFORMATION USING LINKED DATA Ujjal Marjit 1, Kumar Sharma 2 and Utpal Biswas 3 1 C.I.R.M, University Kalyani, Kalyani (West Bengal) India [email protected] 2 Department of Computer
Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful
LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model
LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model 22 October 2014 Tony Hammond Michele Pasin Background About Macmillan
Big Data and Semantic Web in Manufacturing. Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India
Big Data and Semantic Web in Manufacturing Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India Outline Big data in Manufacturing Big data Analytics Semantic web technologies Case
Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.
White Paper Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. Using LSI for Implementing Document Management Systems By Mike Harrison, Director,
A Generic Transcoding Tool for Making Web Applications Adaptive
A Generic Transcoding Tool for Making Applications Adaptive Zoltán Fiala 1, Geert-Jan Houben 2 1 Technische Universität Dresden Mommsenstr. 13, D-01062, Dresden, Germany [email protected]
Supporting Change-Aware Semantic Web Services
Supporting Change-Aware Semantic Web Services Annika Hinze Department of Computer Science, University of Waikato, New Zealand [email protected] Abstract. The Semantic Web is not only evolving into
DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
DISIT Lab, competence and project idea on bigdata. reasoning
DISIT Lab, competence and project idea on bigdata knowledge modeling, OD/LD and reasoning Paolo Nesi Dipartimento di Ingegneria dell Informazione, DINFO Università degli Studi di Firenze Via S. Marta 3,
LDIF - Linked Data Integration Framework
LDIF - Linked Data Integration Framework Andreas Schultz 1, Andrea Matteini 2, Robert Isele 1, Christian Bizer 1, and Christian Becker 2 1. Web-based Systems Group, Freie Universität Berlin, Germany [email protected],
Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek
Recommender Systems: Content-based, Knowledge-based, Hybrid Radek Pelánek 2015 Today lecture, basic principles: content-based knowledge-based hybrid, choice of approach,... critiquing, explanations,...
Semantic Video Annotation by Mining Association Patterns from Visual and Speech Features
Semantic Video Annotation by Mining Association Patterns from and Speech Features Vincent. S. Tseng, Ja-Hwung Su, Jhih-Hong Huang and Chih-Jen Chen Department of Computer Science and Information Engineering
CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
Introduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
Capturing Meaningful Competitive Intelligence from the Social Media Movement
Capturing Meaningful Competitive Intelligence from the Social Media Movement Social media has evolved from a creative marketing medium and networking resource to a goldmine for robust competitive intelligence
Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object
Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object Anne Monceaux 1, Joanna Guss 1 1 EADS-CCR, Centreda 1, 4 Avenue Didier Daurat 31700 Blagnac France
Understanding Web personalization with Web Usage Mining and its Application: Recommender System
Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,
(Refer Slide Time: 01:52)
Software Engineering Prof. N. L. Sarda Computer Science & Engineering Indian Institute of Technology, Bombay Lecture - 2 Introduction to Software Engineering Challenges, Process Models etc (Part 2) This
Analysis of Data Mining Concepts in Higher Education with Needs to Najran University
590 Analysis of Data Mining Concepts in Higher Education with Needs to Najran University Mohamed Hussain Tawarish 1, Farooqui Waseemuddin 2 Department of Computer Science, Najran Community College. Najran
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
The Big Data methodology in computer vision systems
The Big Data methodology in computer vision systems Popov S.B. Samara State Aerospace University, Image Processing Systems Institute, Russian Academy of Sciences Abstract. I consider the advantages of
Using Semantic Data Mining for Classification Improvement and Knowledge Extraction
Using Semantic Data Mining for Classification Improvement and Knowledge Extraction Fernando Benites and Elena Sapozhnikova University of Konstanz, 78464 Konstanz, Germany. Abstract. The objective of this
Activity Mining for Discovering Software Process Models
Activity Mining for Discovering Software Process Models Ekkart Kindler, Vladimir Rubin, Wilhelm Schäfer Software Engineering Group, University of Paderborn, Germany [kindler, vroubine, wilhelm]@uni-paderborn.de
Mastering the Velocity Dimension of Big Data
Mastering the Velocity Dimension of Big Data Emanuele Della Valle DEIB - Politecnico di Milano [email protected] It's a streaming world Agenda Mastering the velocity dimension with informaeon
Lightweight Data Integration using the WebComposition Data Grid Service
Lightweight Data Integration using the WebComposition Data Grid Service Ralph Sommermeier 1, Andreas Heil 2, Martin Gaedke 1 1 Chemnitz University of Technology, Faculty of Computer Science, Distributed
Data Refinery with Big Data Aspects
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management
Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management Paper Jean-Louis Amat Abstract One of the main issues of operators
Design Abstractions for Innovative Web Applications: the case of the SOA augmented with Semantics
Design Abstractions for Innovative Web Applications: the case of the SOA augmented with Semantics Stefano Ceri 1, Marco Brambilla 1, Emanuele Della Valle 2 1 Dipartimento di Elettronica e Informazione,
Presente e futuro del Web Semantico
Sistemi di Elaborazione dell informazione II Corso di Laurea Specialistica in Ingegneria Telematica II anno 4 CFU Università Kore Enna A.A. 2009-2010 Alessandro Longheu http://www.diit.unict.it/users/alongheu
THE SEMANTIC WEB AND IT`S APPLICATIONS
15-16 September 2011, BULGARIA 1 Proceedings of the International Conference on Information Technologies (InfoTech-2011) 15-16 September 2011, Bulgaria THE SEMANTIC WEB AND IT`S APPLICATIONS Dimitar Vuldzhev
María Elena Alvarado gnoss.com* [email protected] Susana López-Sola gnoss.com* [email protected]
Linked Data based applications for Learning Analytics Research: faceted searches, enriched contexts, graph browsing and dynamic graphic visualisation of data Ricardo Alonso Maturana gnoss.com *Piqueras
Characterizing Knowledge on the Semantic Web with Watson
Characterizing Knowledge on the Semantic Web with Watson Mathieu d Aquin, Claudio Baldassarre, Laurian Gridinoc, Sofia Angeletou, Marta Sabou, and Enrico Motta Knowledge Media Institute (KMi), The Open
Semantic Stored Procedures Programming Environment and performance analysis
Semantic Stored Procedures Programming Environment and performance analysis Marjan Efremov 1, Vladimir Zdraveski 2, Petar Ristoski 2, Dimitar Trajanov 2 1 Open Mind Solutions Skopje, bul. Kliment Ohridski
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Automating Big Data Management, by DISIT Lab Distributed [Systems and Internet, Data Intelligence] Technologies Lab Prof. Ph.D. Eng.
Automating Big Data Management, by DISIT Lab Distributed [Systems and Internet, Data Intelligence] Technologies Lab Prof. Ph.D. Eng. Paolo Nesi Dipartimento di Ingegneria dell Informazione, DINFO Università
Tensor Factorization for Multi-Relational Learning
Tensor Factorization for Multi-Relational Learning Maximilian Nickel 1 and Volker Tresp 2 1 Ludwig Maximilian University, Oettingenstr. 67, Munich, Germany [email protected] 2 Siemens AG, Corporate
Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study
Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study Amar-Djalil Mezaour 1, Julien Law-To 1, Robert Isele 3, Thomas Schandl 2, and Gerd Zechmeister
ANALYTICS IN BIG DATA ERA
ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut
CitationBase: A social tagging management portal for references
CitationBase: A social tagging management portal for references Martin Hofmann Department of Computer Science, University of Innsbruck, Austria [email protected] Ying Ding School of Library and Information Science,
Sanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 [email protected] 1. Introduction The field of data mining and knowledgee discovery is emerging as a
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)
large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven
A HUMAN RESOURCE ONTOLOGY FOR RECRUITMENT PROCESS
A HUMAN RESOURCE ONTOLOGY FOR RECRUITMENT PROCESS Ionela MANIU Lucian Blaga University Sibiu, Romania Faculty of Sciences [email protected] George MANIU Spiru Haret University Bucharest, Romania Faculty
SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA
SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA J.RAVI RAJESH PG Scholar Rajalakshmi engineering college Thandalam, Chennai. [email protected] Mrs.
Data, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
Data Warehousing and Data Mining in Business Applications
133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business
HOW TO DO A SMART DATA PROJECT
April 2014 Smart Data Strategies HOW TO DO A SMART DATA PROJECT Guideline www.altiliagroup.com Summary ALTILIA s approach to Smart Data PROJECTS 3 1. BUSINESS USE CASE DEFINITION 4 2. PROJECT PLANNING
Oracle Real Time Decisions
A Product Review James Taylor CEO CONTENTS Introducing Decision Management Systems Oracle Real Time Decisions Product Architecture Key Features Availability Conclusion Oracle Real Time Decisions (RTD)
SPARQL UniProt.RDF. Get these slides! Tutorial plan. Everyone has had some introduction slash knowledge of RDF.
SPARQL UniProt.RDF Everyone has had some introduction slash knowledge of RDF. Jerven Bolleman Developer Swiss-Prot Group Swiss Institute of Bioinformatics Get these slides! https://sites.google.com/a/jerven.eu/jerven/home/
Semantic Concept Based Retrieval of Software Bug Report with Feedback
Semantic Concept Based Retrieval of Software Bug Report with Feedback Tao Zhang, Byungjeong Lee, Hanjoon Kim, Jaeho Lee, Sooyong Kang, and Ilhoon Shin Abstract Mining software bugs provides a way to develop
Master s Program in Information Systems
The University of Jordan King Abdullah II School for Information Technology Department of Information Systems Master s Program in Information Systems 2006/2007 Study Plan Master Degree in Information Systems
A Practical Approach to Process Streaming Data using Graph Database
A Practical Approach to Process Streaming Data using Graph Database Mukul Sharma Research Scholar Department of Computer Science & Engineering SBCET, Jaipur, Rajasthan, India ABSTRACT In today s information
Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2
Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,
Data Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
Model Driven Interoperability through Semantic Annotations using SoaML and ODM
Model Driven Interoperability through Semantic Annotations using SoaML and ODM JiuCheng Xu*, ZhaoYang Bai*, Arne J.Berre*, Odd Christer Brovig** *SINTEF, Pb. 124 Blindern, NO-0314 Oslo, Norway (e-mail:
Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens
Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens 1 Optique: Improving the competitiveness of European industry For many
K@ A collaborative platform for knowledge management
White Paper K@ A collaborative platform for knowledge management Quinary SpA www.quinary.com via Pietrasanta 14 20141 Milano Italia t +39 02 3090 1500 f +39 02 3090 1501 Copyright 2004 Quinary SpA Index
How To Create A Data Transformation And Data Visualization Tool In Java (Xslt) (Programming) (Data Visualization) (Business Process) (Code) (Powerpoint) (Scripting) (Xsv) (Mapper) (
A Generic, Light Weight, Pluggable Data Transformation and Visualization Tool for XML to XML Transformation Rahil A. Khera 1, P. S. Game 2 1,2 Pune Institute of Computer Technology, Affiliated to SPPU,
DataOps: Seamless End-to-end Anything-to-RDF Data Integration
DataOps: Seamless End-to-end Anything-to-RDF Data Integration Christoph Pinkel, Andreas Schwarte, Johannes Trame, Andriy Nikolov, Ana Sasa Bastinos, and Tobias Zeuch fluid Operations AG, Walldorf, Germany
Prerequisites. Course Outline
MS-55040: Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot Description This three-day instructor-led course will introduce the students to the concepts of data mining,
BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE
BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE Alex Lin Senior Architect Intelligent Mining [email protected] Outline Predictive modeling methodology k-nearest Neighbor
SURVEY REPORT DATA SCIENCE SOCIETY 2014
SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses
Chapter 6. Attracting Buyers with Search, Semantic, and Recommendation Technology
Attracting Buyers with Search, Semantic, and Recommendation Technology Learning Objectives Using Search Technology for Business Success Organic Search and Search Engine Optimization Recommendation Engines
An XML Framework for Integrating Continuous Queries, Composite Event Detection, and Database Condition Monitoring for Multiple Data Streams
An XML Framework for Integrating Continuous Queries, Composite Event Detection, and Database Condition Monitoring for Multiple Data Streams Susan D. Urban 1, Suzanne W. Dietrich 1, 2, and Yi Chen 1 Arizona
Logic and Reasoning in the Semantic Web (part I RDF/RDFS)
Logic and Reasoning in the Semantic Web (part I RDF/RDFS) Fulvio Corno, Laura Farinetti Politecnico di Torino Dipartimento di Automatica e Informatica e-lite Research Group http://elite.polito.it Outline
Visual Analysis of Statistical Data on Maps using Linked Open Data
Visual Analysis of Statistical Data on Maps using Linked Open Data Petar Ristoski and Heiko Paulheim University of Mannheim, Germany Research Group Data and Web Science {petar.ristoski,heiko}@informatik.uni-mannheim.de
Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014
Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
Rotorcraft Health Management System (RHMS)
AIAC-11 Eleventh Australian International Aerospace Congress Rotorcraft Health Management System (RHMS) Robab Safa-Bakhsh 1, Dmitry Cherkassky 2 1 The Boeing Company, Phantom Works Philadelphia Center
D5.3.2b Automatic Rigorous Testing Components
ICT Seventh Framework Programme (ICT FP7) Grant Agreement No: 318497 Data Intensive Techniques to Boost the Real Time Performance of Global Agricultural Data Infrastructures D5.3.2b Automatic Rigorous
Search Result Optimization using Annotators
Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,
Intrusion Detection via Machine Learning for SCADA System Protection
Intrusion Detection via Machine Learning for SCADA System Protection S.L.P. Yasakethu Department of Computing, University of Surrey, Guildford, GU2 7XH, UK. [email protected] J. Jiang Department
A Semantic Marketplace of Peers Hosting Negotiating Intelligent Agents
A Semantic Marketplace of Peers Hosting Negotiating Intelligent Agents Theodore Patkos and Dimitris Plexousakis Institute of Computer Science, FO.R.T.H. Vassilika Vouton, P.O. Box 1385, GR 71110 Heraklion,
