Structured Representation Models. Structured Information Sources

Size: px
Start display at page:

Download "Structured Representation Models. Structured Information Sources"

Transcription

1 SchemalessRepresentationofSemistructured Dong-YalSeo1,Dong-HaLee1,Kang-SikMoon1,JisookChang1, DataandSchemaConstruction 1Dept.ofComputerScienceandEngineering PohangUniversityofScienceandTechnology Jeon-YoungLee1,andChang-YongHan2 2DataWarehouseAdvancedTechnology Pohang,Kyungbuk, ,KOREA Abstract.Weshouldconsidersemistructureddataofwhichhavea Youngdeungpo-Gu,Seoul, ,KOREA OracleSystemsKorea,Ltd. weakschemainformationinnetworkedinformationworld.tomanage suchsemistructureddataeciently,thispaperintroducesadatamodel fullydependentonschemalessmanipulations.forschemaconstruction, transformsemistructureddataintostructuredonebyintroducingschema constructionmethodology,comparedtotheformerstudieswhichare forsemistructureddataandoperationsforschemaconstruction.we wedenedoperationsforbuildingis-a/is-part-ofrelationships,collectingdataobjectstobuildaprimitiveclass,andmergingtwodata 1Introduction instancesorclasses. 1.1Motivation Inearlystagesofdataprocessingsuchasinventory/accountmanagementsystems,acentralizedlargedatabasesystemwasusedasaninformationserver. dataviapredenedschema.theschemaisrmandend-usersarenotresponsible Throughthedatabasedesign(orreal-worldmodeling)phase,DBA(Database andcreationafterschemadenition.forend-users,theirroleistomanipulate Administrator)denesawell-structuredschema.Weperformdataacquisition WideWeb)isatypicaldomainoftheexamples.Everyusercreateshis/her forschemamanagement. owndocumentsandsubmittsinthewww.howtomanagethoseplentyof usercreatesandupdateshis/herowninformation,likedba.thewww(world- informationisprovidedbyindividualusersandupdatedveryquickly.eachend- Asdatasourcesandcomputingenvironmentaredistributed,abundantof hyper-linksorsearchbykeywordsbecausethereisnoabsoluteschemainthe HTMLdocumentsandotherwebresources?Weshouldalwaysnavigatethrough storedinformation.ifwecoulddeneaschemaonthesetofwebresources, useastructuredquerylanguage.schemaprovidesthewell-structuredviewof notonlywehaveabetterstructureofgatheredinformationbutalsowecan

2 Semistructured Data Processing Lightweight Information Systems Structured Data Processing Conventional DBMSs storeddata.itexpressesdatalocation,relationshipsamongdataobjects,data categories,summarizedconcepts,andsoon. structuremaybeirregularorincomplete,areknownassemistructureddata.even thoughadataiscreatedasawell-structured,i.e.,schema-based,setofdata,it becomessemistructuredwhenthedatacomesoutfromitsoriginalstructure. Forexample,asinglerecordfromarelationaltableissemistructuredifwehave Thedatasets,wherethereisnoabsoluteschemaxedinadvanceandwhose tables. 1.2ProblemsandApproaches noideaabouttheoverallstructureofthetableandtherelationshipswithother Figure1showstheapproachesofinformationprocessingbasedonthestructuralnatureofdatasources.Rightsideofthegurepresentsconventionaldata well-structuredmodel,likerelationalorobject-oriented.usersmanipulatedata processing.informationitselfhasarigidstructureandisrepresentedwitha withaschemawhichmainlyprovidesanstructuralabstractionofstoreddata. Fig.1.InformationStructuresandProcessings Schema Construction Semistructured Structured Representation Models Representation Models Structured instances,andisstoredinalightweightstorage.storedinformationismanipulatedwithalightweightquerylanguage,whichcanbeusedwithincomplete schemainformation. databaseschema.althoughschemalessmanipulationisconvenientforuserswho wanttoretrievedatawithoutdeepknowledgeofunderlyingstructures,schema givesrmnessandconceptualizedview. isindispensableforembeddedsql,apicalls,orstoredprocedures.schema Thestudiesonlightweightapproachesmuchoverlookedtheimportanceof representedwithalightweightmodel,whichpermitsschemalesscreationofdata turedinformation[12](orevenunstructured[3]).semistructuredinformationis Leftsideshowstheprocessingofinformationwithpoorschema,i.e.semistruc- Information Sources Sources outcompleteknowledgeofthepredenedschema.(orevenwithoutanyschema Insemistructureddataprocessing,end-usersrepresentdatainstanceswith-

3 lesscreation,schema-basedmanipulation"whichinvolvesthefollowinggoals: phase.forsemistructureddataprocessing,weestablishedthestrategy,\schema- information.)sodatacreationphasecanbeperformedbeforeschemadenition 1.Providearepresentationmodelforschemalessdatainstances.Themodel 2.Deviseamechanismforschemaconstructionwhichcanbeappliedtoa shouldbeexpressiveenoughtodescribesemistructureddatainstancesfrom heterogeneousdatasources. schemalesspoolofdatainstances.afterapplyingschemaconstructionprocedures,wewillhavearigidschemaandmanipulatethestoreddatawith relatedworkandcorrespondingcontributions.section3and4addressadata Theremainingpartsofthispaperiscomposedasfollows.Section2presents modelforschemalesscreationofdataobjectsandoperationsforschemaconstruction,themaincontributionofthiswork,respectively.andnally,conclusion SQL. 2RelatedWork anddirectionsforfutureworkarediscussedinsection5. workdealswiththeproblemsininformationgatheringlayer.morespecically, Generalproblemsofnetworkedinformationprocessingarediscussedin[4]. weareinterestedindatamappingproblemandweintroduceschemaconstruction informationinterface,informationdispersion,andinformationgathering.our Threeconceptuallayersofnetworkedinformationsystemsareintroducedas operatorsforthatproblem.theimportanceandthemotivationaboutschemalessdatarepresentationsandmanipulationsarediscussedin[1][3][11][12]. Althoughnotdevelopedasasemistructureddatamodel,O2'scomplexvalue OEM(ObjectExchangeModel)[11],Labeled-Tree[3],andDataForestModel[1]. model[8]showsagoodwayofsemistructuredrepresentationwithattribute-value Schemalessdatainstancesareusuallydescribedbytheirattributesandcorrespondingvalues.Attribute-valuepairswereusedfordatarepresentationin andtheirexpressivepowersarealmostsame. pairs.alltheearliermodelsforsemistructureddataaresimilartoeachother substructuresaswellasatomicvalueslikeintegersandstringsbyusingattributevaluepairs4.labeled-treemodel,hasthesameexpressivepowerastheoem, representssemistructureddataastrees,i.e.,thetreeswithalabelingofedges. TSIMMIS3project[7]introducesOEMandotherrelatedwork[12]aboutthe integrationofheterogeneousinformationsources.oemprovidessetsandnested DataForestModelsupportslisttypewhichisunabletobedescribedinthe OEMandthelabeled-tree. proachdrawsadistinctionbetweenourstudyandconventionalmethodologiesin importanceofdatabaseschemaistoooverlooked.theschemaconstructionap- 3TheStanford-IBMManagerofMultipleInformationSources Theformerstudiesaremainlyfocusedonlightweightapproachesandthe 4In[11],theterm\level-valuepair"wasusedinsteadof\attribute-valuepair".

4 semistructureddataprocessing.theproblemsonstructuring[13]andtyping[9] semistructureddataareintroducedrecently. forclasscompositionswhichdealswithbehaviorscomparedtoourapproach whichdealswithdata. Thestudiesonsubject-orientedprogramming[10]introduceamethodology orrelationshipsamongobjects.wemainlyconsidereddataobjectsfromthe operationsandproperties.propertiesdeneeitherattributesoftheobjectitself 3SchemalessCreationofDataObjects viewpointofproperties. Objectsareusuallydistinguishedbytheirtypes,wheretypedescribesapplicable 3.1ModelDenition Ourdatamodeldescribesschemalessdataobjectswithaseriesofattributevaluepairs,calledAVPL(Attribute-ValuePairsList).Anattribute-valuepairis asetofattributes,andasetofvaluesasd,a,andv,respectively,avplis denedasfollows: 1.Singleattribute-valuepairisanAVPL 2.Unionoftwoattribute-valuepairsisanAVPL (a2a)^(v2v)?!f(a;v)g2d composedoftwotuples,attributeandvalue.whenwedenoteasetofavpl, itselfisalsoanattribute.whensdenotesthesetofstrings,attributeisdened asfollows: Attributeisanorderedcollectionofoneormorevariables,whereeachvariable D1;D22D?!D1[D22D 2.Compositeattribute(Attributewithmultiplevariables) 1.Singletonattribute a1;a2;:::;an2a?!(a1;a2;:::;an)2a s2s?!s2a attributeanditsvariables.assignmentofvaluestoattributesaredenedas follows: Valueisanassignableinstance,orasetofinstances,tothecorresponding where(a1;a2;:::;an)isanorderedsequenceofattributevariables. 1.Singletonattributeandvalue wherea2aandv2v a?v

5 2.Compositeattributeandvalue Thedomainofattributesincludesprimitivestrings,referencesofvalues,set (1in). where(a1;a2;:::;an)2a,(v1;v2;:::;vn)2v,andeachviisassignedtoai?(v1;v2;:::;vn) denedasfollows: structure. ofavplobjects)isalsoacomponentofotheravplobjectsandallowsnested (orlist)ofvalues,andavplobjects.therefore,anavplobjectitself(oraset 1.Primitivecharacterstringss2S?!s2V Whenwedenoteaset(ortypesystem)ofvaluesasV,typesofvaluesare 2.Referencestoanytypeofvalues wheresdenotesasetofstrings. 3.Setofanytypesofvalues(unordered) where&visthereference,i.e.,identier,ofvand. v1;v2;:::;vn2v?!fv1;v2;:::;vng2v v2v?!&v2v 4.Listofanytypesofvalues(ordered) 5.AVPLobjectsv1;v2;:::;vn2V?!<v1;v2;:::;vn>2V 6.Null(emptyvalue) whereddenotesasetofavplobjects. Anyattributecanbenullvalued,i.e.,novalueisassigned. d2d?!d2v 7.Identier 3.2ExpressivePower Aselfcontainedlabel,astring,whichbeginswith`#'.Identierisoptional AlltheatomicvaluesarestringsinAVPL.Otherkindsofatomictypeslike andusedbyotheravplobjectsasareference. integer,oat,andbooleanarenotprovided.thosetypescanbeeasilyderived fromadatasourceandusersarefreefromatomictypes. shipcannotberepresented.figure2isanexampleofavplobjectintabular sets,lists,andnestedstructures.itprovidestablestructureswithacomposite recordtuples).advancedsemanticsofobject-orientedmodel,likeis-arelation- attribute(astableheaders)andacorrespondinglistofcompositevalues(as Wecanrepresenttable-structuredvaluesaswellasreferences(identiers), representation.

6 Name Research Education Contact Dong-Yal Seo Database Degree School Year BS POSTECH SchemaConstruction Fig.2.TabularRepresentationofanExampleAVPLObject MS POSTECH 1994 Telephone Fax 4.1SchemaandObjects Schema,inanOODB,denesclassesandtheirrelationships.Andtherelationshipsamongclassesimplytherelationshipsamongobjects.Schemadenesboth structuralandbehavioralpartofaclass.inthiswork,wemainlyfocusedon classfromasetofinstances,and2)variousrelationshipsamongthoseclasses. Atrst,wewillremindpossiblerelationshipsamongclassestodeneoperations forconstructingclassesandtheirrelationships.therearetworelationshipsbetweenobejcts,is-aandis-part-of.theformeristhebasisoftheinheritance Toconstructaschemafromapoolofschemalessobjects,weshouldbuild1)a structuralpart. hierarchyandthelatterisisthebasisofthecompositionhierarchy. informationinanobject-orientedmodel.typeisimplementedasaclassandthe classdenesacollectionrelationship.notonlyobjectsarecreatedasinstances ofaclass,aclasscouldbecreatedasacollectionofinstances. Atypeisacollectionofobjectswiththesamestructuralandbehavioral uniqueinobject-orientedworld.sotwodescriptionscanhaveanequivalence relationship. twodescriptionsmustbemergedintoasingledescriptionbecauseanobjectis Oneobjectcanbedescribedinmorethantwoways.Inthiscase,those 4.2SchemaConstructionOperations 1.CreationofaclassbyInstanceCollection ofavplobjectsuisconstructedwithobjectcollect(s1;:::;sm)if FortheAVPLobjectsS1;:::;Smandtheirattribute-setsA1;:::;Am,aclass valueofaisnull. a2ua,ifthereisanyavplobjectsi(1m)whereaisnotinai,the wheretheattribute-setuaofuisua=a1[a2[[amforallattributes U=fS1;S2;:::;Smg 2.Merging

7 (a)objectmerging withobjectmerge(s,t)if FortwoAVPLobjectsSandT,anewAVPLobjectUisconstructed wheretheattribute-setofuisthesameass[tand^aisacommon(shared)attribute-valuepairofsandt.wisanattribute-value U=fwj(w2S[T)^9^a(^a2S^^a2T)g (b)classmerging isconstructedwithclassmerge(s,t)ifu2uisconstructedwithobjectmerge(s,t)anduisconstructedwithobjectcollect(u1;u2;:::;um) FortwoclassesofAVPLobjectsSandT,anewclassofAVPLobjectsU pair. 3.Composition(IS-PART-OFRelationship) (a)objectcomposition wheres2s,t2t,andui2ufor1im. FortwoAVPLobjectsS,T,anewAVPLobjectUisconstructedwith ObjectCompose(S,T)ifU=(S?t)[^T (b)classcomposition wheret2sandt2t.^tistitselforareferencetot.theattribute-set ofuisthesameass. 1im. structedwithobjectcompose(s,t)anduisconstructedwithob- jectcollect(u1;u2;:::;um)wheres2s,t2t,andui2ufor jectsuisconstructedwithclasscompose(s,t)ifu2uiscon- FortwoclassesofAVPLobjectsSandT,anewclassofAVPLob- 4.Inclusion(IS-ARelationship) FortwosetsofAVPLobjectsUandV,anewrelationship,Uisasubsetof V,canbeconstructedwithClassInclude(U,V)if whereattribute-setsuaandvaofuandv,respectively,hasrelationshipof VAUA. UV 5.Triviallywecandeneadditionaloperations,suchasdestruction,splitting, jectcollect().otheroperationslikeclassmerge()orclasscompose()canbe Figure3explainstheoperationsObjectMerge(),ObjectCompose(),andOb- andexclusion,fromtheinverseoftheabovedenedoperations. implementedbyusingobjectmerge()orobjectcompose()withobjectcollect(), respectively.objectcompose()infigure3meanscompositionbyreferencevalue.

8 o1 Name Advisor Research Dong-Yal Seo J.Y. Lee Database o2 Name Telephone Dong-Yal Seo o3 = Object_Merge(o1, o2) Name Advisor Research Telephone Dong-Yal Seo J.Y. Lee Database dyseo@white.... a) Object Merging o5= Object_Compose(o3, o4) Name Advisor Research Telephone Dong-Yal Seo Database dyseo@white.... o4 Name Position Lab.... J.Y. Lee Associate Prof. IIS b) Object Composition o1 o6 Name Age Address Home City Chang-Yong Han 28 Pohang Sungnam o7 = Object_Collect(o1, o6) Name Advisor Research Age Address Home City 4.3SchemaConsistency Fig.3.ExampleofSchemaConstructionOperations Dong-Yal Seo J.Y. Lee Database Chang-Yong Han 28 Pohang Sungnam Whentheuserrunsaschemaconstructionprocedureusingabovementioned operations,schemaevolutiontakesplaceinthepre-existingschema.indatabase c) Object Collection to Build a Class maptheconsequencesoftheeectsonthetaxonomyoftheschema-modication world,itisveryimportanttokeepschemaconsistency.weintroduceseveral eectsofschemaconstructionoperationsontheexistingschemahierarchyand aectstaticstructureofclasses. operationslistedin[2].infact,theschemaconstructionoperationsmightheavily ationinschemaevolutiontaxonomyiftheclasstobemergedhasrelationships berejected5. cannotbepreserved,theoperationthatbreaksschemaconsistencyrulesshould withotherclasses.thus,iftheinvariantspropertiesoftheinheritancehierarchy Forexample,mergeoperationcouldbeconsideredasattributeaddingoper- 5Refer[6]formoreaboutschemainvariants. WechoseschemaevolutiontaxonomyofORIONdatamodelbasedonthe

9 comparisonsin[6].table1showsthetaxonomyofschemamodicationsin anobject-orienteddatabaseandtheircorrespondingschemaconstructionoperations.itmeansthatwecanmaptheconsistencyproblemsbyourschema constructionoperationsintoschemamodicationproblems. havevaluablemeaninginclassicalobject-orientedmodelwhereclassdenition alwaysprecedesobjectinstantiation. notndanytaxonomyformethods.weneitheraddressthecategoryofdefault valueattributesorsharedattributesdenedin[2],sincethesefunctionsonly Becausewedidnotconsiderthebehavioralpartofobjects,thereaderwill SchemaConstructionCorrespondingSchemaEvolution Merge Split Table1.SchemaConstructionOperationsandEvolutions Compose Decompose Include Modifythedomain'sattributes Modifycompositeattributesintononcompositeattributes Addattributes Exclude Deleteattributesandbuildanewclass Collect MakeaclassSthesuperclassofclassC RemoveaclassSfromthelistofsuperclassesofclassC Createanewclass Weintroducedanewmodelofdatabaseprocessingwhereobjectsarecreated beforeschemadenition.wedenedatypesystemforsemistructureddatainstances,andtheoperationsfortheconstructionofstructuralschemafromaset whichcontainsalistofuser-denedattributesandtheircorrespondingvalues. Forschemaconstruction,wedenedoperationsforbuildingIS-AandIS-PART- OFrelationships,collectingobjectstobuildaclass,andmergingtwoobjectsor classestomakealargerone.operationscanbeappliedinbothobject-leveland Inourdatamodel,aschemalessdatainstanceiscreatedasadescription 5ConclusionandFutureWork ofschemalessdatainstances. class-level. semistructureddatainstances,whicharenotcreatedasinstancesofpredened schema.databasesystemforcollectedhtmldocumentsisagoodapplicationof ourwork.htmldocumentshavesignicantlylessstructurethantheexamples inthispaperanditismorediculttoextracttheattribute-valuepairsneeded Ourapproachissuitablefortheapplicationswherewecollectandmanage toconstructtheschema.

10 References 1.Abiteboul,S.,Cluet,S.,Milo,T.:CorrespondenceandTranslationforHeterogeneousData.Proceedingsofthe'97ICDT,Delphi,Greece(1997)352{363 3.Buneman,P.,Davidson,S.,Hillerbrand,G.,Suciu,D.:AQueryLanguageand 2.Banerjee,J.,Kim,W.,Kim,H.,Korth,H.:SemanticsandImplementationof MOD,SanFrancisco,CA(1987)311{322 SchemaEvolutioninObject-OrientedDatabases.Proceedingsofthe'87ACMSIG- 4.Bowman,C.,Danzig,P.,Manber,U.,Schwartz,M.:ScalableInternetResourceDiscovery:ResearchProblemsandApproaches.CommunicationsoftheACM.37(8) MOD,Montreal,Canada(1996)505{516 OptimizationTechniquesforUnstructuredData.Proceedingsofthe'96ACMSIG- (1994)98{107 5.Bowman,C.,Danzig,P.,Hardy,D.,Manber,U.,Schwartz,M.:TheHarvestInformationDiscoveryandAccessSystem.ProceedingsoftheSecondInternational 7.Chawathe,S.,Garcia-Molina,H.,Hammer,J.,Ireland,K.:TheTSIMMISProject: 6.Tsichritzis,D.,ed.:ObjectManagement.CentreUniversitaired'Informatique,UniversityofGeneva(1990) WorldWideWebConference,Chicago,Illinois(1994)763{771 IntegrationofHeterogeneousInformationSources.ProceedingsofIPSJConference, 8.Bancilhon,F.,Delobel,C.,Kanellakis,P.eds.:BuildinganObject-OrientSystem: 9.Nestorov,S.,Abiteboul,S.,Motwani,R.:InferringStructureinSemistructured Tokyo,Japan(1994) TheStoryofO2.MorganKaufmann,SanMateo,CA(1992) Data.ProceedingsoftheWorkshopfortheManagementofSemistructuredData 11.Papakonstantinou,Y.,Garcia-Molina,H.,Widom,J.:ObjectExchangeAcross 10.Ossher,H.,Kaplan,M.,Harrison,W.,Katz,A.,Kruskal,V.:Subject-Oriented CompositionRules.ProceedingsoftheOOPSLA'95,Austin,Texas(1995)235{ 250 (inconjunctionwith'97acmpods/sigmod),tucson,arizona(1997)42{48 12.Quass,D.,Rajaraman,A.,Ullman,J.,Widom,J.:QueryingSemistructuredHeterogeneousInformation.Proceedingsof4thInternationalConferenceonDeductive SelectivelyLabeledOrderedTrees.ProceedingsoftheWorkshopfortheManage- andobject-orienteddatabases,singapore(1995)319{344 mentofsemistructureddata(inconjunctionwith'97acmpods/sigmod), Tucson,Arizona(1997)54{59 ConferenceonDataEngineering,Taipei,Taiwan(1995)251{260 HeterogeneousInformationSources.Proceedingsofthe11thIEEEInternational 13.Seo,D.,Lee,D.,Lee,K.,Lee,J.:DiscoveryofSchemaInformationfromaForestof ThisarticlewasprocessedusingtheLATEXmacropackagewithLLNCSstyle

SMART Solutions for Active Directory Migrations

SMART Solutions for Active Directory Migrations SMART Solutions for Active Directory Migrations Challenges of Active Directory Migrations Types of Active Directory Migrations Intra- Forest Migration between Domains in the Same Forest Separate a Forest

More information

Developing Microsoft SQL Server Databases 20464C; 5 Days

Developing Microsoft SQL Server Databases 20464C; 5 Days Developing Microsoft SQL Server Databases 20464C; 5 Days Lincoln Land Community College Capital City Training Center 130 West Mason Springfield, IL 62702 217-782-7436 www.llcc.edu/cctc Course Description

More information

Administering a SQL Database Infrastructure (MS- 20764)

Administering a SQL Database Infrastructure (MS- 20764) Administering a SQL Database Infrastructure (MS- 20764) Length: 5 days Overview About this course This five-day instructor-led course provides students who administer and maintain SQL Server databases

More information

Database Dictionary. Provided by GeekGirls.com

Database Dictionary. Provided by GeekGirls.com Database Dictionary Provided by GeekGirls.com http://www.geekgirls.com/database_dictionary.htm database: A collection of related information stored in a structured format. Database is sometimes used interchangeably

More information

20464C: Developing Microsoft SQL Server Databases

20464C: Developing Microsoft SQL Server Databases 20464C: Developing Microsoft SQL Server Databases Course Details Course Code: Duration: Notes: 20464C 5 days This course syllabus should be used to determine whether the course is appropriate for the students,

More information

Designing a Microsoft SQL Server 2005 Infrastructure

Designing a Microsoft SQL Server 2005 Infrastructure Course Outline Other Information MS 2786 Days 2 Starting Time 9:00 Finish Time 4:30 Lunch & refreshments are included with this course. Designing a Microsoft SQL Server 2005 Infrastructure Introduction

More information

Developing Microsoft SQL Server Databases MOC 20464

Developing Microsoft SQL Server Databases MOC 20464 Developing Microsoft SQL Server Databases MOC 20464 Course Outline Module 1: Introduction to Database Development This module introduces database development and the key tasks that a database developer

More information

Developing Microsoft SQL Server Databases

Developing Microsoft SQL Server Databases CÔNG TY CỔ PHẦN TRƯỜNG CNTT TÂN ĐỨC TAN DUC INFORMATION TECHNOLOGY SCHOOL JSC LEARN MORE WITH LESS! Course 20464C: Developing Microsoft SQL Server Databases Length: 5 Days Audience: IT Professionals Level:

More information

right, left, back total air quality in exhaust air channel (ventilator free flow m_/h

right, left, back total air quality in exhaust air channel (ventilator free flow m_/h Miami Sun Technical Data Bodyline model: MS 4500.6 27 x 160 W 18 x 160 W 6 x 400 W total power input in watt (with air conditioning) 11800 (13050) CEE 20 A weight in kg gross / net 400/360 model: MS 450.6

More information

Course 20464: Developing Microsoft SQL Server Databases

Course 20464: Developing Microsoft SQL Server Databases Course 20464: Developing Microsoft SQL Server Databases Type:Course Audience(s):IT Professionals Technology:Microsoft SQL Server Level:300 This Revision:C Delivery method: Instructor-led (classroom) Length:5

More information

RUBA: Real-time Unstructured Big Data Analysis Framework

RUBA: Real-time Unstructured Big Data Analysis Framework RUBA: Real-time Unstructured Big Data Analysis Framework Jaein Kim, Nacwoo Kim, Byungtak Lee IT Management Device Research Section Honam Research Center, ETRI Gwangju, Republic of Korea jaein, nwkim, [email protected]

More information

Developing Microsoft SQL Server Databases (20464) H8N64S

Developing Microsoft SQL Server Databases (20464) H8N64S HP Education Services course data sheet Developing Microsoft SQL Server Databases (20464) H8N64S Course Overview In this course, you will be introduced to SQL Server, logical table design, indexing, query

More information

Customer Training Catalog Training Programs CN OSS

Customer Training Catalog Training Programs CN OSS Customer Training Catalog Training Programs Customer Training Catalog Training Programs CN OSS HUAWEI Learning Service 2015 COMMERCIAL IN CONFIDENCE 1 CONTENTS Customer Training Catalog Training Programs

More information

Implementing a Data Warehouse with Microsoft SQL Server 2012

Implementing a Data Warehouse with Microsoft SQL Server 2012 Implementing a Data Warehouse with Microsoft SQL Server 2012 Module 1: Introduction to Data Warehousing Describe data warehouse concepts and architecture considerations Considerations for a Data Warehouse

More information

A National Online Essay Evaluation Service: Valid, Reliable, & Cost Effective Writing Assessment

A National Online Essay Evaluation Service: Valid, Reliable, & Cost Effective Writing Assessment A National Online Essay Evaluation Service: Valid, Reliable, & Cost Effective Writing Assessment Leslie C. Perelman Program in Writing and Humanistic Studies Massachusetts Institute

More information

MS 20462 Administering Microsoft SQL Server Databases

MS 20462 Administering Microsoft SQL Server Databases MS 20462 Administering Microsoft SQL Server Databases Description: Days: 5 Prerequisites: This five-day instructor-led course provides students with the knowledge and skills to maintain a Microsoft SQL

More information

Course 20464C: Developing Microsoft SQL Server Databases

Course 20464C: Developing Microsoft SQL Server Databases Course 20464C: Developing Microsoft SQL Server Databases Module 1: Introduction to Database DevelopmentThis module introduces database development and the key tasks that a database developer would typically

More information

MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012 MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Description: This five-day instructor-led course teaches students how to design and implement a BI infrastructure. The

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

So You Want an SOA: Best Practices for Migrating to SOA in the Enterprise. Eric Newcomer, CTO

So You Want an SOA: Best Practices for Migrating to SOA in the Enterprise. Eric Newcomer, CTO So You Want an SOA: Best Practices for Migrating to SOA in the Enterprise Eric Newcomer, CTO Overview First of all: concepts and definitions Change your thinking about your IT environment Including organization

More information

How To Improve Performance In A Database

How To Improve Performance In A Database Some issues on Conceptual Modeling and NoSQL/Big Data Tok Wang Ling National University of Singapore 1 Database Models File system - field, record, fixed length record Hierarchical Model (IMS) - fixed

More information

A Service for Data-Intensive Computations on Virtual Clusters

A Service for Data-Intensive Computations on Virtual Clusters A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King [email protected] Planets Project Permanent

More information

A Multidatabase System as 4-Tiered Client-Server Distributed Heterogeneous Database System

A Multidatabase System as 4-Tiered Client-Server Distributed Heterogeneous Database System A Multidatabase System as 4-Tiered Client-Server Distributed Heterogeneous Database System Mohammad Ghulam Ali Academic Post Graduate Studies and Research Indian Institute of Technology, Kharagpur Kharagpur,

More information

Programa de Actualización Profesional ACTI Oracle Database 11g: SQL Tuning Workshop

Programa de Actualización Profesional ACTI Oracle Database 11g: SQL Tuning Workshop Programa de Actualización Profesional ACTI Oracle Database 11g: SQL Tuning Workshop What you will learn This Oracle Database 11g SQL Tuning Workshop training is a DBA-centric course that teaches you how

More information

Reading the Degree Evaluation

Reading the Degree Evaluation Reading the Degree Evaluation 1 Reading the Degree Evaluation The new degree evaluation can be very helpful but there are a few things to keep in mind when using this tool. 1. This is a guide only it is

More information

Effective Team Development Using Microsoft Visual Studio Team System

Effective Team Development Using Microsoft Visual Studio Team System Effective Team Development Using Microsoft Visual Studio Team System Course 6214A: Three days; Instructor-Led Introduction This three-day instructor-led course provides students with the knowledge and

More information

MS 20465: Designing Database Solutions for Microsoft SQL Server 2012

MS 20465: Designing Database Solutions for Microsoft SQL Server 2012 MS 20465: Designing Database Solutions for Microsoft SQL Server 2012 Description: This course describes how to design and monitor high performance, highly available data solutions with SQL Server 2012.

More information

Basic knowledge of the Microsoft Windows operating system and its core functionality Working knowledge of Transact-SQL and relational databases

Basic knowledge of the Microsoft Windows operating system and its core functionality Working knowledge of Transact-SQL and relational databases M20462 Administering Microsoft SQL Server Databases Description: This five-day instructor-led course provides students with the knowledge and skills to maintain a Microsoft SQL Server 2014 database. The

More information

A Platform as a Service for Smart Home

A Platform as a Service for Smart Home A Platform as a Service for Smart Home Boyun Eom, Choonhwa Lee, Changwoo Yoon, Hyunwoo Lee, and Won Ryu Abstract Owing to the convergence of home network, smart home technologies have been developing rapidly.

More information

DATABASE REVERSE ENGINEERING

DATABASE REVERSE ENGINEERING DATABASE REVERSE ENGINEERING DBTech_EXT Workshop in Thessaloniki 2009-09-10 Kari Silpiö HAAGA-HELIA University of Applied Sciences Database Reverse Engineering 2 OUTLINE What is Database Reverse Engineering?

More information

Security Test s i t ng Eileen Donlon CMSC 737 Spring 2008

Security Test s i t ng Eileen Donlon CMSC 737 Spring 2008 Security Testing Eileen Donlon CMSC 737 Spring 2008 Testing for Security Functional tests Testing that role based security functions correctly Vulnerability scanning and penetration tests Testing whether

More information

Implementing a Data Warehouse with Microsoft SQL Server MOC 20463

Implementing a Data Warehouse with Microsoft SQL Server MOC 20463 Implementing a Data Warehouse with Microsoft SQL Server MOC 20463 Course Outline Module 1: Introduction to Data Warehousing This module provides an introduction to the key components of a data warehousing

More information

COURSE OUTLINE MOC 20463: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

COURSE OUTLINE MOC 20463: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER COURSE OUTLINE MOC 20463: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER MODULE 1: INTRODUCTION TO DATA WAREHOUSING This module provides an introduction to the key components of a data warehousing

More information

How To Make A Network Address Prefix Smaller

How To Make A Network Address Prefix Smaller CSC521 Communication Protocols 網 路 通 訊 協 定 Ch.9 Classless And Subnet Address Extensions (CIDR) 吳 俊 興 國 立 高 雄 大 學 資 訊 工 程 學 系 Outline 1. Introduction 2. Review Of Relevant Facts 3. Minimizing Network Numbers

More information

MS-50401 - Designing and Optimizing Database Solutions with Microsoft SQL Server 2008

MS-50401 - Designing and Optimizing Database Solutions with Microsoft SQL Server 2008 MS-50401 - Designing and Optimizing Database Solutions with Microsoft SQL Server 2008 Table of Contents Introduction Audience At Completion Prerequisites Microsoft Certified Professional Exams Student

More information

6.830 Lecture 3 9.16.2015 PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization

6.830 Lecture 3 9.16.2015 PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization 6.830 Lecture 3 9.16.2015 PS1 Due Next Time (Tuesday!) Lab 1 Out today start early! Relational Model Continued, and Schema Design and Normalization Animals(name,age,species,cageno,keptby,feedtime) Keeper(id,name)

More information

HANDLING IMPRECISION IN QUALITATIVE DATA WAREHOUSE: URBAN BUILDING SITES ANNOYANCE ANALYSIS USE CASE

HANDLING IMPRECISION IN QUALITATIVE DATA WAREHOUSE: URBAN BUILDING SITES ANNOYANCE ANALYSIS USE CASE International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-2/W1, 213 8th International Symposium on Spatial Data Quality, 3 May - 1 June 213, Hong Kong HANDLING

More information

Microsoft SQL Server: MS-10980 Performance Tuning and Optimization Digital

Microsoft SQL Server: MS-10980 Performance Tuning and Optimization Digital coursemonster.com/us Microsoft SQL Server: MS-10980 Performance Tuning and Optimization Digital View training dates» Overview This course is designed to give the right amount of Internals knowledge and

More information

Lesson 4 Web Service Interface Definition (Part I)

Lesson 4 Web Service Interface Definition (Part I) Lesson 4 Web Service Interface Definition (Part I) Service Oriented Architectures Module 1 - Basic technologies Unit 3 WSDL Ernesto Damiani Università di Milano Interface Definition Languages (1) IDLs

More information

East Asia Network Sdn Bhd

East Asia Network Sdn Bhd Course: Analyzing, Designing, and Implementing a Data Warehouse with Microsoft SQL Server 2014 Elements of this syllabus may be change to cater to the participants background & knowledge. This course describes

More information

Thepurposeofahospitalinformationsystem(HIS)istomanagetheinformationthathealth

Thepurposeofahospitalinformationsystem(HIS)istomanagetheinformationthathealth FederatedDatabaseSystemsforReplicatingInformationin UniversityofDortmund,DepartmentofComputerScience,Informatik10 ExtendingtheSchemaArchitectureof E-mail:[email protected] HospitalInformationSystems

More information

Course Outline. Module 1: Introduction to Data Warehousing

Course Outline. Module 1: Introduction to Data Warehousing Course Outline Module 1: Introduction to Data Warehousing This module provides an introduction to the key components of a data warehousing solution and the highlevel considerations you must take into account

More information

Implementing a Data Warehouse with Microsoft SQL Server

Implementing a Data Warehouse with Microsoft SQL Server This course describes how to implement a data warehouse platform to support a BI solution. Students will learn how to create a data warehouse 2014, implement ETL with SQL Server Integration Services, and

More information

International Cyber University for Health

International Cyber University for Health International Cyber University for Health I. Organization Mission The ICUH seeks to meet the health information needs of the public in key areas of relevancy and urgency, to upgrade the competence of professionals

More information

Undergraduate Degree Map for Completion in Four Years

Undergraduate Degree Map for Completion in Four Years Page 1 of 5 Undergraduate Degree Map for Completion in Four Years College: College of Arts and Humanities Department: English Name of Program: TECHNICAL COMMUNICATION BS Option Degree Designation: BS Emphasis/Concentration:

More information

Condusiv s V-locity Server Boosts Performance of SQL Server 2012 by 55%

Condusiv s V-locity Server Boosts Performance of SQL Server 2012 by 55% openbench Labs Executive Briefing: April 19, 2013 Condusiv s Server Boosts Performance of SQL Server 2012 by 55% Optimizing I/O for Increased Throughput and Reduced Latency on Physical Servers 01 Executive

More information

Functional Dependencies and Normalization

Functional Dependencies and Normalization Functional Dependencies and Normalization 5DV119 Introduction to Database Management Umeå University Department of Computing Science Stephen J. Hegner [email protected] http://www.cs.umu.se/~hegner Functional

More information

Administering a SQL Database Infrastructure

Administering a SQL Database Infrastructure Administering a SQL Database Infrastructure 20764A 5 Days Instructor-led, Hands on Course Information This five-day instructor-led course provides students who administer and maintain SQL Server databases

More information

Administering a SQL Database Infrastructure 20764; 5 Days; Instructor-led

Administering a SQL Database Infrastructure 20764; 5 Days; Instructor-led Administering a SQL Database Infrastructure 20764; 5 Days; Instructor-led Course Description This five-day instructor-led course provides students who administer and maintain SQL Server databases with

More information

Technical Data Sheet: imc SEARCH 3.1. Topology

Technical Data Sheet: imc SEARCH 3.1. Topology : imc SEARCH 3.1 Database application for structured storage and administration of measurement data: Measurement data (measurement values, measurement series, combined data from multiple measurement channels)

More information

Chapter 10 Practical Database Design Methodology and Use of UML Diagrams

Chapter 10 Practical Database Design Methodology and Use of UML Diagrams Chapter 10 Practical Database Design Methodology and Use of UML Diagrams Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 10 Outline The Role of Information Systems in

More information

Implementing a Data Warehouse with Microsoft SQL Server 2012 (70-463)

Implementing a Data Warehouse with Microsoft SQL Server 2012 (70-463) Implementing a Data Warehouse with Microsoft SQL Server 2012 (70-463) Course Description Data warehousing is a solution organizations use to centralize business data for reporting and analysis. This five-day

More information

Database Design and Programming

Database Design and Programming Database Design and Programming Peter Schneider-Kamp DM 505, Spring 2012, 3 rd Quarter 1 Course Organisation Literature Database Systems: The Complete Book Evaluation Project and 1-day take-home exam,

More information

Implementing a Data Warehouse with Microsoft SQL Server

Implementing a Data Warehouse with Microsoft SQL Server Page 1 of 7 Overview This course describes how to implement a data warehouse platform to support a BI solution. Students will learn how to create a data warehouse with Microsoft SQL 2014, implement ETL

More information

Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning

Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning Course Outline: Course: Implementing a Data with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning Duration: 5.00 Day(s)/ 40 hrs Overview: This 5-day instructor-led course describes

More information

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER Page 1 of 8 ABOUT THIS COURSE This 5 day course describes how to implement a data warehouse platform to support a BI solution. Students will learn how to create a data warehouse with Microsoft SQL Server

More information

MCC AAS: Information Technology Programming to WSU BA in Computer Science

MCC AAS: Information Technology Programming to WSU BA in Computer Science Articulation Agreement Between Wayne State University and Macomb Community college Linking MCC s AAS in Information Technology - Programming With Wayne State s Bachelor of Science or Bachelor of Arts in

More information

Chapter 10. Practical Database Design Methodology. The Role of Information Systems in Organizations. Practical Database Design Methodology

Chapter 10. Practical Database Design Methodology. The Role of Information Systems in Organizations. Practical Database Design Methodology Chapter 10 Practical Database Design Methodology Practical Database Design Methodology Design methodology Target database managed by some type of database management system Various design methodologies

More information

Seeking Data Quality. Using Agile Methods to Test a Data Warehouse

Seeking Data Quality. Using Agile Methods to Test a Data Warehouse Seeking Data Quality Using Agile Methods to Test a Data Warehouse Copyright Ideaca 2008 Agenda Seeking Data Quality Data Warehouse Overview The Value of a Data Warehouse Agile as Business Value Driver

More information

Data Integration with Talend Open Studio Robert A. Nisbet, Ph.D.

Data Integration with Talend Open Studio Robert A. Nisbet, Ph.D. Data Integration with Talend Open Studio Robert A. Nisbet, Ph.D. Most college courses in statistical analysis and data mining are focus on the mathematical techniques for analyzing data structures, rather

More information

FINANCIAL REPORTING WITH BUSINESS ANALYTICS

FINANCIAL REPORTING WITH BUSINESS ANALYTICS www.ifsworld.com FINANCIAL REPORTING WITH BUSINESS ANALYTICS LEIF JOHANSSON BUSINESS SOLUTIONS CONSULTANT BILL NOBLE IMPLEMENTATION MANAGER 2009 IFS AGENDA FINANCIAL REPORTING WITH BA Architecture Business

More information

Announcements. SQL is hot! Facebook. Goal. Database Design Process. IT420: Database Management and Organization. Normalization (Chapter 3)

Announcements. SQL is hot! Facebook. Goal. Database Design Process. IT420: Database Management and Organization. Normalization (Chapter 3) Announcements IT0: Database Management and Organization Normalization (Chapter 3) Department coin design contest deadline - February -week exam Monday, February 1 Lab SQL SQL Server: ALTER TABLE tname

More information

CATALOG ADDENDUM: 2013 CATALOG WITH EFFECTIVE DATE OF JANUARY 1, 2013- DECEMBER 31, 2013

CATALOG ADDENDUM: 2013 CATALOG WITH EFFECTIVE DATE OF JANUARY 1, 2013- DECEMBER 31, 2013 CATALOG ADDENDUM: 2013 CATALOG WITH EFFECTIVE DATE OF JANUARY 1, 2013- DECEMBER 31, 2013 The 2013 General Catalog contains The Los Angeles Film School official degree and program requirements, as well

More information

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777 Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777 Course Outline Module 1: Introduction to Data Warehousing This module provides an introduction to the key components of a data warehousing

More information

Microsoft SQL Server 2014: MS-20462 SQL Server Administering Databases

Microsoft SQL Server 2014: MS-20462 SQL Server Administering Databases coursemonster.com/uk Microsoft SQL Server 2014: MS-20462 SQL Server Administering Databases View training dates» Overview This five-day instructor-led course provides students with the knowledge and skills

More information

Education and Research of Science and Engineering in Korea

Education and Research of Science and Engineering in Korea Education and Research of Science and Engineering in Korea Prof. Dr. Sooyoung Chang Pohang University of Science and Technology (Postech) 1. Introduction History of science and engineering education in

More information

DATABASE NORMALIZATION

DATABASE NORMALIZATION DATABASE NORMALIZATION Normalization: process of efficiently organizing data in the DB. RELATIONS (attributes grouped together) Accurate representation of data, relationships and constraints. Goal: - Eliminate

More information

Implementing a Microsoft SQL Server 2008 Database

Implementing a Microsoft SQL Server 2008 Database Implementing a Microsoft SQL Server 2008 Database MOC6232 About this Course Elements of this syllabus are subject to change. This five-day instructor-led course provides students with the knowledge and

More information

Server-side Development using Python and SQL

Server-side Development using Python and SQL Lab 2 Server-side Development using Python and SQL Authors: Sahand Sadjadee Alexander Kazen Gustav Bylund Per Jonsson Tobias Jansson Spring 2015 TDDD97 Web Programming http://www.ida.liu.se/~tddd97/ Department

More information

SQM. Maintaining Microsoft SQL for Broadcast Engineers. Training Course Outline

SQM. Maintaining Microsoft SQL for Broadcast Engineers. Training Course Outline SQM Maintaining Microsoft SQL for Broadcast Engineers Training Course Outline 2015 Marcangelo Limited Copyright 2015 www.marcangelo.co.uk SQM : Duration: 2 Days Maintaining Microsoft SQL for Broadcast

More information

Concepts of Database Management Seventh Edition. Chapter 6 Database Design 2: Design Method

Concepts of Database Management Seventh Edition. Chapter 6 Database Design 2: Design Method Concepts of Database Management Seventh Edition Chapter 6 Database Design 2: Design Method Objectives Discuss the general process and goals of database design Define user views and explain their function

More information

Conceptual Schema Approach to Natural Language Database Access

Conceptual Schema Approach to Natural Language Database Access Conceptual Schema Approach to Natural Language Database Access In-Su Kang, Seung-Hoon Na, Jong-Hyeok Lee Div. of Electrical and Computer Engineering Pohang University of Science and Technology (POSTECH)

More information

Information Systems Analysis and Design CSC340. 2004 John Mylopoulos Database Design -- 2. Information Systems Analysis and Design CSC340

Information Systems Analysis and Design CSC340. 2004 John Mylopoulos Database Design -- 2. Information Systems Analysis and Design CSC340 XX. Database Design Databases Databases and DBMS Data Models, Hierarchical, Network, Relational Database Design Restructuring an ER schema Performance analysis Analysis of Redundancies, Removing generalizations

More information

DATABASE SYSTEMS. Chapter 7 Normalisation

DATABASE SYSTEMS. Chapter 7 Normalisation DATABASE SYSTEMS DESIGN IMPLEMENTATION AND MANAGEMENT INTERNATIONAL EDITION ROB CORONEL CROCKETT Chapter 7 Normalisation 1 (Rob, Coronel & Crockett 978184480731) In this chapter, you will learn: What normalization

More information

Infrastructures for big data

Infrastructures for big data Infrastructures for big data Rasmus Pagh 1 Today s lecture Three technologies for handling big data: MapReduce (Hadoop) BigTable (and descendants) Data stream algorithms Alternatives to (some uses of)

More information

Implementing a Data Warehouse with Microsoft SQL Server

Implementing a Data Warehouse with Microsoft SQL Server Course Code: M20463 Vendor: Microsoft Course Overview Duration: 5 RRP: 2,025 Implementing a Data Warehouse with Microsoft SQL Server Overview This course describes how to implement a data warehouse platform

More information

Master of Urban and Regional Planning And Juris Doctorate Law Degree (4.10.06)

Master of Urban and Regional Planning And Juris Doctorate Law Degree (4.10.06) DUAL DEGREE Master of Urban and Regional Planning And Juris Doctorate Law Degree (4.10.06) The Degrees: The School of Law (Law School) and the College of Architecture and Planning (Architecture and Planning)

More information

Administrating Microsoft SQL Server 2012 Databases

Administrating Microsoft SQL Server 2012 Databases MS10775 Längd: 5 dagar Administrating Microsoft SQL Server 2012 Databases OBS!! Från hösten 2014 ersätts denna kurs av motsvarande nya kurs MS20462 Administering Microsoft SQL Server Databases This five-day

More information

Data Integration and Exchange. L. Libkin 1 Data Integration and Exchange

Data Integration and Exchange. L. Libkin 1 Data Integration and Exchange Data Integration and Exchange L. Libkin 1 Data Integration and Exchange Traditional approach to databases A single large repository of data. Database administrator in charge of access to data. Users interact

More information

Improving database development. Recommendations for solving development problems using Red Gate tools

Improving database development. Recommendations for solving development problems using Red Gate tools Improving database development Recommendations for solving development problems using Red Gate tools Introduction At Red Gate, we believe in creating simple, usable tools that address the problems of software

More information

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days Lincoln Land Community College Capital City Training Center 130 West Mason Springfield, IL 62702 217-782-7436 www.llcc.edu/cctc Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days Course

More information