Alargenumberofapplications(e.g.,callroutingandswitchingintelecommunic-
|
|
- Cathleen Short
- 8 years ago
- Views:
Transcription
1 RajeevRastogiPhilipBohannonJamesParker DistributedMulti-LevelRecoveryin S.SeshadriyAviSilberschatzS.Sudarshany yindianinstituteoftechnology,bombay,india Main-MemoryDatabases BellLaboratories,MurrayHill,NJ ofupdates.theschemesoerdierenttradeos,basedonfactorssuchas onebasedonpageshipping,andtheotherbasedonbroadcastingofthelog presentarecoveryschemeforclient-serverarchitectures,basedonshippinglog updaterates. recordstotheserver,andtworecoveryschemesforshared-diskarchitectures databases,specicallyforclient-serverandshared-diskarchitectures.we Inthispaperwepresentrecoverytechniquesfordistributedmain-memory Ourtechniquesareextensionstoadistributed-memorysettingofacent- Abstract thesystemlog.further,thetechniquesuseafuzzycheckpointingscheme cessing,anduseper-transactionredoandundologstoreducecontentionon mentedinthedalmain-memorydatabasesystem.ourcentralizedaswell ralizedrecoveryschemeformain-memorydatabases,whichhasbeenimple- schemesalsosupportconcurrentupdatestothesamepageatdierentsites. thatwritesonlydirtypagestodisk,yetminimallyinterfereswithnormal asdistributed-memoryrecoveryschemeshaveseveralattractivefeatures reducediski/obywritingonlyredologrecordstodiskduringnormalpro- evenacquirealatchbeforeupdatingapage.ourlogshipping/broadcasting processing allbutoneofourrecoveryschemesdonotrequireupdatersto theysupportanexplicitmulti-levelrecoveryabstractionforhighconcurrency, ytheworkoftheseauthorswasperformedinpartwhiletheywereatbelllabs. 0
2 thatisdisk-resident.anattractiveapproachtoprovidingapplicationswithlow (andpredictable)responsetimesistoloadtheentiredatabaseintomain-memory. thehighperformanceneedsofsuchapplicationsduetothelatencyofaccessingdata ofmilliseconds.traditionaldisk-baseddatabasesystemsareincapableofmeeting ations,nancialapplications,automationcontrol)requirehighperformanceaccess todatawithresponsetimerequirementsoftheorderofafewmillisecondstotens 1Introduction Databasesforsuchapplicationsareoftenoftheorderoftensorhundredsofmegabytes,whichcaneasilybesupportedinmain-memory.Further,machineswithmain Alargenumberofapplications(e.g.,callroutingandswitchingintelecommunic- ofram,machineswithsuchlargemainmemorieswillbecomecheaperandmore alargebuer-cachetoatraditionaldisk-basedsystem.incontrast,inamainmemorydatabasesystem(mmdb)(see,e.g.,[gms92,lsc92,jlr+94,dko+84]), memorypointers,orindirectlyvialocationindependentdatabaseosetsthatcan pages.also,objectslargerthanthesystem'spagesizecanbestoredcontiguously, interactwithabuermanager,eitherforlocatingdata,orforfetching/pinningbuer processandlockedinmemory.datacanbeaccessedeitherdirectlybyvirtual bequicklytranslatedtomemoryaddresses.duringdataaccess,thereisnoneedto Oneapproachforimplementingsuchhighperformancedatabasesistoprovide memoriesof8gigabytesormorearealreadyavailable,andwiththefallingprice common. theentiredatabasecanbedirectlymappedintothevirtualaddressspaceofthe therebysimplifyingretrievalorin-placeuse.thus,dataaccessusingamain-memory databaseisveryfastcomparedtousingdisk-basedstoragemanagers,evenwhen furtherperformanceimprovementsforanumberofapplications.forexample,considerapplicationsinwhichtransactionsarepredominantlyread-onlyandupdate ratesarelow(e.g.,numbertranslationandcallroutingintelecommunications). Eachmachinecanlocallyaccessdatacachedinmemory,thusavoidingnetworkcommunicationwhichcouldbefairlyexpensive.AnotherexampleisComputerAided arelong,andinteractiveresponsetimeisveryimportant. Designapplications,wherelocalityofreferenceisveryhigh,updatetransactions Distributedarchitecturesinwhichseveralmachinesareconnectedbyafastnet- thedisk-basedmanagerhassucientmemorytocachealldatapages. work,andperformdatabaseaccessesandupdatesinparallel,providesignicant toahot-sparesincetheloadcanbedistributedinthenon-failurecaseleadingto Inthiscase,especiallywithlowupdaterates,adistributeddatabaseispreferable criticalapplications,evenifdatatseasilyinasinglemachine'smain-memory. improvedperformance. isbasedonthemain-memoryrecoveryschemepresentedin[jss93].therecovery whichundologrecordsarekeptinmemoryandonlywrittentodiskifrequiredfor schemeof[jss93]providesimportantfeaturessuchastransientundologgingin Distributionalsoenhancesfaulttolerance,whichisrequiredinmanymission- TherecoveryschemeusedintheDalmain-memorydatabasesystem[JLR+94] 1
3 checkpointing,per-transactionlogsinmemorytoreducecontentiononthesystem logtail,andrecoveryusingonlyasinglepassoverthesystemlog.therecovery tothedistributedmemorycase,simultaneouslymaintainingtheadvantagesofthe ([WHBM90,MHL+92,Lom92]),andfuzzycheckpointing[SGM90a,Hag86]. single-sitescheme,andecientlysupportingtheapplicationsdescribedabove.for schemeusedindalprovidesseveralfurtherextensions,suchasmulti-levelrecovery example,wecanmakeuseoftransientundologgingtoreducethesizeofthelog protocols. writtentodisk,aswellasthesizeofthelogsentacrossnetworklinksindistributed ThegoaloftheworkdescribedherewastoextendtheDalrecoveryscheme tionexecutesatasinglesite,fetchingdata(pages)asrequiredfromothersites. Distributedcommitprotocolsarenotneededasin\function-shipping"environ- client-serverarchitectures,andthesecondandthirdforshareddiskarchitectures. Theseareall\data-shipping"schemes(see,e.g.,[FZT+92])inwhichatransacments.Whileshareddiskarchitectureshavetraditionallybeencloselytiedtohardwareplatforms(e.g.,VAXCluster),UNIX-basedshareddiskplatformsandnetwork ofworkstationarchitectureswithsimilarperformancecharacteristicsarebecoming morecommon. Wepresentthreedistinctbutrelateddistributedrecoveryschemes{therstfor isthatconcurrentupdatesarepossibleatgranularitiessmallerthanapage-size. Therebyminimizing\false-sharing"(thatis,apparentconictsduetocoarse-granularitylocking)andconsequently,needlessnetworkaccessestoresolvefalsesharing. recoveryalgorithms,suchastransientundologging,explicitmulti-levelrecovery, Ourdistributedrecoveryalgorithmsprovidetheadvancedfeaturesofourcentralized andfuzzycheckpointing.siteorglobalrecoveryrequiresonlyasinglepassover checkpoint. thesystemlog,startingfromtheendofthesystemlogrecordedinthemostrecent Akeypropertyoftheclient-serverschemeandoneoftheshareddiskschemes thepaper. 2OverviewofMain-MemoryRecovery Sections6and7presentourshareddiskrecoveryalgorithms.Section8concludes multi-levelrecoveryandthesingle-sitealgorithmonwhichthepresentworkisbased recoveryalgorithminsection4.section5describesourshareddiskmodel,while insection2.relatedworkispresentedinsection3.wepresentourclient-server Theremainderofthepaperisorganizedasfollows.Wepresentbackgroundon (S)modesthatguardupdatesandaccessestotheregion,respectively. singleassociatedlock,referredtoastheregionlock,withexclusive(x)andshared Inthissectionwepresentareviewofmulti-levelrecoveryconceptsandanoverview anobject,oranarbitrarydatastructurelikealistoratree.eachregionhasa detailsofourschemearedescribedin[bpr+96]. ofthesingle-sitemain-memoryrecoveryschemeusedinthedalsystem.low-level Inourscheme,dataislogicallyorganizedintoregions.Aregioncanbeatuple, 2
4 Figure1:OverviewofRecoveryStructures Redo Log Database Undo Log Dirty Page Table Trans. Local Logs Active Trans. Table End Stable Log System Log Tail In Main Memory logginghasbeendonephysically(e.g.recordingexactlywhichbytesweremodied Stable System Log On Disk End toinsertakeyintotheindex)thenthetransactionmanagementsystemmustensure 2.1Multi-LevelRecovery Stable cur_ckpt Database thatthesephysicalundodescriptionsarevaliduntiltransactioncommit.sincethe Ckpt A ckpt_dpt descriptionsrefertobytechangesatspecicpositions,thistypicallyimpliesthat untiltransactioncommitleadstounacceptablylowlevelsofconcurrency.ifundo Multi-levelrecovery[WHBM90,MHL+92,Lom92]providesrecoverysupportfor theuseofweakeroperationlocksinplaceofstrongershared/exclusiveregionlocks. enhancedconcurrencybasedonthesemanticsofoperations.specically,itpermits Acommonexampleisindexmanagement,whereholdingphysicalregionlocks Ckpt B Active Trans Table (ATT) theregionlocksontheupdatedindexnodesmustbeheldtilltransactioncommit (undo logs) Oncethisreplacementismade,theregionlocksmaybereleasedandonly(less restrictive)operationlocksareretained.forexample,regionlocksontheparticular index. toensurecorrectrecovery,inadditiontoconsiderationsforconcurrentaccesstothe replacedbyalogicalundorecordindicatingthattheinsertedkeymustbedeleted. theoperationlevel.thus,foraninsertoperation,physicalundorecordswouldbe nodesinvolvedinaninsertcanbereleased,whileanoperationlockonthenewly recordswithhigherlevellogicalundologrecordscontainingundodescriptionsat insertedkeythatpreventsthekeyfrombeingaccessedordeletedisheld. Themulti-levelrecoveryapproachistoreplacetheselow-levelphysicalundolog storedondiskare1)curckpt,an\anchor"pointingtothemostrecentvalidcheck- isinmainmemory,with(two)checkpointimagesckptaandckptbondisk.also 2.2SystemOverview Figure1givesanoverviewofthestructuresusedforrecovery.Thedatabase(a sequenceofxedsizepages)ismappedintotheaddressspaceofeachprocessand 3
5 withitstailinmemory.thevariableendofstablelogstoresapointerintothesystemlogsuchthatallrecordspriortothepointerareknowntohavebeenushedto separateredoandundologsforactivetransactions,inadditiontoinformationabout thestablesystemlog. storedwitheachcheckpointimage.thedirtypagetableinacheckpointimageis referredtoasckptdpt. transactionstatus.adirtypagetable,dpt,ismaintainedinmemorytorecordpages TheATT(withundologs,butwithoutredologs)andthedirtypagetablearealso thathavebeenupdatedsincethelastcheckpoint.forsimplicityofpresentation, weassumethatthedirtypageismaintainedasabitmapwithonebitperpage. Thereisasingleactivetransactiontable(ATT)inmain-memorywhichstores pointimageforthedatabase,and2)asinglesystemlogcontainingredoinformation, associatedwithit.anoperationatlevellicanconsistofasequenceofoperations to[lom92].webrieydescribethemodelbelow.eachoperationhasalevelli Transactions,inourmodel,consistofasequenceofmulti-leveloperations,following 2.3TransactionsandOperations memory,establishingapointintheserializationorder,andcommitwhenthecommit distinguishbetweenpre-commit,whenthecommitrecordentersthesystemlogin Ln?1.PhysicalupdatestoregionsarelevelL0operations.Fortransactions,we atlevelli?1.transactions,assumedtobeatlevelln,calloperationsatlevel commits;similarly,anoperationlockatlevelliishelduntilthetransactionorthe containingoperation(atlevelli+1)commits.allthelocksacquiredbyatransaction byotheractivetransactions.levell0operationsobtainregionlocksinsteadof operationlocks.thelocksontheregionarereleasedoncethel1operationpre- operationlockisgrantediftheoperationcommuteswithotheroperationlocksheld interchangeablysincebothrefertothetimewhenthecommitrecordentersthe recordhitsthestablelog.foroperations,weusethetermscommitandpre-commit systemloginmemory. arereleasedonceitcommits.1 Eachtransactionobtainsanoperationlockbeforeitexecutesanoperation;the Therecoveryalgorithmmaintainsseparatelocalundoandredologsinmemoryfor 2.4LoggingModel mayreaduncommitteddata,andtheircommitmustbedelayeduntilthedirtydatatheyhaveread hasbeencommitted. andredologrecordsthatareappendedtotherespectivelocallog.whenatransaction/operationpre-commits,thecurrentcontentsofthetransaction'slocalredo logareappendedtothesystemlogtailinmemory,andthelogicalundodescription intheatt.eachphysicalupdate(toapartofaregion)generatesphysicalundo eachtransaction.thesearestoredasalinkedlistoanentryforthetransaction 1Itispossibletoreleaselocksforatransactiononpre-commit;asaresultread-onlytransactions 4
6 arewrittentothesystemlogduringnormalprocessing. systemlog.thus,withtheexceptionoflogicalundodescriptors,onlyredorecords fortheoperationisincludedinanoperationcommitlogrecordappendedtothe recordwrittentodisk,pagestouchedbytheupdateonthelogrecordaremarked tions/updatesarereplacedinthetransaction's(local)undologwithalogicalundo dirtyinthedirtypagetable,dpt,bytheushingprocedure.inoursingle-siterecoveryscheme,updateactionsdonotobtainlatchesonpages{insteadregionlocksare obtainedtoensurethatupdatesdonotinterferewitheachother.3eliminatinglatchingsignicantlydecreasesaccesscostsinmain-memory,andreducesprogramming Thesystemlogisushedtodiskwhenatransactioncommits.Foreachredolog logrecordcontainingtheundodescriptionfortheoperation.in-memoryundologs oftransactionsthathavecommittedaredeletedsincetheyarenotrequiredagain.2 Also,whenanoperationpre-commits,theundologrecordsforitssubopera- totheredolog.(ourdistributed-memoryschemes,withtheexceptionofoneofthe assettingofdirtybitsforthepage,arenowperformedbasedonlogrecordswritten complexity.recoveryrelatedactionsthatarenormallytakenonpagelatching,such latchingtoensurecachecoherency,whichisnotaprobleminthesingle-sitecase.) shared-diskschemes,donotobtainpagelatcheseither;thesoleexceptionusespage Theredologisusedasasingleunifyingresourcetocoordinatetheapplication's diersslightlyfromtheterminologyused,forexample,inaries[mhl+92]. interactionwiththerecoverysystem,andthisapproachhasprovenveryuseful. 2.5Ping-pongCheckpointing pointtomeanacopyofthemain-memorydatabasewhichisstoredondisk,andthe termcheckpointingtorefertotheactionofcreatingacheckpoint.thisterminology Consistentwiththeterminologyinmain-memorydatabases,weusethetermcheck- undologsforupdatesonapageareushedtodiskbeforethepageisushedto disk.insuchsystems,toguaranteethewalproperty,typicallyalatchonapage isobtained,alllogrecordspertainingtothepageareushedtostablestoragethe latch,therebypreventingconcurrentupdateswhileapageisbeingushedtodisk. Asaresultofnotobtaininglatchesonpagesduringupdates,itisnotpossibleto beingwrittenout. pageiscopiedtodisk,andthelatchreleased.updatersalsoobtainthesamepage enforcethewrite-aheadloggingpolicy,sincepagesmaybeupdatedevenastheyare Traditionalrecoveryschemesimplementwrite-aheadlogging(WAL),wherebyall thetransactionaborting. ofatupletochange,theninadditiontoaregionlockonthetuple,anxmoderegionlockonthe storageallocationstructuresonthepagemustbeobtained. turesmayneedtobeobtained.forexample,inapagebasedsystem,ifanupdatecausesthesize pointing(see,e.g.,[sgm90b]).inping-pongcheckpointingtwocopiesofthedata- baseimagearestoredondisk,andalternatecheckpointswritedirtypagestoaltern- Instead,ourrecoveryalgorithmmakesuseofastrategycalledping-pongcheck- 2Thelogscanbedeletedonpre-commit,since,shortofasystemcrash,nothingcanresultin 3Incaseswhenregionsizeschange,certainadditionalregionlocksonstorageallocationstruc- 5
7 thatisbeingcreatedtobetemporarilyinconsistent;i.e.,updatesmayhavebeen outtobringthecheckpointtoaconsistentstate.evenifafailureoccurswhile writtenoutwithoutcorrespondingundorecordshavingbeenwritten.however, recovery. atecopies.writingalternatecheckpointstoalternatecopiespermitsacheckpoint creatingonecheckpoint,theothercheckpointisstillconsistentandcanbeusedfor afterwritingoutdirtypages,sucientredoandundologinformationiswritten incompletepagewritesresultingfrom,forexample,powerfailures.incompletepage ingdoesnothaveaveryhighspacepenalty,sincediskspaceismuchcheaper writescausenoproblemswithping-pongcheckpointing,sincethepreviouscheckpointimageisstillavailable.ping-pongcheckpointingalsopermitssomephysical Keepingtwocopiesofamain-memorydatabaseondiskforping-pongcheckpoint- realitytheyarenot,andcomplexschemesareneededtodetectandrecoverfrom Forinstance,althoughmanyrecoveryschemesassumepagewritesareatomic,in andlogicalconsistencycheckstobeperformedonthecheckpointbeforedeclaring thanmain-memory.further,ping-pongcheckpointinghasseveralotherbenets. thatwereeitherdirtyintheckptdptofthelastcompletedcheckpoint,ordirtyin itsuccessfullycompleted. outthatweremodiedsincethecurrentcheckpointimagewaspreviouslywritten, thoseofthedptandthedptiszeroed(notingofendofstablelogandzeroingofdpt usingthischeckpoint.next,thecontentsofthe(in-memory)ckptdptaresetto aredoneatomicallywithrespecttoushing).thepageswrittenoutarethepages Thisisthestartpointforscanningthesystemlogwhenrecoveringfromacrash thecurrent(in-memory)ckptdpt,orinboth.inotherwords,allpagesarewritten stableloginthevariableendofstablelog,whichwillbestoredwiththecheckpoint. Beforewritinganydirtydatatodisk,thecheckpointnotesthecurrentendofthe toensurethatupdatesdescribedbylogrecordsprecedingthecurrentcheckpoint's namely,pagesthatweredirtiedsincethelast-but-onecheckpoint.thisisnecessary endofstableloghavemadeitinthedatabaseimageinthecurrentcheckpoint. interferingwithnormaloperations.thecheckpointimageisthusfuzzy.fuzzy checkpointinghowevercouldresultintwoproblemsforrecovery: Checkpointswriteoutdirtypageswithoutobtaininganylatchesandtherebyavoid Therstproblemissolvedbyourpolicyofalwayswritingphysicalredologrecords. Byapplyingphysicalredologrecords(whoseeectsareidempotent)toacheckpoint pageimagewecanensurethatwecanobtainapageimagethatdoesnotcontain thecheckpointpageimagemaycontainpartialupdatesofanoperation anypartialupdates. theundologrecordforanupdatemaynotbeinthestablesystemlog(which madeittothecheckpointimage,oneofthefollowingholds:1)correspondingphysicalundologrecordsarewrittenouttodiskafterthedatabaseimagehasbeen Thesecondproblemissolvedbyensuringthatforanyupdatewhoseeectshave checkpoint). couldresultinaproblemifthesystemweretocrashimmediatelyafterthe 6
8 writtenor2)allphysicalredologrecordsfortheoperation(correspondingtothe partialupdate)aswellasthelogicalundodescriptorintheoperationcommitlog recordareonstablestorage.thisisperformedbycheckpointingtheattand ushingthelogaftercheckpointingthedata.thecheckpointoftheattwritesout alllogrecordscorrespondingtotheoperation(containingthepartialupdate)aswell removedfromtheattbeforethecheckpointoftheatt,thelogushensuresthat undologrecords,aswellassomeotherstatusinformation.incasetheoperation containingthepartialupdatecompletesandconsequentlytheundologrecordsare astheoperationcommitlogrecordareonstablestorage.thecheckpointisdeclared partofthetransaction. bytraversingtheundologbackwardsfromtheend.transactionabortiscarried outbyexecuting,inreverseorder,everyundorecordjustasiftheexecutionwere dates/operationsdescribedbylogrecordsinthetransaction'sundologareundone Whenatransactionaborts,thatis,doesnotsuccessfullycompleteexecution,up- 2.6AbortProcessing completed(andconsistent)bytogglingcurckpttopointtothenewcheckpoint. whentheproxyoperationcommits,allitsundologrecordsaredeletedalongwith theproxyoperationservesapurposesimilartothatservedbycompensationlogrecords(clrs)inaries{duringrestartrecovery,whenitisencountered,thelogicacordsarecreatedforeachphysicalundorecordencounteredduringtheabort.sim- Followingthephilosophyofrepeatinghistory[MHL+92],newphysicalredologre- thelogicalundorecordfortheoperationthatwasundone.thecommitrecordfor formedbytheoperationaregeneratedasduringnormalprocessing.furthermore, operationisexecutedbasedontheundodescription.logrecordsforupdatesperilarly,foreachlogicalundorecordencountered,anew\compensation"or\proxy" RestartrecoverybeginsbyinitializingtheATTandtransactionundologstothe undolog,thuspreventingitfrombeingundoneagain. undologrecordfortheoperationthatwasundoneisdeletedfromthetransaction's 2.7Recovery beforethedatabaseimageischeckpointed.thisvalueofendofstablelogbecomes andsetsdpttozero.next,recoveryprocessesredologrecords.recallthataspartof the\beginrecoverypoint"forthecheckpointoncethecheckpointhascompleted. ATTandundologsstoredinthemostrecentcheckpoint,loadsthedatabaseimage Allupdatesdescribedbylogrecordsprecedingthispointareguaranteedtobe thecheckpointoperation,theendofthesystemlogondisk,endofstablelog,isnoted recordfortheoperationisnotfoundinthesystemlog.suchlogrecordsrepresent forthelastcompletedcheckpointofthedatabaseareapplied.restartrecovery reectedinthecheckpointeddatabaseimage. ignoresredologrecordsforupdatesperformedbyanoperationifthecommitlog Thus,duringrestartrecoveryonlyredologrecordsfollowingtheendofstablelog 7
9 uncommittedupdates,andmaynothavecorrespondingundorecordsinthecheckpointedatt.however,iftheundorecordsareabsent,theeectsofthelogrecords willnotbereectedinthecheckpointeddatabaseimage.suchrecordswouldbe dirtyforeachlogrecordandnecessaryactionsaretakentokeepthecheckpointed presentonlyduetoacrashwhilethelogrecordsforanoperationwerebeingushed. imageoftheattconsistentwiththelogasitisapplied.theseactionsontheatt mirrortheactionstakenduringnormalprocessing.forexample,whenanoperation commitlogrecordisencountered,lowerlevellogrecordsinthetransaction'sundo logfortheoperationarereplacedbyahigherlevelundodescription. Duringtheapplicationofredologrecords,appropriatepagesindptaresetto rolledbackisveryimportant,sothatanundoatlevelliseesdatastructuresthat rolledback.however,theorderinwhichoperationsofdierenttransactionsare areconsistent[lom92].first,alloperations(acrossalltransactions)atl0that back.todothis,allcompletedoperationsthathavebeeninvokeddirectlybythe transaction,orhavebeendirectlyinvokedbyanincompleteoperation,havetobe mustberolledbackarerolledback,followedbyalloperationsatlevell1,thenl2 andsoon. Oncealltheredologrecordshavebeenapplied,theactivetransactionsarerolled 3ConnectiontoRelatedWork operationcommitswhenundooperationscomplete(similartoclrsdescribedin Multi-levelrecoveryandvariantsthereof,primarilyfordisk-basedsystems,have [MHL+92]).Also,asin[Lom92],transactionrollbackatcrashrecoveryisperformed ourschemesrepeathistory,generatelogrecordsduringundoprocessingandlog impactthedistributedschemesare levelbylevel.someofthefeaturesofourmain-memoryrecoverytechniquewhich beenproposedintheliterature[whbm90,lom92,mhl+92].liketheseschemes, 2.Separateundologsaremaintainedinmemoryforactivetransactions.Aresult 3.Oursingle-siteschemedoesnotrequirelatchingofpagesduringupdates, 1.Duetotransientundologging,nophysicalundologsarewrittenouttothe setting.actionsthatarenormallytakenonpagelatching,suchassettingof isthattransactionrollbackdoesnotneedtoaccessthegloballog,partofwhich couldbeondisk. whichisinconvenientandexpensiveineitheramain-memorydboranoodb globallogexceptduringcheckpoints. recordswrittentothegloballog.(oneofourshared-diskschemesusespage doesnot.) dirtybitsforthepage,areecientlyperformedbasedonphysicalredolog latchingforensuringcacheconsistency,whiletheothershared-diskscheme 8
10 vironment,eachsitemaintainsaseparatelog,andpagesareshippedbetweensites. 4.Thecorrectnessrequirementsofthewrite-aheadloggingpolicyareaccomplishedwithasingleushfortheentiredatabaseduringacheckpoint,rather Ourshared-disklog-shippingschemedoesnotshippages,butinsteadbroadcastslog IntheARIES-SD[MN91]familyofschemesforrecoveryintheshareddisken- 5.Ourschemedoesnotperformin-placeupdateofthediskimageduringpage records,takingadvantageofcheapapplicationoftheselogrecordsinmain-memory, ush,insteadusingping-pongcheckpointing. than(potentially)oneushperpage. logicalundoandhigh-concurrencyindexoperations. andpermittingconcurrentupdatesatasmaller-than-pagegranularity.inourshared toprotecttheearlyreleaseoflocks,makingitunclearhowthatschemesupports diskschemes,logushesaredrivenbythereleaseofalockfromasite,inorderto recovery.the\superfast"methodofaries-sd[mn91]doesnotdescribeushes supportrepeatingofhistoryandcorrectrollbackofmulti-levelactionsduringcrash whichassumepage-levelconcurrencycontrolandtheno-stealpagewritepolicy {neitherofwhichareassumptionsmadeinourschemes. clients,whichisnotsupportedin[mn94]. checkpointingprocess.wealsosupportconcurrentupdatestoapagebydierent [MHL+92]canbeextendedtoaclient-serverenvironment.Incontrasttoour client-serverscheme,theirschemeinvolvestheclientsaswellastheserverinthe In[Rah91],theauthorsproposerecoveryschemesfortheshareddiskenvironment theclient-serverrecoveryschemefortheexodusstoragemanager(esm-cs)is discussed,butrecoveryconsiderationsarenotextensivelyaddressed.in[fzt+92], described.thisrecoveryscheme,basedonaries[mhl+92],requirespage-level In[MN94],theauthorsshowhowtheARIESrecoveryalgorithmdescribedin 4Client-ServerRecoveryScheme lockinguntilendoftransaction(forexample,thecommitdirtypagelist). In[CFZ94],object-levelaswellasadaptivelockingandreplicamanagementare Inthissection,wedescribetheclient-serverrecoveryscheme.Oursystemmodelis asfollows. Thereisasingleserverwithstablestorage,whichisresponsibleforcoordinatingallthelogging,andforperformingcheckpointsandrecovery(see Figure2).Theservermaintainsacopyoftheentiredatabaseinmemory. entiredatabaseinitsmemory. databaseattheclient. Atransactionexecutesatasingleclientandupdates/accessesthecopyofthe Multipleclientsmaybeconnectedtotheserver;eachclienthasacopyofthe 9
11 Database Database ATT ATT DPT System Log Tail System Log Tail ThenetworkisFIFOandreliable. Figure2:Client-ServerArchitecture In Main Memory SERVER In Main Memory Client nodes Network On Disk Stable System Log Database cur_ckpt ATT Checkpoints locksandagloballockmanager(glm)attheserverkeepstrackoflockscached Asaresultofupdatingthelocalcopyofthedatabase,databasepagesupdatedby Ckpt A System Log Tail Ckpt B theclientitself.however,requestsforlocksnotcachedlocallyareforwardedtothe atthevariousclients.transactionrequestsforlockscachedlocallyarehandledat aclientmaynotbecurrentatsomeotherclient.therefore,apageataclientisin dataduetoupdatesbyotherclientsandarerefreshedbyobtainingthelatestcopy oneoftwostates{validorinvalid.invalidpagescontainstaleversionsofcertain ofthepagefromtheserver. andreleasinglocks.eachclientsitehasalocallockmanager(llm)whichcaches Transactionsfollowthecallbacklockingscheme[LLOW91,CFZ94]whenobtaining Main Memory aconictingmode(beforegrantingthelockrequest).aclientrelinquishesalock GLMwhichcallsbackthelockfromotherclientsthatmayhavecachedthelockin inresponsetoacallbackassoonastransactionscurrentlyholdingthelock(ifany) system)whiletheclientsmaintaintheattforthetransactionsbelongingtothat releasethelock. client.thelogrecordsforupdatesgeneratedbyatransactionataclientsiteare storedinthatsite'satt.clientsitesdonotmaintainasystemlogondisk,but keepasystemlogtailinmemoryandappendlogrecordsfromthelocalredologsto thistailwhenoperationscommit/abort.checkpointingisperformedsolelyatthe server,andfollowsthesameprocedureasthecentralizedcase. TheservermaintainsthedptandtheATT(foralltransactionsintheclient-server theclientwaitsfortheservertoushthenewlyreceivedlogrecordstodiskbefore systemlogareshippedbytheclienttotheserver.inthecaseoftransactioncommit, Whenalockisrelinquishedfromasiteoratransactioncommits,logrecordsinthe 10
12 willnothavetoreadtheaectedpagesfromdisk. reportingthecommittotheuser.theshippedredologrecordsareusedtoupdate theserver'scopyoftheaectedpages,ensuringthatpagesshippedtoclientsfrom izationstothebasicideasdiscussedabove. recordsthemselvesissmallsince,inourmain-memorydatabasecontext,theserver theserverarecurrent(notethatpagesareshippedonlyfromtheservertoclients andneverviceversa).thisenablesourschemetosupportconcurrentupdatesto recordswillusuallybecheaperthanshippingpages,andthecostofapplyingthelog asinglepageatmultipleclientssincere-applyingtheupdatesattheservercauses 4.1BasicOperations themtobemerged(thisapproachisalsoadoptedin[cdf+94]).shippingthelog Wenowdescribethefeatureswhichdistinguishtheclient-serverschemefromthe pointsinprocessing. centralizedcase,intermsofactionsperformedattheclientandtheserveratspecic Wewillnowdescribeourschemeindetailandalsooutlineseveralpossibleoptim- PageAccess:Incaseaclientaccessesapagethatisvalid,itsimplygoes aheadwithoutcommunicatingwiththeserver.else,ifthepageisinvalid (certaindataonthepagemaybestale),thentheclientrefreshesthepage by1)obtainingthemostrecentversionofthepagefromtheserver,and2) applyingtothenewlyreceivedpageanylocalupdateswhichhavenotbeen senttotheserver(thisstepmergeslocalupdateswithupdatesfromother Topreventraceconditions,theclientdoesnotsendlogrecordstotheserver sites).theclientthenmarksthepageasvalid.theserverkeepstrackof clientsthathavethepageinavalidstate. Operation/TransactionCommit:Attheclient,redologrecordsare afteraskingforapageandbeforereceivingit. Anoptimizationoftheaboveistocheckforvalidityofpagesatthetimeof acquisitionofregionlocksfromtheserverratherthanoneveryaccess;forthis optimizationtobeused,thesetofpagescoveredbytheregionlockmustbe known. movedtothesystemlog,acommitrecordisappended,andappropriateactions LockRelease:Whenalockisrelinquishedbyaclient,allredologrecords areperformedonthetransaction'sundologintheattasdescribedforthe logareshippedtotheserver,andcommitprocessingwaitsuntiltheserver locally. centralizedcase.incaseofatransactioncommitthelogrecordsinthesystem Thelocallockmanageratthesitemayhowevercontinuetocachethelocks hasacknowledgedthatthelogrecordshavebeenushedtodisk. thatweregeneratedunderthislockneedtobeshippedtotheserver.the Finally,allthelocksacquiredbytheoperation/transactionarereleasedlocally. 11
13 otherclientthatobtainsthesamelockgetsacopyofthepageswhichcontains theupdatesdescribedbytheselogrecords.asimplewaytoensurethatall serverthenappliestheselogrecordstoitsdatabaseimagetoensurethatan- logrecordsgeneratedunderthelockareshippedtotheserveristoushthe systemlogfromtheclienttotheserver. Anoptimizationtoavoidushingthesystemlogeachtimeistostoretheend relatingtotheoperation(includingoperationcommit)precedethepointinthe systemlogstoredwiththelock.thislocationinthelogisclient-site-specic. inthelogstoredwiththelock.similarly,foranoperationlock,alllogrecords oranoperationlockisreleasedbyatransaction.thus,foranyregionlock, theserverduetocall-back,itshipstotheserveratleasttheportionofthe allredologrecordsinthesystemlogaectingthatregionprecedethepoint BeforeaclientsiterelinquishesanXmoderegionlockoroperationlockto oftheclientsystemlogwiththelock(attheclient)whenaxmoderegionlock thatthenextlockwillnotbeacquiredontheregionuntiltheserver'scopy systemlogwhichprecedesthelogpointerstoredwiththelock.thisensures LogRecordProcessing:Attheserver,foreachphysicalredologrecord releasedthelocks.thus,iftheserverabortsatransactionafterasitefailure, isuptodate,andthehistoryoftheupdateisinplaceintheserver'slogs. theabortofthisoperationwilltakeplaceatthelogicallevelofthelocksstill heldforitattheserver. ForXmoderegionlocks,thisushensuresrepeatingofhistoryonregions, undodescriptorsintheoperationcommitlogrecordsfortheoperationwhich (receivedfromaclient),theundologrecordisgeneratedbyreadingthecurrent whileforoperationlocksthisushensuresthattheserverreceivesthelogical fromthecommitlogrecordandappendedtotheundologforthetransaction Inaddition,foroperationcommit,thelogicalundodescriptorisextracted bytheredologrecordisapplied,followingwhichthelogrecordisappended sameactionsasinthecentralizedcasewhenthelogrecordsweregenerated. undologforthistransactionintheserver'satt.nexttheupdatedescribed commitlogrecordsreceivedfromtheclientareprocessedbyperformingthe contentsofthepageattheserver.thenewlogrecordisthenappendedtothe Byapplyingallthephysicalupdatesdescribedinthephysicallogrecords intheserver'satt.fortransactioncommit,theclientwhosetransaction committedisnotiedafterthelogushtodisksucceeds. totheredologforthetransactionintheserver'satt.operation/transaction TransactionAbort/SiteFailures:Ifaclientsitedecidestoabortatransaction,itprocessestheabort(asinthecentralizedcase)usingtheundologs toitspages,theserverensuresthatitalwayscontainsthelatestupdateson oftheloggingscheme,asfarasdataupdatesareconcerned,isjustasifthe clienttransactionactuallyranattheserversite. regionsforlockswhichhavebeenreleasedtoitfromtheclients.theeect 12
14 PageInvalidation willaborttransactionsthatwereactiveattheclientusingundologsforthe forthetransactionintheclient'satt.iftheclientsiteitselffails,theserver theserver).iftheserverfails,thenthecompletesystemisbroughtdown,and ingwiththeserver,incaseofpartition,adecisiontoabortisenforceableby transactioninit'satt(sincetheclientcannotcommitwithoutcommunicat- restartrecoveryisperformedattheserverasdescribedinsection2.7. on-update,andinvalidate-on-lock,forensuringthatdataaccessedbyaclient Wecompleteourclient-serverschemebypresentingtwomethods,invalidate- fromthesite.sincetheserverwouldhaveappliedthelogrecordstoitscopy isup-to-date. Allactionsdescribedsofarareusedincommonbybothmethods.Inparticular,bothmethodsfollowtherulethatalllogrecordspertainingtoupdates theclient. Bothmethodsmarkpagesattheclientsasinvalid,todenotethatsomeofthe ofthedata,thisensuresthatwhentheservergrantsalock,ithasthecurrent involvedintheregionforwhichthelockwasobtainedarenotup-to-dateat clientacquiresalock,itisstillpossiblethatthecopyofoneormorepages dataonthepageisoutofdate.evenifapageismarkedinvalid,someof versionofallpagescontainingdatacoveredbythatlock.however,whena regionlockonthedata.therstmethod,invalidate-on-update,isaneager thedatainthepagemaystillbeup-to-date,forinstance,iftheclienthasa methodthatmarkspagesasinvalidatclientsassoonasanupdateoccurs madeunderalockareushedtotheserverbeforethelockisrelinquished Theinvalidate-on-updateschemeworksasfollows.Whentheserverreceiveslog 4.2Invalidate-On-Update markingpagesasinvalidatclientswhentheclientgetsalock.thesecond schemereducesinvalidationmessagesbykeepingextraper-lockinformation attheserver.detailsofthetwomethodsarepresentedinsections4.2and 4.3respectively. attheserver,whilethesecond,invalidate-on-lock,isamorelazymethod, invalidatemessagestoclients(otherthantheclientthatupdatedthepage)thatmay recordsfromaclient,itdoesthefollowing.foreachpagethatitupdates,itsends havethepagemarkedasvalid.forallclientsotherthantheclientthatupdatedthe page,theservernotesthattheclientdoesnothavethepagemarkedvalid.clients, onreceivingtheinvalidatemessage,marktheirpageasinvalid.thusinvalidation messagesarereceivedbyclientsbeforetheycanacquirearegionlockontheupdated data,andbeginaccessingthedata. twodierentregionlocks.lets1bethesitethatushesitsupdatestotheserver Forexample,considertwositess1ands2updatingthesamepageconcurrentlyunder Althoughthemethodisverysimpleandeasytoimplement,ithassomedrawbacks. 13
15 rst;theupdatewillcausetheservertosendaninvalidatemessagetos2,whichwill underthelockthatitalreadyhas,thentheinvalidatewasnotnecessary,sincethe 4.3Invalidate-On-Lock thenre-readthepagefromtheserver.however,ifsites2accessesthepageagain Theinvalidate-on-lockschemedecreasesunnecessaryinvalidationsandtheoverhead ofsendinginvalidationmessagesbymarkingpagesasinvalidonlywhenalockon thenextsectiontakesadvantageofthisobservationtoreduceoverheads. aregioncoveringthepageisobtainedbyaclient.asaresult,iftwoclientsare updatingdierentregionsonthesamepage,asintheearlierexample,noinvalidationmessagesaresenttoeitherclient.bypiggy-backinginvalidationmessages separateinvalidationmessagesinthepreviousschemeiseliminated. dataintheregionithaslockedhasnotchanged.theinvalidate-on-lockschemein forupdatedpagesonlockgrantmessagesfromtheserver,theoverheadofsending obtainingthisinformationistorequirethatanupdatecallmustspecifynotonly associatedwiththelockfortheupdatedregion.thus,theschemerequiresthatit needtocheckforvalidityofapageoneveryaccessorupdatetothepage itsuces bepossibletodeterminetheregionlockfromtheredorecord.asimplewayof formationaboutupdatestothatregion.specically,whenupdatesdescribedby tocheckforvalidityatlockacquisitiontime. aphysicalredorecordareappliedtopagesattheserver,theupdatedpagesare Toachievetheabove,theschememustassociatewiththelockforaregionin- Thebiggestbenetoftheinvalidate-on-lockscheme,however,isthatthereisno aprogrammertoprovidethisinformation,sinceallupdatesmustbemadeholding aregionlock.thelocknamecanthenbesentwiththeredologrecord. thedatatobeupdated,butalsotheregionlockthatprotectsthedata.itiseasyfor witheachlogrecord,whichreectsboththeorderinwhichtherecordwasapplied totheserver'scopyofthepageandtheorderinwhichitwasaddedtothesystem theclient(valid/invalid),alongwiththelsnforthepagewhenitwaslastshipped eachclient,theservermaintainsinaclientpagetable(cpt),thestateofthepageat log.foreachpage,theserverstoresthelsnofthemostrecentlogrecordthat totheclient. updatedthepage,andtheidentityoftheclientwhichissuedit.inaddition,for ThisschemealsorequiresthattheserverassociateaLogSequenceNumber(LSN), toupdatestotheregion.foreachpageinthelist,theserverstoresthelsnofthe mostrecentlogrecordreceivedbytheserverthatrecordedanupdatetothepart oftheregiononthispage,andtheclientwhichperformedtheupdate.thus,when aclientisgrantedaregionlock,if,forapageinthelocklist,thelsnisgreater thanthelsnforthepagewhenitwaslastshippedtotheclient,thentheclient pagecontainsstaledatafortheregionandmustbeinvalidated. Theserveralsomaintainsforeachregionlockalistofpagesthataredirtydue 14
16 apageasinvalidonlyifthereisanupdateperformedundertheregionlockrequested bytheclient,andtheupdatehasnotyetbeenpropagatedtotheclient. TheLSNinformationservestominimizetheshippingofpagestoclients,marking Theadditionalactionsforthisschemeareasfollows: Logapply:WhentheserverappliestoapageParedologrecord,LR, Lockgrant:Asetofinvalidatemessagesispassedbacktotheclientwiththe Phasbeenupdated).First,theLSNforPissettotheLSNforLR.Second, theentryforpinthelistofdirtypagesforlisupdated(orcreated),setting theclienttoc,andthelsntothelsnforlr. generatedatclientcunderregionlockl,ittakesthefollowingactions(after Pagerefresh:Whentheserversendsapagetoaclient(pagerefresh),at thelockwasnotthelasttoupdatethepageunderthislock.theinvalidated theserver,thepageismarkedvalidinthecptfortheclientandthelsnfor pagesaremarkedinvalidinthecptfortheclientandattheclientsite. smallerthanthelsnofthepageinthelocklist,and3)theclientacquiring withthelockbeingacquiredthatmeetthreecriteria:1)thepageiscachedat theclientinthevalidstate,2)thelsnofthepageinthecptfortheclientis lockacquisition.theinvalidatemessagesareforpagesinthelistassociated Locklistcleanup:Weareinterestedinkeepingthelistofpageswithevery thepageinthecptisupdatedtobethelsnforthepageattheserver. lockassmallaspossible.thiscanbeachievedbyperiodicallydeletingpages theclientnotedinthelistofpagesforlasthelastclienttoupdatep: PfromthelistoflockLsuchthatthefollowingconditionholds,whereCis LSNisgreaterthantheLSNforthepageinthelocklist,thentheclienthas needtobepartofanyinvalidationlistsenttotheclient. Therationaleforthisruleisthatthepurposeofregionlocklistsistodetermine pagesthatmustbeinvalidated.however,ifforapageinaclient'scpt,the themostrecentupdatetotheregiononthepage,andthusthepagewillnot EveryclientotherthanChasthepagecachedeitherinaninvalid thelistforlockl. stateorwithlsngreaterthanorequaltothelsnforthepagein 5SharedDiskRecovery:ModelandCommonStructures Intheshareddiskapproach,anumberofmachinesareinterconnectedandalsohave nothingarchitecture,suchasfasteraccesstonon-localdisksandfault-tolerance. manysystems,suchasthedecvaxclusters,andprovidesbenetsoverashared directaccesstodisksoverafastnetwork.theshareddiskenvironmentisusedin 15
17 ourintendedapplications. concurrencycontrol.thisallows,forexample,read-onlytransactionswithafully levelrecovery,ourmainconcernisminimizingfalsesharingthroughne-grained preventingonesystemfrombecomingabottleneckinthesystem.asinourclientserverscheme,inadditiontocarefulconsiderationoftheinteractionwithmulti- isthatthealgorithmsaresymmetricwithrespecttowhichsiteexecutesthem, Also,thebasicadvantageofshareddiskschemesovertheclient-serverschemes cachedworkingsettoproceedatmain-memoryspeeds,animportantpropertyfor Wenowdescribeourshareddiskrecoverymodel. Sitescachelocks,andrelinquishlocksbasedonthecallbacklockingmechanismdescribedinSection4.WeassumethenetworkisFIFOandreliable. managercouldbedistributedforspeedandreliability,butthisisorthogonal toourdiscussion. systemlogondisk.thustherearebemultiplelogsinthesystem. SitesobtainlocksfromaGlobalLockManager(GLM);thefunctionofthelock Eachsitemaintainsitsowncopyoftheentiredatabaseinmemoryanditsown Eachsitehasitsownsystemlogondiskandthereforethelogsaredistributed.Torepeathistoryduringrestartrecovery,weneedsomemechanismto temporallyorderlogrecordsthataectthesameregion.toenablethis,each fromthiscounterisstoredineachphysicalredologrecordforanupdate.we sitemaintainsaglobaltimestampcountertsctr,andatimestampobtained Eachsitemaintainsitsownversionofthedirtypagetabledpt,systemlog(in Asinglepairofcheckpointedimagesismaintainedondiskforthedatabase. memoryandondisk),andanatt(withseparateundoandredologrecords executeatthatsite. foreachtransaction)whichstoresinformationrelatingtotransactionsthat Acheckpointimageconsistsofanimageofthedatabase,thedirtypagetable willseethedetailsofhowthistsctrismaintainedandusedlater. controlandrecovery.therstisapage-shippingapproachwhichissimilarin ckptdpt,andforeverysite: schemewhichallowsconcurrentuseofnon-overlappingregionsonapageacross spirittotheinvalidate-on-updateclient-servermode.thesecondisalogshipping sites. Inthenexttwosections,wepresenttwoschemesforshareddiskconcurrency 2.acopyoftheATTatthesite(containingundologs). 1.endofstablelog{thepointinthesite'ssystemlogfromwhichthesystem logmustbescannedduringrecovery. 16
18 Site 1 DB Site 2 DB Site N DPT1 ATT PTT DPT2 1 1 ATT2 PTT Memory Sys Log Tail Memory Sys Log Tail serverschemeinthatatransactionatasiteupdatingaregiononapageisguaran- teedtohavethelatestcopyofthepage.therefore,concurrentupdatestodierent Figure3:Page-ShippingSharedDiskArchitecture Ourpage-shippingschemeissimilarinspirittotheInvalidate-on-Updateclient- 6Page-ShippingSharedDiskRecoveryScheme Logs Stable Sys Log Stable Sys Log Shared N End of Stable log 6.1DataStructures 2 cur_ckpt ATT (undo logs) Checkpoints Ckpt A Wenowdescribedatastructuresspecictothepage-shippingscheme.Common regionsofapagearenotpossibleinthisscheme. Ckpt B Database ckpt_ptt ckpt_dpt ofthedatabaseisstoredacheckpointpagetimestamptable,referredtoasckptptt. ensuringthatatransactionalwayshasthelatestcopyofthepagewhileaccessing lockingmechanismdescribedearlier.alongwitheachofthetwocheckpointimages orupdatingthepage.sitescachelocks,andrelinquishlocksbasedonthecallback thepagewaslastupdated.eachpagehasanassociatedpagelockwhichhelpsin datastructuresweredescribedinsection5.anoverviewofthedatastructuresfor thisschemeisgiveninfigure3. eachsiteinthepagetimestamptable,pttwhichkeepstrackofthetsctrvaluewhen InadditiontotheTSctrforthesite,atimestampforeachpageismaintainedat 6.2NormalProcessing performedinthecentralizedcase,tosupportdistributedconcurrencycontroland Wedescribebelowtheactionstakenduringnormalprocessing,inadditiontothose recovery.checkpointingandrecoveryfromsystemandsitefailurearedescribedin subsequentsubsections. 17
19 Update:Likeinthecentralizedcase,beforeaccessingaregion,eachtransactionobtainsaregionlockfromtheLLM.Additionalpagelocksareacquired ins(x)modewhileaccessing(updating)dataonapage.ifthislockisnot cachedatthesite,actionsareperformedasdescribedbelowunderlockacquisition. Pagelocksforanaccessarereleasedbyatransactiononcetheaccessis intheredologrecordwhentheupdateisperformed,butthelogrecordis totheupdate.also,thetimestampfortheupdatedpage(intheptt)atthe redologrecordwasgeneratedisstoredintheredologrecordcorresponding Animportantpointtonoteisthatlogrecordsinthesystemlogmaynotbe siteissettothetsctrstoredinthelogrecord. orderedontheirtsctrvalues.thisisbecausethevalueoftsctrisstored completed;pagelocksforanupdatearereleasedbyatransactiononlyafter appendedtothetransactionlocallog,whichisnotushedtothesystemredo theupdateonthepageiscompleted.thevalueoftsctratthesitewhenthe LockRelease:WhenatransactionreleasesanXmoderegionlockoroperationlock,itstorestheendofloginmemorywiththelock(thisisstoredto whichheldtheregionlockwillbemovedtothegloballogbythenormalop- regionoroperationlocksisdonetoensurethatitispossibletorepeathistory erationcommitsemanticspriortothereleaseofthislock.thus,foraregion lock,allredologrecordsforupdatestotheregioncoveredbythelockprecede duringrestartrecovery,andappropriatelocksforundoingoperationsareheld theendoflogpointstoredwiththelock(similarforoperations).whenasite siteuntiltheendoflogpointstoredwiththelock.theushonreleaseofx relinquishesanxregionlockoroperationlock,itushesthegloballogatits loguntiloperationortransactioncommit. optimizetheamountofushingthatneedstobedonewhenalockisrelinquishedasintheclient-serverscheme).notethatallupdatesfortheoperation LockAcquisition: storeswitheachpagelockthesitethatlastheldthepagelockinxmode;the byothersitesthatlateracquirethelock,aswewillseeshortly.theglmalso incaseofsitecrashes.notethatnoushesareperformedwhenpagelocks arereleased. Ifitisapagelock,thenthepageisalreadycurrentatthissite. AtransactionacquiringalockcachedbytheLLMneedtakenospecialaction. informationisupdatedeachtimeasiterelinquishesanxmodepagelock, Additionally,whenasitereleasesanXpageorXregionlockbacktothe GLM,itstampsitwiththesite'sTSctr;theTSctrvalueofthelockisused WhenanX-modepageorregionlockarrivesfromtheGLM,itincludesthe timestampfromthelastsitethatheldthelockinxmode,asdescribedabove. UponreceivinganXregionlockorpagelockatasite,thesite'sTSctrisset 18
20 tothemaximumof1)it'scurrentvalue,and2)thetsctrvalueassociated (thatis,thelockisnotalreadycachedatthesite),thesiterequeststhepage fromthelastsitethatheldthepagelockinxmode(usingthesiteidentier withtheincominglockplusone. WhenasiteacquiresapagelockonbehalfofatransactionfromtheGLM datestoapageatdierentsitesareassignedincreasingtimestampvalues.shipping timestampswithregionlocksensurethatlogrecordsgeneratedunderconicting locksareappliedinthecorrectorderduringrecoveryeventhoughredologrecords Shippingtimestampswithpagelocksensuresthatlogrecordsforsuccessiveup- sentwiththelock).inordertohandlesingle-siterecovery,failureofthe intheindividualsitemaynotbeorderedbytimestamp(asmentionedearlier). acquiringsitetoobtainacopyofthepage,duetoafailureofthesitefrom However,thealgorithmstillworkscorrectly,asshowninthediscussionofrecovery whichitisbeingrequested,causesthelockacquisitiontofailandthelockto andcorrectnessbelow. bereturnedtotheglmunchanged. environmentrequirescoordinationamongthevarioussites.asmentionedabove,a 6.3Checkpointing Unlikethecentralizedandclient-serverscheme,checkpointingintheshareddisk singlepairofcheckpointedimagesismaintainedforallthesites. ATTand3)ushingthegloballog.Below,wedescribeeachstep: followingthreestepsateachsite{1)writingthedatabaseleimage2)writingthe Thesiteinitiatingthecheckpointcoordinatestheoperation,whichconsistsofthe 1.Thecoordinatorannouncesthebeginningofthecheckpoint,atwhichtime thenmakeacopyoftheirdptsandzerotheirdpts.notethatzeroingthedpt alongwiththeendofstablelog(notedabove),andacopyofthedpt.the Eachsitethenmakesacopyofitscurrentpttandsendsittothecoordinator coordinatorconstructsckptdptbyor'ingtogetherthecopyofitsdptandall andrecordingendofstablelogisdoneatomicallywithrespecttoushes. thedptsreceivedfromothersites(recallthatweareassumingthedptisa bitmap).thedatabasepagestobewrittenoutduringthecheckpointarethe allsites(includingthecoordinator)notetheircurrentendofstablelogvalues, Foreachpagetobewrittenout,thecoordinatorusesthepttssenttoitby highesttimestampforthepage.thissiteisresponsibleforwritingthepageto thecheckpointimage.oncethecoordinatorhaspartitionedthesetofpages theothersitesanditsownptttodeterminethesitewhosepttcontainsthe pagesthataredirtyinckptdptorintheckptdptinthepreviouscheckpoint. tobewrittenoutamongthevarioussites,eachsiteissentthesetofpage write,proceedstowritethosepagestothecheckpointimage.sincenotwo identiersassignedtoit.asite,uponreceivingitsassignedsetofpagesto siteswillbeassignedthesamepage,sitecanwritepagesconcurrently. 19
21 previouscheckpointintomemory.foreverypagethatwasdeterminedtobe writtenout(bysomesitei),thetimestampforthepageinckptpttissetto Thecoordinatorthenconstructsckptpttbyrstreadingtheckptpttinthe itstimestampinthecopyofthepttforsitei.finally,ckptdptconstructed earlier,ckptpttandtheendofstablelogsforallthesitesarewrittentothe 2.Onceeverysitehaswrittenoutthedatabaseimageandreportedthistothe Notethatsincethesitewiththehighesttimestampforapagewritesthe checkpoint. thatmultiplesitescanbeconcurrentlywritingouttheatt. coordinator,thecoordinatorinstructseachsitetowriteoutitsatt.note more,aswillbediscussedinthecorrectnesssectionbelow,updatesforapage recordedinlogrecordswithtimestampslessthanthetimestampforthepage inckptpttarealsocontainedinthecheckpoint. endofstablelogrecordedforasite,arecontainedinthecheckpoint.further- pagetothecheckpointimage,updatestothepagebylogrecordspreceding Incasetheentiresystemfails,restartrecoveryisperformedbyanyonesite,sayj. 6.4Recovery 3.AfterwritingouttheATT,eachsiteushesthegloballogatthatsiteasin thecentralizedcase.finally,thedatabasecheckpointiscommittedafterall andforeachsiteiaseparatedpt,dptiisinitializedtocontainzerobitsforallpages. theattandtheendofstablelog.aseparatepagetablepttisinitializedtockptptt Thesitej,whichwewillcalltheactingcoordinatorsite,readsthefollowingfromthe mostrecentcheckpointimage:thedatabaseimage,theckptptt,andforeachsite, siteshavecompletedtheirushing. Startingfromtheendofstablelogpointstoredforasiteinthecheckpoint,thelog recordsinallthesystemlogsaremergedasdescribedbelow,andappliedtothe inthedptforthesitewhosesystemlogcontainstherecord,and3)thetimestamp database.tomergethesystemlogs,theyarescannedinparallel;ateachpoint, ifthenextlogrecordinanyofthesystemlogsisnotaredologrecord,thenany thelogrecord. forthepageinpttissettothemaximumofitscurrentvalueandthetimestampin onesuchrecordisprocessedandtheattforitssiteismodiedasdescribedfor thecentralizedcaseinsection2.7.ontheotherhand,ifthenextrecordsinallthe asmentionedearlier.however,thisdoesnotcauseaproblemandconictinglog systemlogsareredologrecords,thenthelogrecordoutputnextistheoneamongst recordsareappliedintheorderinwhichtheyweregenerated.thereasonforthis themwiththelowesttimestampvalue.if,forapageupdatedbythelogrecord,the isthatfortwoconictinglogrecordsinseparatesystemlogs,theearlierlogrecord timestampinthelogrecordisgreaterthanorequaltothetimestampforthepage inckptptt,then1)theupdateisappliedtothepage,2)thepageismarkeddirty Notethatredorecordsinthesystemlogforasitemaynotbeintimestamporder 20
22 andlogrecordsprecedingitinitssystemloghavelowertimestampsthanthelog recordgeneratedlater.thisfactisrevisitedbelowinouroverviewofcorrectness. fromtheredologareappendedtothesystemlogforsitei. jissettothelargesttimestampcontainedinthepttatsitej.sitejthenrollsback recordgeneratedwhenprocessinganoperationforsiteiisassignedatimestamp in-progressoperationsintheattsforthevarioussitesbeginningwithlevell0and thenconsideringsuccessivelevelsl1;l2andsoon(asdescribedinsection2.7). performedontheundoandredologsfortheentry.furthermore,eachredolog WhenanoperationinanATTentryforasiteiisbeingprocessed,actionsare equaltotsctratsitej,andwhenanoperationpre-commits/aborts,logrecords Oncethelastlogrecordhasbeenprocessed,TSctrattheactingcoordinatorsite sitesaredeletedfromsitej,bringingrecoverytocompletion. forthesiteduringrecoveryatsitej,andthedatabaseimageandpttateachsiteis afterincrementingitbyone.thedptateachsiteisthensettothedptmaintained setequaltothedatabaseimageandpttatsitej.finallyckptpttanddptforother forthesite(maintainedatsitej)tobemarkeddirty.afterthispoint,theother sitesareinvolvedinrecovery.thetsctrateverysiteissettothetsctratsitej Next,sitejusheseverysite'ssystemlogscausingappropriatepagesinthedpt Inthissection,wepresentadditionalargumentsaboutthecorrectnessofourpage- 6.5OverviewofCorrectness rectnessisbased. shippingrecoveryschemebydiscussingbelowseveralpropertiesonwhichthecor- 2.Anylogrecordaectingpageipriortoendofstablelogatanysitehas 1.Apage,i,inacheckpointimagereectsallupdateswithtimestamplessthan 3.IfL1andL2areconictinglogrecordsandL1isgeneratedbeforeL2,thenif 4.IfL1andL2areconictinglogrecordsindierentsystemlogsandL1is timestamplessthanorequaltockptptt[i]andisreectedinthecheckpoint imageofpagei. ckptptt[i]. generatedbeforel2,thenl1andalllogrecordsprecedingitinitssystemlog L2isushedtothestablelog,thensoisL1. ppt[i]atthesiteisgreaterthanorequaltothetimestampinthelogrecord,(b) thepageisinthedptofthesiteand(c)thepageatthesitecontainstheupdate areupdated,andpassingtimestampswithpagelocksguaranteesthatsuccessiveupdatestoapagehavenon-decreasingtimestamps(andinturn,assignnon-decreasing timestampstothepttentry). (1)followsfromthefactthattimestampsforpagesinthepttaresetonlyafterthey (2)Foralogrecordthatupdatespageipriortoendofstablelogatasite,(a) havelowertimestampsthanl2. 21
DataBlitz Main Memory DataBase System
DataBlitz Main Memory DataBase System What is DataBlitz? DataBlitz is a general purpose Main Memory DataBase System that enables: Ð high-speed access to data Ð concurrent access to shared data Ð data integrity
More informationOracle Architecture. Overview
Oracle Architecture Overview The Oracle Server Oracle ser ver Instance Architecture Instance SGA Shared pool Database Cache Redo Log Library Cache Data Dictionary Cache DBWR LGWR SMON PMON ARCn RECO CKPT
More informationRecovery: Write-Ahead Logging
Recovery: Write-Ahead Logging EN 600.316/416 Instructor: Randal Burns 4 March 2009 Department of Computer Science, Johns Hopkins University Overview Log-based recovery Undo logging Redo logging Restart
More informationOracle Database Security and Audit
Copyright 2014, Oracle Database Security and Beyond Checklists Learning objectives Understand data flow through an Oracle database instance Copyright 2014, Why is data flow important? Data is not static
More informationLast Class Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications
Last Class Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A. Pavlo Lecture#23: Crash Recovery Part 2 (R&G ch. 18) Write-Ahead Log Checkpoints Logging Schemes
More informationRecovery: An Intro to ARIES Based on SKS 17. Instructor: Randal Burns Lecture for April 1, 2002 Computer Science 600.416 Johns Hopkins University
Recovery: An Intro to ARIES Based on SKS 17 Instructor: Randal Burns Lecture for April 1, 2002 Computer Science 600.416 Johns Hopkins University Log-based recovery Undo logging Redo logging Restart recovery
More informationRecovery and the ACID properties CMPUT 391: Implementing Durability Recovery Manager Atomicity Durability
Database Management Systems Winter 2004 CMPUT 391: Implementing Durability Dr. Osmar R. Zaïane University of Alberta Lecture 9 Chapter 25 of Textbook Based on slides by Lewis, Bernstein and Kifer. University
More informationUVA. Failure and Recovery. Failure and inconsistency. - transaction failures - system failures - media failures. Principle of recovery
Failure and Recovery Failure and inconsistency - transaction failures - system failures - media failures Principle of recovery - redundancy - DB can be protected by ensuring that its correct state can
More informationTransaction Management Overview
Transaction Management Overview Chapter 16 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Transactions Concurrent execution of user programs is essential for good DBMS performance. Because
More informationORACLE INSTANCE ARCHITECTURE
ORACLE INSTANCE ARCHITECTURE ORACLE ARCHITECTURE Oracle Database Instance Memory Architecture Process Architecture Application and Networking Architecture 2 INTRODUCTION TO THE ORACLE DATABASE INSTANCE
More informationCrashes and Recovery. Write-ahead logging
Crashes and Recovery Write-ahead logging Announcements Exams back at the end of class Project 2, part 1 grades tags/part1/grades.txt Last time Transactions and distributed transactions The ACID properties
More informationIntroduction to Database Management Systems
Database Administration Transaction Processing Why Concurrency Control? Locking Database Recovery Query Optimization DB Administration 1 Transactions Transaction -- A sequence of operations that is regarded
More informationRecovery. P.J. M c.brien. Imperial College London. P.J. M c.brien (Imperial College London) Recovery 1 / 1
Recovery P.J. M c.brien Imperial College London P.J. M c.brien (Imperial College London) Recovery 1 / 1 DBMS Architecture REDO and UNDO transaction manager result reject delay scheduler execute begin read
More informationChapter 10: Distributed DBMS Reliability
Chapter 10: Distributed DBMS Reliability Definitions and Basic Concepts Local Recovery Management In-place update, out-of-place update Distributed Reliability Protocols Two phase commit protocol Three
More informationCrash Recovery. Chapter 18. Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke
Crash Recovery Chapter 18 Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke Review: The ACID properties A tomicity: All actions in the Xact happen, or none happen. C onsistency: If each Xact
More informationRecovery Principles in MySQL Cluster 5.1
Recovery Principles in MySQL Cluster 5.1 Mikael Ronström Senior Software Architect MySQL AB 1 Outline of Talk Introduction of MySQL Cluster in version 4.1 and 5.0 Discussion of requirements for MySQL Cluster
More informationFailure Recovery Himanshu Gupta CSE 532-Recovery-1
Failure Recovery CSE 532-Recovery-1 Data Integrity Protect data from system failures Key Idea: Logs recording change history. Today. Chapter 17. Maintain data integrity, when several queries/modifications
More informationSQL Server Transaction Log from A to Z
Media Partners SQL Server Transaction Log from A to Z Paweł Potasiński Product Manager Data Insights pawelpo@microsoft.com http://blogs.technet.com/b/sqlblog_pl/ Why About Transaction Log (Again)? http://zine.net.pl/blogs/sqlgeek/archive/2008/07/25/pl-m-j-log-jest-za-du-y.aspx
More informationExam : 70-458. Transition Your MCTS on SQL Server 2008 to MCSA: SQL Server 2012, Part 2. Title : The safer, easier way to help you pass any IT exams.
Exam : 70-458 Title : Transition Your MCTS on SQL Server 2008 to MCSA: SQL Server 2012, Part 2 Version : DEMO 1 / 7 1.Note: This question is part of a series of questions that use the same set of answer
More informationRecovery System C H A P T E R16. Practice Exercises
C H A P T E R16 Recovery System Practice Exercises 16.1 Explain why log records for transactions on the undo-list must be processed in reverse order, whereas redo is performed in a forward direction. Answer:
More informationChapter 14: Recovery System
Chapter 14: Recovery System Chapter 14: Recovery System Failure Classification Storage Structure Recovery and Atomicity Log-Based Recovery Remote Backup Systems Failure Classification Transaction failure
More informationThe Oracle Universal Server Buffer Manager
The Oracle Universal Server Buffer Manager W. Bridge, A. Joshi, M. Keihl, T. Lahiri, J. Loaiza, N. Macnaughton Oracle Corporation, 500 Oracle Parkway, Box 4OP13, Redwood Shores, CA 94065 { wbridge, ajoshi,
More informationRedo Recovery after System Crashes
Redo Recovery after System Crashes David Lomet Microsoft Corporation One Microsoft Way Redmond, WA 98052 lomet@microsoft.com Mark R. Tuttle Digital Equipment Corporation One Kendall Square Cambridge, MA
More informationOutline. Failure Types
Outline Database Management and Tuning Johann Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Unit 11 1 2 Conclusion Acknowledgements: The slides are provided by Nikolaus Augsten
More informationDatenbanksysteme II: Implementation of Database Systems Recovery Undo / Redo
Datenbanksysteme II: Implementation of Database Systems Recovery Undo / Redo Material von Prof. Johann Christoph Freytag Prof. Kai-Uwe Sattler Prof. Alfons Kemper, Dr. Eickler Prof. Hector Garcia-Molina
More information- An Oracle9i RAC Solution
High Availability and Scalability Technologies - An Oracle9i RAC Solution Presented by: Arquimedes Smith Oracle9i RAC Architecture Real Application Cluster (RAC) is a powerful new feature in Oracle9i Database
More informationRecovery Theory. Storage Types. Failure Types. Theory of Recovery. Volatile storage main memory, which does not survive crashes.
Storage Types Recovery Theory Volatile storage main memory, which does not survive crashes. Non-volatile storage tape, disk, which survive crashes. Stable storage information in stable storage is "never"
More informationCSE 444 Midterm Test
CSE 444 Midterm Test Autum 2008 Name: Total time: 50 Question 1 /40 Question 2 /30 Question 3 /30 Total /100 1 1. SQL [40 points] We have a database of documents. Each document consists of several sections,
More informationTransactions and Recovery. Database Systems Lecture 15 Natasha Alechina
Database Systems Lecture 15 Natasha Alechina In This Lecture Transactions Recovery System and Media Failures Concurrency Concurrency problems For more information Connolly and Begg chapter 20 Ullmanand
More informationReview: The ACID properties
Recovery Review: The ACID properties A tomicity: All actions in the Xaction happen, or none happen. C onsistency: If each Xaction is consistent, and the DB starts consistent, it ends up consistent. I solation:
More informationMicrokernels & Database OSs. Recovery Management in QuickSilver. DB folks: Stonebraker81. Very different philosophies
Microkernels & Database OSs Recovery Management in QuickSilver. Haskin88: Roger Haskin, Yoni Malachi, Wayne Sawdon, Gregory Chan, ACM Trans. On Computer Systems, vol 6, no 1, Feb 1988. Stonebraker81 OS/FS
More informationBest Practices for Using MySQL in the Cloud
Best Practices for Using MySQL in the Cloud Luis Soares, Sr. Software Engineer, MySQL Replication, Oracle Lars Thalmann, Director Replication, Backup, Utilities and Connectors THE FOLLOWING IS INTENDED
More informationDesign of Internet Protocols:
CSCI 234 Design of Internet Protocols: George lankenship George lankenship 1 Outline asic Principles Logging Logging algorithms Rollback algorithms George lankenship 2 Why Techniques? CID properties of
More informationModule 3: Instance Architecture Part 1
Module 3: Instance Architecture Part 1 Overview PART 1: Configure a Database Server Memory Architecture Overview Memory Areas and Their Functions and Thread Architecture Configuration of a Server Using
More informationAgenda. Transaction Manager Concepts ACID. DO-UNDO-REDO Protocol DB101
Concepts Agenda Database Concepts Overview ging, REDO and UNDO Two Phase Distributed Processing Dr. Nick Bowen, VP UNIX and xseries SW Development October 17, 2003 Yale Oct 2003 Database System ACID index
More informationCorrupt and Shutdown Dirty EVTX Log Files: A Comparison of Recovery Using the Microsoft Event Viewer Versus Ipswitch's LogHealer Technology
Corrupt and Shutdown Dirty EVTX Log Files: A Comparison of Recovery Using the Microsoft Event Viewer Versus Ipswitch's LogHealer Technology As the Microsoft Windows Vista, Windows Server 2008, and Windows
More information2 nd Semester 2008/2009
Chapter 17: System Departamento de Engenharia Informática Instituto Superior Técnico 2 nd Semester 2008/2009 Slides baseados nos slides oficiais do livro Database System c Silberschatz, Korth and Sudarshan.
More informationRecovery algorithms are techniques to ensure transaction atomicity and durability despite failures. Two main approaches in recovery process
Database recovery techniques Instructor: Mr Mourad Benchikh Text Books: Database fundamental -Elmesri & Navathe Chap. 21 Database systems the complete book Garcia, Ullman & Widow Chap. 17 Oracle9i Documentation
More informationUnit 12 Database Recovery
Unit 12 Database Recovery 12-1 Contents 12.1 Introduction 12.2 Transactions 12.3 Transaction Failures and Recovery 12.4 System Failures and Recovery 12.5 Media Failures and Recovery Wei-Pang Yang, Information
More informationDB2 backup and recovery
DB2 backup and recovery IBM Information Management Cloud Computing Center of Competence IBM Canada Lab 1 2011 IBM Corporation Agenda Backup and recovery overview Database logging Backup Recovery 2 2011
More informationCS 245 Final Exam Winter 2013
CS 245 Final Exam Winter 2013 This exam is open book and notes. You can use a calculator and your laptop to access course notes and videos (but not to communicate with other people). You have 140 minutes
More informationRecover EDB and Export Exchange Database to PST 2010
Recover EDB and Export Exchange Database to PST 2010 Overview: The Exchange Store (store.exe) is the main repository of Exchange Server 2010 edition. In this article, the infrastructure of store.exe along
More information! Volatile storage: ! Nonvolatile storage:
Chapter 17: Recovery System Failure Classification! Failure Classification! Storage Structure! Recovery and Atomicity! Log-Based Recovery! Shadow Paging! Recovery With Concurrent Transactions! Buffer Management!
More informationDistributed Architectures. Distributed Databases. Distributed Databases. Distributed Databases
Distributed Architectures Distributed Databases Simplest: client-server Distributed databases: two or more database servers connected to a network that can perform transactions independently and together
More informationDatabase Concurrency Control and Recovery. Simple database model
Database Concurrency Control and Recovery Pessimistic concurrency control Two-phase locking (2PL) and Strict 2PL Timestamp ordering (TSO) and Strict TSO Optimistic concurrency control (OCC) definition
More informationSynchronization and recovery in a client-server storage system
The VLDB Journal (1997) 6: 209 223 The VLDB Journal c Springer-Verlag 1997 Synchronization and recovery in a client-server storage system E. Panagos, A. Biliris AT&T Research, 600 Mountain Avenue, Murray
More informationESSENTIAL SKILLS FOR SQL SERVER DBAS
elearning Event ESSENTIAL SKILLS FOR SQL SERVER DBAS Session 2 Session 2 Session 1 DBAS: What, Why, and How. Primary Focus of DBAs: Availability and Security Basic SQL Server Engine and Security. Session
More informationDatabase Performance Monitor Utility
Database Performance Monitor Utility In the past five years, I am managing the world s biggest database system for online payment service (AliPay of Alibaba Group), it handles 100 million trades on 2012/11/11,
More informationModule 2: Database Architecture
Module 2: Database Architecture Overview Schema and Data Structure (Objects) Storage Architecture Data Blocks, Extents, and Segments Storage Allocation Managing Extents and Pages Tablespaces and Datafiles
More informationDBMaster. Backup Restore User's Guide P-E5002-Backup/Restore user s Guide Version: 02.00
DBMaster Backup Restore User's Guide P-E5002-Backup/Restore user s Guide Version: 02.00 Document No: 43/DBM43-T02232006-01-BARG Author: DBMaster Production Team, Syscom Computer Engineering CO. Publication
More informationChapter 15: Recovery System
Chapter 15: Recovery System Failure Classification Storage Structure Recovery and Atomicity Log-Based Recovery Shadow Paging Recovery With Concurrent Transactions Buffer Management Failure with Loss of
More informationWhat's the Point of Oracle Checkpoints? Harald van Breederode Oracle University 29-OCT-2009
What's the Point of Oracle Checkpoints? Harald van Breederode Oracle University 29-OCT-2009 1 About Me Senior Principal DBA Trainer Oracle University 25 years Unix Experience 12 years Oracle DBA Experience
More informationInformation Systems. Computer Science Department ETH Zurich Spring 2012
Information Systems Computer Science Department ETH Zurich Spring 2012 Lecture VI: Transaction Management (Recovery Manager) Recovery Manager ETH Zurich, Spring 2012 Information Systems 3 Failure Recovery
More informationRecovery Principles of MySQL Cluster 5.1
Recovery Principles of MySQL Cluster 5.1 Mikael Ronström Jonas Oreland MySQL AB Bangårdsgatan 8 753 20 Uppsala Sweden {mikael, jonas}@mysql.com Abstract MySQL Cluster is a parallel main memory database.
More informationOracle 12c Multitenant and Encryption in Real Life. Christian Pfundtner
Oracle 12c Multitenant and Encryption in Real Life Christian Pfundtner Christian Pfundtner, DB Masters GmbH Over 20 years of Oracle Database OCA, OCP, OCE, OCM, ACE Our Credo: Databases are our world 4
More informationChapter 16: Recovery System
Chapter 16: Recovery System Failure Classification Failure Classification Transaction failure : Logical errors: transaction cannot complete due to some internal error condition System errors: the database
More informationOpenSAF A Standardized HA Solution
OpenSAF A Standardized HA Solution LinuxCON Edinburgh, UK 2013-10-21 Anders Widell Ericsson AB Outline What are OpenSAF and SA Forum? What is Service Availability? Simple Use Case: Web server The OpenSAF
More informationDatabases and Information Systems 1 Part 3: Storage Structures and Indices
bases and Information Systems 1 Part 3: Storage Structures and Indices Prof. Dr. Stefan Böttcher Fakultät EIM, Institut für Informatik Universität Paderborn WS 2009 / 2010 Contents: - database buffer -
More informationOracle Database 11g: Administration Workshop I
Oracle Database 11g: Administration Workshop I Volume 1 - Student Guide D50102GC10 Edition 1.0 September 2007 D52683 L$ Authors Priya Vennapusa James Spiller Maria Billings Technical Contributors and Reviewers
More informationRestore and Recovery Tasks. Copyright 2009, Oracle. All rights reserved.
Restore and Recovery Tasks Objectives After completing this lesson, you should be able to: Describe the causes of file loss and determine the appropriate action Describe major recovery operations Back
More informationOracle. Brief Course Content This course can be done in modular form as per the detail below. ORA-1 Oracle Database 10g: SQL 4 Weeks 4000/-
Oracle Objective: Oracle has many advantages and features that makes it popular and thereby makes it as the world's largest enterprise software company. Oracle is used for almost all large application
More informationTransactional Information Systems:
Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery Gerhard Weikum and Gottfried Vossen 2002 Morgan Kaufmann ISBN 1-55860-508-8 Teamwork is essential.
More informationConfiguring Security for FTP Traffic
2 Configuring Security for FTP Traffic Securing FTP traffic Creating a security profile for FTP traffic Configuring a local traffic FTP profile Assigning an FTP security profile to a local traffic FTP
More informationKEYWORDS InteractX, database, SQL Server, SQL Server Express, backup, maintenance.
Document Number: File Name: Date: 10/16/2008 Product: InteractX, SQL Server, SQL Server Application Note Associated Project: Related Documents: BackupScript.sql KEYWORDS InteractX, database, SQL Server,
More informationThis article Includes:
Log shipping has been a mechanism for maintaining a warm standby server for years. Though SQL Server supported log shipping with SQL Server 2000 as a part of DB Maintenance Plan, it has become a built-in
More informationWhy and How You Should Be Using Policy-Managed RAC Databases
Why and How You Should Be Using Policy-Managed RAC Databases Mark V. Scardina Director of Product Management Oracle Quality of Service Management 1 Copyright 2012, Oracle and/or its affiliates. All rights
More informationHardware/Software Guidelines
There are many things to consider when preparing for a TRAVERSE v11 installation. The number of users, application modules and transactional volume are only a few. Reliable performance of the system is
More informationRestore Scenarios What to keep in mind. Pedro A. Lopes PFE
Restore Scenarios What to keep in mind Pedro A. Lopes PFE Backup types Full Backup Differential Backup (Database or FG) Transaction Log Backup (Tail of the Log) Partial Backup (Piecemeal - Filegroup) Mirrored
More informationDelivery Method: Instructor-led, group-paced, classroom-delivery learning model with structured, hands-on activities.
Course Code: Title: Format: Duration: SSD024 Oracle 11g DBA I Instructor led 5 days Course Description Through hands-on experience administering an Oracle 11g database, you will gain an understanding of
More informationORACLE CORE DBA ONLINE TRAINING
ORACLE CORE DBA ONLINE TRAINING ORACLE CORE DBA THIS ORACLE DBA TRAINING COURSE IS DESIGNED TO PROVIDE ORACLE PROFESSIONALS WITH AN IN-DEPTH UNDERSTANDING OF THE DBA FEATURES OF ORACLE, SPECIFIC ORACLE
More informationRemus: : High Availability via Asynchronous Virtual Machine Replication
Remus: : High Availability via Asynchronous Virtual Machine Replication Brendan Cully, Geoffrey Lefebvre, Dutch Meyer, Mike Feeley,, Norm Hutchinson, and Andrew Warfield Department of Computer Science
More informationLesson 12: Recovery System DBMS Architectures
Lesson 12: Recovery System DBMS Architectures Contents Recovery after transactions failure Data access and physical disk operations Log-Based Recovery Checkpoints Recovery With Concurrent Transactions
More informationDerby: Replication and Availability
Derby: Replication and Availability Egil Sørensen Master of Science in Computer Science Submission date: June 2007 Supervisor: Svein Erik Bratsberg, IDI Norwegian University of Science and Technology Department
More informationDatabase System Architecture & System Catalog Instructor: Mourad Benchikh Text Books: Elmasri & Navathe Chap. 17 Silberschatz & Korth Chap.
Database System Architecture & System Catalog Instructor: Mourad Benchikh Text Books: Elmasri & Navathe Chap. 17 Silberschatz & Korth Chap. 1 Oracle9i Documentation First-Semester 1427-1428 Definitions
More informationVirtual Machine Synchronization for High Availability Clusters
Virtual Machine Synchronization for High Availability Clusters Yoshiaki Tamura, Koji Sato, Seiji Kihara, Satoshi Moriai NTT Cyber Space Labs. 2007/4/17 Consolidating servers using VM Internet services
More informationAvailability Digest. MySQL Clusters Go Active/Active. December 2006
the Availability Digest MySQL Clusters Go Active/Active December 2006 Introduction MySQL (www.mysql.com) is without a doubt the most popular open source database in use today. Developed by MySQL AB of
More informationCloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise
Cloud Service Model Selecting a cloud service model Different cloud service models within the enterprise Single cloud provider AWS for IaaS Azure for PaaS Force fit all solutions into the cloud service
More informationSQL Server Training Course Content
SQL Server Training Course Content SQL Server Training Objectives Installing Microsoft SQL Server Upgrading to SQL Server Management Studio Monitoring the Database Server Database and Index Maintenance
More informationCOURCE TITLE DURATION. Oracle Database 11g: Administration Workshop I
COURCE TITLE DURATION DBA 11g Oracle Database 11g: Administration Workshop I 40 H. What you will learn: This course is designed to give students a firm foundation in basic administration of Oracle Database
More informationOptimizing SQL Server 2012 for SharePoint 2013. SharePoint Saturday/Friday, Honolulu March 27, 2015
Optimizing SQL Server 2012 for SharePoint 2013 SharePoint Saturday/Friday, Honolulu March 27, 2015 With Mahalo to our sponsors: Mahalo! About the Speaker Brian Alderman (MCT / Author / Speaker / Consultant)
More informationAvoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas
3. Replication Replication Goal: Avoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas Problems: Partial failures of replicas and messages No
More informationOracle server: An Oracle server includes an Oracle Instance and an Oracle database.
Objectives These notes introduce the Oracle server architecture. The architecture includes physical components, memory components, processes, and logical structures. Primary Architecture Components The
More informationUnderstand Troubleshooting Methodology
Understand Troubleshooting Methodology Lesson Overview In this lesson, you will learn about: Troubleshooting procedures Event Viewer Logging Resource Monitor Anticipatory Set If the workstation service
More informationOracle Database 11g: Administration Workshop I Release 2
Oracle University Contact Us: 1.800.529.0165 Oracle Database 11g: Administration Workshop I Release 2 Duration: 5 Days What you will learn This Oracle Database 11g: Administration Workshop I Release 2
More informationImplementing and Managing Windows Server 2008 Hyper-V
Course 6422A: Implementing and Managing Windows Server 2008 Hyper-V Length: 3 Days Language(s): English Audience(s): IT Professionals Level: 300 Technology: Windows Server 2008 Type: Course Delivery Method:
More informationHow To Recover From Failure In A Relational Database System
Chapter 17: Recovery System Database System Concepts See www.db-book.com for conditions on re-use Chapter 17: Recovery System Failure Classification Storage Structure Recovery and Atomicity Log-Based Recovery
More informationOpenClovis Product Presentation
OpenClovis Product Presentation 2014 Corporate Background! Founded in 2002! Open Source business model! Profitable since 2008! $40M invested on products! Product Release 6.0 is mature and shipping! SAF
More informationOracle Database 11g: Administration Workshop I Release 2
Oracle University Contact Us: (+202) 35 35 02 54 Oracle Database 11g: Administration Workshop I Release 2 Duration: 5 Days What you will learn This course is designed to give you a firm foundation in basic
More informationA SURVEY OF POPULAR CLUSTERING TECHNOLOGIES
A SURVEY OF POPULAR CLUSTERING TECHNOLOGIES By: Edward Whalen Performance Tuning Corporation INTRODUCTION There are a number of clustering products available on the market today, and clustering has become
More informationNEXTGEN v5.8 HARDWARE VERIFICATION GUIDE CLIENT HOSTED OR THIRD PARTY SERVERS
This portion of the survey is for clients who are NOT on TSI Healthcare s ASP and are hosting NG software on their own server. This information must be collected by an IT staff member at your practice.
More informationOracle Database Links Part 2 - Distributed Transactions Written and presented by Joel Goodman October 15th 2009
Oracle Database Links Part 2 - Distributed Transactions Written and presented by Joel Goodman October 15th 2009 About Me Email: Joel.Goodman@oracle.com Blog: dbatrain.wordpress.com Application Development
More informationTuning Microsoft SQL Server for SharePoint. Daniel Glenn
Tuning Microsoft SQL Server for SharePoint Daniel Glenn Daniel Glenn @DanielGlenn http://knowsp.com SharePoint and Collaboration Practice Leader @ InfoWorks, Inc. www.infoworks-tn.com PASS Nashville Business
More informationHigh Availability Databases based on Oracle 10g RAC on Linux
High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN, June 2006 Luca Canali, CERN IT Outline Goals Architecture of an HA DB Service Deployment at the CERN Physics Database
More informationWould-be system and database administrators. PREREQUISITES: At least 6 months experience with a Windows operating system.
DBA Fundamentals COURSE CODE: COURSE TITLE: AUDIENCE: SQSDBA SQL Server 2008/2008 R2 DBA Fundamentals Would-be system and database administrators. PREREQUISITES: At least 6 months experience with a Windows
More informationOracle 11g DBA Training Course Content
Oracle 11g DBA Training Course Content ORACLE 10g/11g DATABASE ADMINISTRATION CHAPTER1 Important Linux commands Installing of Redhat Linux as per oracle database requirement Installing of oracle database
More informationModule 07. Log Shipping
Module 07 Log Shipping Agenda Log Shipping Overview SQL Server Log Shipping Log Shipping Failover 2 Agenda Log Shipping Overview SQL Server Log Shipping Log Shipping Failover 3 Log Shipping Overview Definition
More informationUse RMAN to relocate a 10TB RAC database with minimum downtime. Tao Zuo tao_zuo@npd.com NPD Inc. 9/2011
Use RMAN to relocate a 10TB RAC database with minimum downtime Tao Zuo tao_zuo@npd.com NPD Inc. 9/2011 Contents Methods of relocate a database with minimum down time RMAN oracle suggested backup strategy
More informationBackup, Restore and Options for SQL Server
Backup, Restore and Options for SQL Server Housekeeping Please be sure to answer survey (above video window) Ask questions at any time Viewing Tip Enlarge Slides Now You can enlarge the window with the
More informationImproving Transaction-Time DBMS Performance and Functionality
Improving Transaction-Time DBMS Performance and Functionality David B. Lomet #, Feifei Li * # Microsoft Research Redmond, WA 98052, USA lomet@microsoft.com * Department of Computer Science Florida State
More information