Alargenumberofapplications(e.g.,callroutingandswitchingintelecommunic-

Size: px
Start display at page:

Download "Alargenumberofapplications(e.g.,callroutingandswitchingintelecommunic-"

Transcription

1 RajeevRastogiPhilipBohannonJamesParker DistributedMulti-LevelRecoveryin S.SeshadriyAviSilberschatzS.Sudarshany yindianinstituteoftechnology,bombay,india Main-MemoryDatabases BellLaboratories,MurrayHill,NJ ofupdates.theschemesoerdierenttradeos,basedonfactorssuchas onebasedonpageshipping,andtheotherbasedonbroadcastingofthelog presentarecoveryschemeforclient-serverarchitectures,basedonshippinglog updaterates. recordstotheserver,andtworecoveryschemesforshared-diskarchitectures databases,specicallyforclient-serverandshared-diskarchitectures.we Inthispaperwepresentrecoverytechniquesfordistributedmain-memory Ourtechniquesareextensionstoadistributed-memorysettingofacent- Abstract thesystemlog.further,thetechniquesuseafuzzycheckpointingscheme cessing,anduseper-transactionredoandundologstoreducecontentionon mentedinthedalmain-memorydatabasesystem.ourcentralizedaswell ralizedrecoveryschemeformain-memorydatabases,whichhasbeenimple- schemesalsosupportconcurrentupdatestothesamepageatdierentsites. thatwritesonlydirtypagestodisk,yetminimallyinterfereswithnormal asdistributed-memoryrecoveryschemeshaveseveralattractivefeatures reducediski/obywritingonlyredologrecordstodiskduringnormalpro- evenacquirealatchbeforeupdatingapage.ourlogshipping/broadcasting processing allbutoneofourrecoveryschemesdonotrequireupdatersto theysupportanexplicitmulti-levelrecoveryabstractionforhighconcurrency, ytheworkoftheseauthorswasperformedinpartwhiletheywereatbelllabs. 0

2 thatisdisk-resident.anattractiveapproachtoprovidingapplicationswithlow (andpredictable)responsetimesistoloadtheentiredatabaseintomain-memory. thehighperformanceneedsofsuchapplicationsduetothelatencyofaccessingdata ofmilliseconds.traditionaldisk-baseddatabasesystemsareincapableofmeeting ations,nancialapplications,automationcontrol)requirehighperformanceaccess todatawithresponsetimerequirementsoftheorderofafewmillisecondstotens 1Introduction Databasesforsuchapplicationsareoftenoftheorderoftensorhundredsofmegabytes,whichcaneasilybesupportedinmain-memory.Further,machineswithmain Alargenumberofapplications(e.g.,callroutingandswitchingintelecommunic- ofram,machineswithsuchlargemainmemorieswillbecomecheaperandmore alargebuer-cachetoatraditionaldisk-basedsystem.incontrast,inamainmemorydatabasesystem(mmdb)(see,e.g.,[gms92,lsc92,jlr+94,dko+84]), memorypointers,orindirectlyvialocationindependentdatabaseosetsthatcan pages.also,objectslargerthanthesystem'spagesizecanbestoredcontiguously, interactwithabuermanager,eitherforlocatingdata,orforfetching/pinningbuer processandlockedinmemory.datacanbeaccessedeitherdirectlybyvirtual bequicklytranslatedtomemoryaddresses.duringdataaccess,thereisnoneedto Oneapproachforimplementingsuchhighperformancedatabasesistoprovide memoriesof8gigabytesormorearealreadyavailable,andwiththefallingprice common. theentiredatabasecanbedirectlymappedintothevirtualaddressspaceofthe therebysimplifyingretrievalorin-placeuse.thus,dataaccessusingamain-memory databaseisveryfastcomparedtousingdisk-basedstoragemanagers,evenwhen furtherperformanceimprovementsforanumberofapplications.forexample,considerapplicationsinwhichtransactionsarepredominantlyread-onlyandupdate ratesarelow(e.g.,numbertranslationandcallroutingintelecommunications). Eachmachinecanlocallyaccessdatacachedinmemory,thusavoidingnetworkcommunicationwhichcouldbefairlyexpensive.AnotherexampleisComputerAided arelong,andinteractiveresponsetimeisveryimportant. Designapplications,wherelocalityofreferenceisveryhigh,updatetransactions Distributedarchitecturesinwhichseveralmachinesareconnectedbyafastnet- thedisk-basedmanagerhassucientmemorytocachealldatapages. work,andperformdatabaseaccessesandupdatesinparallel,providesignicant toahot-sparesincetheloadcanbedistributedinthenon-failurecaseleadingto Inthiscase,especiallywithlowupdaterates,adistributeddatabaseispreferable criticalapplications,evenifdatatseasilyinasinglemachine'smain-memory. improvedperformance. isbasedonthemain-memoryrecoveryschemepresentedin[jss93].therecovery whichundologrecordsarekeptinmemoryandonlywrittentodiskifrequiredfor schemeof[jss93]providesimportantfeaturessuchastransientundologgingin Distributionalsoenhancesfaulttolerance,whichisrequiredinmanymission- TherecoveryschemeusedintheDalmain-memorydatabasesystem[JLR+94] 1

3 checkpointing,per-transactionlogsinmemorytoreducecontentiononthesystem logtail,andrecoveryusingonlyasinglepassoverthesystemlog.therecovery tothedistributedmemorycase,simultaneouslymaintainingtheadvantagesofthe ([WHBM90,MHL+92,Lom92]),andfuzzycheckpointing[SGM90a,Hag86]. single-sitescheme,andecientlysupportingtheapplicationsdescribedabove.for schemeusedindalprovidesseveralfurtherextensions,suchasmulti-levelrecovery example,wecanmakeuseoftransientundologgingtoreducethesizeofthelog protocols. writtentodisk,aswellasthesizeofthelogsentacrossnetworklinksindistributed ThegoaloftheworkdescribedherewastoextendtheDalrecoveryscheme tionexecutesatasinglesite,fetchingdata(pages)asrequiredfromothersites. Distributedcommitprotocolsarenotneededasin\function-shipping"environ- client-serverarchitectures,andthesecondandthirdforshareddiskarchitectures. Theseareall\data-shipping"schemes(see,e.g.,[FZT+92])inwhichatransacments.Whileshareddiskarchitectureshavetraditionallybeencloselytiedtohardwareplatforms(e.g.,VAXCluster),UNIX-basedshareddiskplatformsandnetwork ofworkstationarchitectureswithsimilarperformancecharacteristicsarebecoming morecommon. Wepresentthreedistinctbutrelateddistributedrecoveryschemes{therstfor isthatconcurrentupdatesarepossibleatgranularitiessmallerthanapage-size. Therebyminimizing\false-sharing"(thatis,apparentconictsduetocoarse-granularitylocking)andconsequently,needlessnetworkaccessestoresolvefalsesharing. recoveryalgorithms,suchastransientundologging,explicitmulti-levelrecovery, Ourdistributedrecoveryalgorithmsprovidetheadvancedfeaturesofourcentralized andfuzzycheckpointing.siteorglobalrecoveryrequiresonlyasinglepassover checkpoint. thesystemlog,startingfromtheendofthesystemlogrecordedinthemostrecent Akeypropertyoftheclient-serverschemeandoneoftheshareddiskschemes thepaper. 2OverviewofMain-MemoryRecovery Sections6and7presentourshareddiskrecoveryalgorithms.Section8concludes multi-levelrecoveryandthesingle-sitealgorithmonwhichthepresentworkisbased recoveryalgorithminsection4.section5describesourshareddiskmodel,while insection2.relatedworkispresentedinsection3.wepresentourclient-server Theremainderofthepaperisorganizedasfollows.Wepresentbackgroundon (S)modesthatguardupdatesandaccessestotheregion,respectively. singleassociatedlock,referredtoastheregionlock,withexclusive(x)andshared Inthissectionwepresentareviewofmulti-levelrecoveryconceptsandanoverview anobject,oranarbitrarydatastructurelikealistoratree.eachregionhasa detailsofourschemearedescribedin[bpr+96]. ofthesingle-sitemain-memoryrecoveryschemeusedinthedalsystem.low-level Inourscheme,dataislogicallyorganizedintoregions.Aregioncanbeatuple, 2

4 Figure1:OverviewofRecoveryStructures Redo Log Database Undo Log Dirty Page Table Trans. Local Logs Active Trans. Table End Stable Log System Log Tail In Main Memory logginghasbeendonephysically(e.g.recordingexactlywhichbytesweremodied Stable System Log On Disk End toinsertakeyintotheindex)thenthetransactionmanagementsystemmustensure 2.1Multi-LevelRecovery Stable cur_ckpt Database thatthesephysicalundodescriptionsarevaliduntiltransactioncommit.sincethe Ckpt A ckpt_dpt descriptionsrefertobytechangesatspecicpositions,thistypicallyimpliesthat untiltransactioncommitleadstounacceptablylowlevelsofconcurrency.ifundo Multi-levelrecovery[WHBM90,MHL+92,Lom92]providesrecoverysupportfor theuseofweakeroperationlocksinplaceofstrongershared/exclusiveregionlocks. enhancedconcurrencybasedonthesemanticsofoperations.specically,itpermits Acommonexampleisindexmanagement,whereholdingphysicalregionlocks Ckpt B Active Trans Table (ATT) theregionlocksontheupdatedindexnodesmustbeheldtilltransactioncommit (undo logs) Oncethisreplacementismade,theregionlocksmaybereleasedandonly(less restrictive)operationlocksareretained.forexample,regionlocksontheparticular index. toensurecorrectrecovery,inadditiontoconsiderationsforconcurrentaccesstothe replacedbyalogicalundorecordindicatingthattheinsertedkeymustbedeleted. theoperationlevel.thus,foraninsertoperation,physicalundorecordswouldbe nodesinvolvedinaninsertcanbereleased,whileanoperationlockonthenewly recordswithhigherlevellogicalundologrecordscontainingundodescriptionsat insertedkeythatpreventsthekeyfrombeingaccessedordeletedisheld. Themulti-levelrecoveryapproachistoreplacetheselow-levelphysicalundolog storedondiskare1)curckpt,an\anchor"pointingtothemostrecentvalidcheck- isinmainmemory,with(two)checkpointimagesckptaandckptbondisk.also 2.2SystemOverview Figure1givesanoverviewofthestructuresusedforrecovery.Thedatabase(a sequenceofxedsizepages)ismappedintotheaddressspaceofeachprocessand 3

5 withitstailinmemory.thevariableendofstablelogstoresapointerintothesystemlogsuchthatallrecordspriortothepointerareknowntohavebeenushedto separateredoandundologsforactivetransactions,inadditiontoinformationabout thestablesystemlog. storedwitheachcheckpointimage.thedirtypagetableinacheckpointimageis referredtoasckptdpt. transactionstatus.adirtypagetable,dpt,ismaintainedinmemorytorecordpages TheATT(withundologs,butwithoutredologs)andthedirtypagetablearealso thathavebeenupdatedsincethelastcheckpoint.forsimplicityofpresentation, weassumethatthedirtypageismaintainedasabitmapwithonebitperpage. Thereisasingleactivetransactiontable(ATT)inmain-memorywhichstores pointimageforthedatabase,and2)asinglesystemlogcontainingredoinformation, associatedwithit.anoperationatlevellicanconsistofasequenceofoperations to[lom92].webrieydescribethemodelbelow.eachoperationhasalevelli Transactions,inourmodel,consistofasequenceofmulti-leveloperations,following 2.3TransactionsandOperations memory,establishingapointintheserializationorder,andcommitwhenthecommit distinguishbetweenpre-commit,whenthecommitrecordentersthesystemlogin Ln?1.PhysicalupdatestoregionsarelevelL0operations.Fortransactions,we atlevelli?1.transactions,assumedtobeatlevelln,calloperationsatlevel commits;similarly,anoperationlockatlevelliishelduntilthetransactionorthe containingoperation(atlevelli+1)commits.allthelocksacquiredbyatransaction byotheractivetransactions.levell0operationsobtainregionlocksinsteadof operationlocks.thelocksontheregionarereleasedoncethel1operationpre- operationlockisgrantediftheoperationcommuteswithotheroperationlocksheld interchangeablysincebothrefertothetimewhenthecommitrecordentersthe recordhitsthestablelog.foroperations,weusethetermscommitandpre-commit systemloginmemory. arereleasedonceitcommits.1 Eachtransactionobtainsanoperationlockbeforeitexecutesanoperation;the Therecoveryalgorithmmaintainsseparatelocalundoandredologsinmemoryfor 2.4LoggingModel mayreaduncommitteddata,andtheircommitmustbedelayeduntilthedirtydatatheyhaveread hasbeencommitted. andredologrecordsthatareappendedtotherespectivelocallog.whenatransaction/operationpre-commits,thecurrentcontentsofthetransaction'slocalredo logareappendedtothesystemlogtailinmemory,andthelogicalundodescription intheatt.eachphysicalupdate(toapartofaregion)generatesphysicalundo eachtransaction.thesearestoredasalinkedlistoanentryforthetransaction 1Itispossibletoreleaselocksforatransactiononpre-commit;asaresultread-onlytransactions 4

6 arewrittentothesystemlogduringnormalprocessing. systemlog.thus,withtheexceptionoflogicalundodescriptors,onlyredorecords fortheoperationisincludedinanoperationcommitlogrecordappendedtothe recordwrittentodisk,pagestouchedbytheupdateonthelogrecordaremarked tions/updatesarereplacedinthetransaction's(local)undologwithalogicalundo dirtyinthedirtypagetable,dpt,bytheushingprocedure.inoursingle-siterecoveryscheme,updateactionsdonotobtainlatchesonpages{insteadregionlocksare obtainedtoensurethatupdatesdonotinterferewitheachother.3eliminatinglatchingsignicantlydecreasesaccesscostsinmain-memory,andreducesprogramming Thesystemlogisushedtodiskwhenatransactioncommits.Foreachredolog logrecordcontainingtheundodescriptionfortheoperation.in-memoryundologs oftransactionsthathavecommittedaredeletedsincetheyarenotrequiredagain.2 Also,whenanoperationpre-commits,theundologrecordsforitssubopera- totheredolog.(ourdistributed-memoryschemes,withtheexceptionofoneofthe assettingofdirtybitsforthepage,arenowperformedbasedonlogrecordswritten complexity.recoveryrelatedactionsthatarenormallytakenonpagelatching,such latchingtoensurecachecoherency,whichisnotaprobleminthesingle-sitecase.) shared-diskschemes,donotobtainpagelatcheseither;thesoleexceptionusespage Theredologisusedasasingleunifyingresourcetocoordinatetheapplication's diersslightlyfromtheterminologyused,forexample,inaries[mhl+92]. interactionwiththerecoverysystem,andthisapproachhasprovenveryuseful. 2.5Ping-pongCheckpointing pointtomeanacopyofthemain-memorydatabasewhichisstoredondisk,andthe termcheckpointingtorefertotheactionofcreatingacheckpoint.thisterminology Consistentwiththeterminologyinmain-memorydatabases,weusethetermcheck- undologsforupdatesonapageareushedtodiskbeforethepageisushedto disk.insuchsystems,toguaranteethewalproperty,typicallyalatchonapage isobtained,alllogrecordspertainingtothepageareushedtostablestoragethe latch,therebypreventingconcurrentupdateswhileapageisbeingushedtodisk. Asaresultofnotobtaininglatchesonpagesduringupdates,itisnotpossibleto beingwrittenout. pageiscopiedtodisk,andthelatchreleased.updatersalsoobtainthesamepage enforcethewrite-aheadloggingpolicy,sincepagesmaybeupdatedevenastheyare Traditionalrecoveryschemesimplementwrite-aheadlogging(WAL),wherebyall thetransactionaborting. ofatupletochange,theninadditiontoaregionlockonthetuple,anxmoderegionlockonthe storageallocationstructuresonthepagemustbeobtained. turesmayneedtobeobtained.forexample,inapagebasedsystem,ifanupdatecausesthesize pointing(see,e.g.,[sgm90b]).inping-pongcheckpointingtwocopiesofthedata- baseimagearestoredondisk,andalternatecheckpointswritedirtypagestoaltern- Instead,ourrecoveryalgorithmmakesuseofastrategycalledping-pongcheck- 2Thelogscanbedeletedonpre-commit,since,shortofasystemcrash,nothingcanresultin 3Incaseswhenregionsizeschange,certainadditionalregionlocksonstorageallocationstruc- 5

7 thatisbeingcreatedtobetemporarilyinconsistent;i.e.,updatesmayhavebeen outtobringthecheckpointtoaconsistentstate.evenifafailureoccurswhile writtenoutwithoutcorrespondingundorecordshavingbeenwritten.however, recovery. atecopies.writingalternatecheckpointstoalternatecopiespermitsacheckpoint creatingonecheckpoint,theothercheckpointisstillconsistentandcanbeusedfor afterwritingoutdirtypages,sucientredoandundologinformationiswritten incompletepagewritesresultingfrom,forexample,powerfailures.incompletepage ingdoesnothaveaveryhighspacepenalty,sincediskspaceismuchcheaper writescausenoproblemswithping-pongcheckpointing,sincethepreviouscheckpointimageisstillavailable.ping-pongcheckpointingalsopermitssomephysical Keepingtwocopiesofamain-memorydatabaseondiskforping-pongcheckpoint- realitytheyarenot,andcomplexschemesareneededtodetectandrecoverfrom Forinstance,althoughmanyrecoveryschemesassumepagewritesareatomic,in andlogicalconsistencycheckstobeperformedonthecheckpointbeforedeclaring thanmain-memory.further,ping-pongcheckpointinghasseveralotherbenets. thatwereeitherdirtyintheckptdptofthelastcompletedcheckpoint,ordirtyin itsuccessfullycompleted. outthatweremodiedsincethecurrentcheckpointimagewaspreviouslywritten, thoseofthedptandthedptiszeroed(notingofendofstablelogandzeroingofdpt usingthischeckpoint.next,thecontentsofthe(in-memory)ckptdptaresetto aredoneatomicallywithrespecttoushing).thepageswrittenoutarethepages Thisisthestartpointforscanningthesystemlogwhenrecoveringfromacrash thecurrent(in-memory)ckptdpt,orinboth.inotherwords,allpagesarewritten stableloginthevariableendofstablelog,whichwillbestoredwiththecheckpoint. Beforewritinganydirtydatatodisk,thecheckpointnotesthecurrentendofthe toensurethatupdatesdescribedbylogrecordsprecedingthecurrentcheckpoint's namely,pagesthatweredirtiedsincethelast-but-onecheckpoint.thisisnecessary endofstableloghavemadeitinthedatabaseimageinthecurrentcheckpoint. interferingwithnormaloperations.thecheckpointimageisthusfuzzy.fuzzy checkpointinghowevercouldresultintwoproblemsforrecovery: Checkpointswriteoutdirtypageswithoutobtaininganylatchesandtherebyavoid Therstproblemissolvedbyourpolicyofalwayswritingphysicalredologrecords. Byapplyingphysicalredologrecords(whoseeectsareidempotent)toacheckpoint pageimagewecanensurethatwecanobtainapageimagethatdoesnotcontain thecheckpointpageimagemaycontainpartialupdatesofanoperation anypartialupdates. theundologrecordforanupdatemaynotbeinthestablesystemlog(which madeittothecheckpointimage,oneofthefollowingholds:1)correspondingphysicalundologrecordsarewrittenouttodiskafterthedatabaseimagehasbeen Thesecondproblemissolvedbyensuringthatforanyupdatewhoseeectshave checkpoint). couldresultinaproblemifthesystemweretocrashimmediatelyafterthe 6

8 writtenor2)allphysicalredologrecordsfortheoperation(correspondingtothe partialupdate)aswellasthelogicalundodescriptorintheoperationcommitlog recordareonstablestorage.thisisperformedbycheckpointingtheattand ushingthelogaftercheckpointingthedata.thecheckpointoftheattwritesout alllogrecordscorrespondingtotheoperation(containingthepartialupdate)aswell removedfromtheattbeforethecheckpointoftheatt,thelogushensuresthat undologrecords,aswellassomeotherstatusinformation.incasetheoperation containingthepartialupdatecompletesandconsequentlytheundologrecordsare astheoperationcommitlogrecordareonstablestorage.thecheckpointisdeclared partofthetransaction. bytraversingtheundologbackwardsfromtheend.transactionabortiscarried outbyexecuting,inreverseorder,everyundorecordjustasiftheexecutionwere dates/operationsdescribedbylogrecordsinthetransaction'sundologareundone Whenatransactionaborts,thatis,doesnotsuccessfullycompleteexecution,up- 2.6AbortProcessing completed(andconsistent)bytogglingcurckpttopointtothenewcheckpoint. whentheproxyoperationcommits,allitsundologrecordsaredeletedalongwith theproxyoperationservesapurposesimilartothatservedbycompensationlogrecords(clrs)inaries{duringrestartrecovery,whenitisencountered,thelogicacordsarecreatedforeachphysicalundorecordencounteredduringtheabort.sim- Followingthephilosophyofrepeatinghistory[MHL+92],newphysicalredologre- thelogicalundorecordfortheoperationthatwasundone.thecommitrecordfor formedbytheoperationaregeneratedasduringnormalprocessing.furthermore, operationisexecutedbasedontheundodescription.logrecordsforupdatesperilarly,foreachlogicalundorecordencountered,anew\compensation"or\proxy" RestartrecoverybeginsbyinitializingtheATTandtransactionundologstothe undolog,thuspreventingitfrombeingundoneagain. undologrecordfortheoperationthatwasundoneisdeletedfromthetransaction's 2.7Recovery beforethedatabaseimageischeckpointed.thisvalueofendofstablelogbecomes andsetsdpttozero.next,recoveryprocessesredologrecords.recallthataspartof the\beginrecoverypoint"forthecheckpointoncethecheckpointhascompleted. ATTandundologsstoredinthemostrecentcheckpoint,loadsthedatabaseimage Allupdatesdescribedbylogrecordsprecedingthispointareguaranteedtobe thecheckpointoperation,theendofthesystemlogondisk,endofstablelog,isnoted recordfortheoperationisnotfoundinthesystemlog.suchlogrecordsrepresent forthelastcompletedcheckpointofthedatabaseareapplied.restartrecovery reectedinthecheckpointeddatabaseimage. ignoresredologrecordsforupdatesperformedbyanoperationifthecommitlog Thus,duringrestartrecoveryonlyredologrecordsfollowingtheendofstablelog 7

9 uncommittedupdates,andmaynothavecorrespondingundorecordsinthecheckpointedatt.however,iftheundorecordsareabsent,theeectsofthelogrecords willnotbereectedinthecheckpointeddatabaseimage.suchrecordswouldbe dirtyforeachlogrecordandnecessaryactionsaretakentokeepthecheckpointed presentonlyduetoacrashwhilethelogrecordsforanoperationwerebeingushed. imageoftheattconsistentwiththelogasitisapplied.theseactionsontheatt mirrortheactionstakenduringnormalprocessing.forexample,whenanoperation commitlogrecordisencountered,lowerlevellogrecordsinthetransaction'sundo logfortheoperationarereplacedbyahigherlevelundodescription. Duringtheapplicationofredologrecords,appropriatepagesindptaresetto rolledbackisveryimportant,sothatanundoatlevelliseesdatastructuresthat rolledback.however,theorderinwhichoperationsofdierenttransactionsare areconsistent[lom92].first,alloperations(acrossalltransactions)atl0that back.todothis,allcompletedoperationsthathavebeeninvokeddirectlybythe transaction,orhavebeendirectlyinvokedbyanincompleteoperation,havetobe mustberolledbackarerolledback,followedbyalloperationsatlevell1,thenl2 andsoon. Oncealltheredologrecordshavebeenapplied,theactivetransactionsarerolled 3ConnectiontoRelatedWork operationcommitswhenundooperationscomplete(similartoclrsdescribedin Multi-levelrecoveryandvariantsthereof,primarilyfordisk-basedsystems,have [MHL+92]).Also,asin[Lom92],transactionrollbackatcrashrecoveryisperformed ourschemesrepeathistory,generatelogrecordsduringundoprocessingandlog impactthedistributedschemesare levelbylevel.someofthefeaturesofourmain-memoryrecoverytechniquewhich beenproposedintheliterature[whbm90,lom92,mhl+92].liketheseschemes, 2.Separateundologsaremaintainedinmemoryforactivetransactions.Aresult 3.Oursingle-siteschemedoesnotrequirelatchingofpagesduringupdates, 1.Duetotransientundologging,nophysicalundologsarewrittenouttothe setting.actionsthatarenormallytakenonpagelatching,suchassettingof isthattransactionrollbackdoesnotneedtoaccessthegloballog,partofwhich couldbeondisk. whichisinconvenientandexpensiveineitheramain-memorydboranoodb globallogexceptduringcheckpoints. recordswrittentothegloballog.(oneofourshared-diskschemesusespage doesnot.) dirtybitsforthepage,areecientlyperformedbasedonphysicalredolog latchingforensuringcacheconsistency,whiletheothershared-diskscheme 8

10 vironment,eachsitemaintainsaseparatelog,andpagesareshippedbetweensites. 4.Thecorrectnessrequirementsofthewrite-aheadloggingpolicyareaccomplishedwithasingleushfortheentiredatabaseduringacheckpoint,rather Ourshared-disklog-shippingschemedoesnotshippages,butinsteadbroadcastslog IntheARIES-SD[MN91]familyofschemesforrecoveryintheshareddisken- 5.Ourschemedoesnotperformin-placeupdateofthediskimageduringpage records,takingadvantageofcheapapplicationoftheselogrecordsinmain-memory, ush,insteadusingping-pongcheckpointing. than(potentially)oneushperpage. logicalundoandhigh-concurrencyindexoperations. andpermittingconcurrentupdatesatasmaller-than-pagegranularity.inourshared toprotecttheearlyreleaseoflocks,makingitunclearhowthatschemesupports diskschemes,logushesaredrivenbythereleaseofalockfromasite,inorderto recovery.the\superfast"methodofaries-sd[mn91]doesnotdescribeushes supportrepeatingofhistoryandcorrectrollbackofmulti-levelactionsduringcrash whichassumepage-levelconcurrencycontrolandtheno-stealpagewritepolicy {neitherofwhichareassumptionsmadeinourschemes. clients,whichisnotsupportedin[mn94]. checkpointingprocess.wealsosupportconcurrentupdatestoapagebydierent [MHL+92]canbeextendedtoaclient-serverenvironment.Incontrasttoour client-serverscheme,theirschemeinvolvestheclientsaswellastheserverinthe In[Rah91],theauthorsproposerecoveryschemesfortheshareddiskenvironment theclient-serverrecoveryschemefortheexodusstoragemanager(esm-cs)is discussed,butrecoveryconsiderationsarenotextensivelyaddressed.in[fzt+92], described.thisrecoveryscheme,basedonaries[mhl+92],requirespage-level In[MN94],theauthorsshowhowtheARIESrecoveryalgorithmdescribedin 4Client-ServerRecoveryScheme lockinguntilendoftransaction(forexample,thecommitdirtypagelist). In[CFZ94],object-levelaswellasadaptivelockingandreplicamanagementare Inthissection,wedescribetheclient-serverrecoveryscheme.Oursystemmodelis asfollows. Thereisasingleserverwithstablestorage,whichisresponsibleforcoordinatingallthelogging,andforperformingcheckpointsandrecovery(see Figure2).Theservermaintainsacopyoftheentiredatabaseinmemory. entiredatabaseinitsmemory. databaseattheclient. Atransactionexecutesatasingleclientandupdates/accessesthecopyofthe Multipleclientsmaybeconnectedtotheserver;eachclienthasacopyofthe 9

11 Database Database ATT ATT DPT System Log Tail System Log Tail ThenetworkisFIFOandreliable. Figure2:Client-ServerArchitecture In Main Memory SERVER In Main Memory Client nodes Network On Disk Stable System Log Database cur_ckpt ATT Checkpoints locksandagloballockmanager(glm)attheserverkeepstrackoflockscached Asaresultofupdatingthelocalcopyofthedatabase,databasepagesupdatedby Ckpt A System Log Tail Ckpt B theclientitself.however,requestsforlocksnotcachedlocallyareforwardedtothe atthevariousclients.transactionrequestsforlockscachedlocallyarehandledat aclientmaynotbecurrentatsomeotherclient.therefore,apageataclientisin dataduetoupdatesbyotherclientsandarerefreshedbyobtainingthelatestcopy oneoftwostates{validorinvalid.invalidpagescontainstaleversionsofcertain ofthepagefromtheserver. andreleasinglocks.eachclientsitehasalocallockmanager(llm)whichcaches Transactionsfollowthecallbacklockingscheme[LLOW91,CFZ94]whenobtaining Main Memory aconictingmode(beforegrantingthelockrequest).aclientrelinquishesalock GLMwhichcallsbackthelockfromotherclientsthatmayhavecachedthelockin inresponsetoacallbackassoonastransactionscurrentlyholdingthelock(ifany) system)whiletheclientsmaintaintheattforthetransactionsbelongingtothat releasethelock. client.thelogrecordsforupdatesgeneratedbyatransactionataclientsiteare storedinthatsite'satt.clientsitesdonotmaintainasystemlogondisk,but keepasystemlogtailinmemoryandappendlogrecordsfromthelocalredologsto thistailwhenoperationscommit/abort.checkpointingisperformedsolelyatthe server,andfollowsthesameprocedureasthecentralizedcase. TheservermaintainsthedptandtheATT(foralltransactionsintheclient-server theclientwaitsfortheservertoushthenewlyreceivedlogrecordstodiskbefore systemlogareshippedbytheclienttotheserver.inthecaseoftransactioncommit, Whenalockisrelinquishedfromasiteoratransactioncommits,logrecordsinthe 10

12 willnothavetoreadtheaectedpagesfromdisk. reportingthecommittotheuser.theshippedredologrecordsareusedtoupdate theserver'scopyoftheaectedpages,ensuringthatpagesshippedtoclientsfrom izationstothebasicideasdiscussedabove. recordsthemselvesissmallsince,inourmain-memorydatabasecontext,theserver theserverarecurrent(notethatpagesareshippedonlyfromtheservertoclients andneverviceversa).thisenablesourschemetosupportconcurrentupdatesto recordswillusuallybecheaperthanshippingpages,andthecostofapplyingthelog asinglepageatmultipleclientssincere-applyingtheupdatesattheservercauses 4.1BasicOperations themtobemerged(thisapproachisalsoadoptedin[cdf+94]).shippingthelog Wenowdescribethefeatureswhichdistinguishtheclient-serverschemefromthe pointsinprocessing. centralizedcase,intermsofactionsperformedattheclientandtheserveratspecic Wewillnowdescribeourschemeindetailandalsooutlineseveralpossibleoptim- PageAccess:Incaseaclientaccessesapagethatisvalid,itsimplygoes aheadwithoutcommunicatingwiththeserver.else,ifthepageisinvalid (certaindataonthepagemaybestale),thentheclientrefreshesthepage by1)obtainingthemostrecentversionofthepagefromtheserver,and2) applyingtothenewlyreceivedpageanylocalupdateswhichhavenotbeen senttotheserver(thisstepmergeslocalupdateswithupdatesfromother Topreventraceconditions,theclientdoesnotsendlogrecordstotheserver sites).theclientthenmarksthepageasvalid.theserverkeepstrackof clientsthathavethepageinavalidstate. Operation/TransactionCommit:Attheclient,redologrecordsare afteraskingforapageandbeforereceivingit. Anoptimizationoftheaboveistocheckforvalidityofpagesatthetimeof acquisitionofregionlocksfromtheserverratherthanoneveryaccess;forthis optimizationtobeused,thesetofpagescoveredbytheregionlockmustbe known. movedtothesystemlog,acommitrecordisappended,andappropriateactions LockRelease:Whenalockisrelinquishedbyaclient,allredologrecords areperformedonthetransaction'sundologintheattasdescribedforthe logareshippedtotheserver,andcommitprocessingwaitsuntiltheserver locally. centralizedcase.incaseofatransactioncommitthelogrecordsinthesystem Thelocallockmanageratthesitemayhowevercontinuetocachethelocks hasacknowledgedthatthelogrecordshavebeenushedtodisk. thatweregeneratedunderthislockneedtobeshippedtotheserver.the Finally,allthelocksacquiredbytheoperation/transactionarereleasedlocally. 11

13 otherclientthatobtainsthesamelockgetsacopyofthepageswhichcontains theupdatesdescribedbytheselogrecords.asimplewaytoensurethatall serverthenappliestheselogrecordstoitsdatabaseimagetoensurethatan- logrecordsgeneratedunderthelockareshippedtotheserveristoushthe systemlogfromtheclienttotheserver. Anoptimizationtoavoidushingthesystemlogeachtimeistostoretheend relatingtotheoperation(includingoperationcommit)precedethepointinthe systemlogstoredwiththelock.thislocationinthelogisclient-site-specic. inthelogstoredwiththelock.similarly,foranoperationlock,alllogrecords oranoperationlockisreleasedbyatransaction.thus,foranyregionlock, theserverduetocall-back,itshipstotheserveratleasttheportionofthe allredologrecordsinthesystemlogaectingthatregionprecedethepoint BeforeaclientsiterelinquishesanXmoderegionlockoroperationlockto oftheclientsystemlogwiththelock(attheclient)whenaxmoderegionlock thatthenextlockwillnotbeacquiredontheregionuntiltheserver'scopy systemlogwhichprecedesthelogpointerstoredwiththelock.thisensures LogRecordProcessing:Attheserver,foreachphysicalredologrecord releasedthelocks.thus,iftheserverabortsatransactionafterasitefailure, isuptodate,andthehistoryoftheupdateisinplaceintheserver'slogs. theabortofthisoperationwilltakeplaceatthelogicallevelofthelocksstill heldforitattheserver. ForXmoderegionlocks,thisushensuresrepeatingofhistoryonregions, undodescriptorsintheoperationcommitlogrecordsfortheoperationwhich (receivedfromaclient),theundologrecordisgeneratedbyreadingthecurrent whileforoperationlocksthisushensuresthattheserverreceivesthelogical fromthecommitlogrecordandappendedtotheundologforthetransaction Inaddition,foroperationcommit,thelogicalundodescriptorisextracted bytheredologrecordisapplied,followingwhichthelogrecordisappended sameactionsasinthecentralizedcasewhenthelogrecordsweregenerated. undologforthistransactionintheserver'satt.nexttheupdatedescribed commitlogrecordsreceivedfromtheclientareprocessedbyperformingthe contentsofthepageattheserver.thenewlogrecordisthenappendedtothe Byapplyingallthephysicalupdatesdescribedinthephysicallogrecords intheserver'satt.fortransactioncommit,theclientwhosetransaction committedisnotiedafterthelogushtodisksucceeds. totheredologforthetransactionintheserver'satt.operation/transaction TransactionAbort/SiteFailures:Ifaclientsitedecidestoabortatransaction,itprocessestheabort(asinthecentralizedcase)usingtheundologs toitspages,theserverensuresthatitalwayscontainsthelatestupdateson oftheloggingscheme,asfarasdataupdatesareconcerned,isjustasifthe clienttransactionactuallyranattheserversite. regionsforlockswhichhavebeenreleasedtoitfromtheclients.theeect 12

14 PageInvalidation willaborttransactionsthatwereactiveattheclientusingundologsforthe forthetransactionintheclient'satt.iftheclientsiteitselffails,theserver theserver).iftheserverfails,thenthecompletesystemisbroughtdown,and ingwiththeserver,incaseofpartition,adecisiontoabortisenforceableby transactioninit'satt(sincetheclientcannotcommitwithoutcommunicat- restartrecoveryisperformedattheserverasdescribedinsection2.7. on-update,andinvalidate-on-lock,forensuringthatdataaccessedbyaclient Wecompleteourclient-serverschemebypresentingtwomethods,invalidate- fromthesite.sincetheserverwouldhaveappliedthelogrecordstoitscopy isup-to-date. Allactionsdescribedsofarareusedincommonbybothmethods.Inparticular,bothmethodsfollowtherulethatalllogrecordspertainingtoupdates theclient. Bothmethodsmarkpagesattheclientsasinvalid,todenotethatsomeofthe ofthedata,thisensuresthatwhentheservergrantsalock,ithasthecurrent involvedintheregionforwhichthelockwasobtainedarenotup-to-dateat clientacquiresalock,itisstillpossiblethatthecopyofoneormorepages dataonthepageisoutofdate.evenifapageismarkedinvalid,someof versionofallpagescontainingdatacoveredbythatlock.however,whena regionlockonthedata.therstmethod,invalidate-on-update,isaneager thedatainthepagemaystillbeup-to-date,forinstance,iftheclienthasa methodthatmarkspagesasinvalidatclientsassoonasanupdateoccurs madeunderalockareushedtotheserverbeforethelockisrelinquished Theinvalidate-on-updateschemeworksasfollows.Whentheserverreceiveslog 4.2Invalidate-On-Update markingpagesasinvalidatclientswhentheclientgetsalock.thesecond schemereducesinvalidationmessagesbykeepingextraper-lockinformation attheserver.detailsofthetwomethodsarepresentedinsections4.2and 4.3respectively. attheserver,whilethesecond,invalidate-on-lock,isamorelazymethod, invalidatemessagestoclients(otherthantheclientthatupdatedthepage)thatmay recordsfromaclient,itdoesthefollowing.foreachpagethatitupdates,itsends havethepagemarkedasvalid.forallclientsotherthantheclientthatupdatedthe page,theservernotesthattheclientdoesnothavethepagemarkedvalid.clients, onreceivingtheinvalidatemessage,marktheirpageasinvalid.thusinvalidation messagesarereceivedbyclientsbeforetheycanacquirearegionlockontheupdated data,andbeginaccessingthedata. twodierentregionlocks.lets1bethesitethatushesitsupdatestotheserver Forexample,considertwositess1ands2updatingthesamepageconcurrentlyunder Althoughthemethodisverysimpleandeasytoimplement,ithassomedrawbacks. 13

15 rst;theupdatewillcausetheservertosendaninvalidatemessagetos2,whichwill underthelockthatitalreadyhas,thentheinvalidatewasnotnecessary,sincethe 4.3Invalidate-On-Lock thenre-readthepagefromtheserver.however,ifsites2accessesthepageagain Theinvalidate-on-lockschemedecreasesunnecessaryinvalidationsandtheoverhead ofsendinginvalidationmessagesbymarkingpagesasinvalidonlywhenalockon thenextsectiontakesadvantageofthisobservationtoreduceoverheads. aregioncoveringthepageisobtainedbyaclient.asaresult,iftwoclientsare updatingdierentregionsonthesamepage,asintheearlierexample,noinvalidationmessagesaresenttoeitherclient.bypiggy-backinginvalidationmessages separateinvalidationmessagesinthepreviousschemeiseliminated. dataintheregionithaslockedhasnotchanged.theinvalidate-on-lockschemein forupdatedpagesonlockgrantmessagesfromtheserver,theoverheadofsending obtainingthisinformationistorequirethatanupdatecallmustspecifynotonly associatedwiththelockfortheupdatedregion.thus,theschemerequiresthatit needtocheckforvalidityofapageoneveryaccessorupdatetothepage itsuces bepossibletodeterminetheregionlockfromtheredorecord.asimplewayof formationaboutupdatestothatregion.specically,whenupdatesdescribedby tocheckforvalidityatlockacquisitiontime. aphysicalredorecordareappliedtopagesattheserver,theupdatedpagesare Toachievetheabove,theschememustassociatewiththelockforaregionin- Thebiggestbenetoftheinvalidate-on-lockscheme,however,isthatthereisno aprogrammertoprovidethisinformation,sinceallupdatesmustbemadeholding aregionlock.thelocknamecanthenbesentwiththeredologrecord. thedatatobeupdated,butalsotheregionlockthatprotectsthedata.itiseasyfor witheachlogrecord,whichreectsboththeorderinwhichtherecordwasapplied totheserver'scopyofthepageandtheorderinwhichitwasaddedtothesystem theclient(valid/invalid),alongwiththelsnforthepagewhenitwaslastshipped eachclient,theservermaintainsinaclientpagetable(cpt),thestateofthepageat log.foreachpage,theserverstoresthelsnofthemostrecentlogrecordthat totheclient. updatedthepage,andtheidentityoftheclientwhichissuedit.inaddition,for ThisschemealsorequiresthattheserverassociateaLogSequenceNumber(LSN), toupdatestotheregion.foreachpageinthelist,theserverstoresthelsnofthe mostrecentlogrecordreceivedbytheserverthatrecordedanupdatetothepart oftheregiononthispage,andtheclientwhichperformedtheupdate.thus,when aclientisgrantedaregionlock,if,forapageinthelocklist,thelsnisgreater thanthelsnforthepagewhenitwaslastshippedtotheclient,thentheclient pagecontainsstaledatafortheregionandmustbeinvalidated. Theserveralsomaintainsforeachregionlockalistofpagesthataredirtydue 14

16 apageasinvalidonlyifthereisanupdateperformedundertheregionlockrequested bytheclient,andtheupdatehasnotyetbeenpropagatedtotheclient. TheLSNinformationservestominimizetheshippingofpagestoclients,marking Theadditionalactionsforthisschemeareasfollows: Logapply:WhentheserverappliestoapageParedologrecord,LR, Lockgrant:Asetofinvalidatemessagesispassedbacktotheclientwiththe Phasbeenupdated).First,theLSNforPissettotheLSNforLR.Second, theentryforpinthelistofdirtypagesforlisupdated(orcreated),setting theclienttoc,andthelsntothelsnforlr. generatedatclientcunderregionlockl,ittakesthefollowingactions(after Pagerefresh:Whentheserversendsapagetoaclient(pagerefresh),at thelockwasnotthelasttoupdatethepageunderthislock.theinvalidated theserver,thepageismarkedvalidinthecptfortheclientandthelsnfor pagesaremarkedinvalidinthecptfortheclientandattheclientsite. smallerthanthelsnofthepageinthelocklist,and3)theclientacquiring withthelockbeingacquiredthatmeetthreecriteria:1)thepageiscachedat theclientinthevalidstate,2)thelsnofthepageinthecptfortheclientis lockacquisition.theinvalidatemessagesareforpagesinthelistassociated Locklistcleanup:Weareinterestedinkeepingthelistofpageswithevery thepageinthecptisupdatedtobethelsnforthepageattheserver. lockassmallaspossible.thiscanbeachievedbyperiodicallydeletingpages theclientnotedinthelistofpagesforlasthelastclienttoupdatep: PfromthelistoflockLsuchthatthefollowingconditionholds,whereCis LSNisgreaterthantheLSNforthepageinthelocklist,thentheclienthas needtobepartofanyinvalidationlistsenttotheclient. Therationaleforthisruleisthatthepurposeofregionlocklistsistodetermine pagesthatmustbeinvalidated.however,ifforapageinaclient'scpt,the themostrecentupdatetotheregiononthepage,andthusthepagewillnot EveryclientotherthanChasthepagecachedeitherinaninvalid thelistforlockl. stateorwithlsngreaterthanorequaltothelsnforthepagein 5SharedDiskRecovery:ModelandCommonStructures Intheshareddiskapproach,anumberofmachinesareinterconnectedandalsohave nothingarchitecture,suchasfasteraccesstonon-localdisksandfault-tolerance. manysystems,suchasthedecvaxclusters,andprovidesbenetsoverashared directaccesstodisksoverafastnetwork.theshareddiskenvironmentisusedin 15

17 ourintendedapplications. concurrencycontrol.thisallows,forexample,read-onlytransactionswithafully levelrecovery,ourmainconcernisminimizingfalsesharingthroughne-grained preventingonesystemfrombecomingabottleneckinthesystem.asinourclientserverscheme,inadditiontocarefulconsiderationoftheinteractionwithmulti- isthatthealgorithmsaresymmetricwithrespecttowhichsiteexecutesthem, Also,thebasicadvantageofshareddiskschemesovertheclient-serverschemes cachedworkingsettoproceedatmain-memoryspeeds,animportantpropertyfor Wenowdescribeourshareddiskrecoverymodel. Sitescachelocks,andrelinquishlocksbasedonthecallbacklockingmechanismdescribedinSection4.WeassumethenetworkisFIFOandreliable. managercouldbedistributedforspeedandreliability,butthisisorthogonal toourdiscussion. systemlogondisk.thustherearebemultiplelogsinthesystem. SitesobtainlocksfromaGlobalLockManager(GLM);thefunctionofthelock Eachsitemaintainsitsowncopyoftheentiredatabaseinmemoryanditsown Eachsitehasitsownsystemlogondiskandthereforethelogsaredistributed.Torepeathistoryduringrestartrecovery,weneedsomemechanismto temporallyorderlogrecordsthataectthesameregion.toenablethis,each fromthiscounterisstoredineachphysicalredologrecordforanupdate.we sitemaintainsaglobaltimestampcountertsctr,andatimestampobtained Eachsitemaintainsitsownversionofthedirtypagetabledpt,systemlog(in Asinglepairofcheckpointedimagesismaintainedondiskforthedatabase. memoryandondisk),andanatt(withseparateundoandredologrecords executeatthatsite. foreachtransaction)whichstoresinformationrelatingtotransactionsthat Acheckpointimageconsistsofanimageofthedatabase,thedirtypagetable willseethedetailsofhowthistsctrismaintainedandusedlater. controlandrecovery.therstisapage-shippingapproachwhichissimilarin ckptdpt,andforeverysite: schemewhichallowsconcurrentuseofnon-overlappingregionsonapageacross spirittotheinvalidate-on-updateclient-servermode.thesecondisalogshipping sites. Inthenexttwosections,wepresenttwoschemesforshareddiskconcurrency 2.acopyoftheATTatthesite(containingundologs). 1.endofstablelog{thepointinthesite'ssystemlogfromwhichthesystem logmustbescannedduringrecovery. 16

18 Site 1 DB Site 2 DB Site N DPT1 ATT PTT DPT2 1 1 ATT2 PTT Memory Sys Log Tail Memory Sys Log Tail serverschemeinthatatransactionatasiteupdatingaregiononapageisguaran- teedtohavethelatestcopyofthepage.therefore,concurrentupdatestodierent Figure3:Page-ShippingSharedDiskArchitecture Ourpage-shippingschemeissimilarinspirittotheInvalidate-on-Updateclient- 6Page-ShippingSharedDiskRecoveryScheme Logs Stable Sys Log Stable Sys Log Shared N End of Stable log 6.1DataStructures 2 cur_ckpt ATT (undo logs) Checkpoints Ckpt A Wenowdescribedatastructuresspecictothepage-shippingscheme.Common regionsofapagearenotpossibleinthisscheme. Ckpt B Database ckpt_ptt ckpt_dpt ofthedatabaseisstoredacheckpointpagetimestamptable,referredtoasckptptt. ensuringthatatransactionalwayshasthelatestcopyofthepagewhileaccessing lockingmechanismdescribedearlier.alongwitheachofthetwocheckpointimages orupdatingthepage.sitescachelocks,andrelinquishlocksbasedonthecallback thepagewaslastupdated.eachpagehasanassociatedpagelockwhichhelpsin datastructuresweredescribedinsection5.anoverviewofthedatastructuresfor thisschemeisgiveninfigure3. eachsiteinthepagetimestamptable,pttwhichkeepstrackofthetsctrvaluewhen InadditiontotheTSctrforthesite,atimestampforeachpageismaintainedat 6.2NormalProcessing performedinthecentralizedcase,tosupportdistributedconcurrencycontroland Wedescribebelowtheactionstakenduringnormalprocessing,inadditiontothose recovery.checkpointingandrecoveryfromsystemandsitefailurearedescribedin subsequentsubsections. 17

19 Update:Likeinthecentralizedcase,beforeaccessingaregion,eachtransactionobtainsaregionlockfromtheLLM.Additionalpagelocksareacquired ins(x)modewhileaccessing(updating)dataonapage.ifthislockisnot cachedatthesite,actionsareperformedasdescribedbelowunderlockacquisition. Pagelocksforanaccessarereleasedbyatransactiononcetheaccessis intheredologrecordwhentheupdateisperformed,butthelogrecordis totheupdate.also,thetimestampfortheupdatedpage(intheptt)atthe redologrecordwasgeneratedisstoredintheredologrecordcorresponding Animportantpointtonoteisthatlogrecordsinthesystemlogmaynotbe siteissettothetsctrstoredinthelogrecord. orderedontheirtsctrvalues.thisisbecausethevalueoftsctrisstored completed;pagelocksforanupdatearereleasedbyatransactiononlyafter appendedtothetransactionlocallog,whichisnotushedtothesystemredo theupdateonthepageiscompleted.thevalueoftsctratthesitewhenthe LockRelease:WhenatransactionreleasesanXmoderegionlockoroperationlock,itstorestheendofloginmemorywiththelock(thisisstoredto whichheldtheregionlockwillbemovedtothegloballogbythenormalop- regionoroperationlocksisdonetoensurethatitispossibletorepeathistory erationcommitsemanticspriortothereleaseofthislock.thus,foraregion lock,allredologrecordsforupdatestotheregioncoveredbythelockprecede duringrestartrecovery,andappropriatelocksforundoingoperationsareheld theendoflogpointstoredwiththelock(similarforoperations).whenasite siteuntiltheendoflogpointstoredwiththelock.theushonreleaseofx relinquishesanxregionlockoroperationlock,itushesthegloballogatits loguntiloperationortransactioncommit. optimizetheamountofushingthatneedstobedonewhenalockisrelinquishedasintheclient-serverscheme).notethatallupdatesfortheoperation LockAcquisition: storeswitheachpagelockthesitethatlastheldthepagelockinxmode;the byothersitesthatlateracquirethelock,aswewillseeshortly.theglmalso incaseofsitecrashes.notethatnoushesareperformedwhenpagelocks arereleased. Ifitisapagelock,thenthepageisalreadycurrentatthissite. AtransactionacquiringalockcachedbytheLLMneedtakenospecialaction. informationisupdatedeachtimeasiterelinquishesanxmodepagelock, Additionally,whenasitereleasesanXpageorXregionlockbacktothe GLM,itstampsitwiththesite'sTSctr;theTSctrvalueofthelockisused WhenanX-modepageorregionlockarrivesfromtheGLM,itincludesthe timestampfromthelastsitethatheldthelockinxmode,asdescribedabove. UponreceivinganXregionlockorpagelockatasite,thesite'sTSctrisset 18

20 tothemaximumof1)it'scurrentvalue,and2)thetsctrvalueassociated (thatis,thelockisnotalreadycachedatthesite),thesiterequeststhepage fromthelastsitethatheldthepagelockinxmode(usingthesiteidentier withtheincominglockplusone. WhenasiteacquiresapagelockonbehalfofatransactionfromtheGLM datestoapageatdierentsitesareassignedincreasingtimestampvalues.shipping timestampswithregionlocksensurethatlogrecordsgeneratedunderconicting locksareappliedinthecorrectorderduringrecoveryeventhoughredologrecords Shippingtimestampswithpagelocksensuresthatlogrecordsforsuccessiveup- sentwiththelock).inordertohandlesingle-siterecovery,failureofthe intheindividualsitemaynotbeorderedbytimestamp(asmentionedearlier). acquiringsitetoobtainacopyofthepage,duetoafailureofthesitefrom However,thealgorithmstillworkscorrectly,asshowninthediscussionofrecovery whichitisbeingrequested,causesthelockacquisitiontofailandthelockto andcorrectnessbelow. bereturnedtotheglmunchanged. environmentrequirescoordinationamongthevarioussites.asmentionedabove,a 6.3Checkpointing Unlikethecentralizedandclient-serverscheme,checkpointingintheshareddisk singlepairofcheckpointedimagesismaintainedforallthesites. ATTand3)ushingthegloballog.Below,wedescribeeachstep: followingthreestepsateachsite{1)writingthedatabaseleimage2)writingthe Thesiteinitiatingthecheckpointcoordinatestheoperation,whichconsistsofthe 1.Thecoordinatorannouncesthebeginningofthecheckpoint,atwhichtime thenmakeacopyoftheirdptsandzerotheirdpts.notethatzeroingthedpt alongwiththeendofstablelog(notedabove),andacopyofthedpt.the Eachsitethenmakesacopyofitscurrentpttandsendsittothecoordinator coordinatorconstructsckptdptbyor'ingtogetherthecopyofitsdptandall andrecordingendofstablelogisdoneatomicallywithrespecttoushes. thedptsreceivedfromothersites(recallthatweareassumingthedptisa bitmap).thedatabasepagestobewrittenoutduringthecheckpointarethe allsites(includingthecoordinator)notetheircurrentendofstablelogvalues, Foreachpagetobewrittenout,thecoordinatorusesthepttssenttoitby highesttimestampforthepage.thissiteisresponsibleforwritingthepageto thecheckpointimage.oncethecoordinatorhaspartitionedthesetofpages theothersitesanditsownptttodeterminethesitewhosepttcontainsthe pagesthataredirtyinckptdptorintheckptdptinthepreviouscheckpoint. tobewrittenoutamongthevarioussites,eachsiteissentthesetofpage write,proceedstowritethosepagestothecheckpointimage.sincenotwo identiersassignedtoit.asite,uponreceivingitsassignedsetofpagesto siteswillbeassignedthesamepage,sitecanwritepagesconcurrently. 19

21 previouscheckpointintomemory.foreverypagethatwasdeterminedtobe writtenout(bysomesitei),thetimestampforthepageinckptpttissetto Thecoordinatorthenconstructsckptpttbyrstreadingtheckptpttinthe itstimestampinthecopyofthepttforsitei.finally,ckptdptconstructed earlier,ckptpttandtheendofstablelogsforallthesitesarewrittentothe 2.Onceeverysitehaswrittenoutthedatabaseimageandreportedthistothe Notethatsincethesitewiththehighesttimestampforapagewritesthe checkpoint. thatmultiplesitescanbeconcurrentlywritingouttheatt. coordinator,thecoordinatorinstructseachsitetowriteoutitsatt.note more,aswillbediscussedinthecorrectnesssectionbelow,updatesforapage recordedinlogrecordswithtimestampslessthanthetimestampforthepage inckptpttarealsocontainedinthecheckpoint. endofstablelogrecordedforasite,arecontainedinthecheckpoint.further- pagetothecheckpointimage,updatestothepagebylogrecordspreceding Incasetheentiresystemfails,restartrecoveryisperformedbyanyonesite,sayj. 6.4Recovery 3.AfterwritingouttheATT,eachsiteushesthegloballogatthatsiteasin thecentralizedcase.finally,thedatabasecheckpointiscommittedafterall andforeachsiteiaseparatedpt,dptiisinitializedtocontainzerobitsforallpages. theattandtheendofstablelog.aseparatepagetablepttisinitializedtockptptt Thesitej,whichwewillcalltheactingcoordinatorsite,readsthefollowingfromthe mostrecentcheckpointimage:thedatabaseimage,theckptptt,andforeachsite, siteshavecompletedtheirushing. Startingfromtheendofstablelogpointstoredforasiteinthecheckpoint,thelog recordsinallthesystemlogsaremergedasdescribedbelow,andappliedtothe inthedptforthesitewhosesystemlogcontainstherecord,and3)thetimestamp database.tomergethesystemlogs,theyarescannedinparallel;ateachpoint, ifthenextlogrecordinanyofthesystemlogsisnotaredologrecord,thenany thelogrecord. forthepageinpttissettothemaximumofitscurrentvalueandthetimestampin onesuchrecordisprocessedandtheattforitssiteismodiedasdescribedfor thecentralizedcaseinsection2.7.ontheotherhand,ifthenextrecordsinallthe asmentionedearlier.however,thisdoesnotcauseaproblemandconictinglog systemlogsareredologrecords,thenthelogrecordoutputnextistheoneamongst recordsareappliedintheorderinwhichtheyweregenerated.thereasonforthis themwiththelowesttimestampvalue.if,forapageupdatedbythelogrecord,the isthatfortwoconictinglogrecordsinseparatesystemlogs,theearlierlogrecord timestampinthelogrecordisgreaterthanorequaltothetimestampforthepage inckptptt,then1)theupdateisappliedtothepage,2)thepageismarkeddirty Notethatredorecordsinthesystemlogforasitemaynotbeintimestamporder 20

22 andlogrecordsprecedingitinitssystemloghavelowertimestampsthanthelog recordgeneratedlater.thisfactisrevisitedbelowinouroverviewofcorrectness. fromtheredologareappendedtothesystemlogforsitei. jissettothelargesttimestampcontainedinthepttatsitej.sitejthenrollsback recordgeneratedwhenprocessinganoperationforsiteiisassignedatimestamp in-progressoperationsintheattsforthevarioussitesbeginningwithlevell0and thenconsideringsuccessivelevelsl1;l2andsoon(asdescribedinsection2.7). performedontheundoandredologsfortheentry.furthermore,eachredolog WhenanoperationinanATTentryforasiteiisbeingprocessed,actionsare equaltotsctratsitej,andwhenanoperationpre-commits/aborts,logrecords Oncethelastlogrecordhasbeenprocessed,TSctrattheactingcoordinatorsite sitesaredeletedfromsitej,bringingrecoverytocompletion. forthesiteduringrecoveryatsitej,andthedatabaseimageandpttateachsiteis afterincrementingitbyone.thedptateachsiteisthensettothedptmaintained setequaltothedatabaseimageandpttatsitej.finallyckptpttanddptforother forthesite(maintainedatsitej)tobemarkeddirty.afterthispoint,theother sitesareinvolvedinrecovery.thetsctrateverysiteissettothetsctratsitej Next,sitejusheseverysite'ssystemlogscausingappropriatepagesinthedpt Inthissection,wepresentadditionalargumentsaboutthecorrectnessofourpage- 6.5OverviewofCorrectness rectnessisbased. shippingrecoveryschemebydiscussingbelowseveralpropertiesonwhichthecor- 2.Anylogrecordaectingpageipriortoendofstablelogatanysitehas 1.Apage,i,inacheckpointimagereectsallupdateswithtimestamplessthan 3.IfL1andL2areconictinglogrecordsandL1isgeneratedbeforeL2,thenif 4.IfL1andL2areconictinglogrecordsindierentsystemlogsandL1is timestamplessthanorequaltockptptt[i]andisreectedinthecheckpoint imageofpagei. ckptptt[i]. generatedbeforel2,thenl1andalllogrecordsprecedingitinitssystemlog L2isushedtothestablelog,thensoisL1. ppt[i]atthesiteisgreaterthanorequaltothetimestampinthelogrecord,(b) thepageisinthedptofthesiteand(c)thepageatthesitecontainstheupdate areupdated,andpassingtimestampswithpagelocksguaranteesthatsuccessiveupdatestoapagehavenon-decreasingtimestamps(andinturn,assignnon-decreasing timestampstothepttentry). (1)followsfromthefactthattimestampsforpagesinthepttaresetonlyafterthey (2)Foralogrecordthatupdatespageipriortoendofstablelogatasite,(a) havelowertimestampsthanl2. 21

DataBlitz Main Memory DataBase System

DataBlitz Main Memory DataBase System DataBlitz Main Memory DataBase System What is DataBlitz? DataBlitz is a general purpose Main Memory DataBase System that enables: Ð high-speed access to data Ð concurrent access to shared data Ð data integrity

More information

Oracle Architecture. Overview

Oracle Architecture. Overview Oracle Architecture Overview The Oracle Server Oracle ser ver Instance Architecture Instance SGA Shared pool Database Cache Redo Log Library Cache Data Dictionary Cache DBWR LGWR SMON PMON ARCn RECO CKPT

More information

Recovery: Write-Ahead Logging

Recovery: Write-Ahead Logging Recovery: Write-Ahead Logging EN 600.316/416 Instructor: Randal Burns 4 March 2009 Department of Computer Science, Johns Hopkins University Overview Log-based recovery Undo logging Redo logging Restart

More information

Oracle Database Security and Audit

Oracle Database Security and Audit Copyright 2014, Oracle Database Security and Beyond Checklists Learning objectives Understand data flow through an Oracle database instance Copyright 2014, Why is data flow important? Data is not static

More information

Last Class Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications

Last Class Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications Last Class Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A. Pavlo Lecture#23: Crash Recovery Part 2 (R&G ch. 18) Write-Ahead Log Checkpoints Logging Schemes

More information

Recovery: An Intro to ARIES Based on SKS 17. Instructor: Randal Burns Lecture for April 1, 2002 Computer Science 600.416 Johns Hopkins University

Recovery: An Intro to ARIES Based on SKS 17. Instructor: Randal Burns Lecture for April 1, 2002 Computer Science 600.416 Johns Hopkins University Recovery: An Intro to ARIES Based on SKS 17 Instructor: Randal Burns Lecture for April 1, 2002 Computer Science 600.416 Johns Hopkins University Log-based recovery Undo logging Redo logging Restart recovery

More information

Recovery and the ACID properties CMPUT 391: Implementing Durability Recovery Manager Atomicity Durability

Recovery and the ACID properties CMPUT 391: Implementing Durability Recovery Manager Atomicity Durability Database Management Systems Winter 2004 CMPUT 391: Implementing Durability Dr. Osmar R. Zaïane University of Alberta Lecture 9 Chapter 25 of Textbook Based on slides by Lewis, Bernstein and Kifer. University

More information

UVA. Failure and Recovery. Failure and inconsistency. - transaction failures - system failures - media failures. Principle of recovery

UVA. Failure and Recovery. Failure and inconsistency. - transaction failures - system failures - media failures. Principle of recovery Failure and Recovery Failure and inconsistency - transaction failures - system failures - media failures Principle of recovery - redundancy - DB can be protected by ensuring that its correct state can

More information

Transaction Management Overview

Transaction Management Overview Transaction Management Overview Chapter 16 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Transactions Concurrent execution of user programs is essential for good DBMS performance. Because

More information

ORACLE INSTANCE ARCHITECTURE

ORACLE INSTANCE ARCHITECTURE ORACLE INSTANCE ARCHITECTURE ORACLE ARCHITECTURE Oracle Database Instance Memory Architecture Process Architecture Application and Networking Architecture 2 INTRODUCTION TO THE ORACLE DATABASE INSTANCE

More information

Crashes and Recovery. Write-ahead logging

Crashes and Recovery. Write-ahead logging Crashes and Recovery Write-ahead logging Announcements Exams back at the end of class Project 2, part 1 grades tags/part1/grades.txt Last time Transactions and distributed transactions The ACID properties

More information

Introduction to Database Management Systems

Introduction to Database Management Systems Database Administration Transaction Processing Why Concurrency Control? Locking Database Recovery Query Optimization DB Administration 1 Transactions Transaction -- A sequence of operations that is regarded

More information

Recovery. P.J. M c.brien. Imperial College London. P.J. M c.brien (Imperial College London) Recovery 1 / 1

Recovery. P.J. M c.brien. Imperial College London. P.J. M c.brien (Imperial College London) Recovery 1 / 1 Recovery P.J. M c.brien Imperial College London P.J. M c.brien (Imperial College London) Recovery 1 / 1 DBMS Architecture REDO and UNDO transaction manager result reject delay scheduler execute begin read

More information

Chapter 10: Distributed DBMS Reliability

Chapter 10: Distributed DBMS Reliability Chapter 10: Distributed DBMS Reliability Definitions and Basic Concepts Local Recovery Management In-place update, out-of-place update Distributed Reliability Protocols Two phase commit protocol Three

More information

Crash Recovery. Chapter 18. Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke

Crash Recovery. Chapter 18. Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke Crash Recovery Chapter 18 Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke Review: The ACID properties A tomicity: All actions in the Xact happen, or none happen. C onsistency: If each Xact

More information

Recovery Principles in MySQL Cluster 5.1

Recovery Principles in MySQL Cluster 5.1 Recovery Principles in MySQL Cluster 5.1 Mikael Ronström Senior Software Architect MySQL AB 1 Outline of Talk Introduction of MySQL Cluster in version 4.1 and 5.0 Discussion of requirements for MySQL Cluster

More information

Failure Recovery Himanshu Gupta CSE 532-Recovery-1

Failure Recovery Himanshu Gupta CSE 532-Recovery-1 Failure Recovery CSE 532-Recovery-1 Data Integrity Protect data from system failures Key Idea: Logs recording change history. Today. Chapter 17. Maintain data integrity, when several queries/modifications

More information

SQL Server Transaction Log from A to Z

SQL Server Transaction Log from A to Z Media Partners SQL Server Transaction Log from A to Z Paweł Potasiński Product Manager Data Insights pawelpo@microsoft.com http://blogs.technet.com/b/sqlblog_pl/ Why About Transaction Log (Again)? http://zine.net.pl/blogs/sqlgeek/archive/2008/07/25/pl-m-j-log-jest-za-du-y.aspx

More information

Exam : 70-458. Transition Your MCTS on SQL Server 2008 to MCSA: SQL Server 2012, Part 2. Title : The safer, easier way to help you pass any IT exams.

Exam : 70-458. Transition Your MCTS on SQL Server 2008 to MCSA: SQL Server 2012, Part 2. Title : The safer, easier way to help you pass any IT exams. Exam : 70-458 Title : Transition Your MCTS on SQL Server 2008 to MCSA: SQL Server 2012, Part 2 Version : DEMO 1 / 7 1.Note: This question is part of a series of questions that use the same set of answer

More information

Recovery System C H A P T E R16. Practice Exercises

Recovery System C H A P T E R16. Practice Exercises C H A P T E R16 Recovery System Practice Exercises 16.1 Explain why log records for transactions on the undo-list must be processed in reverse order, whereas redo is performed in a forward direction. Answer:

More information

Chapter 14: Recovery System

Chapter 14: Recovery System Chapter 14: Recovery System Chapter 14: Recovery System Failure Classification Storage Structure Recovery and Atomicity Log-Based Recovery Remote Backup Systems Failure Classification Transaction failure

More information

The Oracle Universal Server Buffer Manager

The Oracle Universal Server Buffer Manager The Oracle Universal Server Buffer Manager W. Bridge, A. Joshi, M. Keihl, T. Lahiri, J. Loaiza, N. Macnaughton Oracle Corporation, 500 Oracle Parkway, Box 4OP13, Redwood Shores, CA 94065 { wbridge, ajoshi,

More information

Redo Recovery after System Crashes

Redo Recovery after System Crashes Redo Recovery after System Crashes David Lomet Microsoft Corporation One Microsoft Way Redmond, WA 98052 lomet@microsoft.com Mark R. Tuttle Digital Equipment Corporation One Kendall Square Cambridge, MA

More information

Outline. Failure Types

Outline. Failure Types Outline Database Management and Tuning Johann Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Unit 11 1 2 Conclusion Acknowledgements: The slides are provided by Nikolaus Augsten

More information

Datenbanksysteme II: Implementation of Database Systems Recovery Undo / Redo

Datenbanksysteme II: Implementation of Database Systems Recovery Undo / Redo Datenbanksysteme II: Implementation of Database Systems Recovery Undo / Redo Material von Prof. Johann Christoph Freytag Prof. Kai-Uwe Sattler Prof. Alfons Kemper, Dr. Eickler Prof. Hector Garcia-Molina

More information

- An Oracle9i RAC Solution

- An Oracle9i RAC Solution High Availability and Scalability Technologies - An Oracle9i RAC Solution Presented by: Arquimedes Smith Oracle9i RAC Architecture Real Application Cluster (RAC) is a powerful new feature in Oracle9i Database

More information

Recovery Theory. Storage Types. Failure Types. Theory of Recovery. Volatile storage main memory, which does not survive crashes.

Recovery Theory. Storage Types. Failure Types. Theory of Recovery. Volatile storage main memory, which does not survive crashes. Storage Types Recovery Theory Volatile storage main memory, which does not survive crashes. Non-volatile storage tape, disk, which survive crashes. Stable storage information in stable storage is "never"

More information

CSE 444 Midterm Test

CSE 444 Midterm Test CSE 444 Midterm Test Autum 2008 Name: Total time: 50 Question 1 /40 Question 2 /30 Question 3 /30 Total /100 1 1. SQL [40 points] We have a database of documents. Each document consists of several sections,

More information

Transactions and Recovery. Database Systems Lecture 15 Natasha Alechina

Transactions and Recovery. Database Systems Lecture 15 Natasha Alechina Database Systems Lecture 15 Natasha Alechina In This Lecture Transactions Recovery System and Media Failures Concurrency Concurrency problems For more information Connolly and Begg chapter 20 Ullmanand

More information

Review: The ACID properties

Review: The ACID properties Recovery Review: The ACID properties A tomicity: All actions in the Xaction happen, or none happen. C onsistency: If each Xaction is consistent, and the DB starts consistent, it ends up consistent. I solation:

More information

Microkernels & Database OSs. Recovery Management in QuickSilver. DB folks: Stonebraker81. Very different philosophies

Microkernels & Database OSs. Recovery Management in QuickSilver. DB folks: Stonebraker81. Very different philosophies Microkernels & Database OSs Recovery Management in QuickSilver. Haskin88: Roger Haskin, Yoni Malachi, Wayne Sawdon, Gregory Chan, ACM Trans. On Computer Systems, vol 6, no 1, Feb 1988. Stonebraker81 OS/FS

More information

Best Practices for Using MySQL in the Cloud

Best Practices for Using MySQL in the Cloud Best Practices for Using MySQL in the Cloud Luis Soares, Sr. Software Engineer, MySQL Replication, Oracle Lars Thalmann, Director Replication, Backup, Utilities and Connectors THE FOLLOWING IS INTENDED

More information

Design of Internet Protocols:

Design of Internet Protocols: CSCI 234 Design of Internet Protocols: George lankenship George lankenship 1 Outline asic Principles Logging Logging algorithms Rollback algorithms George lankenship 2 Why Techniques? CID properties of

More information

Module 3: Instance Architecture Part 1

Module 3: Instance Architecture Part 1 Module 3: Instance Architecture Part 1 Overview PART 1: Configure a Database Server Memory Architecture Overview Memory Areas and Their Functions and Thread Architecture Configuration of a Server Using

More information

Agenda. Transaction Manager Concepts ACID. DO-UNDO-REDO Protocol DB101

Agenda. Transaction Manager Concepts ACID. DO-UNDO-REDO Protocol DB101 Concepts Agenda Database Concepts Overview ging, REDO and UNDO Two Phase Distributed Processing Dr. Nick Bowen, VP UNIX and xseries SW Development October 17, 2003 Yale Oct 2003 Database System ACID index

More information

Corrupt and Shutdown Dirty EVTX Log Files: A Comparison of Recovery Using the Microsoft Event Viewer Versus Ipswitch's LogHealer Technology

Corrupt and Shutdown Dirty EVTX Log Files: A Comparison of Recovery Using the Microsoft Event Viewer Versus Ipswitch's LogHealer Technology Corrupt and Shutdown Dirty EVTX Log Files: A Comparison of Recovery Using the Microsoft Event Viewer Versus Ipswitch's LogHealer Technology As the Microsoft Windows Vista, Windows Server 2008, and Windows

More information

2 nd Semester 2008/2009

2 nd Semester 2008/2009 Chapter 17: System Departamento de Engenharia Informática Instituto Superior Técnico 2 nd Semester 2008/2009 Slides baseados nos slides oficiais do livro Database System c Silberschatz, Korth and Sudarshan.

More information

Recovery algorithms are techniques to ensure transaction atomicity and durability despite failures. Two main approaches in recovery process

Recovery algorithms are techniques to ensure transaction atomicity and durability despite failures. Two main approaches in recovery process Database recovery techniques Instructor: Mr Mourad Benchikh Text Books: Database fundamental -Elmesri & Navathe Chap. 21 Database systems the complete book Garcia, Ullman & Widow Chap. 17 Oracle9i Documentation

More information

Unit 12 Database Recovery

Unit 12 Database Recovery Unit 12 Database Recovery 12-1 Contents 12.1 Introduction 12.2 Transactions 12.3 Transaction Failures and Recovery 12.4 System Failures and Recovery 12.5 Media Failures and Recovery Wei-Pang Yang, Information

More information

DB2 backup and recovery

DB2 backup and recovery DB2 backup and recovery IBM Information Management Cloud Computing Center of Competence IBM Canada Lab 1 2011 IBM Corporation Agenda Backup and recovery overview Database logging Backup Recovery 2 2011

More information

CS 245 Final Exam Winter 2013

CS 245 Final Exam Winter 2013 CS 245 Final Exam Winter 2013 This exam is open book and notes. You can use a calculator and your laptop to access course notes and videos (but not to communicate with other people). You have 140 minutes

More information

Recover EDB and Export Exchange Database to PST 2010

Recover EDB and Export Exchange Database to PST 2010 Recover EDB and Export Exchange Database to PST 2010 Overview: The Exchange Store (store.exe) is the main repository of Exchange Server 2010 edition. In this article, the infrastructure of store.exe along

More information

! Volatile storage: ! Nonvolatile storage:

! Volatile storage: ! Nonvolatile storage: Chapter 17: Recovery System Failure Classification! Failure Classification! Storage Structure! Recovery and Atomicity! Log-Based Recovery! Shadow Paging! Recovery With Concurrent Transactions! Buffer Management!

More information

Distributed Architectures. Distributed Databases. Distributed Databases. Distributed Databases

Distributed Architectures. Distributed Databases. Distributed Databases. Distributed Databases Distributed Architectures Distributed Databases Simplest: client-server Distributed databases: two or more database servers connected to a network that can perform transactions independently and together

More information

Database Concurrency Control and Recovery. Simple database model

Database Concurrency Control and Recovery. Simple database model Database Concurrency Control and Recovery Pessimistic concurrency control Two-phase locking (2PL) and Strict 2PL Timestamp ordering (TSO) and Strict TSO Optimistic concurrency control (OCC) definition

More information

Synchronization and recovery in a client-server storage system

Synchronization and recovery in a client-server storage system The VLDB Journal (1997) 6: 209 223 The VLDB Journal c Springer-Verlag 1997 Synchronization and recovery in a client-server storage system E. Panagos, A. Biliris AT&T Research, 600 Mountain Avenue, Murray

More information

ESSENTIAL SKILLS FOR SQL SERVER DBAS

ESSENTIAL SKILLS FOR SQL SERVER DBAS elearning Event ESSENTIAL SKILLS FOR SQL SERVER DBAS Session 2 Session 2 Session 1 DBAS: What, Why, and How. Primary Focus of DBAs: Availability and Security Basic SQL Server Engine and Security. Session

More information

Database Performance Monitor Utility

Database Performance Monitor Utility Database Performance Monitor Utility In the past five years, I am managing the world s biggest database system for online payment service (AliPay of Alibaba Group), it handles 100 million trades on 2012/11/11,

More information

Module 2: Database Architecture

Module 2: Database Architecture Module 2: Database Architecture Overview Schema and Data Structure (Objects) Storage Architecture Data Blocks, Extents, and Segments Storage Allocation Managing Extents and Pages Tablespaces and Datafiles

More information

DBMaster. Backup Restore User's Guide P-E5002-Backup/Restore user s Guide Version: 02.00

DBMaster. Backup Restore User's Guide P-E5002-Backup/Restore user s Guide Version: 02.00 DBMaster Backup Restore User's Guide P-E5002-Backup/Restore user s Guide Version: 02.00 Document No: 43/DBM43-T02232006-01-BARG Author: DBMaster Production Team, Syscom Computer Engineering CO. Publication

More information

Chapter 15: Recovery System

Chapter 15: Recovery System Chapter 15: Recovery System Failure Classification Storage Structure Recovery and Atomicity Log-Based Recovery Shadow Paging Recovery With Concurrent Transactions Buffer Management Failure with Loss of

More information

What's the Point of Oracle Checkpoints? Harald van Breederode Oracle University 29-OCT-2009

What's the Point of Oracle Checkpoints? Harald van Breederode Oracle University 29-OCT-2009 What's the Point of Oracle Checkpoints? Harald van Breederode Oracle University 29-OCT-2009 1 About Me Senior Principal DBA Trainer Oracle University 25 years Unix Experience 12 years Oracle DBA Experience

More information

Information Systems. Computer Science Department ETH Zurich Spring 2012

Information Systems. Computer Science Department ETH Zurich Spring 2012 Information Systems Computer Science Department ETH Zurich Spring 2012 Lecture VI: Transaction Management (Recovery Manager) Recovery Manager ETH Zurich, Spring 2012 Information Systems 3 Failure Recovery

More information

Recovery Principles of MySQL Cluster 5.1

Recovery Principles of MySQL Cluster 5.1 Recovery Principles of MySQL Cluster 5.1 Mikael Ronström Jonas Oreland MySQL AB Bangårdsgatan 8 753 20 Uppsala Sweden {mikael, jonas}@mysql.com Abstract MySQL Cluster is a parallel main memory database.

More information

Oracle 12c Multitenant and Encryption in Real Life. Christian Pfundtner

Oracle 12c Multitenant and Encryption in Real Life. Christian Pfundtner Oracle 12c Multitenant and Encryption in Real Life Christian Pfundtner Christian Pfundtner, DB Masters GmbH Over 20 years of Oracle Database OCA, OCP, OCE, OCM, ACE Our Credo: Databases are our world 4

More information

Chapter 16: Recovery System

Chapter 16: Recovery System Chapter 16: Recovery System Failure Classification Failure Classification Transaction failure : Logical errors: transaction cannot complete due to some internal error condition System errors: the database

More information

OpenSAF A Standardized HA Solution

OpenSAF A Standardized HA Solution OpenSAF A Standardized HA Solution LinuxCON Edinburgh, UK 2013-10-21 Anders Widell Ericsson AB Outline What are OpenSAF and SA Forum? What is Service Availability? Simple Use Case: Web server The OpenSAF

More information

Databases and Information Systems 1 Part 3: Storage Structures and Indices

Databases and Information Systems 1 Part 3: Storage Structures and Indices bases and Information Systems 1 Part 3: Storage Structures and Indices Prof. Dr. Stefan Böttcher Fakultät EIM, Institut für Informatik Universität Paderborn WS 2009 / 2010 Contents: - database buffer -

More information

Oracle Database 11g: Administration Workshop I

Oracle Database 11g: Administration Workshop I Oracle Database 11g: Administration Workshop I Volume 1 - Student Guide D50102GC10 Edition 1.0 September 2007 D52683 L$ Authors Priya Vennapusa James Spiller Maria Billings Technical Contributors and Reviewers

More information

Restore and Recovery Tasks. Copyright 2009, Oracle. All rights reserved.

Restore and Recovery Tasks. Copyright 2009, Oracle. All rights reserved. Restore and Recovery Tasks Objectives After completing this lesson, you should be able to: Describe the causes of file loss and determine the appropriate action Describe major recovery operations Back

More information

Oracle. Brief Course Content This course can be done in modular form as per the detail below. ORA-1 Oracle Database 10g: SQL 4 Weeks 4000/-

Oracle. Brief Course Content This course can be done in modular form as per the detail below. ORA-1 Oracle Database 10g: SQL 4 Weeks 4000/- Oracle Objective: Oracle has many advantages and features that makes it popular and thereby makes it as the world's largest enterprise software company. Oracle is used for almost all large application

More information

Transactional Information Systems:

Transactional Information Systems: Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery Gerhard Weikum and Gottfried Vossen 2002 Morgan Kaufmann ISBN 1-55860-508-8 Teamwork is essential.

More information

Configuring Security for FTP Traffic

Configuring Security for FTP Traffic 2 Configuring Security for FTP Traffic Securing FTP traffic Creating a security profile for FTP traffic Configuring a local traffic FTP profile Assigning an FTP security profile to a local traffic FTP

More information

KEYWORDS InteractX, database, SQL Server, SQL Server Express, backup, maintenance.

KEYWORDS InteractX, database, SQL Server, SQL Server Express, backup, maintenance. Document Number: File Name: Date: 10/16/2008 Product: InteractX, SQL Server, SQL Server Application Note Associated Project: Related Documents: BackupScript.sql KEYWORDS InteractX, database, SQL Server,

More information

This article Includes:

This article Includes: Log shipping has been a mechanism for maintaining a warm standby server for years. Though SQL Server supported log shipping with SQL Server 2000 as a part of DB Maintenance Plan, it has become a built-in

More information

Why and How You Should Be Using Policy-Managed RAC Databases

Why and How You Should Be Using Policy-Managed RAC Databases Why and How You Should Be Using Policy-Managed RAC Databases Mark V. Scardina Director of Product Management Oracle Quality of Service Management 1 Copyright 2012, Oracle and/or its affiliates. All rights

More information

Hardware/Software Guidelines

Hardware/Software Guidelines There are many things to consider when preparing for a TRAVERSE v11 installation. The number of users, application modules and transactional volume are only a few. Reliable performance of the system is

More information

Restore Scenarios What to keep in mind. Pedro A. Lopes PFE

Restore Scenarios What to keep in mind. Pedro A. Lopes PFE Restore Scenarios What to keep in mind Pedro A. Lopes PFE Backup types Full Backup Differential Backup (Database or FG) Transaction Log Backup (Tail of the Log) Partial Backup (Piecemeal - Filegroup) Mirrored

More information

Delivery Method: Instructor-led, group-paced, classroom-delivery learning model with structured, hands-on activities.

Delivery Method: Instructor-led, group-paced, classroom-delivery learning model with structured, hands-on activities. Course Code: Title: Format: Duration: SSD024 Oracle 11g DBA I Instructor led 5 days Course Description Through hands-on experience administering an Oracle 11g database, you will gain an understanding of

More information

ORACLE CORE DBA ONLINE TRAINING

ORACLE CORE DBA ONLINE TRAINING ORACLE CORE DBA ONLINE TRAINING ORACLE CORE DBA THIS ORACLE DBA TRAINING COURSE IS DESIGNED TO PROVIDE ORACLE PROFESSIONALS WITH AN IN-DEPTH UNDERSTANDING OF THE DBA FEATURES OF ORACLE, SPECIFIC ORACLE

More information

Remus: : High Availability via Asynchronous Virtual Machine Replication

Remus: : High Availability via Asynchronous Virtual Machine Replication Remus: : High Availability via Asynchronous Virtual Machine Replication Brendan Cully, Geoffrey Lefebvre, Dutch Meyer, Mike Feeley,, Norm Hutchinson, and Andrew Warfield Department of Computer Science

More information

Lesson 12: Recovery System DBMS Architectures

Lesson 12: Recovery System DBMS Architectures Lesson 12: Recovery System DBMS Architectures Contents Recovery after transactions failure Data access and physical disk operations Log-Based Recovery Checkpoints Recovery With Concurrent Transactions

More information

Derby: Replication and Availability

Derby: Replication and Availability Derby: Replication and Availability Egil Sørensen Master of Science in Computer Science Submission date: June 2007 Supervisor: Svein Erik Bratsberg, IDI Norwegian University of Science and Technology Department

More information

Database System Architecture & System Catalog Instructor: Mourad Benchikh Text Books: Elmasri & Navathe Chap. 17 Silberschatz & Korth Chap.

Database System Architecture & System Catalog Instructor: Mourad Benchikh Text Books: Elmasri & Navathe Chap. 17 Silberschatz & Korth Chap. Database System Architecture & System Catalog Instructor: Mourad Benchikh Text Books: Elmasri & Navathe Chap. 17 Silberschatz & Korth Chap. 1 Oracle9i Documentation First-Semester 1427-1428 Definitions

More information

Virtual Machine Synchronization for High Availability Clusters

Virtual Machine Synchronization for High Availability Clusters Virtual Machine Synchronization for High Availability Clusters Yoshiaki Tamura, Koji Sato, Seiji Kihara, Satoshi Moriai NTT Cyber Space Labs. 2007/4/17 Consolidating servers using VM Internet services

More information

Availability Digest. MySQL Clusters Go Active/Active. December 2006

Availability Digest. MySQL Clusters Go Active/Active. December 2006 the Availability Digest MySQL Clusters Go Active/Active December 2006 Introduction MySQL (www.mysql.com) is without a doubt the most popular open source database in use today. Developed by MySQL AB of

More information

Cloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise

Cloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise Cloud Service Model Selecting a cloud service model Different cloud service models within the enterprise Single cloud provider AWS for IaaS Azure for PaaS Force fit all solutions into the cloud service

More information

SQL Server Training Course Content

SQL Server Training Course Content SQL Server Training Course Content SQL Server Training Objectives Installing Microsoft SQL Server Upgrading to SQL Server Management Studio Monitoring the Database Server Database and Index Maintenance

More information

COURCE TITLE DURATION. Oracle Database 11g: Administration Workshop I

COURCE TITLE DURATION. Oracle Database 11g: Administration Workshop I COURCE TITLE DURATION DBA 11g Oracle Database 11g: Administration Workshop I 40 H. What you will learn: This course is designed to give students a firm foundation in basic administration of Oracle Database

More information

Optimizing SQL Server 2012 for SharePoint 2013. SharePoint Saturday/Friday, Honolulu March 27, 2015

Optimizing SQL Server 2012 for SharePoint 2013. SharePoint Saturday/Friday, Honolulu March 27, 2015 Optimizing SQL Server 2012 for SharePoint 2013 SharePoint Saturday/Friday, Honolulu March 27, 2015 With Mahalo to our sponsors: Mahalo! About the Speaker Brian Alderman (MCT / Author / Speaker / Consultant)

More information

Avoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas

Avoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas 3. Replication Replication Goal: Avoid a single point of failure by replicating the server Increase scalability by sharing the load among replicas Problems: Partial failures of replicas and messages No

More information

Oracle server: An Oracle server includes an Oracle Instance and an Oracle database.

Oracle server: An Oracle server includes an Oracle Instance and an Oracle database. Objectives These notes introduce the Oracle server architecture. The architecture includes physical components, memory components, processes, and logical structures. Primary Architecture Components The

More information

Understand Troubleshooting Methodology

Understand Troubleshooting Methodology Understand Troubleshooting Methodology Lesson Overview In this lesson, you will learn about: Troubleshooting procedures Event Viewer Logging Resource Monitor Anticipatory Set If the workstation service

More information

Oracle Database 11g: Administration Workshop I Release 2

Oracle Database 11g: Administration Workshop I Release 2 Oracle University Contact Us: 1.800.529.0165 Oracle Database 11g: Administration Workshop I Release 2 Duration: 5 Days What you will learn This Oracle Database 11g: Administration Workshop I Release 2

More information

Implementing and Managing Windows Server 2008 Hyper-V

Implementing and Managing Windows Server 2008 Hyper-V Course 6422A: Implementing and Managing Windows Server 2008 Hyper-V Length: 3 Days Language(s): English Audience(s): IT Professionals Level: 300 Technology: Windows Server 2008 Type: Course Delivery Method:

More information

How To Recover From Failure In A Relational Database System

How To Recover From Failure In A Relational Database System Chapter 17: Recovery System Database System Concepts See www.db-book.com for conditions on re-use Chapter 17: Recovery System Failure Classification Storage Structure Recovery and Atomicity Log-Based Recovery

More information

OpenClovis Product Presentation

OpenClovis Product Presentation OpenClovis Product Presentation 2014 Corporate Background! Founded in 2002! Open Source business model! Profitable since 2008! $40M invested on products! Product Release 6.0 is mature and shipping! SAF

More information

Oracle Database 11g: Administration Workshop I Release 2

Oracle Database 11g: Administration Workshop I Release 2 Oracle University Contact Us: (+202) 35 35 02 54 Oracle Database 11g: Administration Workshop I Release 2 Duration: 5 Days What you will learn This course is designed to give you a firm foundation in basic

More information

A SURVEY OF POPULAR CLUSTERING TECHNOLOGIES

A SURVEY OF POPULAR CLUSTERING TECHNOLOGIES A SURVEY OF POPULAR CLUSTERING TECHNOLOGIES By: Edward Whalen Performance Tuning Corporation INTRODUCTION There are a number of clustering products available on the market today, and clustering has become

More information

NEXTGEN v5.8 HARDWARE VERIFICATION GUIDE CLIENT HOSTED OR THIRD PARTY SERVERS

NEXTGEN v5.8 HARDWARE VERIFICATION GUIDE CLIENT HOSTED OR THIRD PARTY SERVERS This portion of the survey is for clients who are NOT on TSI Healthcare s ASP and are hosting NG software on their own server. This information must be collected by an IT staff member at your practice.

More information

Oracle Database Links Part 2 - Distributed Transactions Written and presented by Joel Goodman October 15th 2009

Oracle Database Links Part 2 - Distributed Transactions Written and presented by Joel Goodman October 15th 2009 Oracle Database Links Part 2 - Distributed Transactions Written and presented by Joel Goodman October 15th 2009 About Me Email: Joel.Goodman@oracle.com Blog: dbatrain.wordpress.com Application Development

More information

Tuning Microsoft SQL Server for SharePoint. Daniel Glenn

Tuning Microsoft SQL Server for SharePoint. Daniel Glenn Tuning Microsoft SQL Server for SharePoint Daniel Glenn Daniel Glenn @DanielGlenn http://knowsp.com SharePoint and Collaboration Practice Leader @ InfoWorks, Inc. www.infoworks-tn.com PASS Nashville Business

More information

High Availability Databases based on Oracle 10g RAC on Linux

High Availability Databases based on Oracle 10g RAC on Linux High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN, June 2006 Luca Canali, CERN IT Outline Goals Architecture of an HA DB Service Deployment at the CERN Physics Database

More information

Would-be system and database administrators. PREREQUISITES: At least 6 months experience with a Windows operating system.

Would-be system and database administrators. PREREQUISITES: At least 6 months experience with a Windows operating system. DBA Fundamentals COURSE CODE: COURSE TITLE: AUDIENCE: SQSDBA SQL Server 2008/2008 R2 DBA Fundamentals Would-be system and database administrators. PREREQUISITES: At least 6 months experience with a Windows

More information

Oracle 11g DBA Training Course Content

Oracle 11g DBA Training Course Content Oracle 11g DBA Training Course Content ORACLE 10g/11g DATABASE ADMINISTRATION CHAPTER1 Important Linux commands Installing of Redhat Linux as per oracle database requirement Installing of oracle database

More information

Module 07. Log Shipping

Module 07. Log Shipping Module 07 Log Shipping Agenda Log Shipping Overview SQL Server Log Shipping Log Shipping Failover 2 Agenda Log Shipping Overview SQL Server Log Shipping Log Shipping Failover 3 Log Shipping Overview Definition

More information

Use RMAN to relocate a 10TB RAC database with minimum downtime. Tao Zuo tao_zuo@npd.com NPD Inc. 9/2011

Use RMAN to relocate a 10TB RAC database with minimum downtime. Tao Zuo tao_zuo@npd.com NPD Inc. 9/2011 Use RMAN to relocate a 10TB RAC database with minimum downtime Tao Zuo tao_zuo@npd.com NPD Inc. 9/2011 Contents Methods of relocate a database with minimum down time RMAN oracle suggested backup strategy

More information

Backup, Restore and Options for SQL Server

Backup, Restore and Options for SQL Server Backup, Restore and Options for SQL Server Housekeeping Please be sure to answer survey (above video window) Ask questions at any time Viewing Tip Enlarge Slides Now You can enlarge the window with the

More information

Improving Transaction-Time DBMS Performance and Functionality

Improving Transaction-Time DBMS Performance and Functionality Improving Transaction-Time DBMS Performance and Functionality David B. Lomet #, Feifei Li * # Microsoft Research Redmond, WA 98052, USA lomet@microsoft.com * Department of Computer Science Florida State

More information