A set of tools for buldng PostgreSQL dstrbuted databases n bomedcal envronment. M. Cavaller, R. Prudentno, U. Pozzol, G. Ren IRCCS E. Medea, Bosso Parn (LC), Italy. E-mal: gren@bp.lnf.t Abstract PostgreSQL s an advanced Object- Relatonal DBMS supportng SQL constructs and accessble wth standard protocols. Its objectorented features qualfy t for managng bomedcal data and t s freely avalable to the communty va the open-source phlosophy. Unfortunately the current avalable verson does not support any database dstrbuton feature. he am of the present work was to develop specfc procedures n order to extend orgnal potentals of PostgreSQL ORDBMS and make t able to manage dstrbuted databases by means of asynchronous replcaton. Partcular attenton was devoted to conflct resoluton rules, ncludng several procedures wth dfferent degrees of restrctveness and even gvng the end user the possblty to wrte user-defned conflct resoluton procedures. he replcaton system was tested at the IRCCS E. Medea, an Italan Scentfc and clncal research Insttute spread over 5 stes n dfferent geographcal locatons. he testng system was developed wth the purpose to make clncal data collected by Epleptology Unts n each ste avalable to all branches, wthout havng to care for ther physcal dstrbuton. he proposed replca procedures manage only tradtonal (non-bnary) data types because of the very dfferent storage model of large objects. Snce large objects are very common for treatment and storage of bomedcal data, we are currently workng on an mproved verson of the replca engne that allows large objects to be replcated. Key words dstrbuted database, ORDBMS, bomedcal data standard protocols. Its object-orented features qualfy t for managng bomedcal data [3] more than other RDBMS and t s freely avalable to the communty va the open-source phlosophy. Unfortunately, the current verson does not support any database dstrbuton feature. Data Replcaton s a process that allows to buld a dstrbuted database through the management of multple copes of data, cachng one copy on each ste [4] (Fg. 1). In partcular, synchronous replcaton (also called real-tme data replcaton) conveys nformaton n real tme to all of the nvolved stes. On the contrary, asynchronous replcaton (store and forward replcaton) stores operatons performed on a database n a local queue for later dstrbuton by a database synchronzaton process. Synchronous replcaton technology ensures the hghest level of data ntegrty but requres a permanent avalablty of servers and transmsson bandwdth. On the other hand, asynchronous replcaton provdes more flexblty than synchronous replcaton as a database synchronzaton tme nterval can be defned whch can vary from mnutes to months and, moreover, a sngle ste could work even f a remote server s unreachable or down. In addton, data operatons are performed more quckly and network traffc s more compact. However, a complex replcaton plannng s requred n the case of asynchronous replcaton n order to detect and correct data conflcts due to concurrent modfcatons occurrng at dfferent stes between two database synchronzaton events. he am of the present work was to develop specfc procedures n order to extend orgnal potentals of PostgreSQL ORDBMS and make t able to manage dstrbuted databases by means of asynchronous I. INRODUCION In the last years data communcaton evoluton has rapdly and substantally changed the way nformaton s managed. he growth of low prce connectvty and the mprovement of the data storage technology led people to ask for even more nformaton to be always accessble and from everywhere. New technologes were developed to share data scattered on the net (eg. MIDAS, CORBA), but sometmes data aggregaton s also needed. For example, f we want to get a unque set of data from databases scattered on dfferent stes, these new technologes requre a data dstrbuton mechansm. PostgreSQL [1,2] s an advanced Object-Relatonal DBMS supportng SQL constructs and accessble wth Ste 1 Ste 3 Local operatons Forwarded operatons Ste 2 Fgure 1: Workng scheme of data replcaton. In ths example three stes are consdered. 1
replcaton. Because of the nature of bomedcal data, partcular attenton was devoted to conflct resoluton rules, ncludng several procedures wth dfferent degrees of restrctveness and even gvng the end user the possblty to wrte user-defned conflct resoluton procedures. In the followng sectons the man ssues regardng the development and abltes of the new replcaton procedures are addressed. Frst, we descrbe the mechansm bult to unquely dentfy a record nto the dstrbuted database. Second, we analyze the logger that allows forward and roll back transactons. hrd, we examne the module that jons the transactons comng from remote stes preservng data ntegrty n lne wth specfed conflct resoluton rules. Fnally, we provde some mplementaton detals for each of the prevous steps. mplemented by read-only access to R a,j, and readwrte access to P a,.(fg. 2). he Master/Slave [6] data ownershp model s obtaned by read-wrte access to R a,j, and P a, when the ste s Master for table a, and by read-only access to R a,j, and P a, when the ste s Slave. Update Anywhere [7] data ownershp model s obtaned by read-wrte access to P a, and R a,j,. III. LOGGER FOR FORWARD AND ROLL BACK RANSACIONS Let us consder a, (t 0 ) a generc table of the database at a gven ntal tme t 0. At any followng tme t, the same table a, (t) wll be a functon of the ntal status a, (t 0 ) and of the sequence of operatons appled to t n the nterval [t 0 - t]. We store each operaton appled to table a, as a record n table C a, thus obtanng II. IDENIFICAION OF DISRIBUED DAA. Any table n a dstrbuted database can be represented n the followng form: = P U a, a, j = 1... N, j a, j, (1) where a dentfes a generc table of the database dentfes the ste (ste-d), a, s the entre content of table a on ste, P a, s the nformaton entered n table a on ste, N s the number of stes, R a,j, s the replcated partton nsde table a comng from ste j cached nto ste. Each ste s dentfed wth a ste-d. P a, and R a,j, are parttons of table a,. When an nserton occurs n table a on local ste the nformaton s stored n the local partton P a, and later forwarded to every replcated partton R a,,j of each remote ste j. A mechansm for the unque dentfcaton of a record ether n local (P a, ) or remote (R a,j, ) partton s requred. Snce the record-d number s ste-dependent nformaton that s unque for each record stored n a local database but s not unque n a dstrbuted database, for each record we keep track of the nserton ste-d (master ste) together wth the record d number related to the master ste. Both local and replcated copes of the record contan ths nformaton that s combned wth a map of the dstrbuted tables to ensure unque dentfcaton of each record. By ths way we can always know the orgn (master ste) of a record and we can represent each table as logcally spltted nto parttons (1). akng advantage of the method we can freely adopt one among the most common replcaton data ownershp models, and eventually apply the selected model to a reduced set of tables. In partcular, Workload Parttonng data ownershp model [5] s R, ( t) f [ a, ( t0), Ca, ( t)] a = (2) where f() s the functon able to apply the sequence of operatons C a, (t) to table a, (t 0 ). It s possble to defne a roll back functon f -1 () too, whch s able to apply the sequence of operatons C a, (t) to table a, (t) n a reverse way, obtanng a, ( t0 a, a, t 1 ) = f [ ( t), C ( )] (3) As table a s replcated at several stes, we have a dfferent C a, at each ste as well as a dfferent a, (t). he actual and algned replcated table a (t) contanng every operaton performed both n local and remote a,1 a,1 a,2 a,2 a,3 a,3 P a,1 R a,1,2 R a,1 a,1,3 R a,2,1 P a,2 a,2 R a,2,3 R a,3,1 R a,3,2 P a,3 a,3 Ste 1 Ste 2 Ste 3 Local users operatons Forwarded data streams Fgure 2: Workload partton data ownershp model obtaned allowng users to modfy local parttons only. 2
stes wll be obtaned consderng changes C a,j (t) occurred n each ste j and applyng them to the table a, (t 0 ) n a generc ste : ( t) = f [ ( t ), C ( t)] (4) a a, 0 U j = 1... N a, IV. REPLICAION CONFLICS MANAGEMEN An asynchronous replcaton system allowng data updates n each ste (update anywhere data ownershp model) may cause some replcaton conflcts between varous C a,j (t). A unqueness conflct occurs when dfferent stes try to nsert dfferent records wth the same unque key. An update conflct occurs when dfferent transactons on dfferent stes refer to the same record and the requested operatons are nconsstent. he number of conflcts ncreases wth the number of stes N and the tme nterval between two database synchronzaton events. We mplemented a conflct resoluton module that acts as a flter on j=1..n C a,j (t) forwardng only operatons that can be completed accordng to the resoluton algorthms defned for each table. hs partcular conflct management scheme allows seral applcaton of resoluton algorthms, makng t possble, for example, to check some sort of data consstency based on medcal group prvleges prorty, after prelmnary conflct resoluton. A set of record-based conflct resoluton algorthms was also developed n order to reduce the number of rejected operatons due to update conflcts. hese algorthms were based on the followng two observatons: t s always possble to dentfy an ntermedate perod wthout any update conflct [t 0,t c ], beng t 0 t c t; t s usually possble to modfy the sequence of operatons on dfferent records preservng data ntegrty and shftng t c as close as possble to t. V. IMPLEMENAION DEAILS In our system replca actons are controlled by a specfc database admnstrator username (replcator). An automatc mechansm checks ths username n order to prevent unwanted operatons from beng recorded on C a, (t) durng replca process. A complete map of all the tables makng up the database and the rules for ther replcaton s stored n a set of tables, called Replca Scheme ables (RS), specfcally desgned to provde flexble defnton of dstrbuton of the tables over the stes. For example, tables can be ndependently replcated n dfferent subsets of the dstrbuted database. Modfcatons to data stored n RS fre trggers that qualfy or retreat a generc table from beng replcated, dynamcally creatng or destroyng trggers and auxlary tables for loggng appled operatons, greatly reducng the DBA tasks. In partcular, each replcated table a, has ts own auxlary table C a, n whch sequences of appled operatons are stored. he object-orented features of PostgreSQL make t possble to nhert each auxlary table from ts orgnal one, thus keepng the same record structure. Addtonal nformaton such as tmestamp, type of operaton (nsert, update, delete) and user name are recorded too, to allow constructon of advanced securty algorthms for data replcaton, vsblty and treatment. o accomplsh replcated database synchronzaton a massve plannng and coordnaton of varous steps s requred because of several dstrbuted and parallel processes runnng at dfferent stes. he DBA plans the schedulng of the replcaton process that can be started from any ste at any tme. he ste where the replcaton process starts s defned as replca-master ste, whch wll coordnate each step durng that partcular replcaton. A replca daemon runnng on each ste answers requests comng from the replca-master ste. Durng database synchronzaton (Fg. 3), the generc ste creates a set of data C (t) that contans operatons appled nsde the ste to any replcated table snce the last synchronzaton event. hs set of data s transmtted to every destnaton ste and fltered by the conflct resoluton module. he conflct resoluton module contans several predefned algorthms that manage common conflct stuatons, whose probablty s related to the number of stes, the synchronzaton nterval, the type of applcaton runnng on the database, the replcaton data ownershp model chosen. Custom procedures that mprove flexblty of the replca process can be wrtten too and added to the module. he DBA chooses the conflct resoluton algorthm to be used wth each table and stores ths nformaton n the RS. he DBA can also specfy f operatons rolled back by conflct resoluton algorthms have to be sgnaled to the end user wth e-mal messages. Specal attenton s devoted to data propagaton plannng and data are compressed durng transmsson n order to mnmze bandwdth requrements and speed up the process. Data encrypton s also avalable to ensure data prvacy. rgger procedures were wrtten n the procedural language PL/cl. he daemon was wrtten n cl [8] lnked wth PostgreSQL connectvty API and usng cl/dp lbrares for the communcaton layer over CP/IP sockets. he replcaton system was developed on a PC wth CPU Intel Pentum III runnng Lnux Red Hat 6.0 wth kernel 2.2.5. he tests were performed on SUN SparcStaton 20 and SUN Ultra 1 runnng Unx Solars 2.x, and on PCs wth CPU Intel Pentum II and III runnng Lnux. VI. CASE SUDY he replcaton system was developed and tested at IRCCS E. Medea. he IRCCS E. Medea s an 3
ste A, A, Capture rgger C A, A, Conflct Resoluton ransacton Engne B, B, Capture rgger C B, B, Data Propagator User orgnated operatons Replcator orgnated operatons Captured user operatons Auxlary change table readng Auxlary change table updatng Data propagated towards other stes Data comng from other stes Fgure 3: Replca process schema. User operatons on a shared table are captured and stored nto auxlary tables. Durng database synchronzaton phase local streams are propagated over nvolved stes and locally fltered by Conflct Resoluton ; ransacton Engne apples accepted operatons and roll back rejected ones. Italan Scentfc and clncal research Insttute spread over 5 stes n dfferent geographcal locatons, connected through a prvate network. he testng system was developed wth the purpose to make clncal data collected by Epleptology Unts n each ste avalable to all branches, wthout havng to care for ther physcal dstrbuton. A Workload Parttonng replcaton data ownershp model was chosen because the Health and Management Drecton asked to put read-only lmtatons to remote departments. he applcaton was tested successfully. VII. CONCLUSION In the present study we developed a set of tools to extend capabltes of PostgreSQL n order to make t able to manage database replcaton n dfferent stes. he new set of tools mplements asynchronous data replcaton over several stes connected by a wde area network. Havng more than one copy of the same database ncreases avalablty of the system, ensurng data access or data backup even f some servers are down or not reachable. Better performances are also obtaned workng on local data. One of the drawbacks s that local operatons performed n dfferent stes between two database synchronzaton events lead to temporary data nconsstency over the whole system. he system was tested n a clncal research nsttute and gave encouragng results. he proposed replca procedures manage only tradtonal (non-bnary) data types because of the very dfferent storage model of large objects. Snce large objects are very common for treatment and storage of bomedcal data, we are currently workng on an mproved verson of the replca engne that allows large objects to be replcated. VIII. ACKNOWLEDGMENS Replcaton procedures were bult usng many free software tools. he authors wsh to thank PostgreSQL development team for makng avalable the source code whch was the startng pont of ths work. We would also lke to thank developers of the cl language and developers of the extensons reported n the prevous sectons. IX. REFERENCES [1] Stonebraker, M., he desgn of the POSGRES storage system, Proceedngs of the hrteenth Internatonal Conference on Very Large Data Bases, Sept. 1987; 289-300 [2] Stonebraker M., Kemntz G., he POSGRES nextgeneraton database management system, Communcatons of the ACM, Oct. 1991, vol.34, (no.10):78-92. [3] Dallo B., ravere J.-M., Mazoyer B., A Revew of Database Management Systems Sutable for Neuromagng, Methods of nformaton n medcne, F. K. Shattauer, Feb. 1999 [4] Chen S. W., Pu C., A structural classfcaton of ntegrated replca control mechansm, echncal Report CUCS-006-92, Columba Unv., New York, NY, 1992. 4
[5] Enterprse Replcaton: A hgh-performance soluton for dstrbutng and sharng nformatons, INFORMIX Software Inc. [6] Comparng Replcaton echnologes, PEERDIREC Inc. [7] Oracle8 Server Concepts: Database Replcaton, Oracle Corporaton. [8] Ousterhout J., cl and the k oolkt, Addson-Wesley, 1994. 5