NormalizingIncompleteDatabases



Similar documents

TW3421x - An Introduction to Credit Risk Management Default Probabilities External credit ratings. Dr. Pasquale Cirillo.

BIG PICTURE. What is Tenant Credit? Why Should We Care About It? In Today s Heated Capital Markets Does Tenant Credit Really Matter?


Fiscal Year LAUSD Debt Report and Debt Management Policy Changes

BEGINNING ALGEBRA ACKNOWLEDMENTS

Moody s Standard & Poor s

Using Securities Markets for Financing and Investing Opportunities

FUNERAL PLAN Insurance

Bond Valuation. Chapter 7. Example (coupon rate = r d ) Bonds, Bond Valuation, and Interest Rates. Valuing the cash flows

timeout StoR!msg0 RtoS?ack0

6.6 Factoring Strategy

Types of Credit Ratings and Definitions of Rating Symbols

Life Insurer Financial Profile

Mapping of Creditreform Rating AG s credit assessments under the Standardised Approach

Finance 1 Coursework. Oracle Corporation: Credit Rating Report. Client: Steve Thomas (Lecturer) Analyst: Arif Harbott

Chapter. Investing in Bonds Evaluating Bonds 13.2 Buying and Selling Bonds South-Western, Cengage Learning

A Guide to the Private Office Asset Management Ltd Wrap Account

TABLE OF CONTENTS Requirements for Pharmacy Benefits Managers Applying for a Certificate of Registration to do Business in the State of Connecticut

education booklet CORPS Introduction to corporate bonds STOCKCROSS FINANCIAL SERVICES

EuroRating credit rating agency

6.5 Factoring Special Forms

INSTITUTIONAL INVESTMENT & FIDUCIARY SERVICES: Building a Better Portfolio: The Case for High Yield Bonds

Figure 1: The Traditional Lending Process. Mortgage Lenders

ANNUAL GENERAL MEETING MAY 13, 2014

Probability Models of Credit Risk

ECON 354 Money and Banking. Risk Structure of Long-Term Bonds in the United States 20. Professor Yamin Ahmad

Credit Insurance Grain Programme Prepared by Rebecca Bishop and Prudence Chang

Interest Rates and Bond Valuation

Outline. NAIC Risk Based Capital Model. Goals Formula Risk Categories RBC Level of Action C-3 Phase II

Conduits: Their Structure and Risk

Chapter 6 Valuing Bonds. (1) coupon payment - interest payment (coupon rate * principal) - usually paid every 6 months.

I Have...Who Has... Multiplication Game

To receive highlights of the information on this page delivered directly to your inbox, please register here.

k, then n = p2α 1 1 pα k

Bank Customers (Credit) Rating System Based On Expert System and ANN

Quarterly Asset Class Report Institutional Fixed Income

Chromosome Mapping Assignment INSTRUCTIONS

The Importance and Nature of Assessing Life Insurance Company Financial Strength

DC-8706K Auto Dial Alarm System

The US Municipal Bond Risk Model. Oren Cheyette

9 Summary of California Law (10th), Partnership

threads threads threads

Methodology for Cbonds Credit Ratings Database

UBS Municipal Trading and Portfolio Specialist Group. Understanding your options and opportunities in today s municipal bond marketplace.

Lecture Slides 4. SQL Summary. presented by Timothy Heron. Indexes. Creating an index. Mathematical functions, for single values and groups of values.

~ EQUIVALENT FORMS ~

Analysis of S&P Rating Factors for U.S. and Bermuda Companies. October 2013

PUBLIC LAW NOV. 2, 1998 CONSUMER REPORTING EMPLOYMENT CLARIFICATION ACT OF 1998

March 16, 2015 What is a Mutual Fund? Mutual Funds

Introduction to Australian Real Estate Debt Securities

Tizen Compliance Test (TCT) Hojun Jaygarl (Samsung Electronics), Cathy Shen (Intel)

Chapter 10. Fixed Income Markets. Fixed-Income Securities

Overview Q YTD 09 YTD Core income in DKK million. Cost and expenses in DKK million

Reforming Credit Rating Agencies

Bonds and Yield to Maturity

The Art of Tracking Corporate Bond Indices

E000 CARSON TOWERS APARTMENTS ELECTRICAL INFORMATION SHEET OWNER STANDARD PROPERTY COMPANY CONSULTANT

FIXED INCOME MANAGER QUESTIONNAIRE

Bond Mutual Funds. a guide to. A bond mutual fund is an investment company. that pools money from shareholders and invests

CREDIT RATING PRESENTED BY: SAGAR PATIL 88 VINAYAK ALASE 107

Morningstar Calculated Fixed-Income Style Box Methodology

Real Estate Risk Assessment. Sustainable Real Estate Markets Restoring Confidence in Financial Markets. Peter Champness. MIPIM Cannes.

A Study of Differences in Standard & Poor s and Moody s. Corporate Credit Ratings

Mapping of Euler Hermes Rating s credit assessments under the Standardised Approach

S Corporation Questions & Answers

Transcription:

NormalizingIncompleteDatabases Abstract 600MountainAvenue,MurrayHill,NJ07974USA E-mail:libkin@research.att.com AT&TBellLaboratories LeonidLibkin Databasesareoftenincompletebecauseofthepresence ofdisjunctiveinformation,duetoconicts,partialknowledgeandotherreasons.queriesagainstsuchdatabaseswithnullvalues[akg91,il84],isdisjunctiveinforticsofsuchdatabasesandprovenormalizationtheorems forset-andbag-basedcomplexobjects.thesetheorems provideuswithprogrammingprimitivesthatoneneeds inordertoobtainthelistofallpossibilitiesencodedbya complexobjectwithdisjunctions. theoryofansweringqueriesagainstincompletedatabases withdisjunctiveinformation,anduseittodesignpracticalalgorithmsforqueryevaluation.wedenetheseman- Themaingoalofthispaperistodevelopageneralwithdisjunctionsareknownintheliterature.The dierentdatabasesaremerged. mationthatoccursprimarilyintheareasofdesign ideaofusingand-ortreestodevelopanewobject Itmayalsoariseduetoconictsthatoccurwhen andplanning,aswasnoticedin[inv91a,inv91b]. waspresentedin[lw93a];however,ithadexponential oftenaskquestionsaboutvariouspossibilitiesencodedby thestoreddata,ratherthanthestoreddataitself.normalization,whichisamechanismforaskingsuchqueries, spacecomplexity. databaseone-by-one,ratherthanallatonce.ithaslinear spacecomplexityandallowsustospeedupmanyclasses rithmthatproducesobjectsrepresentedbyanincomplete ofqueries. andmorespaceecient.partialnormalizationallowsus todisregardsomeofthedisjunctionsiftheydonotaect agivenquery.wealsodesignanewnormalizationalgo- Westudytwowaysofmakingqueryevaluationfastercently,afunctionalquerylanguagefordatabaseswith disjunctionswasdesigned[lw93a]andimplemented Anumberofapproachestoqueryingdatabases aboutthedatastoredinadatabase,whereasconceptualqueriesaskquestionsaboutthedataencodedbencebetweenthestructuralandconceptualqueries, considerthefollowingexampleofanincompletedesignborrowedfrom[gl94],seegure1. [GL94].Inthesepaperstwokindsofquerieshave orienteddatamodelwithanadhocqueryfacility plexityinthismodelwasanalyzedin[imv89].re- wasexploitedin[inv91a,inv91b].thequerycom- beendistinguished:structuralqueriesaskquestions mentedinexistingdbpl.wepresentexperimentalresults thatdemonstratesubstantialimprovementoverstandard algorithms,bothinspaceandtime. 1Introduction Informationstoredindatabasesisusuallyincomplete.Oneofthetypicalsourcesofpartiality,along xyzv BB A1.1A1A HHHA2 A1.2A2.1 BB DESIGN Algorithmspresentedinthispaperhavebeenimple-theinformationinadatabase.Toillustratethedier- BBBqwklm A2.2 BBB B1 B p rsta2.3 B2 219Inthisgureverticalandhorizontallinesrepresent Figure1:Incompletedesign u

subpartsthatmustbeincludedinthedesign,while theslopinglinesrepresentpossiblechoices.for example,thewholedesignconsistsoftwoparts:a andb.anaiseitherana1orana2,andab consistsofab1andab2,whereab1iseithera worak.structuralqueriesaskaboutthestructure ofagivenobject.forexample,\whatistheleast expensivechoiceforb2"and\howmanysubparts doesa2have"areexamplesofstructuralqueries. Conceptualqueriesaskquestionsaboutpossible completeddesigns.forexample,\howmanycompleteddesignsarethere"and\isthereacompleted designthatcostsunder$100andhasreliabilityat least95%"areexamplesofconceptualqueries. Todistinguishordinarysetsfromcollectionsof disjunctivepossibilities,wecallthelatteror-sets,see [INV91a,LW93a,Rou91].Weusehitodenoteorsets.Intheexampleingure1,thewholedesigncan berepresentedasasetfa;bg,whileaisanor-set ha1;a2iandb2isanor-sethw;ki.notethator-sets havetwodistinctrepresentations.withrespectto structuralqueries,or-setsbehavelikesets,butwith respecttoconceptualqueries,anor-setdenotesone ofitselements.forexample,h1;2iisstructurallya two-elementset,butconceptuallyitisanintegerthat equalseither1or2. Amechanismforansweringconceptualqueries againstcomplexobjectswithor-sets,callednormalization,waspresentedin[lw93a].roughlyspeaking, itprovidesuswithasmallnumberofprogramming primitivesthat,whenrepeatedlyappliedtoanobject o,createanor-setthatlistsallpossibilitiesencoded byo(likecompleteddesigns).thisor-setiscalled thenormalformofo.thenconceptualqueriesare simplystructuralqueriesonnormalforms. Normalization,aspresentedin[LW93a],provides thesolidtheoreticalfoundationfordevelopinglanguagesinwhichconceptualqueriescanbeformulated.italsohasledtodevelopmentofaprototype [GL94].However,thereareseveraltheoreticalproblemsthatmustbeaddressedinordertodeveloppracticalmethodsforansweringconceptualqueries. Onlysetshavebeenconsideredin[INV91a, INV91b,LW93a,Rou91],butmanypractical languagesarebasedonbags(multisets).inthe pastfewyearsseveralapproachestodesignofbag languageshavebeenproposed.moreover,most approachesagreeonwhatconstitutesthebasicset ofbagoperations[alb91,gm93,lw93b,lw94]. Thus,webelievethenormalizationmechanism mustbeextendedtobags. Normalizationmaycauseexponentialblowupin thesizeofobjects.forobjectsofsizen,the sizeoftheirnormalformsisbounded(roughly) byn1:45n[lw93a].therefore,weneed betternormalizationtools.onepossibilityisto normalizepartially.ifsomeofthedisjunctions donotaecttheconceptualquerythatisasked, thereisnoneedtounfoldthosedisjunctions.the problemofpartialnormalizationhasnotbeen addressedintheliterature. Normalization,aspresentedin[LW93a],requires thatthewholenormalformbecreatedbeforeany conceptualqueriescouldbeasked.therefore,it hasexponentialspacecomplexity.alternatively, onemaywanttoproducenormalformelements (e.g.completeddesigns)one-by-one,ratherthan allatonce,thusmakingthespaceusagelinear. Themaingoalofthepaperistoaddressthese shortcomingsofthenormalizationprocess.asthe outcome,weshallhavemuchbettertoolsforquerying databaseswithdisjunctiveinformationandmuch betterunderstandingoftheirstructure.themain contributionsofthispaperarelistedbelow. 1.Werigorouslydenenormalforms(orconceptual semantics)ofobjectswithor-setsandprove normalizationtheoremsgivingusasmallnumber ofoperationsthatconstructnormalforms.wedo thisforbothsetandbagsemantics. 2.Weproveapartialnormalizationresultthattells uswhenthenormalizationprocessneednotbe completedinordertoansweraconceptualquery. Wegivearestrictionontypesofobjectsforwhich thiscanbedone. 3.Wedesignalinearspacealgorithmthatproduces allelementsinthenormalform,andsuggestanew programmingprimitivebasedonit.thisprimitiveallowsustoexpressanumberofimportant queries(includingaclassofexistentialconceptual queries)inauniformfashion. 4.Weconsiderinteractionofdisjunctiveinformation withtraditionalformsofpartialinformation, representedviaordersonobjects,andproveboth normalizationandpartialnormalizationtheorems inthissetting. 5.Weimplementthenewspace-ecientalgorithm inthesystemforqueryingdatabaseswithdisjunctions[gl94].wecompareitwiththestandardalgorithmanddemonstratesubstantialimprovement.weshowhowthenewprogramming 220

primitivecanbeusedtogetherwithsomeheuristicstoanswerconceptualqueriesapproximately, whennormalizationprocessisveryexpensive. Organization.Wedenestructuralsemantics andnormalformsinsection2.normalization theoremsforsetsandbagsandpartialnormalization theoremareprovedinsection3.thespaceecientnormalizationalgorithmandaprogramming primitivebasedonitarepresentedinsection4. Normalizationinthepresenceofpartialinformation isstudiedinsection5.experimentalresultsare presentedinsection6. Remark.Ourapproachtodisjunctiveinformationas aformofpartialinformationshouldnotbeconfused withtheworkondisjunctivedeductivedatabases [LMR92].Fordierencesbetweentheseapproaches, see[inv91a,inv91b]. 2Semanticsandnormalforms Aswementionedbefore,objectswithor-setscan betreatedatthestructuralandconceptuallevels. Consequently,therearetwodierentsemanticsfor or-objects.oneofthemtreatsor-setsascollections, whiletheothertakesintoaccountthatanor-set denotesoneofitselements. Tostatethisprecisely,werstdenetypesof objects.therearetwotypesystemsofinterest:one dealingwithsetsandtheotherwithmultisets(bags): (ST) t:=bjttjftgjhti (BT) s:=bjssjfjsjgjhsi Herebrangesoveracollectionofbasetypessuchas integers,booleansetc.tt0istheproducttype;its elementsarepairs(x;y)wherexhastypetandyhas typet0.valuesofthesettypeftgarenitesetsof elementsoftypet.valuesoffjtjgandhtiarenite bagsandor-setsofvaluesoftypetrespectively.if Pfin(X)standsforthenitepowersetofXandPb(X) forthefamilyofnitebagsoverx,then,assuming thatadomaindbofeachbasetypeisgiven,wedene thestructuralsemanticsoftypesasfollows: [[b]]s=db [[tt0]]s=[[t]]s[[t0]]s [[ftg]]s=[[hti]]s=pfin([[t]]s)[[fjtjg]]s=pb([[t]]s) Anobjectwhosetypeisinthetypesystem(ST)is calledaset-basedcomplexobject.anobjectwhose typeisin(bt)iscalledabag-basedcomplexobject. Anyobjectcontainingor-setsisalsocalledanorobject. Weneedtwotranslationsbetween(ST)and(BT) andbetweenset-basedandbag-basedobjects.first, foranytypetin(st),wedenetbagin(bt)by replacingallsetbracketsbybagbrackets.type ssetisdenedassinwhichallbagbrackets arereplacedbysetbrackets.foranyobjectx ofan(st)typet,denexbagoftypetbagby replacingeachsetinxbyabagwiththesame elementsandallmultiplicitiesequal1.forexample, (f1;2g;f3;4g)bag=(fj1;2jg;fj3;4jg).conversely,for Yofa(BT)types,YSetoftypesSetisdenedby replacingeachbaginywiththesetcontainingall elementsofthatbag(i.e.duplicatesareeliminated). Forexample,fjfj1;1;2jg;fj1;2;2jgjgSet=ff1;2gg. Itshouldbenotedthat(tBag)Set=tforany (ST)typet,and(tSet)Bag=tforany(BT)type t.however,while(xbag)set=xforanysetbasedobjectx,itisnotnecessarilythecasethat (YSet)Bag=Yforabag-basedobjectY. Beforewedenetheconceptualsemantics,which willbecallednormalform,weneedthenotionofthe skeletonofatype.theskeletonsk(t)ofatypetis denedtobethetypeformedbyremovingallor-set bracketsfromt.thatis,sk(b)=b,sk(tt0)= sk(t)sk(t0),sk(ftg)=fsk(t)g,sk(fjtjg)=fjsk(t)jg andsk(hti)=sk(t). Next,wedeneabinaryrelationxlyamong objectswhosemeaningintuitivelyis\xisinthe conceptualrepresentationofy".(forexample,dl DESIGNidisacompleteddesign.) Foranyx;x0ofabasetype,x0lxix=x0. (x0;y0)l(x;y)ix0lxandy0ly. fjx01;:::;x0njglfjx1;:::;xnjgithereexistsa permutationonf1;:::;ngsuchthatx0ilx(i) foralli=1;:::;n. fx01;:::;x0nglfx1;:::;xkgithereexistsa partitionx1;:::;xnoffx1;:::;xkgsuchthatfor anyi=1;:::;nandforanyx2xi:x0ilx. xlhx1;:::;xkiixlxiforsomexi.(recall thatanor-setdenotesoneofitselements.) Notethatinthesetclauseitisnotenoughtoask forapermutationofelementsfx1;:::;xngthatwould satisfyx0ilx(i)becausesomeofthosex0imaythen bethesameandfx01;:::;x0ngwouldnotbeaset. Hence,weneedpartitions. Denition.ForanyobjectX,itsnormalform nf(x)isdenedastheor-sethx1;:::;xniofall objectsxisuchthatxilx.notethatthenormal formisalwaysnite. 221

Lemma1IfXisoftypet,thenanyxlXisof typesk(t).inparticular,foranyor-objectxoftype t,itsnormalformnf(x)isoftypehsk(t)i. 2 Inotherwords,thenormalformofanobjectlists allpossibilitiesthatareencodedbythedisjunctions presentinthatobject.eachnormalformentryisa regularcomplexobject,i.e.doesnothaveanyor-sets. 3Normalizationtheorems Thegeneralideaofthenormalizationtheoremsis togivealistofoperationsthatcanberepeatedly appliedtoanobjectuntilthenormalformis produced.suchalistwasrstpresentedin[lw93a]; herewegofurtherinseveralaspects.first,weclearly distinguishbetweensetandbagsemantics.second, weproveapartialnormalizationresultthatcanbe viewedasnormalizationatintermediatetypes.that is,whilethestandardnormalizationtheoremsnda uniquerepresentationofanobjectoftypetattype hsk(t)i,thepartialnormalizationresultndssucha representationattypeswheresis\between"tand hsk(t)i.toguaranteeuniqueness,somerestrictions ontypesmustbeimposed. Weneedalanguagetoexpresstheoperationsused fornormalizingobjects.weadopttheframeworkof [LW93a]whichinturnisbasedon[BBW92]andnds itsoriginsin[ab88,bbn91].theoperatorstogether withtheirmostgeneraltypesaregiveningure2. Recallbrieythesemanticsofthegeneralandset operators.fgiscompositionoffunctions;(f;g)is pairformation.1and2aretherstandthesecond projections.!alwaysreturnstheuniqueelementofa specialbasetypeunit.eqisequalitytest;idisthe identityandcondisconditional.forsetoperations: Kfgisthefunctionthatrepresentstheconstantfg; formssingletons:(x)=fxg;[takesunionof twosets;attenssetsofsets:(ff1;2g;f2;3gg)= f1;2;3g;map(f)appliesftoallelementsofaset; and2ispair-with:2(1;f2;3g)=f(1;2);(1;3)g. Operatorsonor-setsareexactlythesameasoperatorsonsetsexceptthattheprexorisadded.Operatorsonbagsaresimilartothoseonsets,butadditiveunionthataddsupmultiplicitiesisused.Also, atteningforbagsisadditive:b(fjb1;:::;bnjg)= B1]:::]Bn. Finally,andbprovideinteractionbetweensets andor-setsandbetweenbagsandor-sets.assume thatx=fx1;:::;xngandy=fjy1;:::;ynjgwhere Xi=hxi1;:::;xiniiandYi=hyi1;:::;yinii.LetFbe thefamilyof\choice"functionsfromf1;:::;ngton Generaloperators g:u!sf:s!t fg:u!t f:u!sg:u!t (f;g):u!st 1:st!s2:st!t!:t!uniteq:tt!boolid:t!t c:boolf:s!tg:s!t cond(c;f;g):s!t Operatorsonsets Kfg:unit!ftg2:sftg!fstg [:ftgftg!ftg:t!ftg f:s!t mapf:fsg!ftg:fftgg!ftg Operatorsonbags Kfjjg:unit!fjtjgb2:sfjtjg!fjstjg ]:fjtjgfjtjg!fjtjgb:t!fjtjg f:s!t bmapf:fjsjg!fjtjgb:fjfjtjgjg!fjtjg Operatorsonor-sets Khi:unit!htior2:shti!hsti or[:htihti!htior:t!hti f:s!t ormapf:hsi!htior:hhtii!hti Interaction :fhtig!hftgi b:fjhtijg!hfjtjgi Figure2:Operatorsofor-NRLandbor-NRL suchthat1f(i)niforalli.then (X)=hfxif(i)ji=1;:::;ngjf2Fi b(y)=hfjyif(i)ji=1;:::;njgjf2fi Themaindierencebetweenthesetwodenitionsis thatduplicatesareremovedfromsetsbutnotfrom bags.forexample,(fh1;3i;h2;3ig)evaluatesto hf1;2g;f1;3g;f2;3g;f3gi,butb(fjh1;3i;h2;3ijg)is equaltohfj1;2jg;fj1;3jg;fj2;3jg;fj3;3jgi. Denition(seealso[LW93a]).Thelanguage or-nrlovertypesystem(st)includesallgeneral operators,setoperators,or-setoperatorsand.the languagebor-nrlovertypesystem(bt)includes allgeneraloperators,bagoperators,or-setoperators andb. 222

3.1Normalizingtypes Denethefollowingrewriterulesontypes: shti!hstihsit!hstihhtii!hti fhtig!hftgi fjhsijg!hfjsjgi Denetherewritesystem(STR)on(ST)typesas thethreerulesintherstlineandfhtig!hftgi. Therewritesystem(BTR)on(BT)typesisdened asthetopthreerulesandfjhsijg!hfjsjgi.weuse thenotations?!?!tifsrewritestotinzeroormore steps.recall[dj90]thatanormalformofarewrite systemisatermthatcannotbefurtherrewritten. Proposition2(see[LW93a])Both(STR)and (BTR)areterminatingChurch-Rosserrewritesystems.Consequently,eachtypehasauniquenormal formthatcanbecalculatedashsk(t)iforanytypet thatinvolvesor-sets. 2 3.2Normalizingcomplexobjects Itwassuggestedin[LW93a]toassignfunctionsin thelanguagetotherewriterulessothatforevery rewritingfromstottherewouldbeanassociated denablefunctionoftypes!t.thegoalofthis assignmentistoobtainafunctionoftypes!hsk(s)i thatproducesthenormalformsforobjectsoftypes. Insubsection3.3weexplainhowtodothisfor bags.subsection3.4dealswithsets.werecall theresultof[lw93a]andexplainhownormalization processforsetsinteractswithduplicateelimination. Insubsection3.5weconsiderthecasewhenthetarget typeisnotsk(s)butanintermediatetypetsuchthat s?!?!t?!?!hsk(t)i.wendtypestforwhichany objectoftypeswouldhaveauniquerepresentation attypet;theprocessofndingsucharepresentation iscalledpartialnormalization. 3.3Normalizingbag-basedcomplexobjects Weassociatethefollowingfunctionswiththerewrite rules: or2:shti!hsti or1:hsit!hsti or:hhtii!hti b:fjhsijg!hfjsjgi: Hereor1=ormap((2;1))or2(2;1)ispairwithovertherstargument. Now,following[LW93a],wedenethefunction appb(r):s!twhererisarewritestrategythat rewritesstot.firstassumethattisatypeandpa positioninthederivationtreefortsuchthatapplying arewriterulewithassociatedfunctionftotatp yieldstypes.wedeneafunctionappb(t;p;f):t! sshowingtheactionofrewriterulesonobjectsby inductiononthestructureoft: ifpistherootofthederivationoft,then appb(t;p;f)=f; ift=t1t2andpisint1,thenappb(t;p;f)= (appb(t1;p;f)1;2); ift=t1t2andpisint2,thenappb(t;p;f)= (1;appb(t2;p;f)2); Ifpisint0,thenappb(fjt0jg;p;f)= bmap(appb(t0;p;f)); Ifpisint0,thenappb(ht0i;p;f)= ormap(appb(t0;p;f)). Forarewritestrategyr:=tf1?!t1f2?!:::fn?! tn=t0suchthattherewriterulewithassociated functionfiisappliedatpositionpi,weextend appbtoappb(t;t0;r):t!t0byappb(t;t0;r)= appb(tn?1;pn;fn):::appb(t1;p2;f2)appb(t;p1;f1). Theorem3(Normalizationforbags)Forany bag-basedor-objectxoftypetandanyrewritestrategyr:t?!?!hsk(t)i,thefollowingholds: appb(t;hsk(t)i;r)(x)=nf(x) 3.4Normalizingset-basedcomplexobjects Thenormalizationtheoremforset-basedobjectswas provedin[lw93a],thoughdetailswerenotexplained there.herewegiveitsstatementthatfollows immediatelyfromtheorem3. Letrbearewritingt1!:::!tnwherealltis aretypesfrom(st).byrbagwemeantherewriting tbag 1!:::!tBag nof(bt)types.notethatif t1?!?!tnisin(str),thentbag 1?!?!tBag nisin(btr). Theorem4(Normalizationforsets)Forany set-basedor-objectxandanyrewritestrategy r:t?!?!hsk(t)i,thefollowingholds: (appb(tbag;hsk(tbag)i;rbag)(xbag))set=nf(x) Inotherwords,turnxintoabag-object,andapply rbagbyusingappbtoobtainsomeobjecty.then nf(x)=yset. 223

Notethatthestatementoftheorem4isdifferentfrom(andinfactstrongerthan)thenormalizationtheoremin[LW93a],whichstatedthat (appb(tbag;hsk(tbag)i;rbag)(xbag))setdoesnotdependonthechoiceofr,anddenednormalforms astheresultofapplicationofanysuchrewritingr. Thequestionarisesifitispossibletoconstructthe normalformwithoutusingthebagsemantics.the answertothisquestionisnegative.toseethis,dene app(t;t0;r)forset-basedobjectsinthesamewaywe denedappb,butusingmapinsteadofbmaptomap oversets,andusinginsteadofb. Proposition5Thereexistset-basedobjectsxof typetsuchthatfornorewritingr:t?!?!hsk(t)i isapp(t;hsk(t)i;r)(x)thenormalformofx.2 Themainreasonthatitisimpossibletoexpress normalizationbymeansofappinor-nrlisthatduplicateeliminationdoesnotcommutewithnormalization.thatis,nf(xset)isgenerallydierentfrom nf(x)set,whilenf(ybag)set=nf(y).wemustadmit herethatproposition5contradictsaclaimmadein [LW93a]thatnormalizationdoesnotaddexpressivenesstoor-NRL.Itdoesnotenhancebor-NRL,but doesaddexpressivepowertoor-nrl. 3.5Partialnormalization Supposethataconceptualqueryasksaquestion aboutpossibilitiesthatareencodedonlybysome ofthedisjunctions,andthatitdoesnottakeinto accountotherdisjunctionspresentinagivenobject. Dowehavetocompletethenormalizationprocessto answersuchaquery?ifaqueryqcanbeansweredby havinganobjectoftypes,andwehaveanobjectxof typetsuchthatt?!?!s,canwendarepresentation ofxattypestoanswerq? Inthissectionweexplainwhensuchapartial normalizationcanbeperformed.firstnoticethat itisnotalwayspossible.takex=hhh1;2i;h2;3iii oftypehhhintiii.thenor(x)=hh1;2i;h2;3iiand ormap(or)(x)=hh1;2;3ii{thesearetwodierent objectsofthesametypehhintii. Theorem9belowsaysthatessentiallyweonlyhave toexcludesituationslikethis.weconsiderbagshere; theresultforsetscanbereadilyobtained,justas theorem4wasobtainedfromtheorem3. First,weneedacriterionthatwouldcheckifatype scanberewrittentot.(wedidnothavethisproblem before,asitwaseasytocheckift=hsk(s)i.)let tsmeanthatsisobtainedfromtbyremoving someoftheor-setbrackets,i.e.shasfewer disjunctions.nowwedeneanewrelationcon typesusingtherulesbelow. tct tct0scs0 tt0css0 tcs fjtjgcfjsjg tt0t0cs tchsi Proposition6Theaboverulesaresoundandcompletefor?!?!.Thatis,s?!?!tisCt. 2 ThelastruleforCintroducesanewvariablet0 insteadofsuggestingaproofsearchstrategy.one mightthinkthatthisleadsto(atleast)exponential timealgorithmsforverifyingsct.(thissomewhat resemblesthesituationwiththecutruleinsequent calculus.althoughitcanbeeliminated,thecost isahyperexponentialblow-upintheprooflength, cf.[gir87].)fortunately,thisphenomenonisnot observedforourrewritesystem. Proposition7Thereexistsalineartimecomplexity algorithmthat,giventwotypessandt,returnstrue ifs?!?!tandfalseotherwise. 2 Nowwesaythatatypetisa-typeifit doesnothaveasubtypeoftheformhhvii.we nextdenetheconceptofa-rewritingbetweentypes.intuitively,-rewritingsresolveallambiguities arisingfromsubtypesofformhhvii.formally,let sandtbetwodistinct-typessuchthats?!?!t. Letrbearewritingbetweensandt:s=s0?! s1?!:::?!sn=t.foreachi=0;:::;n?1,let s1i;:::;smi ibeallthetypessuchthatsi?!sji(in onestep)andsji?!?!t.letpjibethepositioninsi atwhichrewriteruleisappliedtoobtainsjifromsi, j=1;:::;mi. Thentherewritingr:s?!?!tisa-rewriting (writtenasr:s?!?!t)ifeithern=1(onestep rewriting)orn>1anditsatisesthefollowingtwo propertiesforeveryi=0;:::;n?2: 1.Ifoneofsjisisa-type,thensi+1isa-type. 2.Ifallsjihavesubtypesofformhhvii,then(a) si+1=sjisuchthatthereisnopliclosertothe rootthanpji,and(b)si+2isobtainedfromsi+1 byapplyingtherulehhvii?!hvionthenewly createdsubtypehhvii. 224

Thisdenitionresolvesambiguitiesarisingfrom subtypesofformhhvii.therstpropertysays thattheyneednotbeintroducedunlessabsolutely necessary,andthesecondpropertydictatesthatonce wecannotavoidintroducingasubtypehhvii,itmust bedoneasclosetotherootaspossible,andthen gottenridofatthenextstepoftherewriting.togive anexample,hfhtigis!hfhtigsi!hhftgisi! hhftgsii!hftgsiisa-rewriting,buttheone thatachievesthesameresultbydoinghfhtigis! hhftgiisrstisnotbecauseintroductionofthe doubleor-setsubtypecanbeavoided. Proposition8Letsandtbe-typesands?!?!t. Thenthereexistsa-rewritingr:s?!?!t. 2 Usingthisproposition,wecanformulatethepartial normalizationtheorem. Theorem9(PartialNormalization)Letsandt be-typessuchthats?! rewritingsr1;r2:s?!?!t.thenforanytwo-?!tandforanyobjectxof types,thefollowingholds: appb(s;t;r1)(x)=appb(s;t;r2)(x) Thistheoremtellsusthatanyobjectofa-type shasanunambiguousrepresentationofa-typetif sct.thisrepresentationisobtainedbyapplying any-rewritestrategythatrewritesstot. Onemaywonderifrestrictingrewritingstorewritingsonlyisreallynecessary,andifso,are boththeconditionson-rewritingsnecessary.the followingpropositionshowsthatitis. Proposition10Itispossibletond-typessand t,anobjectxoftypesandtworewritingsr1andr2 fromstotwhichviolateeithertherstorthesecond propertyof-rewritingssuchthatappb(s;t;r1)(x)6= appb(s;t;r2)(x). 2 4Normalizationalgorithmsand primitives Thereis,ofcourse,atrivialnormalizationalgorithm basedonthegeneralnormalizationtheorems.we presentitbelowforbag-basedcomplexobjects. IfXisnotanor-object,thennf(X)=hXi. IfXis(x;y)oftypest,thennf(X)= orcartprod(nf(x);nf(y))ifbothsandtinvolve or-sets,nf(x)=or1(nf(x);y)ifonlysinvolves or-setsandnf(x)=or2(x;nf(y))ifonlyt involvesor-sets. IfX=fjx1;:::;xnjg,thennf(X)= b(fjnf(x1);:::;nf(xn)jg). Thisalgorithmdoescalculatethenormalform,as followsfromtheorem3.itcanbereadilyadaptedto theset-basedcomplexobjects. Theproblemwiththisalgorithmisitsexponential spacecomplexity,asshownin[lw93a].itcreatesthe wholenormalformbeforeanyconceptualqueriescan beasked.webelieveitwouldbemorereasonableto designanewevaluationstrategy,thatproducesthe elementsinthenormalformone-by-one.thenthe spaceusagewouldbelinearand,inaddition,some conceptualqueriescanbeevaluatedmuchfaster. Forexample,foranexistentialqueryoveranormal form,satisabilitycannowbeveriedforeachnewly producedentry.iftheconditionissatised,the evaluationstopswithoutproducingallelementsin thenormalform.thatis,ifxisoftypetand pisoftypesk(t)!bool,andwewanttond outifthereisanelementofnf(x)thatsatisesp (e.g.isthereacheapreliabledesign?),thenwe shouldbeabletostopwhensuchanelementis found.thequery9pwhichwillbeshownlaterin thissectiondoespreciselythat.notethatusing thestraightforwardnormalizationalgorithm,even evaluationof9(x:true)requiresexponentialspace asthenormalformmustbeproducedrst! Theevaluationstrategythatwearegoingto presentisessentiallythedepthrstsearchonthe and-ortreeunderlyingacomplexobject.this strategywillworkforbothset-andbag-based complexobjects,assetsandbagswillbetranslated intoliststogiveanorderofevaluation.usingthis evaluationstrategy,weshallalsosuggestnew,more exible,normalizationprimitives. Wecreateaspecialdatastructure,calledannotated complexobjects,torepresentand-ortrees.basically, anannotationgivesachoiceofanelementforeachorsetandalsocontainslocalconditionstellingwhether allpossibilitiesencodedbyanobjectareexhausted. Foreachobjecttypet,wehaveanewannotatedtype A(t)andtheinitialtranslationt!A(t).Fromeach annotatedobject,wecangetanentryinthenormal form.attheheartofthealgorithmliesaprocedure thattakesanannotatedobjectandproducesthe \next"one.thisenablesustolistallnormalform entriessequentially. Wetranslatesetsandbagsintolists,assuming someordering.nomatterwhichorderingischosen, thealgorithmwillproduceallnormalformentries. However,theorderinwhichtheyareproduceddoes 225

dependonthetranslation,andcanbeusedfor additionaloptimizations. Inwhatfollows,wepresentthealgorithmforsetbasedcomplexobjects.Thealgorithmforbag-based complexobjectscanbeobtainedbyrepeatingit verbatimandreplacing\set"by\bag".wedenote thetypeoflistsoftypetby[t]. Denition(Annotatedcomplexobjects).Type K(kind)hasfourpossiblevalues:B(base),P (product),s(set),ando(or-set).foreachtypet, weproduceanannotatedtypea(t)asfollows: A(b)=Kbifbisabasetype. A(st)=Kbool(A(s)A(t)): A(ftg)=Kbool[A(t)]: A(hti)=Kbool[(A(t)bool)]: Thebooleanvalueinthesetranslationissetto trueiftherearestillentriesencodedbytheobject thathavenotbeenlookedat.foror-sets,the booleancomponentinsidelistsisusedforindicating theelementthatiscurrentlyusedasthechoicegiven bythator-set.inallalgorithmsonlyoneentryin suchalistwillhavethetruebooleancomponent. Nowwedenethreefunctions:initial:t!A(t) producestheinitialannotationofanobject;pick: A(t)!sk(t)producesanelementofthenormalform givenbyanannotation;end:a(t)!boolreturns trueiallpossibilitiesencodedbyitsargumenthave beenexhausted. Thedenitionsofinitialandpickaregivenin gure3.byvoidwemeanaspecialobjectused toindicatetheendoftheprocessofgoingoverthe normalform.p1{p5giveasimpliedversionofpick inwhichvoidisnotpropagatedtothetoplevel.such propagationisdonetodetectinconsistenciesencoded byemptyor-sets. Thefunctionendalwaysreturnstrueon(B;x). Onanyotherannotatedobjectx=(k;c;v),endx= :c.wealsodeneafunctionreset:a(t)!a(t)that disregardstheannotationofanobjectandrestores theinitialone.thedenitionalmostverbatim repeatsinitialandisomittedhere. Arecursivealgorithmfornextisgiveningure 4.Weusethe[]bracketsforlists.Foranylist X=[x1;:::;xn],Xoistandsfor[x1;:::;xi?1]and X1idenotes[xi+1;:::;xn](theymaybeempty).We usethenotation::and@forconsingandappending. Thatis,a::xputsaasthenewheadbeforethelist x,andx@yappendsytotheendofx. Nowwecanproducethefollowingalgorithmthat listselementsofthenormalformofanor-objecto. Calculatingnorm(cond,init,update,out)(o) acc:=init; ao:=initialo; last:=endao; while:(cond(pickao)_last) doacc:=update(pickao,acc); ao:=nextao; last:=endao end; returnout((pickao,last),acc) Figure5:Algorithmfornorm ao:=initialo; repeatprint(pickao); ao:=nextao untilend(ao) Theorem11Foranyor-objecto,thealgorithm aboveprintsallelementsofnf(o)andnothingelse. Moreover,ithaslinearspacecomplexity. 2 Althoughnoduplicateeliminationisdoneinthis algorithm,itdoesnotproduceunnecessarycopies. Corollary12Letobeanor-objectsuchthatall or-setsinitarepairwisedisjoint.thentheabove algorithmprintseachentryinnf(o)exactlyonce.2 Thecorrectnessresultsuggestsaddingnew,more exiblenormalizationprimitivestoor-nrl.we proposethefollowingonecallednorm. cond:sk(t)!bool update:sk(t)u!u out:(sk(t)bool)u!s init:u norm(cond,init,update,out):t!s Its\semantics"isgivenbythealgorithmingure 5.Intuitively,theoutputvalueisaccumulatedin acc,condisusedtobreaktheloopifthecondition issatised,lastindicatesifallpossibilitieshavebeen lookedat,andoutformstheoutput. Now,anumberoffunctionscanbedenedusing norm.hereweconsiderjusttwo.intherst denition,pisoftypesk(t)!bool. 9pnorm(p;false;x:y:false;1) normalizenorm(x:false;hi;x:y:or(x)or[y;2) 226

I1initialx=(B;x)ifxisofbasetype. I2initial(x;y)=(P;true;(initialx;initialy)). I3initialfx1;:::;xng=(S;true;[initialx1;:::;initialxn]). I4initialhx1;:::;xni=(O;true;[(initialx1;true);(initialx2;false);:::;(initialxn;false)]). I5initialhi=(O;false;[]). P1pick(B;x)=x. P2pick(P;c;(x;y))=ifcthen(pickx;picky)elsevoid. P3pick(S;c;[x1;:::;xn])=ifcthenfpickx1;:::;pickxngelsevoid. P4pick(O;c;[x1;:::;xn])=ifcthenpick1(xi)elsevoidwhere2(xi)=true. P5pick(O;c;[])=void. Figure3:Denitionsofinitial(I1{I5)andpick(P1{P5) next(p;c;(x;y))=(p;true;(x;nexty)) :end(nexty) next(b;x)=(b;x) Base next(p;c;(x;y))=(p;true;(nextx;resety)) end(nexty):end(nextx) Pair Set next(p;c;(x;y))=(p;false;(x;y)) end(nexty)end(nextx) next(s;c;[])=(s;false;[]) end(nextx1)next(s;true;[x2;:::;xn])=(s;c0;x0) next(s;c;x)=(s;c0;resetx1::x0) next(s;c;x)=(s;true;nextx1::[x2;:::;xn]) Or-set :end(nextx1) next(o;c;[])=(o;false;[]) next(o;c;x)=(o;true;x0i@[(1(xi);false);(1(xi+1);true)]@[xi+2;:::;xn]) next(o;c;x)=(o;true;x0i@[(next1(xi);true)]@x1i) 2(xi)X1i6=[]end(next1(xi)) 2(xi):end(next1(xi)) 2(xi)X1i=[]end(next1(xi)) next(o;c;x)=(o;false;x) Figure4:Algorithmfornext 227

Corollary13Foranyor-objecto,1)9p(o)=(x;c) wherexisanormalformentrysatisfyingpifc= falseandtherearenonormalformentriessatisfying pifc=true,and2)normalize(o)isitsnormalform. 2Notethat9pisveryusefulinevaluationofexistential queries.ifanentrythatsatisespisfound,9pstops andreturnsthatentrywithoutproducingallother normalformentries.incontrasttothestandard algorithmthatrequiresexponentialspacetoevaluate suchqueriesevenifpisx:true,9(x:true)needs lineartimeandspacetobeevaluated. Asanotherapplicationofthenewevaluation strategy,itispossibletorunnormalizationfora giventime,andgetthebestentryinthenormalform obtainedinthattime.thisisoftenhelpfulifan approximatesolutionissatisfactory. Space-ecientevaluationofrecursive queriesusingnormalization.nowweshowa somewhatsurprisingapplicationofournormalizationalgorithm{itdealswithalgorithmicexpressive powerofquerylanguages.recallthattheabiteboul- BeerialgebraA&B[AB88]isthenestedrelationalalgebra(generalandsetoperatorsingure2)plusthe powersetoperator.whilethenestedrelationalalgebracannotexpressrecursivequeriessuchastransitiveclosure(tc)[lw94],a&bcanexpresstcby rstproducingallpossiblerelationsonagivensetof nodesandthenselectingthosethatcontainagiven oneandaretransitive.ofcoursethiswayofcomputingtcusesexponentialspace.aremarkableresultof[sp94]saysthatnomatterhowwewritean A&B-expressiontocomputetc,itwilluseexponential space.however,itisbasedonacontrivedrestriction thata\natural"evaluationstrategyisused.ifthis restrictionisdropped,thenitispossibletodevisean evaluationstrategythatcomputestcinpolynomial space,asshownin[ah95]. Itwasprovedin[LW93a]thathasessentiallythe expressivepowerofthepowersetoperator.hence, wecanviewor-nrlasanextensionofa&bwithorsets.nowweexplainhowtousenormtocomputetc space-ecientlyinthislanguage.weusesomemetanotation,buteverythingcanbeexpressedinor-nrl. LetR:fbbgbeanonemptybinaryrelation. DeneNR=map(1)R[map(2)R(thesetof nodesofr)andn2rascartprod(nr;nr).nowlet PR=map(z:or[(or(fg);or(z)))(N2R) Thatis,foreachpairofnodes(x;y),thesetPR containsanelementhfg;f(x;y)gi.letrc:ft tgfttg!fttgcomputetherelational composition(itcanbedoneinanylanguagethat containsrelationalalgebraasasublanguage).let ebeoftypebb(i.e.anedge).dene ce=s:(rc(s;s)=s)&(rs)&(e62s) Finally,lettce=norm(ce;();x:();21)(PR). Proposition14tceevaluatestotrueifeisintc(R) anditevaluatestofalseotherwise.consequently, tc(r)canbecomputedinpolynomialspaceusing norm. 2 Thispropositioncanberegardedasacounterpartof theresultof[ah95]sayingthattccanbeevaluated ina&busingpolynomialspaceunderaspecial evaluationstrategy.hereweusedourspace-ecient strategyfornormalizationtoachievethesameresult. 5Objectswithpartialinformation andantichainsemantics Theantichainsemantics,denedin[Lib95,LW93a] andbasedontheideasfrom[bjo91,lib91],is usedforobjectswithpartialinformation.thekey ideaisthatthenotionofpartialitycanbeconveyed byorderings,withxymeaningthatyismore informativethanx. Thisorderingisusuallygivenforbasetypes.For example,anullvalueni(noinformation)isless informativethananyintegerorboolean.forpairs, (x;y)(x0;y0)ixx0andyy0.itwasexplained in[lw93a]thatthefollowingtwoorderings,wellknowninsemanticsofconcurrency[gun92],mustbe usedforsetsandor-setsrespectively: Xv[Y,8x2X9y2Y:xy Xv]Y,8y2Y9x2X:xy Usingtheseorderingssuggestsanewsemanticsin whichanobjectcandenoteanyotherobjectthat ismoreinformative.thisallowseliminationof redundanciesgivenbycomparableelements,because Xv[YimaxXv[YandXv]YiminXv]Y, wheremaxxandminxaresetsofmaximaland minimalelementsofx. InmaxXandminXelementsarepairwiseincomparable.Suchsetsarecalledantichains.Using Afin(A)forthefamilyofantichainsoveraposetA, wedenethefollowing(structural)antichain-based semantics.hereweconsideronlyset-basedobjects. [[b]]a=(db;b) [[ts]]a=[[t]]a[[s]]a [[ftg]]a=(afin([[t]]a);v[)[[hti]]a=(afin([[t]]a);v]) 228

Asfollowsfromtheclaimsabove,foreachobjectx oftypetthereexistsasemanticallyequivalentobject xin[[t]]adenedbythefollowingrules: x=xforxofabasetype. (x;y)=(x;y): fx1;:::;xng=maxfx1;:::;xng: hx1;:::;xni=minhx1;:::;xni: Consequently,foreachoperationf:s!tin or-nrl,wedeneanewoperationfathattakes x2[[s]]aandreturnsf(x)2[[t]]a.itisknown(see [Lib92,LW93a])thataisanisomorphismbetween [[fhtig]]aand[[hftgi]]a.usingtheseoperationsfa,itis possibletodeneappa(t;t0;r):t!t0thatappliesa rewritestrategyr:t?!?!t0,exactlyinthesameway aswedenedapp,butusingtheindexaeverywhere. Thefollowingtworesultsstatethenormalization theoremfortheantichainsemantics,andthepartial normalizationtheorem. Theorem15Letx2[[t]]abeanobjectoftypet suchthattinvolvesor-sets.then,foranyrewriting r:t?!?!hsk(t)i,thefollowingholds: appa(t;hsk(t)i;r)(x)=nf(x) Theorem16Letsandtbetwo-typessuchthat s?!?!t.thenforanytwo-rewritingsr1;r2: s?!?!tandanyx2[[s]]a, appa(s;t;r1)(x)=appa(s;t;r2)(x) 6Experimentalresults Thebasicnormalizationalgorithmandthenew spaceecientnormalizationalgorithmhavebeen implementedinthesystemor-sml1[gl94],which isadatabaseprogramminglanguagebuiltontopof StandardMLofNewJersey[HMT90]. Werananumberofexpermentstocomparethe speedofthebasicalgorithmwiththenewalgorithm describedinthispaper.asourtestobjects,wechose objectsthatareknowntocauseexponentialblow-up inthesizeofthenormalform[lw93a].inaddition, theseobjectsarenotwellsuitedfortheor-sml duplicateeliminationalgorithm[gl94],sowecould comparethespeedofthestandardalgorithmsforsets andbags. Inthetablebelow,therstcolumnshows(approximately)thenumberofentiresinthenormalform. Entriesthemselvesarerelativelysmall.Thesecond 1[GL94]describestheversionofOR-SMLinwhichthe primitivenormisnotavailable. columnshowsrunningtime2forthestandardalgorithmforsets;thatis,attheendduplicatesareeliminated.thethirdcolumnisrunningtimeforthestandardalgorithmforbags.thelastcolumnisrunning timeforthenewalgorithm.notethatwecompare timeratherthanspace.despiteitsspaceeciency, thennewalgorithmstillhastocomputeexponentially manyentries.thereareseveralreasonswhygures inthelastcolumnarebetter;amongthemiswinning intimeduetonotrunninggarbagecollections. #entriestime(1)time(2)time(3) >19,000>11min0.9sec 1.8sec >59,000>90min8.9sec 5.8sec >175,000>16hr31.1sec19.1sec >525,000>2days1min35sec59sec >1:5106notdoneoutof memory3min9sec >4:5106notdonesame9min56sec >14106notdonesame31min51sec Wehavealsoconsideredanapplicationofthe normalizationalgorithmwhereonehastoselecta normalformentrywhichisbestaccordingtosome criterionf.ifthenormalformislarge,itispossible torunthealgorithmforagiventime,returning thebestentrythatwasfoundsofar.inoneof ourexamples,withalmost3.5billionentriesinthe normalform(goingoverthemtakesabout5days), weobatinedthevalueoffwithin7%oftheoptimal byrunningthealgorithmforonly15seconds,andthe valuewithin4%oftheoptimalin30minutes. 7Conclusion Inthispaperwehavestudiedvarioustechniquesfor normalizingdatabaseswithdisjunctiveinformation representedbyor-sets.thisproblemisparticularly importantintheareasofapplicationsuchasdesign andplanning,aswellasmergingdatabases.queries againstsuchdatabasesoftenaskquestionsabout possibilitiesencodedbythedatabase,ratherthan theinformationthatisstoredthere.werigorously denedtheconceptofnormalizationforbothset andbagsemantics.weexplainedhownormalforms thatlistallpossibilitiesencodedbyanincomplete objectcanbecalculated.onlyalimitednumber ofoperationsareneededforcalculationofnormal forms,andthesequenceinwhichtheyareapplied isirrelevantforbothsetandbagsemantics. Sincenormalformscanbeofsizeexponentialinthe sizeoftheobjects,weneedbettertoolsforanswering conceptualqueries.wedemonstratedtwo.partial 2OnSGIChallengeXL{8R4400150MHzprocessorswith 1GigabyteRAM. 229

normalizationallowsustoanswerquerieswithout normalizingcompletely.wehavealsodesignedanew space-ecientnormalizationalgorithm. Thereareimmediatepracticalbenetsoftheresultspresentedinthispaper.ThenewspaceecientalgorithmhasbeenimplementedinOR-SML{ asystemforqueryingdatabaseswithdisjunctions.in additiontobeingspaceecientandfasterthanthe standardalgorithm,itallowsmorecontroloverthe processofnormalization.thismakesthenormalizationtechniquesapplicableinpracticalproblems,such ascomputerautomateddesign. Acknowledgements:ThankstoRickHull, TomaszImielinskiandKumarVadapartyforrightfullydisputingtheclaimin[LW93a]thattheproduceallnormalizationisthewaytoanswerconceptual queries.iamverygratefultopeterbuneman,elsa Gunter,JonRiecke,ValTannenandLimsoonWong fortheircomments,helpandcriticism,andtoanthonykoskyforacarefulreadingofthemanuscript. References [AB88]S.Abiteboul,C.Beeri,Onthepowerof languagesforthemanipulationofcomplex objects,inproc.ofint.workshopon TheoryandApplicationsofNestedRelations andcomplexobjects,darmstadt,1988. [AH95]S.AbiteboulandG.Hillebrand.Spaceusage infunctionalquerylanguages.inlncs893: Proc.ICDT-95,pages437{454. [AKG91]S.Abiteboul,P.KanellakisandG.Grahne. Ontherepresentationandqueryingofsetsof possibleworlds.tcs78(1991),159{187. [Alb91]J.Albert.Algebraicpropertiesofbagdata types.invldb-91,pages211{219. [BBN91]V.Breazu-Tannen,P.Buneman,andS.Naqvi. Structuralrecursionasaquerylanguage.In Proc.ofDBPL-91,pages9{19. [BBW92]V.Breazu-Tannen,P.Buneman,andL.Wong. Naturallyembeddedquerylanguages.InLNCS 646:Proc.ICDT-92,pages140{154. [BDW91]P.Buneman,S.Davidson,A.Watters,A semanticsforcomplexobjectsandapproximate answers,jcss43(1991),170{218. [BJO91]P.Buneman,A.Jung,A.Ohori,Usingpowerdomainstogeneralizerelationaldatabases, TCS91(1991),23{55. [DJ90]N.DershowitzandJ.-P.Jouannand.Rewrite systems.in:handbookoftheoreticalcomputerscience,northholland,1990,pages243{ 320. [Gir87]J.-Y.Girard.\ProofsandTypes",Cambridge, 1987. [GM93]S.GrumbachandT.Milo.Towardstractable algebrasforbags.inpods-93,pages49{58. [Gun92]C.Gunter.\SemanticsofProgrammingLanguages".TheMITPress,1992. [GL94]E.GunterandL.Libkin.OR-SML:Afunctionaldatabaseprogramminglanguagefor disjunctiveinformationanditsapplications. LNCS856:Proc.DEXA-94,pages641-650. [HMT90]R.Harper,R.Milner,andM.Tofte.\The DenitionofStandardML",TheMITPress, 1990. [IL84]T.Imielinski,W.Lipski.Incompleteinformationinrelationaldatabases.J.ofACM 31(1984),761{791. [INV91a]T.Imielinski,S.Naqvi,andK.Vadaparty. Incompleteobjects adatamodelfordesign andplanningapplications.insigmod-91, pages288{297. [INV91b]T.Imielinski,S.Naqvi,andK.Vadaparty. Queryingdesignandplanningdatabases. InLNCS566:DOOD-91,pages524{545. Springer-Verlag. [IMV89]T.Imielinski,R.vanderMeydenandK.Vadaparty.Complexitytailoreddesign:Anew methodologyfordatabasedesign.toappear injcss.extendedabstractinpods-89. [Lib91]L.Libkin,Arelationalalgebraforcomplex objectsbasedonpartialinformation,inlncs 495:MFDBS-91,pages36{41. [Lib92]L.Libkin,Anelementaryproofthatupperand lowerpowerdomainconstructionscommute, BulletinoftheEATCS,48(1992),175{177. [Lib95]L.Libkin.Approximationindatabases.In LNCS893:Proc.ICDT-95,pages411{424. [LW93a]L.LibkinandL.Wong.Semanticrepresentationsandquerylanguagesforor-sets.In PODS-93,pages37{48. [LW93b]L.LibkinandL.Wong.Someproperties ofquerylanguagesforbags.indbpl-93, SpringerVerlag,1994,pages97{114. [LW94]L.LibkinandL.Wong.Newtechniquesfor studyingsetlanguages,baglanguagesand aggregatefunctions.inpods-94,pages155{ 166. [LMR92]L.Lobo,J.MinkerandA.Rajasekar.\FoundationsofDisjunctiveLogicProgramming".The MITPress,1992. [Rou91]B.Rounds,Situation-theoreticaspectsof databases,inproc.conf.onsituationtheory andapplications,cslivol.26,1991,pages 229-256. [SP94]D.SuciuandJ.Paredaens.Anyalgorithm inthecomplexobjectalgebrawithpowerset needsexponentialspacetocomputetransitive closure.inpods-94,pages201{109. 230