Automati Modling of Musial Styl O. Lartillot 1, S. Dubnov 2, G. Assayag 1, G. Bjrano 3 1 Iram (Institut d Rhrh t Coordination Aoustiqu/Musiqu), Paris, Fran 2 Bn Gurion Univrsity, Isral 3 Institut of Computr Sin, Hbrw Univrsity, Jrusalm, Isral mail: lartillo@iram.fr, dubnov@bgumail.bgu.a.il, assayag@iram.fr, jill@s.huji.a.il Abstrat In this papr, w dsrib and ompar two mthods for unsuprvisd larning of musial styl, both of whih prform analyss of musial squns and thn omput a modl from whih nw intrprtations / improvisations los to th original's styl an b gnratd. In both ass, an important part of th musial strutur is apturd, inluding rhythm, mlodi ontour, and polyphoni rlationships. Th first mthod is a drasti improvmnt of th Inrmntal Parsing (IP) mthod, a mthod drivd from omprssion thory and provn usful in th musial domain. Th sond on is an appliation to musi of Prdition Suffix Trs (PST), a larning thniqu initially dvlopd for statistial modling of omplx squns with appliations in linguistis and biology. 1 Styl Modling By Styl Modling, w imply building a omputational rprsntation of th musial surfa that apturs important stylisti faturs hiddn in th way pattrns of rhythm, mlody, harmony and polyphoni rlationships ar intrlavd and rombind in a rdundant fashion. Suh a modl maks it possibl to gnrat nw instans of musial squns that rspt this xpliitd styl (Assayag & al 1999a). It is thrfor an analysis by synthsis shm, whr th losnss of th synthti to th original may b valuatd and may validat th analysis. Our approah is unsuprvisd, that is w want an automati larning pross that may b run on hug quantitis of musial data. Intrsting appliations inlud styl haratrization tools for th musiologist (Dubnov & al 1998), gnration of stylisti mta-data for intllignt rtrival in musial data bass, onvining musi gnration for wb and gam appliations, mahin improvisation with human prformrs, omputr assistd omposition. Th intrsting aspt of our mthod an b sn if th whol pross is onsidrd as a sort of statistial larning algorithm. By statistial larning w man a mthod of aquiring rtain statistial proprtis of a data sour so that nw squns an b ratd, having th sam proprtis as th sour. On of th main purposs of larning is rating a apability to snsibly gnraliz. Composrs ar intrstd in finding out th possibilitis of rtain musial matrial, without nssarily "xplaining" it. So, lt us onsidr hr th possibilitis that a pi (or a st of pis in a givn styl) offr. Statistial analysis of a orpus rvals som of th rombination possibilitis that omply to onstrains or rdundanis typial of th partiular styl. Th onpt of rdundany is losly rlatd to information or ntropy. 2 Th IP Mthod Th Inrmntal Parsing algorithm is inspird by th analysis part of omprssion thniqus of th Lmpl-Ziv family. To undrstand how an ida drivd from th omprssion fild might b usful for our purpos, it is important to s that, as has bn statd by svral authors, omprssing is quivalnt to undrstanding, baus in ordr to nod ffiintly inoming information on has to prform a fin analysis of th way rdundany is organizd. Comprssion algorithm an b split into two phass: first, it rads th input squn and onstruts a modl that apturs rdundany, and thn it gnrats th omprssd od of this squn with rspt to th modl. In our as, th sond phas is rplad by a stohasti navigation through th modl in ordr to gnrat nw squns.
2.1 Dsription During th gnration pross, a ontxt-infrn shm is applid. Th squn formd by rntly gnratd objts (a partiular suffix of this squn) is th ontxt, from whih a prdition on th nxt objt to om is mad with rgard to a ontxtual probability distribution. So, th analysis part must provid a ditionary of suh possibl ontxts along with thir possibl ontinuations. First, th ditionary is rdud to th mpty pattrn and IP inrmntally rads th squn. At ah yl, it slts a pattrn, from th urrnt position to a furthr position, suh that this pattrn is th shortst on whih is not alrady in th ditionary. Evry lft prfix of this pattrn may bom a ontxt, and vry objt that follows this prfix may bom a ontinuation. It is asy to s that an optimal rprsntation (in spa) for suh a ditionary is a prfix tr whr a branh dsnding from th root to any nod is a ontxt, th hilds of any nods ar th ontinuation objts to th ontxt going from root to this nod, and whr th ardinalitis of th subtrs at a rtain nod nods th probability distribution for th objts at th th root of ah subtr. An optimal rprsntation in gnration tim is a suffix tr whr th ontxt ar rvrsd (i.. from a laf to th root) and ontinuations stord as pointrs assoiatd to vry nod. Th probability of ah ontinuation is drivd aording to th lngth of th branhs of th subtr at vry nod. W assoiat to ah nod in th ditionary tr a wight whih is th numbr of nods 1 that blong to th subtr whih this nod is a root of. Thus all lavs in th tr gt a numbr 1, th root gts th total numbr of nods 2 and th wight assoiatd with vry nod is th sum of th wight assoiatd with all its dsndnts. Th probability of ontinuing in on of th dipls is th ratio btwn th wight of th hild not and th wight assoiatd to th urrnt nod. Asymptotially it has bn shown that IP prditor outprforms a Markov prditor of any fixd finit ordr. This surprising proprty of th IP shm drivs from th ounting intrprtation of th IP produr. In this intrprtation, th IP prditor is viwd as a st of squntial prditors oprating on sparat bins, whr ah phras drivd in th pross of IP parsing is rfrrd as th bin labl. Sin IP is unboundd in its lngth, it an b shown that for any finit k-th ordr Markov prditor, th long ontxts of IP srv as rfinmnts of th k-th ordr Markov prditors for suffiintly long squns. Thus, th total numbr of rrors du to a long ontxt prdition is smallr for IP than th rror in th as of a limitd mmory Markov prditor. Whn alulating th ovrall rror rgim, it turns out that asymptotially th longr trms dominat (sin th lngth of th string grows) and vntually th IP shm outprforms any finit ordr Markov prditor. 2.2 Exampl Hr is how IP analyzs th vry simpl squn abraadabra. First, as w said, th ditionnary ontains only th mpty pattrn. Thrfor, th shortst pattrn starting from th bginning of th squn that is not alrady in th ditionnary is simply a. This pattrn is addd to th ditionnary and IP gos on its analysis on th suffix of th pattrn, that is braadabra. On again, th shortst pattrn onsists only of th first lttr of th squn, that is b. Idm for th nxt pattrn r. At this point, th urrnt rmaining squn is aadabra and th ditionnary ontains, a, b, r. Now th shortst pattrn is no mor a, baus it alrady blongs to th ditionnary, but a. Th nxt pattrns ar ad, ab and ra. Th orrsponding prfix tr is thrfor : b a d root Now for ah pattrn in th ditionnary, th last lttr is onsidrd as th ontinuation of its prfix. In this way, for th pattrn a, b and r, w say that a, b, and r ar thr possibl ontinuations of th ontxt ; for ab, a, and ad : b, and d ar thr possibl ontinuations of th ontxt a. Th probabilitis assoiatd to ths ontinuations ar omputd as blow. W hav in this vry simpl xampl thr ontxts :, a and r. Th orrsponding suffix tr is: b r a 1 A variant of th algorithm onsidrs th numbr of lavs. 2 Th variant onsidrs th total numbr of lavs.
ontxt : ontinuations : a (4/7), b (1/7), r (2/7). ontxt : a ontxt : r ontinuations : b (1/3), (1/3), d (1/3). ontinuation : a (1/1). Th gnration pross is inrmntal too. Th first lttr is on ontinuation of th ontxt. Th nxt lttrs ar on ontinuation of th longst ontxt that is a suffix of what has bn alrady gnratd. In our simpl xampl, if th pross gnrats a a, thn th nxt lttr is on ontinuation assoiatd with th ontxt a ; idm for r ; ls th ontxt is. 2.3 Musi as a Squn W hav loosly usd th trm squn as an ordrd list of objts. In ordr to aptur a signifiant amount of musial substan, w shall, in a pr-analyti phas, ut th musial data (gnrally in Midi format) into slis whih bginning and nd ar dtrmind by th apparan of nw vnts and th xtintion of past vnts. Evry sli has a duration information, and ontains a sris of hannls, ah of whih ontains pith and vloity information and whatvr availabl musial paramtrs. Ths slis ar srializd in squns submittd to analysis (ths slis will b rfrrd to by th word objt or symbol, and th st of possibl symbols as alphabt). Hr is for xampl th piano roll rprsntation of th bginning of Bah s prlud in C, whr lowr as lttrs rprsnt th third otav, and uppr as lttrs th fourth on. pith tim And hr is how this squn is slid into symbols. Th bold lttrs rprsnt bginning of nots. G G C C E E G G C C E E 2.4 Improvmnts W hav dsribd hr th basi IP algorithm. This algorithm had alrady bn tstd with intrsting rsults, but had rtain drawbaks that mad it quit impratial in spifi situations. Th improvmnts prsntd hr ar dividd into four stions: pr-analyti simplifiation, gnrativ onstraints, loop sap, analysis-synthsis paramtr distribution. Pr-analyti Simplifiation. Ral musial squns - for xampl MIDI fils of a pi intrprtd by a musiian - fatur flutuations of not onsts, durations and vloitis, induing a omplxity whih fools th analysis: th alphabt siz tnds to grow in an intratabl way, lading to unxptd failurs, and poor gnralisation powr. W hav thus dvlopd a toolkit ontaining fiv simplifiation filtrs: th arpggio filtr vrtially aligns nots whih ar attakd narly at th sam tim, th lgato filtr rmovs ovrlap btwn sussiv nots th staato filtr ignors siln btwn sussiv nots, th rlas filtr vrtially aligns not rlass, th duration filtr statistially quantizs th durations in ordr to rdus th duration alphabt. Ths faturs an b twakd by th usr, using thrsholds (.g. a thrshold of 50ms sparats a struk hord on th piano from an intndd arpggiatd hord). Using th simplifiation toolkit, Midifils ontaining ral prformans that wr intratabl with th basi IP now bom managabl, opning nw prsptivs, baus this partiular kind of musial data, full of idiosynrasy, is of grat valu as a modl for synthti improvisation.
Gnrativ Constraints. It is now possibl to spify onstraints during th synthti phas. At ah synthti yl, if th onstraint is not rsptd by th nw gnratd symbol, this gnration is anld and a nw symbol is trid. If no symbol is satisfying, th algorithm baktraks to th prvious yl, and mor if nssary. On intrsting onstraint is alld th ontinuity onstraint: at any yl of th synthti phas, it is possibl that no ontxt squn b a suffix of th alrady gnratd squn. Th maximum ontxt is thus th mpty ontxt. In ths ass, th algorithm gnrats a ontinuation symbol of th mpty ontxt, that is to say, any symbol, with a stohasti modl orrsponding to its ourrn in th original squn. Th obtaind rsult is a musial disontinuity. To avoid this, it is possibl to spify a ontinuity onstraint whih imposs a minimum siz of ontxt anytim during th synthti phas. Loop Esap. Th synthti phas may asily ntr into an infinit loop stat. Hr is on xampl: at on yl, th maximal ontxt A proposs only on ontinuation symbol; on this symbol is gnratd th nw maximal ontxt B faturs only on ontinuation symbol, and th nw maximal ontxt is A, again. Th nxt ontxts will b B, A, B, A, This tnds to happn whn th input data ontains ontiguous rptitions, whih is oftn th as in musi. W dsrib a mhanism to dtt and sap ths loops. Th prvious xampl is vry simpl, baus this loop was totally dtrministi, and had only two stats. At on stat of a loop, it may possibl to onsidr svral possibl ontinuation symbols, but this hoi lads, soonr or latr, to rturn bak to this prsnt ontxt or a prvious on. W introdu th onpt of ontxt-gnratd subtr, whih onsists of th xhaustiv st of all th possibl ontxts that may b mt aftr th prsnt on. Pratially, w just nd th siz of this subtr, whih is omputd only on bfor th gnration phas. Its omputation onsists of a marking, dirtly in th original tr, of all th possibl futur ontxts and a ounting of ths markings. Whn th siz of th ontxt-gnratd subtr is blow a usr-spifid thrshold N, an N-ordr-loop is dttd. Th loop phnomnon is prinipally du to th fat that th synthti phas sarhs for th maximal ontxt, whih is uniqu and proposs fw altrnativ ontinuation symbols. In th as of a loop, w loosn this onstraint and xamin not only th maximal ontxt, but also smallr ons, whih sap th loop in most ass. Analysis-Synthsis Paramtr Distribution. Th analyti phas onsists of finding th rdundany insid an original squn of symbols. Th troubl is thr may b so muh omplxity and divrsity in musial squns that th siz of th alphabt may b of th sam ordr of th lngth of th squn. Thrfor, littl rdundany would b obsrvabl. Morovr, ah symbol onsists, as said bfor, of svral hannls, ah hannl onsists of nots, and ah not onsists of diffrnt paramtrs. So a symbol is in fat a Cartsian produt of svral musial paramtrs. In ordr to inras abstration and powr in th analysis, w allow th systm to disard som paramtrs, for xampl th vloity. Th rtaind paramtrs will b alld analysis information. In this way, it is obviously possibl to inras rdundany, baus it impliitly organizs th alphabt into quivaln lasss: a hord struk two tims with diffrnt vloitis is nvrthlss th sam hord with th sam harmoni funtion, and it will b dttd as suh. Th problm is, in th synthti phas, all th disardd information annot b rtrivd. For xampl, if w hoos to disard not durations and dynamis, w finally obtain isorythmi and dynamially flat musial squns, whih sounds lik musial box prodution. A bttr solution is to stor in th modl th xludd information.this information is thus alld synthti information. Th analyti phas is now prformd on lasss insid th initial symbol st, and during th synthti phas, synthti information,.g. xprssivity, may b ronstrutd. This solution has signifiant advantags. First, gnratd musi rgains muh divrsity, spirit and human apparan of th original on. Morovr, sin it is possibl to rstrit analyti information and thrfor find mor rdundany in th original squn, th synthti phas boms lss onstraind. At ah synthti yl, vry ontxt faturs many mor possibl ontinuations than bfor. 3.2 Implmntation Th softwar is implmntd as a usr library in an Opn Sour visual programming languag dvlopd at Iram, alld OpnMusi (Assayag & al 1999b). Eah stp of th algorithm - pr-analysis, analysis, synthsis, and postsynthsis, is a funtion, whih, in th musial rprsntation
softwar, is rprsntd by a box faturing inlts and outlts. All paramtrs may b tund up by th usr. Indd, synthti onstraints, th distribution of analyti and synthti information, and th ronstrution of th synthti information may b xpliitly formulatd through visual xprssions. 2.6 Musial Exprimnts A lot of musial xprimnts hav bn arrid in ordr to tst th nw IP algorithm. Midifils gathrd from svral sours, inluding polyphoni musi, piano musi, and whih styl rangs from arly musi to hard-bop jazz hav bn submittd to th larning pross. Exprimnts show that th ombination of th simplifiation tool box and th nw analysis-synthsis distribution shm improvs dramatially th rsults in th as of liv musi, and in ass whr th ovrall omplxity lads to a hug alphabt. W show onvining xampls, inluding, a st of piano improvisations in th styl of Chik Cora, anothr on in th styl of Chopin Etuds, polyphoni Bah ountrpoint, 19th ntury symphoni musi, and modrn jazz styl improvisations drivd from a training st fd by svral prformrs askd to play for th larning systm. 3 PST W hav sn that IP prditor asymptotially outprforms a Markov prditor of any fixd finit ordr. In prati th strings (musi squns) ar of a finit ordr, and morovr, th siz of th IP tr is boundd by a small finit siz. Du to ths limitations it is dsirabl to onsidr modling shms that might b mor optimal for shortrtrm situations. Anothr important fatur of IP that sms rdundant for our nds is th squntial natur of its opration. In our appliation, th goal is gnration of nw squns that maintain similar statistial proprtis to th rfrn sour. W us th prdition probabilitis as th statistis gnrator, but w ar not boundd by a rquirmnt to rly on th past only. Allowing on to us th whol squn for stimation of th statistis might hlp improv th prforman. Ron & al. (1996) dvlopd a variabl lngth Markov modl trmd Prdition Suffix Tr (PST). It has bn shown that PST is a sublass of Probabilisti Finit Automata alld PSAs. PSA is a variant ordr Markov hain in whih th mmory is variabl and in prinipl unboundd. Givn a finit siz PSA, thr xists an quivalnt PST of a slightly largr siz that produs th sam probability output. Morovr, an ffiint larning algorithm xists that allows on to onstrut a PST from sampls gnratd by PSA. Now, if w onsidr th PSA as a ommon ground rlating it to IP, w an onsidr th similaritis and diffrns btwn th two approahs. On an s that both mthods ar similar in trms of thir us of a variabl lngth ontxt for dtrmining th probability for nxt symbol. Th basi stimation produr of th PST though is signifiantly diffrnt from that of IP. 3.1 Dsription (Bjrano & al 1999) First, w dfin L to b th mmory lngth of th PST, i.. th maximal lngth of a possibl string in th tr. W work out gradually through th spa of all possibl subsquns of lngth 1 through L, starting at singl lttr subsquns, and abstaining from furthr xtnding a subsqun whnvr its mpirial probability has gon blow a rtain thrshold (Pmin), or on having rahd th maximal L lngth boundary. Th Pmin utoff avoids an xponntially larg (in L) sarh spa. At th bginning of th sarh w hold a PST onsisting of a singl root nod. Thn, for ah subsqun w did to xamin, w hk whthr thr is som symbol in th alphabt for whih th mpirial probability of obsrving that symbol right aftr th givn subsqun is non ngligibl, and is also signifiantly diffrnt (i.. th
quotint xds a rtain thrshold r) from th mpirial probability of obsrving that sam symbol right aftr th string obtaind from dlting th lftmost lttr from our subsqun. This string orrsponds to th labl of th dirt fathr of th nod w ar urrntly xamining (not that th fathr nod has not nssarily bn addd itslf to th PST at this tim). Whnvr ths two onditions hold, th subsqun, and all nssary nods on its path, ar addd to our PST. Th rason for th two stp pruning (first dfining all nods to b xamind, thn going ovr ah and vry on of thm) stms from th natur of PSTs. A laf in a PST is dmd uslss if its prdition funtion is idntial (or almost idntial) to its parnt nod. Howvr, this in itslf is no rason not to xamin its sons furthr whil sarhing for signifiant pattrns. Thrfor, it may, and dos happn that onsutiv innr PST nods ar almost idntial. Finally, th nod prdition funtions ar addd to th rsulting PST sklton, using th appropriat onditional mpirial probability, and thn ths probabilitis ar smoothd using a standard thniqu so that no singl symbol is absolutly impossibl right aftr any givn subsqun (vn though th mpirial ounts may attst diffrntly). 3.2 Exampl Hr is th PST analysis of abraadabra, with Pmin = 0.1, r = 2, L = 10 and a minimum smoothd probability of 0.01. For ah nod is assoiatd th list of probabilitis that th ontinuation b, rsptivly, a, b,, d and r. (root) (0.44, 0.18, 0.10, 0.10, 0.18) b (0.96, 0.01, 0.01, 0.01, 0.01) a (0.01, 0.01, 0.01, 0.96, 0.01) 3.3 Implmntation a (0.01, 0.48, 0.25, 0.25, 0.01) da (0.01, 0.96, 0.01, 0.01, 0.01) r (0.96, 0.01, 0.01, 0.01, 0.01) ra (0.01, 0.01, 0.96, 0.01, 0.01) This algorithm has just bn adaptd to Opn Musi, and intgratd in xatly th sam framwork than IP. Thrfor, all th faturs prsntd in th paragraphs 2.3 to 2.6 ar also disponibl for PST. 4 IP/PST omparison 4.1 Bath vs. On-lin PST parsing is not on-lin in natur, as IP is th training txt is viwd as a whol unit and symbol frqunis ar obsrvd ovr th whol txt. Thus thr is no arbitrary parsing as in LZ whr a singl symbol hang in th squn may hav dp influn on th ditionary strutur. So PST may prov mor powrful for short squns but is not pratiabl for ral tim improvisation situations, whr th on-lin natur of IP will b adaptd. 4.2 Sltivity Whil IP gos ovr th training txt and parss all of it, th PST larning algorithm looks at th txt as a whol and piks for its ditionary only pattrns onsidrd rlvant. This sltiv parsing an lad to mor disontinuitis and a smallr rprtoir to improvis on, but th ontxt/infrn ruls stors in th trs may b onsidrd as mor motivatd than in th as of IP. From ths two rmarks, on ould say that IP is mor adaptd in gnrativ appliations, spially improvisation and ral-tim, and PST is mor pris and omplt, and is a good start for musiology appliations whr on wants to ahiv th finst dsription. 4.3 Exampl Thanks to th intgration of both algorithms insid a unifid framwork, it is now possibl to ompar thir rsptiv analysis and gnration for a spifi musial xampl. Blow is a sor gnratd by Opn Musi of an intrprtation of th bginning of a Gnossinn by Erik Sati followd by an improvisation of ah algorithm. As th original xampl dos not fatur a lot of omplxity, on an asily s that th gnrativ pross of both algorithms onsists of a kind of sampling of parts of th original pi, so that th rsult b th most similar to th original squn as possibl. Nvrthlss, ths algorithms may somtim did som transitions btwn th vry littl sampls (whih may onsist of vn on not) that wr not xptd, but that follow thir modlling of th styl of th original pi. As a rsult, th original styl an b rognizd, but in th sam tim, ths gnratd squns fatur a rtain amount of rativity. W invit you to listn to mor omplx improvisations, inludd in
th CDROM, in ordr to appriat th rativ skills of ths algorithms, that ar limitd hr by th simpliity of this didati xampl. 5 Conlusion and Nw Dirtions In th prsnt stat of rsarhs, w hav to onsidr musi as a squn of symbols. It ould b mor intrsting to analyz dirtly th bi-dimnsional sor strutur. This would nabl an intllignt analysis of a fugu for xampl. Although th ontxt-infrn approah may b ompard with th impliation/ralisation viw of Myr, w hav to aknowldg that his viw is mor powrful sin his xptation is in long trm, and not a systmi first ordr xptation lik ours. But w suspt hard omputability problms bhind th long trm approah. In our work, larning is indd unsuprvisd sin w do not fd our systm with musial knowldg, rathr it analyzs musi with onstraind algorithms and dos not prod to any indutiv infrn. Anothr approah would b trying to indu automatially som onlusions about th prsn of musial struturs, with th hlp of minimal ognitiv mhanisms. Othr statistial thniqus hav to b xprimntd in musi and ompard to th two prsntd hr. PPM for instan, an b thought of as going half-way btwn IP and PSTs. Thy ar on-lin (lik IP) but thy modl k-trms of growing k with rspt to data walth (lik PSTs). Finally, on may try instad of modling squn x1 x2 xn basd on its own rdundanis, to try and modl x1 x2 xn basd on th rdundanis of a sond, orrlatd (g 2nd voi) squn y1 y2 yn (transdur). Thn, th gnration of a X-typ squn ould b onstraind by an inoming Y-typ squn. This ould lad to a synhronous improvisation shm, whr th omputr, instad of providing answrs to a human playr, as usual, would play synhronously with him, kping an ovrall polyphoni onsistny. Rfrns Assayag, G., S. Dubnov and O. Dlru. 1999a. Gussing th Composr's Mind : Applying Univrsal Prdition to Musial Styl. Prodings of th Intrnational Computr Musi Confrn. Intrnational Computr Musi Assoiation, pp. 496-499. Assayag, G., C. Ruda, M. Laurson, C. Agon and O. Dlru. 1999b. Computr Assistd Composition at Iram : PathWork & OpnMusi., Computr Musi Journal, 23:3. Bjrano, G. and G. Yona. 1998. Modling protin familis using probabilisti suffix trs. Prodings of RECOMB. Dubnov, S., G. Assayag and R. El-Yaniv. 1998. Univrsal Classifiation Applid to Musial Squns. Prodings of th Intrnational Computr Musi Confrn. Intrnational Computr Musi Assoiation, pp. 332-340. Ron, D., Y. Singr and N. Tishby. 1996. Th powr of amnsia: Larning probabilisti automata with variabl mmory lngth. Mahin Larning 25(2-3):117-149. Gnossinn, by Erik Sati (bginning).
LZ improvisation. PST improvisation.