CSE 231 Fall 2015 Cmputer Prject #4 Assignment Overview This assignment fcuses n the design, implementatin and testing f a Pythn prgram that uses character strings fr data decmpressin. It is wrth 45 pints (4.5% f yur curse grade) and must be cmpleted n later than 11:59 PM n Mnday, Octber 12. Assignment Deliverable The deliverable fr this assignment is the fllwing file: prj04.py the surce cde fr yur Pythn prgram Be sure t use the specified file name and t submit it fr grading via the handin system befre the prject deadline. Assignment Backgrund We use data cmpressin all the time: music is cmpressed by the MP3 algrithm, vides are cmpressed by the MP4 algrithm, and phts are cmpressed by the JPEG algrithm. A fundamental part f cmpressin is t remve repeated patterns and replace them with a cde. Fr example, cmpress was used repeatedly in this paragraph and culd be replaced with a cde (a pair f numbers). If yu d that ften enugh, yu can save a lt f space. Cnsider the fllwing traditinal ditty: Pease prridge ht, Pease prridge cld, Pease prridge in the pt, Nine days ld. Sme like it ht, Sme like it cld, Sme like it in the pt, Nine days ld. Using a relatively simple algrithm, that cmpresses t the string: "Pease prridge ht,\(20,15)cld,\(21,15)in the p(48,4)nine days (42,3).\\Sme like it(82,6)(18,13)(80,6)(19,13)(78,26)"
Here, we are using the backslash character ("\") t represent a carriage return (r new line character) and a pair f numbers t represent text that is repeated frm earlier in the string: the first number tells yu hw far t g back and the secnd tells yu hw many characters are repeated frm that starting pint. Yu decmpress the string by replacing each pair, in its turn, with a substring frm earlier in the decmpressed part f the string. Finally, yu replace each backslash with a newline ("\n"). Fr example, the pair (20, 15) is t be replaced by the substring btained by ging back 20 characters and cpying 15 characters starting at that pint. Ging back 20 characters takes yu t the beginning and then yu cpy 15 characters t get the substring "Pease prridge ". Replacing the pair with this substring results in the (partially decmpressed) string: "Pease prridge ht,\pease prridge cld,\(21,15) " (T save space, we have used ellipses ( ) t represent the rest f the cmpressed string.) Fr the next pair, yu g back 21 characters t the beginning f the secnd line where yu again cpy 15 characters, i.e. "Pease prridge ". Replacing the pair with this substring yields: "Pease prridge ht,\pease prridge cld,\pease prridge in the p(48,4) " Next, the pair (48,4) takes yu back 48 characters t the "" in "ht". The substring btained by cpying fur characters is "t,\". Replacing the pair with this substring gives yu: "Pease prridge ht,\pease prridge cld,\pease prridge in the pt,\nine days (42,3) " The pair (42,3) takes yu back 42 characters t pick up the 3 characters "ld"(frm "cld,"). The decmpressed string s far is: "Pease prridge ht,\pease prridge cld,\pease prridge in the pt,\nine days ld.\\sme like it(82,6) " And s n. Last week Ggle annunced a new cmpressin algrithm named Brtli. S what? Fr ne, it is 26% better than existing algrithms. Six years ag, Ggle s disk space was estimated at an exabyte (10 18 bytes = 1,000,000 terabytes). The imprved cmpressin will save Ggle a quarter f that space, r abut 250,000 terabytes. Since terabyte disk drives cst abut $100 each, the better cmpressin will save Ggle apprximately $25 millin. The savings might be much larger nw since Ggle s disk space has certainly grwn in the past six years.
Assignment Specificatins Yu will develp a Pythn prgram that allws the user t decmpress any number f strings. The prgram will prmpt fr a string t decmpress. As lng as the user inputs a nn-empty string, the prgram will utput the decmpressed string. The decmpressed string will have prper carriage returns that replace the backslashes in the input string. When the user presses the enter/return key withut first inputting a string, the prgram will display an apprpriate terminatin message and halt. The decmpressin algrithm lks fr a left parenthesis "(". Nte that the string methd find() is useful t lcate a specific character, such as a left parenthesis. Then, use find() t lcate the next cmma and grab the slice in between t extract the substring representing the first number (which, f curse, needs t be cnverted t an integer). Use find() t extract the secnd number in a similar way. Nw that yu have the tw numbers, wrk backwards in the string and use slicing t btain the substring yu need t cpy. Repeat. We will make tw simplifying assumptins: the nly parentheses and backslashes in the cmpressed string are fr encding (i.e. the riginal string des nt cntain any parentheses r backslashes). Assignment Ntes 1. T clarify the prject specificatins, sample utput is prvided at the end f this dcument. 2. The cding standard fr CSE 231 is psted n the curse website: http://www.cse.msu.edu/~cse231/general/cding.standard.html 3. Items 1-7 f the Cding Standard will be enfrced fr this prject. 4. A useful pattern when wrking with strings is t start with an empty string and add characters t it smething that wrks nicely here t build the decmpressed string. 5. Slicing is useful in multiple places in this assignment. Fr example, it is useful fr extracting the string t be cpied nce yu have extracted the pair f encding numbers. 6. T simplify yur prgram we will assume that parentheses are nly used fr encding. That is, if yu find a parenthesis, yu knw that yu have an encding string f the frmat (first_number,secnd_number). In additin, there will be n spaces within the parentheses f the encding string.
7. A useful first test case (frm the famus slilquy in Shakespeare s Hamlet) has nly ne substitutin and is shrt enugh t nt have any carriage returns: t be, r nt (14,6) 8. There are tw different appraches t slving this prblem: a. One uses a while statement fr the main lp, using find() t lk fr indices f the pair f numbers within the encding rdered pair. One thing t cnsider is when t stp the lp hw d yu knw when yu dn t have any left parentheses left? b. Using a fr statement as the main lp is pssible, but the lgic is messier because yu need Bleans t keep track f when yu are gathering characters fr the tw numbers in the encding rdered pair as well as a Blean that keeps track f when yu are wrking n extracting numbers s yu recgnize that cmma as separating the numbers. 9. Use the fllwing symblic cnstant fr the backslash character. It is a duble backslash because a backslash has a special meaning (fr example the \n used fr carriage return). Therefre, Pythn will interpret the duble backslash as a single backslash. BACKSLASH = "\\" Suggested Prcedure Slve the prblem using pencil and paper first. Yu cannt write a prgram until yu have figured ut hw t slve the prblem. This first step can be dne cllabratively with anther student. Hwever, nce the discussin turns t Pythn specifics and the subsequent writing f Pythn statements, yu must wrk n yur wn. Cycle thrugh the fllwing steps t incrementally develp yur prgram: Edit yur prgram t add new capabilities ne at a time. Using the phrase frm Hamlet s slilquy is a gd way t begin. In fact, yu might start with it as a string defined in yur prgram s yu dn t have t keep inputting it. Use a divide-and-cnquer apprach t slving the prblem. Fr example, start by simply finding and printing the first number f the first encding pair print it. Then find the secnd number f the first encding pair print it. Then use thse tw numbers t extract the string t be cpied and print it. Then mdify yur prgram t handle the phrase frm Hamlet s slilquy. Get that wrking befre trying t handle mre cmplex cases. Use the handin system t submit the current versin f yur prgram.
Be sure t use the handin system t submit the final versin f yur prgram. Be sure t lg ut when yu leave the rm, if yu re wrking in a public lab. The last versin f yur slutin is the prgram which will be graded by yur TA. Yu shuld use the handin system t back up yur partial slutins, especially if yu are wrking clse t the prject deadline. That is the easiest way t ensure that yu wn t lse significant prtins f yur wrk if yur machine fails r there are ther last-minute prblems. Yu wuld als be wise t save a cpy f yur cmpleted prgram in yur CSE file space (the H: drive n the lab machines) befre the prject deadline. If yu write yur prgram at hme and turn it in frm hme, yu shuld cpy it t yur CSE file space befre the deadline. In case f prblems with electrnic submissin, an archived cpy in the handin system r the CSE file space is the nly acceptable evidence f cmpletin. Sample Output