INTERNET DATABASES IN QUALITY INFORMATION IMPROVEMENT Cosmn TOMOZEI Unversty Assstant, Mathematcs and Computer Scence Department, Faculty of Scence Unversty of Bacau, Bacau, Romana E-mal: tomcosmn@yahoo.co.uk Abstract: Even though many mportant companes are reluctant nto deployng ther databases on the Internet, beng too concerned about securty, we would lke to demonstrate that they shouldn t be worred too much about t, but to try to provde nformaton n real-tme to management, boards or people who travel on companes nterests. However, securty s one of the most mportant factors that should be offered to webstes and databases on the Internet. If we consder one of nformaton qualty metrcs, the tme between the sendng of the message and the recevng, t can decrease consderably thanks to a secure, normalzed and non-redundant database. Another drecton of our study s the nterdscplnary approach, ncludng the cooperaton between management scence, nformaton technology and quanttatve analyss n order to provde a perspectve for mprovng nformaton s qualty. Key words: data protecton; redundancy; data ntegrty; flexblty; economc effcency; relablty Internet databases, concepts and technologes In the last decade, databases have been mostly Internet orented, Oracle or SQL Server provdng opportuntes for web developers to use them n ther applcatons. Companes, due to globalzaton also orented ther Informaton Systems to a dstrbuted approach, avalable for users, target groups from all over the world due to Internet connectons or remote connectons. Such an applcaton we developed for the Chamber of Commerce and Industry Bacau wth the man purpose to help the management to store and update ther actvtes and offer support n the process of decson makng. Formerly, when a department needed nformaton from the database, t would have been compulsory to ask the IT department to retreve t from the nformaton system and the answer would have come n a few days, due to the amount of actvtes n whch the department was already nvolved. Now, anyone can do hs own dutes va Internet applcatons, the software gvng them specfc answers and nformaton n real tme. As a result, the gap between the solctaton of data and the tme of response has been reduced to almost zero. 83
The economc effcency s another mportant approach because the man objectve of any economc entty s to be effcent, proftable and to gan market success. Effcency and success are strongly connected wth a sold, non-redundant, fast movng and secure nformaton system. However, some of the actvtes cannot be publshed on the outsde network wthout beng protected from any knd of ntruson attempt or unauthorzed access. For such a reason, we dd our best to mantan protecton, securty and ntegrty as man prortes. In Informaton Theory, redundancy s defned as the number of bts used to transmt a message mnus the number of bts of actual nformaton. Data compresson s the most mportant way to elmnate the unwanted redundancy. On the other hand, Database Theory defnes redundancy as the degree of data multplcaton, whch can be accepted f t doesn t exceed a certan level so as to become uncontrolled. However, n dstrbuted systems redundancy appears due to data replcaton, f such a project approach s desgned by the analyst. Replcaton s the process n whch two or more databases communcate transferrng data and nformaton. Even though t may look lke nformaton clonng the dfference between clonng and replcaton consstng n the fact that a replcaton server s supervsng the data transfer (replcaton server eg IBM DataPropagator for Seres, V8.1 ).Every database s updated when updates or modfcatons are made on one replca. Data Integrty has been also very mportant n our approach, and t has been taken nto consderaton by both sdes: - entty ntegrty, meanng that the prmary key columns for each table must contan a unque value. - referental ntegrty, each value n a foregn key column must exst as a prmary key of the table t references. Furthermore, the securty s also assured by grantng access on the database, to read, wrte, nsert, update or delete data dependng on the role that the user has. The database admnstrator, n our case the IT analyst from the Informaton Technology Department s the one enttled to create user accounts and to offer them specfc rghts. Consequently, a unque password and username s provded to every user for workng on the database after the process of authentcaton. For example, the department of Busness Informaton as admnstrator of the busness nformaton system, offerng advce to companes about busness opportuntes, creatng and supportng contacts and partnershps between companes as well as offerng support for the development of the economc regon, s enabled to make daly updates to the databases. The employees of the Department of Busness Informaton are also responsble for the correctness and the accuracy of the data. Internet database qualty metrcs The frst step n database desgn s data modelng whch realzes the lnk between the end-users and the software soluton that becomes useful for them. The concordance (agreement) s a metrc whch compares the results of a specfc prototype to the ntal objectves. Even though t represents a small fracton of the entre development process, t s very mportant to spend enough tme and resources for data modelng. 84
It s also very mportant that ths stage of the process should not reman just a theoretcal approach but actually to be mplemented and followed on the other steps of applcaton development. Software engneers, applcaton developers and analysts are focusng manly on the code elaboraton related qualty metrcs, process modelng and data analyss not beng as much taken nto account as the above mentoned factors. In [PIGECA06] 1 there are some proposals of data models. The model should also be valdated, audted before contnung the next steps. Project complexty s defned as n E = ( ) where n represents the number of enttes of the project, C>1. The complexty of the entty s represented by = 1 E C E = D * F D - data archtecture complexty; F - functonal complexty; D = R ( a * FDA + b * NFDA ), R - the number of relatonshps(assocatons) FDA - the number of functonal dependences NFDA- the number of non functonal dependences. Data modelng, relatonshps and dependences are all shown n our applcaton n the database dagram 1, whch descrbes the entty-relatonshp model of our applcaton. Accuracy measures f the database enttes are correct and all the errors have been removed from the model. Accuracy s defned by the trplet (Entty, Feature, Value), Et=(en,fe,va). En - database entty (whch has ts orgns n the conceved model, such as data table; Fe - a property of the entty; Va - a quanttatve or qualtatve measurement of the entty. The more accuracy ncreases the mnmum dfference between the actual value(val) and the correct value (val ) s. should mnmum f not zero. That leads us to the dea that Mn(ERR)=Mn Err= n ' val val = 1 = 1 n val val, (*) for each entty. It s also mportant to menton that data conversons are very useful but they should be followed by verfcatons. One of the nstruments that makes data transformaton easer s the data adapter. If the data provded by the web controls value feld s strng type, and the feld n the table related to that control s nt type, data converson s compulsory to prevent errors. 85
If not detected n due tme, the eventual errors produced may lead to an alteraton of the ntermedate results and consequently to the alteraton of the general results. Stll, detected at the end of the elaboraton process, the tme of the error makng wll be very dffcult to fnd out and therefore the correcton wll be much more dffcult. If not detected at all durng the process of testng, the fnal results wll be absolutely wrong and very hard to correct, only after mnute and long lastng work. In order to avod ncorrect takng decson by the manageral staff, t s advsable to check from tme to tme, usng test sets of data f errors have occurred and f so, they overcame a certan amount of sgnfcance. Fgure 1. Database entty-relatonshp dagram For example, Convert.ToInt32 (DropDownLst1.SElectedItem.Value); makes the converson from a data type of strng to an nteger of 32 bts, avodng strng type varables to be put nto calculatng formulae, whch may lead to a defnte alteraton of the results as well. The followng concepts should be taken nto account n the process of Internet Databases applcaton development: understandablty, completeness, flexblty, 86
relablty and data protecton. The last concept has been dealt wth n the frst chapter of the artcle. Understandablty nvolves that developers should thnk from user perspectves and try to empathze wth hs problems, technologcal fear and rejecton of the new. When changes are made, the target group s traned to cooperate wth the new applcaton. Completeness s another mportant web database qualty metrc. An applcaton s complete when all the tems from the model (such as the tables and columns from the database dagram) correspond to the user requrements. At every step (prototype) t s recommended to study f the stage s complete and f s not there should be made some correctons. In our applcaton, there have been omtted some user requrements such as the storage of offcal letters that are very mportant for the busness meetngs. A table was added and t has been related to the other tables n the relatonal model. Problems also appear f the end user requrements are not well defned, whch leads the development team to confuson. The target group must be ntervewed n the development process to fnd out that the partal outputs are accurate. The applcaton s also flexble. Flexblty makes an applcaton economcally effcent. The number of changes n the future s not supposed to be very bg and also changes must be easy to accomplsh. The more the applcaton s flexble, the lower cost of changng and upgradng s. It s also very mportant for the applcaton to be relable durng the lfe cycle. We hope that our applcaton would mantan all ts functons and procedures for a long perod of tme wthout any rremedable, beyond repar errors. It s desred that t would work for a long perod of tme. The ndex of relablty s I r = ndatc ndat where ndat c represents the number of datasets that generated the results; ndat, the total number of datasets. A good applcaton s realzed when the ndcator has the value greater than 0,78. The target group conssts of the employees of each department of the Chamber; they have been ntervewed from the begnnng and on the course of the entre development process. The predctve test results have been ncluded n the table 1. Table 1. The target group s predctve test results Department No of users No of queres Good Updates Errors responses Busness Informaton 5 35 30 2 2 Fars and Exhbtons 12 50 46 8 3 Human Resources 3 35 33 4 0 Publc relatons 5 70 69 3 0 There are 25 users whch made 190 queres n one week. Average queres per employee s 190/25=7,6 queres per user. The number of good responses s 178. The ndex of relablty s 0,93 meanng that the software applcaton s very good. 87
Conclusons Ths paper related some theoretcal themes, studed by many authors n computer scence, wth the role of descrbng the qualty aspects of web relatonal databases. The statstc calculatons are mportant for each software metrc on all stages of the project realzaton because they show the exact level of precson and report the percentage n whch the ntal objectves have been realzed. It s also useful to relate statstc results from each step to the next step of the project, every step, ncludng analyss and data modelng havng the same sgnfcance level for the objectve s realzatons. Economc effcency must also be acheved by makng a relable applcaton that would help the manageral board to take approprate decsons and not to put them n dffculty. The study s also nterdscplnary, because of the dfferent approaches, economc, techncal and socal. In the future, the employees wll have to answer to questons, about the usablty of the programs and the development team wll take them nto consderaton. References 1. Arsanjan, A. Empowerng the Busness Analyst for on Demand Computng, IBM Systems Journal, Vol.44, nr.1, 2005 2. Block, E., Hemstra, D., Choenn, S. Predctng the cost qualty trade off for nformaton retreval queres: Facltatng Databases and query optmzaton, www.utwente.nl, 21.11.2006 3. IBM Info center, www.publb.boulder.bm.com, 28.11.2006 4. Ivan, I., Popa, M., Popescu, Al. The Aggregaton of the Text Enttes, Economc Computaton and Economc Cybernetcs Studes and Research 38, no. 1-4, 2004, pag.37 50 5. Ivan, I., Saha, P. Qualty Characterstcs of the Internet Applcatons, Software 2003 - ASE Prntng House, 2004 6. Ivan, I., Vsou, A. Economc Models Bass, ASE Publshng House, Bucharest, 2005 7. Platn, M., Genaro, M., Calero, C. Data Model metrcs, www.uclm.es, 20.11.2006 1 Codfcatons of references: [ARSA05] Arsanjan, A. Empowerng the Busness Analyst for on Demand Computng, IBM Systems Journal, Vol.44, nr.1, 2005 [BLOHI06] Block, E., Hemstra, D., Choenn, S. Predctng the cost qualty trade off for nformaton retreval queres: Facltatng Databases and query optmzaton, www.utwente.nl, 21.11.2006 [IVSA03] Ivan, I., Saha, P. Qualty Characterstcs of the Internet Applcatons, Software 2003 - ASE Prntng House, 2004 [IVPO04] Ivan, I., Popa, M., Popescu, Al. The Aggregaton of the Text Enttes, Economc Computaton and Economc Cybernetcs Studes and Research 38, no. 1-4, 2004, pag.37 50 [IVVI05] Ivan, I., Vsou, A. Economc Models Bass, ASE Publshng House, Bucharest, 2005 [PIGECA06] Platn, M., Genaro, M., Calero, C. Data Model metrcs, www.uclm.es, 20.11.2006 [www1] IBM Info center, www.publb.boulder.bm.com, 28.11.2006 88