ICS 4B: Transaction Processing and Distributed Data Management Lecture 7: Providing Database as a Service Talk Outline Software as a Service Database as a Service NetDB System Professor Chen Li Challenges for Database as a Service User Interface Issues Performance Issues Data Privacy Issues Based on slides developed by Data Encryption in DBMSs for Data Privacy Hakan Hacigumus, Bala Iyer, and Sharad Mehrotra ICDE, San Jose, CA, USA Conclusion ICS4B Notes 7 Software as a Service Software as a Service Driving forces to paradigm shift Get what you need when you need Faster, cheaper, more accessible networks Rise of distributed architectures Virtualization in server and storage technologies Established e-business infrastructures Pay what you use Don t worry how to deploy, implement, maintain, upgrade Hardware/Software is not the largest in total cost of ownership User Operations 46% Technical Support 4% Capital Cost (HW/SW) % (Source: Gartner Group) Hardware, software, network costs have been decreasing more sharply than personnel cost ICS4B Notes 7 ICS4B Notes 7 4
Software as a Service Database as a Service - Why? Organizations need data management Already in the market as DBMSs are complex systems to deploy, setup, maintain requires highly skilled people (DBAs etc.) with high cost storage services, disaster recovery services, e-mail services, renta-spreadsheet services etc. Ease of Administration Most Significant DB Execution Problems 58% Sun ONE, Oracle Online Services, Microsoft.NET My Services etc. Qualified Administrators 57% Compatibility 5% Qualified Programmers 5% Why not Database as a Service? ICS4B Notes 7 5 Platform Independence 4% 4 5 6 7 % of respondents (Source: InfoWeek Research) ICS4B Notes 7 6 Database as a Service - Offerings NetDB - Database Service Provision Inherits all advantages of software as a service, plus Service provider allows mechanisms to create, store, access databases DB management transferred to service provider for backup, administration, restoration, space management, upgrades Developed in collaboration with University of California, Irvine and IBM Deployed on the Internet over a year ago Been used by 5 universities and more than 5 students to help teaching database classes Clients use the services providers HW, SW, personnel instead of their own Currently offered through IBM Scholars Program ICS4B Notes 7 7 ICS4B Notes 7 8
NetDB System Architecture Database as a Service - Issues Internet User HTTP Server Servlet Engine Database (Web Browser) (User Data) Warm Standby Standby System Backup/Recovery Three tier architecture Client - as thin as possible - just a browser Java based implementation Backed by fail-over solutions Allows expansions and user driven integration for application development ICS4B Notes 7 9 Issues to address: User Interface Performance Data Privacy ICS4B Notes 7 User Interface Performance ICS4B Notes 7 Simple yet powerful supports SQL queries, scripts, UDFs, stored procedures, metadata, data upload 4 Consistent Region-based composition Expansion/Integration User defined interfaces Interaction in a different medium - network Performance should -at least- match what we have already Experimented with TPC-H database and queries Performance DB vs NetDB Performance Ratio.4..8.6.4...8. Scale Factor. DB NetDB ICS4B Notes 7
myencryption{ asdas dadsasd asdd; asdas d ad ; asfsfffsddas dadsasd asdd; asdas dad asdd; asdas dadsasd asdd; asdadsasd asdd; } Data Privacy Encryption Alternatives Users give control of their data to service provider Attacks on stored data is a well known problem So, they need data security in place Implementation Level Software v.s. Hardware encryption? Security of data over the network is well studied SSL,TSL Granularity of Data Field (Attribute) level Row (Record) level (Disk) Page level ID NAME DEPTID SALARY $Sfsdf@_))#$dw?~$@<?.%*##!@<<&&=+ Fg4$$xX@<+- John White 4 Establish security for stored data even it is stolen should not make sense Encryption! iiiiiiiiiiiiiiiiiiii %&*((@sfddw?~$@<?.%d(*##!@<<&&=+ <?.%d(*##!@%&*((@ 4 %5>LWe?#@ Linda Cone <<&&=+sfddw?~$@ 9?~$<&&=+@<?.% 4 We??#@$&& Bob Drake %&*((@sfddwd(*##!@< 85 %&*((@sfddw? 5 Dadsf$&%!Aq Sarah Brown @<<&&=+~$@<? 7 ((@sfd 95 ICS4B Notes 7 ICS4B Notes 7 4 Encryption Alternatives () Encryption Alternatives () Field level encryption Row level encryption Pros: Easier to implement and integrate Flexible Allows selective encryption, reduces number of bytes to encrypt/decrypt Pros: Reduces the data size expansion problem Reduces invocation cost Better security because of total encryption Cons: Increases encryption overhead significantly due to invocation cost Data size expansion (for block cipher algorithms) Current optimization technologies do not handle foreign functions well Cons: Does not allow selective encryption, increases the number of bytes to encrypt/decrypt Implementation and integration can be hard when row functions are not supported ICS4B Notes 7 5 ICS4B Notes 7 6 4
Encryption Alternatives (4) Encryption Alternatives Experiments Page level encryption Pros: Significantly reduces encryption/decryption overhead due to reduced invocation cost Eliminates data size expansion problem (for block ciphers) Better security because of total encryption Cons: Implementation and integration is not straightforward Increases the number of bytes to encrypt/decrypt each time Higher update/delete cost, requires re-encryption of all affected pages Experimented with TPC-H database and queries Data Granularity Implementation Field Level Row Level Page Level Software Encryption V Hardware Encryption V V Encryption scheme alternatives (V: evaluated, : not evaluated) ICS4B Notes 7 7 ICS4B Notes 7 8 Software - Field Level Encryption Software - Field Level Encryption () Block Cipher Algorithm - Blowfish Implemented as foreign function (UDF) Sample insert insert into lineitem (discount) values (encrypt(,key)); Sample select select decrypt(discount,key) from lineitem where custid = ; Creator supplies the key Unauthorized person can not get hold of the key protection even from the service provider at some level User can easily implement different encryption algorithm and check that into the system different encryption algorithm/key can be used for different fields ICS4B Notes 7 9 ICS4B Notes 7 5
Software - Field Level Encryption () NetDB vs NetDB* with encryption Performance Ratio.5.5.5.5. Scale Factor NetDB TPC-H queries, except Q# * Only one field (l_discount of lineitem table) encrypted Introduced very large overhead NetDB* with encryption ICS4B Notes 7 TPC-H Query # Problem: Multiple decryption on same field select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * ( - l_discount)) as sum_disc_price, sum(l_extendedprice * ( - l_discount) * ( + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from tpcd.lineitem where l_shipdate <= date ('998--') - 9 day group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus; ICS4B Notes 7 Query Rewrite to Improve Performance Hardware - Row Level Encryption Problem: Multiple decryption on same field (e.g., TPC-H Q#) CSE based algorithm to eliminate redundant decryptions Use temporary view Improvement due to rewrite Response Time Improvement Ratio 4.5.5.5.5. Scale Factor Specialized hardware IBM S/9 Cryptographic Coprocessor under IBM OS/9 editproc facility invoked for whole row upon read/write request, encrypt/decrypt is invoked from hardware for the row ICS4B Notes 7 ICS4B Notes 7 4 6
SW Field Level v.s. HW Row Level Query Response Time Software vs Hardware Encryption 4 44 8 ICS4B Notes 7 5 75 Number of Rows Experimented on TPC-H Q# Software Field Level: Only one field is encrypted Hardware Row Level: All fields are encrypted SW HW Hardware - Page Level Encryption Relative CPU Time 5 4 Encryption Alternatives 4. ICS4B Notes 7 6. No Row Level Page Level Encryption Page level encryption is simulated It gives significant improvement due to reduction in start-up cost Conclusion Database as a Service is a new model to alleviates the need to hire professionals purchase expensive hardware/software deal with administrative and maintenance tasks It is viable model and can emerge as a successful offering Encryption is a solution for privacy -the most important issue- Hardware encryption has a clear superiority over software Hardware makes encryption practical for databases There are trade-offs for granularity of data ICS4B Notes 7 7 7