Sustainability Issues in Digital Preservation Krishna Kant George Mason University National Science Foundation Digital Preservation 2013 Alexandria, VA, July 24, 2013
Can Preservation be Sustainable? Significant environmental footprint. Storage media life-cycle (mfg, distribution, ) Housing & access management of media Processing & IO infrastructure (Data Centers)
What do we want to preserve? Not data, but information Ideally, knowledge K. Kant, Sustainability of Digital Preservation
Sustainability Issues Make knowledge derivation more sustainable Retain only essential data Minimize environmental impact of data centers Remove duplicate, inessential data Storage vs. reprocessing Has sustainability tradeoffs K. Kant, Sustainability of Digital Preservation
Data Center Impact Power consumption rising. Most of it wasted: But impact is more than power Power distribution, Cooling Idle Machines Materials, water, manufacturing, Sustainability perspective Energy doesn t matter, its carbon footprint does
IT LOAD 6% loss 94% efficient ~1% loss in switch gear and conductors 480V 13.2kv 13.2kv 0.3% loss 99.7% efficient 208V 13.2kv 115kv UPS: 2.5MW Generator ~180 Gallons/hour 0.5% loss 99.5% efficient 1.0% loss 99.0% efficient 9-10% distribution loss at power source Lots of earth s resources used (metals, rare earths, ) K. Kant, Sustainability of Digital Preservation
Renewable Energy Powered IT? Limit energy draw from grid Less infrastructure & losses, but variable supply Impact on performance, QoS, SLA, Challenges Variability at multiple time scales Reliability issues Need better power adaptability
Cooling Infrastructure
High Temperature Operation Chiller-less data centers Less energy/materials, but space inefficient High temperature operation of comm. /computing equipment Smaller Toutlet Tinlet Deal with occasionally hitting temp. limits. Need smarter thermal adaptability
Overdesign Huge UPS, Generators, dist. frames, power supplies, fans, Engineered for worst case Huge waste of power, materials, Power Supply & VRs Low utilizations Low efficiency Need Better Power Infra. adaptability
Energy Adaptive Computing Overdesign Rightsizing + smart adaptation Adaptation to energy/power/thermal/cooling limitations. Dynamic adaptation of infrastructure & workloads Need coordination across compute, network & storage.
Data Growth Exponential growth in both generation & retention Data vs. useful data More data More junk (Less information) Duplication, Keep just in case, too lazy to purge, But, can we define useful?
Sustainability Impact of Data Increased power consumption 10-40% of total power Insatiable drive demand Life cycle impact Cumulative impact because of little deletion Not sustainable!
Data Reduction Opportuninites n ai t, e m en o D m tiv e,d ra ter t is cen in a at m d d A me, Ho ct e t, bj sho t ar p ct t, e bj sho O ap B, Sn /D le ct e ot, j b sh A om dm O e, in Sn b da is Fi a je ta tr le ps c /D h t ce a B, ot, nt ti er ve,d O Sn b ep Do Fi a je ar m le ps c tm a /D h t o en in B, t, t, O Sn b Fi a je le ps c /D h t B, ot, H Fi O ap B, Sn /D le Fi O ap B, Sn /D le Fi Administrative Domain Home, data center, department, Object Object Object Snapshot, File/DB, Snapshot, File/DB, Snapshot, File/DB, K. Kant, Sustainability of Digital Preservation
Data Reduction Within an object Within and across administrative domain Compression, compressive sampling, delta encoding, remove bundled VM Deduplication across objects & storage nodes Cloud based storage/access/deduplication Only a few copies (collectively) across nodes Tradeoffs Storage vs. data movement vs. processing Fidelity vs. cost (reduced representations) Similar, filtered, derived, Cross domain access/privacy/security issues
Role of Content Creators Best Practices Send link instead of content Don t create local copy Purge obsolete, defective, unneeded data Data is valuable, meta-data is precious! Designed, not an afterthought! Strong association with data Must reflect data quality Preservation more crucial than for data K. Kant, Sustainability of Digital Preservation
Thank you! K. Kant, Sustainability of Digital Preservation
Other Sustainability Aspects Physical degradation of media Will require keeping media healthy for exponentially increasing data Unsustainable Media obsolescence HW & SW obsolescence Increasing amount of materials to manufacture storage media Unable to power the media Loss of meta-data or inadequate meta-data Have the data but don t know how to use it/ K. Kant, Sustainability of Digital Preservation