Cacheability Analysis of HTTP traffic in an Operational LTE Network

Cacheablty Analyss of HTTP traffc n an Operatonal LTE Network Buvaneswar A.Ramanan, Lawrence M.Drabeck, Mark Haner, Nach Nth, Therry E.Klen, Chtra Sawkar Bell Labs Research, Alcatel-Lucent 6 Mountan Ave Murray Hll, NJ 7974, USA (buvana.buvaneswar, l.drabeck, mark.haner, karun.nth, therry.klen, chtra.sawkar)@alcatel-lucent.com Abstract Wth the rapd ncrease of traffc on the web, content cachng reduces user-perceved latency as well as the transmsson of redundant traffc on the network. In ths study, we analyze the gans of HTTP content cachng at the locaton of SGW n an LTE Wreless network. Hgh cache ht rato can be acheved f the proxy server caches only those contents that are guaranteed of sgnfcant revsts. In ths paper, we dentfy such contents for optmum proxy server performance. We compare the cacheablty gans for dfferent content types such as mage, vdeo, text etc, and also for popular webstes. Our analyss shows that amongst all the contents, mage type have the hghest revst rate, whch means cachng them s benefcal. Amongst the popular webstes compared, cacheable contents from Facebook have the hghest probablty of revsts. We extend the analyss by varyng the nterval of cachng and studyng ts effect on the cacheablty. Based on these results, we provde gudelnes for confgurng the proxy server for hgh cacheablty benefts. Keywords-component; cacheablty, Revst, Cache Ht, LTE Network, Content cachng, HTTP, Content type, Hosts, enodeb, SGW, Data Analyss, User Plane, Traffc Statstcs I. INTRODUCTION Web cachng s a well-known strategy for mprovng the user experence by keepng Web objects that are lkely to be used n the near future at a locaton closer to the user. Proxy servers play the key roles between users and web stes n mnmzng the response tme of user requests and savng of network bandwdth [1], [2]. Wth the advent of 4G LTE (Long Term Evoluton) technology and ts natonwde popularty and usage, data traffc n moble networks s growng rapdly. Addtonally, 4G networks have enhanced capablty to support more features for socal networkng than prevous generatons. VoIP, peer-to-peer, and flexble fle sharng for multmeda communcatons wll be used extensvely n 4G networks. Extendng web cachng to moble networks, t s natural to target the SGW/PGW (Servng Gateway/Packet Data Network Gateway) locaton for proxy cachng. Takng ths one step further, some popular content (that s expected to be downloaded by a large number of users and s farly predctable) can be pre-loaded and cached at the enb (evolved Node B,.e. Base Staton) to reduce the congeston on the backhaul lnk and reduce the overall download tme and hence the user-perceved qualty of experence. In ths paper, we am to quantfy the cachng gan n terms of the rato of requests for whch the user perceved latency can be mproved and that the bandwdth requred s reduced. Most mportantly, we provde gudelnes n terms of confguraton parameters for the proxy server on a per-content type / per-host bass for maxmum returns. Our analyss and results are based on LTE network data gathered n one of the busest markets n North Amerca. Ths s the frst study to the best of our knowledge on the LTE/3GPP networks amed towards provdng gudelnes based on content attrbutes for proxy cachng. Studes exst n the lterature that are based on users of resdental Internet, 82.11 and 3G Cellular networks. In [3], a match-makng content dscovery soluton for wreless network s proposed, also mplementng confgurable cachng schemes. In [4], a locaton aware cachng scheme s proposed for dynamc envronments to ncrease cache hts and reduce cache query response tmes. Both [3] and [4] base ther results on smulaton model and not on real network data. The mpacts of cachng-related HTTP (Hypertext Transfer Protocol) headers on cacheablty decson were nvestgated n several tracebase studes. C.H.Ch, et al. n [5] provde references on these studes. B.Ager, et al. [6] nvestgated the potental of cachng for a set of applcaton protocols, peer-to-peer and clentserver ncludng HTTP, usng data from 2, DSL subscrbers n 27. A.Arvdsson, et al. [7] propose a dstrbuted cachng scheme and clam that ther scheme s partcularly sutable for cellular moble networks n mnmzng backhaul transmsson. Though ther study s targeted for Cellular networks, t s based on two popular content models rather than on real Cellular Network traffc. In [8], redundant HTTP transfer analyss was done based on smartphone user data n a commercal 3G network; the focus was towards handset cachng and not proxy cachng. J.Erman, et al [9] have explored forward HTTP cachng n 3G (UMTS - Unversal Moble Telecommuncatons System) cellular networks by usng traffc traces generated by mllons of users over a perod of 36 hours from one of the world s largest 3G networks. Ther study focuses on a cost model for buldng herarchcal cachng nodes at natonal and regonal levels based on the observaton that the cache ht rato can reach as hgh 33% at the natonal level and about 27% at regonal level. Our study s dstnct from the 3G network studes of [7-9] n 978-1-4673-513-6/13/$31. 213 IEEE

that we are the frst ones to propose selectve cachng based on content type and host. Though ths study s based on LTE data, the results on cacheablty gans per category are applcable to UMTS and EVDO (Evoluton Data Only) networks as well snce t depends more on the usage nature of the end user devces than on the ar nterface technology. The rest of ths document s organzed as follows. Secton II gves an overvew of data collecton and the felds used for analyss. Secton III defnes two cacheablty beneft metrcs that are used throughout our study. Secton IV lsts the constrants that were used n our analyss for classfyng a HTTP response as cacheable or non-cacheable. Secton V shows the results of our analyss and we conclude n Secton VI. II. OVERVIEW OF DATA COLLECTION Our data was collected n a lve LTE network n a major North East dense urban metropoltan market. We use a Bell Labs developed network montorng tool called LTE Xplorer (see Fgure 1), whch lstens to the control (S11) and data lnks (S1U) and computes statstcs such as per-enb and per-bearer data volume and number of packets per mnute, aggregate data volume and packet counts. It also decodes HTTP packet, retans the correlaton of the HTTP Request and correspondng response wth the dentty of the UE (User Equpment) and the enb(s) servng the UE. For the market under study there are ~13 enb.cells (.e. sectors) and the enbs are assgned to one of two SGW by Trackng Area Code. Ths means a specfc SGW wll be handlng most of the connectons of enbs belongng to a certan geographcal area. The remanng connectons (whose proporton s small) from these enbs wll be handled by the other SGW. The records from real-tme data capture are uploaded to a MySQL database hosted n a Red Hat Enterprse Lnux server. The felds are: UE Id, HTTP get tme & response tmes, enb Id, UE & Server IP & Port number, Response Code, Content type & length, URL, Host, Referrer Cooke, Cache-control, max-age, Range start, end and length, Set-Cooke, Vary, Last Modfed Tme and Expres. The analyss was done on 24 Hour worth of data from 3/1/212 : thru 3/2/212 : and conssts of > 42 Mllon HTTP requests. enb S5 enb SGW S1U Router/ Swtch enb Tap S11 Tap LTE Xplorer enb Fgure 1. LTE Xplorer confguraton MME III. QUANTIFYING CACHEABILITY Cachng refers to savng a copy of a reply to a request on a server the cache wth the ntenton to satsfy subsequent requests for the same content from the cache nstead of the orgn server. All subsequent requests for ths cached data are provded by the proxy server whch s strategcally located near the users and results n qucker data retreval and lower network bandwdth usage. To quantfy the cacheablty benefts we adopt a metrc smlar to B. Ager, et al [6] who nvestgated the potental of cachng for a set of applcaton protocols, peer-to-peer and clent-servers. They ntroduced the metrc, cacheablty, to quantfy the gans of cachng. Ths metrc s based on the number of cacheable requests and the cacheable data volume. The metrc on cacheablty benefts based on data volume s essentally the revsted data volume % and s gven by: Revsted DV% 1 * n = 1 = n = 1 ( k 1) s k s Here k denotes the total number of downloads for tem and s, s the sze of tem. The metrc on cacheablty benefts based on the number of revsts s essentally the % of revsted contents and s gven by Revst % 1 * n = 1 = n = 1 ( k 1) k (1) (2) Here k denotes the total number of downloads for tem. IV. CRITERIA FOR CACHEABLE CONTENTS There exsts several studes regardng what type of data s deemed as cacheable at a proxy server [5], [1-12]. The mpacts of cachng-related HTTP headers on the avalablty decson were nvestgated n several trace-base studes. C.H.Ch, et al. n [5] provde a survey of these studes. We adopt ther recommendatons on the avalablty, freshness and revaldaton decsons wth a few mnor modfcatons. The followng cases are deemed not-cacheable: Set-cooke non-null Vary non-null (For smplcty reasons, we do not consder them for cachng. For detals, refer to Secton 7.3.3 n [2]) Content-Length = Last-Modfed = Cache control = prvate / no-store / no-cache All HTTP Response codes NOT n the set {2, 23, 26, 3, 31, 41} Responses were grouped wth respect to URL, Range_start and Range_end. We assume the entre content volume as denoted by Content Length s downloaded. It s possble that whle the contents of an orgnal vst to a gven URL were

beng fetched, the user navgated away resultng n only the partal content avalable at the proxy server. Ths tends to lead to an overestmaton of the data volume for large fles (lke vdeo). V. RESULTS A. Cacheable Content Volume For data collected over 24 hours, Table 1 shows the requests, data volume, proporton of data that s cacheable and the revst rate (ht rate) for data stored n the cache. Over the entre day of 3/1/12, 42.6 Mllon content requests were recorded and ths corresponded to a data volume of 12.2TB. There were 13 mllon cacheable requests whch correspond to 9 TB of data. The response for 54.3% of the cacheable requests was revsts but ths corresponds to only 81 GB of data volume. Table 1. Cacheable Data Statstcs for a Gven Measurement Perod Category Value Data Collecton Tme 3/1/12 (24hrs) Number of Request 42.6 Mllon Data Volume (all Requests) 12.2 TB Cacheable % (Requests) 3.6% Cacheable % (Data Volume) 73.9% % Revsts (Requests) 54.3% % Revsts (Data Volume) 9. % Revaldaton of Revsts (Requests) 26.9% A network proxy nstalled at the SGW that adheres to all the constrants of Secton IV would mprove the user perceved latency of 7 mllon requests (16%) and reduce the backhaul by 81 GB (6.6%) The proxy would have to save 13 Mllon records that occupy about 9 TB n order to acheve ths effcency. A hgh revst rate but low revst data volume makes t clear that there s a set of small objects that are frequently revsted and there are some bulky contents that are not revsted. Is t possble to dentfy them based on attrbutes such as content-type and host and exclude them from cachng? Ths wll result n smaller cache that can be mantaned for longer. State of the art proxy servers contnuously perform the updates and evcton; the revst rate wll certanly ncrease as the nterval of cachng ncreases. A lmtng factor for a proxy server s the space avalable for storage. It would be typcal for a network operator to store contents wth hgh predcted revst rate and exclude those bulky contents wth low predcted revst rate. By performng a per-content type / per-host analyss, we can predct the revst rate of contents and confgure the proxy server wth a decson polcy for storng contents based on the content type, length and host felds. Please note that although the results presented here are for a 24 hour perod on 3/1/213, we found that there was no consderable varaton for other weekdays between 2/13/212 and 3/2/212. Fgure 2. Percentage of Requests and Data Volume that are cacheable for Dfferent Content Types B. Cacheablty Benefts of dfferent content types The contents are grouped nto 6 major categores and we study the metrcs on cacheablty benefts, the Revsted_DV_% and Revst_% for each content type. Proporton of Cacheable Data Volume for Dfferent Content Types (3/1/212 ) vdeo 43% text 6% other mage 2% audo 1% applcaton 9% applcaton 48% Proporton of Cacheable Requests For Dfferent Content Types (3/1/212) other text 5% vdeo 1% mage 85% audo Fgure 3. Dstrbuton of data volume (top) and number of cacheable requests (bottom) among the dfferent content types

Revst and Revst DV % Revaldaton % 8 7 6 5 4 3 2 1 Percentage Revsts and Revsted Data Volume for Dfferent Content Types (3/1/212) Revst_% Revsted_DV_% applcaton audo mage other text vdeo Content Type Percentage of Revsts that Need Revaldaton for Dfferent Content Types (3/1/212) 1 9 8 7 6 5 4 3 2 1 applcaton audo mage other text vdeo Content Type Fgure 4. Revsted_DV_% and Revst_% (top) and Revaldaton% (bottom) for dfferent content types Fgure 2 shows the % of cacheable requests and data volume for the dfferent content types. Audo, mages and vdeo have >5 of the request beng cacheable but applcaton, text and others have < 1 of the requests beng cacheable. The data volume shows a dfferent story wth > 5 of the data volume for all categores beng cacheable. Now lookng at the total cacheable requests and data volume amongst all categores, Fgure 3 top shows that applcaton and vdeo have the largest data volume for cacheable content. Ther sze exceeds the other content types by almost an order of magntude. If ther Revsted_DV_% value s also good, then cachng them wll result n consderable bandwdth (S5) savng. Fgure 3 bottom shows, by content type, the number of requests for cacheable contents. Image content type has the most cacheable requests wth ther cacheable requests almost an order of magntude large than the other content type. If ther Revst_% value s also relatvely hgh, then cachng them wll result n sgnfcantly mprovng the user perceved latency. Audo and vdeo content types have the smallest number of cacheable requests. For the proxy server to be effectve, we not only requre the requests to be cacheable (Fgure 3), but also the number of tmes the contents are revsted to be 1 (cache ht). Fgure 4 shows the Revsted_DV_%, Revst_% and Revaldaton% of dfferent content types. Remember, the Revst % and Revsted_DV_% are the percentage of requests and data volume that are replcates and could be served by the cache. Here are some results and observatons: Image contents show both a large Revsted_DV_% and Revst_%. Snce the mage requests are the hghest among all the cacheable requests, excellent user perceved latency can be acheved by cachng them. Consderable bandwdth reducton s also possble. The cost ncurred for these gans s small (only ~2 revaldatons). Text contents have hgh Revst_% but very low Revsted_DV_% showng that small text objects are revsted more often than large text objects. Cachng such text objects wll result n mproved user perceved latency but wll not really reduce backhaul data volume and wll have farly hgh revaldatons (~4). Applcaton has the largest cacheable data volume and second largest cacheable requests but, lke text, shows large Revst_% and farly small Revsted_DV_% (many small objects revsted but no large ones). Ths agan wll mprove latency but not help wth backhaul reducton wth cachng.also, the revaldatons are hgh at ~5. Audo shows far Revsted_DV_% and Revst_%, but t s revaldaton s extremely hgh and t s data volume and number of requests are low makng ths not a good canddate for cachng Vdeo content has low revsts and revsted data volume and a very small number of requests (1%) but t has a very large data volume. Cachng vdeo could result n a reducton n backhaul traffc but a large cache space would be needed. There would be very lttle f any mpact on latency wth vdeo cachng. Thus, f the am of proxy server s to reduce user perceved latency, then confgure t to cache mage, text and applcaton. If the am s to save on bandwdth, then confgure t to cache just vdeo. Revst % Revst Data Volume % 8 7 6 5 4 3 2 1 6 5 4 3 2 1 Revst % vs Cachng Interval for Dfferent Content Types applcaton audo mage text vdeo Content Type 12 Hour 24 Hour 72 Hours Revst Data Volume % vs Cachng Interval for Dfferent Content Types applcaton audo mage text vdeo Content Type 12 Hour 24 Hour 72 Hours Fgure 5. Revst_% (top) Revsted_DV_% as a functon of cachng nterval for dfferent content types

Next, we study the behavor of Revst_% and Revsted_DV_% of dfferent content types as a functon of cachng nterval. The am s to see f ncreasng the cachng nterval wll result n hgher gans for one content type over the others. Ths study has drect mplcaton on the evcton polcy of the cache. From Fgure 5, we fnd that except for vdeo, all other content types show ncreasng Revst_% for longer cachng ntervals. Cachng of Audo contents for longer duraton wll have the largest beneft towards Revst_%. Except for vdeo & applcaton, the Revsted_DV_% of other content types ncreases for longer cachng ntervals. The largest beneft comes for mage content types. Thus, we recommend to confgure the proxy server wth longer evcton tme (days) for mage and shorter tme for everythng else for great cacheablty benefts. These ncreased Revst_% and Revst_DV_% wth longer cachng ntervals normally come at a cost of (a) larger storage space and (b) addtonal revaldatons. However, we found that the Revaldaton% remans almost the same for the dfferent cachng ntervals. The reason for ths behavor can be explaned from the max-age cumulatve dstrbuton functon (cdf) (see Fgure 6). There s very lttle change n the max-age between 12 and 72 hours (except for vdeo) and ths leads to lttle change n the Revaldaton% over the dfferent cachng ntervals that we consdered. It s nterestng to note that for the dfferent content type, most max-age values are <3 Hrs or >24 hrs wth almost none between these two values. The cost n terms of revaldaton for longer duraton of cachng s nsgnfcant when compared to the benefts. However, there s stll the cost assocated wth large storage and retreval. Snce vdeo and applcaton contents tend to be bulky, storng them for longer duraton may not provde good returns. Ther evcton nterval should be kept shorter. Even though audo contents show ncreasng trend n Revst_%, owng to ther hgh Revaldaton% (~9 for all the ntervals consdered), ther evcton nterval should be kept shorter. Image contents wll beneft sgnfcantly from longer duraton cachng at a very mnmal cost. Snce ther sze s also small, cachng them for longer duraton would be very benefcal. CDF (%) 1.2 1.8.6.4.2 CDF of Max-Age per Content Type applcaton audo mage text vdeo Fgure 7 Percentage of Requests and Data Volume that are cacheable for Dfferent Hosts C. Cacheablty of dfferent hosts Here, we dentfed a few popular hosts and repeated the same type of analyss as done for the content types. The hosts consdered are Amazon, Facebook, Google, Netflx, news, weather and Youtube. news s a general term that we use for a group of news channels - ABC, FOX, CNN, MSN and ESPN. weather s a general term for any host wth weather n ts name (e.g. weather, weatherbug, accuweather). Fgure 7 shows the avalablty of the dfferent hosts for cachng. Amazon, Facebook, Netflx and News shows ~ 5 of the requests to be cacheable whle Google and Youtube show very low ~1 of the requests to be so. The data volume cacheablty s >7 for Amazon, Facebook, Netflx, Youtube and others but qute low for news and weather. Fgure 8 shows the proporton of cacheable requests and data volume for these dfferent hosts. Netflx and Youtube have large data volume (predomnantly vdeos) but very low requests (a few requests for bg fles). If they exhbt good Revsted_DV_%, then cachng them wll certanly result n great bandwdth savngs (assumng that the proxy server can deal wth the excessve storage requred). Facebook, on the other hand has many requests but small data volume. Stll, excellent user perceved latency can be acheved by cachng these contents provded ther Revst_% s hgh. Google and News requests are the next most populace requests but also show low data volume (when compared to Netflx and Youtube). Weather data s very small n DV and requests, meanng the lkely gans from cachng wll not be hgh n the absolute sense. Max-Age (Tme) Fgure 6. Cumulatve dstrbuton functon (cdf) of the max-age of dfferent content types

Proporton of Cacheable Data Volume for Dfferent Hosts (3/1/212) Percentage of Revsts and Revsted Data Volume for Dfferent Hosts (3/1/212) weather facebook amazon google youtube 19% others 37% netflx 44% Revst and Revst DV % 1 9 8 7 6 5 4 3 2 1 Revst_% Revsted_DV_% amazon facebook google netflx news others weather youtube Hosts weather others 65% youtube amazon 1% news Proporton of Cacheable Requests for Dfferent Hosts (3/1/212) facebook 24% google 5% netflx 1% news 4% Fgure 8. Dstrbuton of cacheable data volume (top) and cacheable requests (bottom) amongst popular hosts Fgure 9 shows the Revsted_DV_% and Revst_% of the hosts consdered. Netflx has the smallest Revsted_DV_% and moderately hgh Revst_%. Small objects from Netflx are lkely to be vsted often (such as cons and button fgures). There may be a lttle beneft cachng only the small objects of Netflx. The contents of Youtube are not revsted often as compared wth other hosts and the Revsted_DV_% for Youtube s also very small. There wll be bandwdth savng n the absolute sense owng to ther bulky nature. Weather s contents have excellent Revsted_DV_% and Revst_% values; but ther small sze, low number of requests and hgh revaldaton % makes them unattractve for cachng. The cons on the weather pages are mostly what are cacheable. News contents have excellent Revsted_DV_% and the hghest Revst_% values. Earler, we saw that ther sze and number of requests are consderably good. They seem to be attractve canddates for cachng, wth the only caveat that 36% of revsts need revaldaton. Revaldaton % 8 7 6 5 4 3 2 1 Percentage of Revsts that Requre Revaldatuon for Dfferent Hosts (3/1/212) amazon facebook google netflx news others weather youtube Hosts Fgure 9. Revsted_DV_% and Revst_% (top) and Revaldaton% (bottom) for dfferent hosts. Google and Facebook qualfy as the next best canddates for cachng due to the fact that ther sze, number, Revst_%, Revsted_DV_% and Revaldaton% are all sgnfcantly very good. Thus, t should be noted that f the am of the proxy server s to reduce user perceved latency, then confgure t to cache news, Facebook, and Google contents. If the am s to save on bandwdth, then confgure t to cache Youtube. Fgure 1 shows the Revst_% and Revsted_DV_% as a functon of cachng nterval for dfferent hosts. There s defntely a gan n terms of addtonal revsts as the nterval of cachng ncreases. We also found that Revaldaton% (not shown here) remans almost the same over the cachng ntervals consdered for all hosts except Netflx. The ncrease n Revsted_DV_% for longer cachng nterval s more than the ncrease n Revst_% for Google, news and weather, whch mples that the probablty of larger contents gettng revsted long after they are stored for the frst tme n cache s hgher than that of smaller contents. Cachng them for longer duraton beneft, more so for Google whose Revaldaton% s neglgble. Amazon and Facebook show an ncrease n both Revst_% and Revsted_DV_%. Longer cachng ntervals for Facebook n partcular should result n great benefts as ts Revaldaton% s almost zero for all the 3 ntervals consdered.

Revst % 1 9 8 7 6 5 4 3 2 1 Revst_% vs Cachng Interval for Dfferent Hosts 12 Hours 24 Hours 72 Hours amazon facebook google netflx news weather youtube Hosts Revsted DV % vs Cachng Interval for Dfferent Hosts Now let us focus brefly on the max-age behavor to understand the behavor of Revaldaton% for longer ntervals. Fgure 11 shows the cumulatve dstrbuton functon (CDF) of the max-age for dfferent host types. Facebook, Google and Youtube have a huge proporton of contents wth very long max-age, meanng longer cachng nterval wll not result n ncreasng Revaldaton%. The behavor s almost the nverse for weather and news contents. Note that the proporton of contents wth max-age between 12Hrs and 72 Hrs s small, whch explans why we saw no change n Revaldaton%. Netflx s nterestng as t s the only one that shows a range of max-ages between and 24 Hrs. Ths explans why the Revaldaton% ncreases for longer cachng ntervals. Revst Data Volume % 7 6 5 4 3 2 1 amazon facebook google netflx news weather youtube Hosts 12 Hours 24 Hours 72 Hours Fgure 1. Revst_% (top) and Revsted_DV_% (bottom) as a functon of cachng nterval for dfferent hosts Youtube s Revst_% ncreases slghtly for longer cachng ntervals and ts Revsted_DV_% remans almost the same. The returns of cachng for longer duraton are very small and snce they are very bulky, t s better to cache them for shorter duraton. Netflx shows an ncrease n Revst_% and Revaldaton% for longer duraton cachng. As stated earler, small objects from Netflx may be cached and should be confgured wth short evcton tme. Thus we recommend that the proxy server be confgured wth a longer (several days) evcton tme for Facebook and larger contents of Google, news and weather and shorter evcton tme for everythng else. CDF (%) 1.2 1.8.6.4.2 CDF of Max-Age per Host Type all_cacheable amazon facebook google netflx news weather youtube Max-Age (Tme) Fgure 11. CDF of the max-age for cacheable content broken down content type (top) and by Hosts (bottom) VI. CONCLUSION In ths paper, we amed to dentfy those contents whch when cached at the locaton of SGW n an LTE network, would result n hgh cacheablty benefts. We presented results on avalablty (to cache) and revst rate of HTTP data based on real measurements n LTE networks. We found that 73% of the data volume and around 3 of the responses are cacheable. Wthn the cacheable category, around 9% of the data volume and > 54% of the request / responses are revsted. We dentfed the contents for selectve cachng based on the content type and host attrbutes. Amongst the dfferent content types, applcaton and vdeo consttute the bulk of the cacheable data volume whereas mages consttute the bulk of the cacheable URL requests. Snce mage content have a very hgh Revsted_DV_% and Revst_% values, cachng them wll result n extremely good returns n terms of mprovng user perceved latency. Ther revaldaton overhead s also very small owng to ther long max-age duraton. Because of ths, ther cachng nterval can be chosen much longer than others. Text contents exhbt excellent Revsted_DV_% and Revst_% values, but the absolute gan wll be small snce ther sze and number s smaller compared to other types. Vdeo contents exhbt around 1 Revsted_DV_% and 2 Revst_% values. Cachng them wll result n sgnfcant bandwdth savngs at a hgh storage overhead. In summary, confgurng the proxy server to cache mage contents wth longer evcton tme wll yeld excellent benefts. Amongst the popular hosts consdered, Netflx and Youtube consttute the bulk of the cacheable data volume where as Facebook and Google consttute the bulk of the cacheable URL requests. Cachng Netflx wll not result n sgnfcant savngs n bandwdth whereas cachng Youtube wll result n some bandwdth savngs at a hgh storage overhead. Cachng news and weather contents wll result n excellent user perceved latency wth some compromse n terms of revaldaton overhead. Cachng Facebook and Google would mprove the user perceved latency. Choosng a larger cachng nterval for Facebook and a smaller cachng nterval for Google wll provde best returns. In summary, confgurng the proxy server to cache Facebook and Google contents wth longer evcton tme wll yeld excellent benefts. In our further study, a cost functon nvolvng revst rate, revaldaton rate, content length dstrbuton (whch has drect mpact on the storage requrement) and number of requests that need to be cached (whch has drect mpact on the lookup tme)

on a per-attrbute bass wll be utlzed to precsely arrve at a decson polcy for cachng and evcton. Addtonally, benefts due to pre-fetchng at the SGW locaton wll be evaluated. Cachng and pre-fetchng at the enb, whch s located much closer to the user than the SGW wll also be nvestgated. ACKNOWLEDGMENT The authors would lke to thank ther chef customer contact Kerry I. for provdng all the facltes needed for the data capture and also for the valuable dscussons. REFERENCES [1] W.Al, S.M.Shamsuddn, and A.S.Ismal, A Survey of Web Cachng and Prefetchng, Int. J. Advance. Soft Comput. Appl., Vol. 3, No. 1, March 211 [2] B.Krshnamurthy and J.Rexford,Web Protocols and Practce: HTTP/1.1, Networkng Protocols, Cachng, and Traffc Measurement, May 14, 21 ISBN-1: 2171889 [3] F. Malandrno, C. Casett, and C. Chassern, Content Dscovery and Cachng n Moble Networks wth Infrastructure, IEEE Trans. on Computers, 61, 212, pp. 157-152. [4] Bo Yang; Mareboyana, M.;, Cachng for Locaton-Aware Image Queres n Moble Networks, IEEE Internatonal Symposum on Multmeda, 5, 211, pp. 41-415. [5] Ch-Hung Ch, Ln Lu, LuWe Zhang, Quanttatve Analyss on the Cacheablty Factors of Web Objects, COMPSAC 26. [6] B. Ager, F. Schneder, J. Km, A. Feldmann, Revstng Cacheablty n Tmes of User Generated Content, IEEE Infocom 21. [7] Arvdsson, A. Mhaly. L. Westberg, Optmse Local Cachng n Cellular Moble Networks, Computer Networks 55, 211, pp. 411-4111. [8] F.Qan, K.S.Quah, J.Huang, J.Erman, A.Gerber, Z.M.Mao, S.Sen, and O.Spatscheck, Web Cachng on Smartphones: Ideal vs. Realty, MobSys 212, Low Wood Bay, Lake Dstrct, UK [9] J. Erman, A. Gerber, M. Hajaghay, D. Pe, S. Sen and O. Spatscheck, To Cache or Not To Cache: The 3G Case, IEEE Internet Computng, Mar./Apr. 211, pp. 27-34. [1] http://wk.squd-cache.org/squdfaq/innerworkngs [11] A.Wolman, G. M. Voelker, N. Sharma, N. Cardwell, A. Karln, and H. M. Levy., "On the Scale and Performance of Cooperatve Web Proxy Cachng", Proceedngs of the Seventeenth ACM Symposum on Operatng Systems Prncples, Kawah Island, SC, December 1999. [12] A.Feldmann, R. Cáceres, F. Dougls, G. Glass, and M. Rabnovch, "Performance of Web Proxy Cachng n Heterogeneous Bandwdth Envronments", Proceedngs of IEEE Infocom'99, March, 1999, pp. 17-116.