Mobile network big data for urban and transporta4on planning in Colombo, Sri Lanka

Size: px
Start display at page:

Download "Mobile network big data for urban and transporta4on planning in Colombo, Sri Lanka"


1 Mobile network big data for urban and transporta4on planning in Colombo, Sri Lanka Rohan Samarajiva & Sriganesh Lokanathan Data for Policy conference, 17 June 2015 This work was carried out with the aid of a grant from the InternaFonal Development Research Centre, Canada and the Department for InternaFonal Development UK..

2 Smart cifes The new buzz phrase IBM has been promofng smart cifes and big data since 2000s But two visions New smart cities created on green fields, like South Korea s Songdo, OR Improving the functioning of existing cities 2

3 Can we have smart cifes on the cheap? TransacFon- generated data ( data exhaust ) Ubiquitous mobiles make every cifzen a mobile sensor: cifzens are who we aim to serve If prepaid travel cards such as Octopus & Oyster in place and public transport is popular, they too can serve as sensors But in their absence, MNBD are only opfon Open- source analyfcs; cheap hardware 3

4 Data used in the research MulFple mobile operators in Sri Lanka have provided Call Detail Records (CDRs) Records of calls SMS Internet access AirFme recharge records No Visitor LocaFon Register (VLR) data Data sets do not include any Personally IdenFfiable InformaFon All phone numbers are pseudonymized LIRNEasia does not maintain any mappings of idenffiers to original phone numbers Cover 50-60% of users; very high coverage in Western (where Colombo the capital city in located) & Northern (most affected by civil conflict) Provinces, based on correlafon with census data 4


6 Popula4on density changes in Colombo region: weekday/ weekend Pictures depict the change in popula4on density at a par4cular 4me rela4ve to midnight Sunday Weekday 6

7 Our findings closely match results from expensive & infrequent transportafon surveys 7


9 Methodology Based on extracted average diurnal mobility paeern for populafon, choose Fme frames for home and work Home Fme: 2100 to 0500 Work Fme: 1000 to 1500 Calculate a home and work locafon for each SIM: Match cell towers to Divisional Secretariat Division (DSD) Count each DSD at most once per day Pick the DSD with the largest number of hits For work consider only weekdays that are not public holidays 9

10 47% of Colombo city s dayfme populafon comes from the surrounding regions Colombo City is made up of Colombo and Thimbirigasyaya DSDs Home DSD Colombo city %age of Colombo s day4me popula4on Maharagama Kolonnawa Kaduwela Sri Jayawardanapura Koee Dehiwala Kesbewa Waeala Kelaniya Ratmalana Moratuwa

11 ImplicaFons for public policy Urban Planning Current municipal boundaries are obsolete; those from outside city limits cause costs but do not contribute adequate revenues; our data suggest opfons for logical boundaries of metro regions Transporta4on Policy High volume transport corridors suitable for provision of mass transit Kaduwela DSD (now served by AB 10 & A0) (3) already idenffied High Level Road (now served by A4 & rail line) to Maharagama DSD (1) 11


13 Hourly loading of base stafons reveals disfnct paeerns Type X:? Type Y:? We can use this insight to group base stafons into different groups, using unsupervised machine learning techniques 13

14 Understanding land use characterisfcs: methodology The Fme series of users connected at a base stafon contains variafons, that can be grouped by similar characterisfcs A month of data is collapsed into an indicafve week (Sunday to Saturday), with the Fme series normalized by the z- score Principal Component Analysis (PCA) is used to idenffy the discriminant paeerns from noisy Fme series data Each base stafon s paeern is filtered into 15 principal components (covering 95% of the data for that base stafon) Using the 15 principal components, we cluster all the base stafons into 3 clusters in an unsupervised manner using k- means algorithm 14

15 Three spafal clusters in Colombo District Cluster- 1 exhibits pawerns consistent with commercial area Cluster- 3 exhibits pawerns consistent with residen4al area Cluster- 2 exhibits pawerns more consistent with mixed- use 15

16 Our results show Central Business District (CBD) in Colombo city has expanded 16

17 Plans and reality 1985 Zoning Plan 2020 Zoning Plan 2013 MNBD analysis 17

18 ImplicaFons for urban policy Almost real- Fme monitoring of urban land use We are currently working on understanding temporal variafons in zone characterisfcs (especially the mixed- use areas) Can dispense with surveys & align master plan to reality LIRNEasia is working to unpack the idenffied categories further, e.g., Entertainment zones that show evening acfvity 18


20 Purpose Reduce transacfon costs of releasing mobile network big data (MNBD) to third parfes for public and commercial purposes First step in a process that will hopefully lead to the adop4on of a voluntary code of conduct by the region s mobile network operators (MNOs) that will be the most effec4ve in minimizing possible harms 20

21 Method PotenFal harms have been idenffied through the literature (Annex 1) and engagement with ongoing analysis of MNBD at LIRNEasia Anchored on my work on u4lity transac4on- generated data since

22 Privacy and other harms, from the ground up Guidelines address harms that have emerged in society and recognized as worthy of remedy in the Common Law and not on abstract principles. Solove (2008: 174) argues that privacy as an abstract concept is difficult to pin down, since it involves a cluster of protecfons against a group of different but related problems. He idenffies 16 privacy problems, grouped into four general types: InformaFon collecfon; InformaFon processing; disseminafon; and invasion 22

23 Considered harms Privacy (9 out 16 recognized in the Common Law in mulfple countries) Surveillance AggregaFon IdenFficaFon, individual and group Insecurity Secondary use Exclusion Breach of confidenfality Disclosure Increased accessibility AnF- compeffve effects MarginalizaFon 23