Mobile App Recommendations with Security and Privacy Awareness



Similar documents
The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Alternative Way to Measure Private Equity Performance

Forecasting the Direction and Strength of Stock Market Movement

What is Candidate Sampling

Multiple-Period Attribution: Residuals and Compounding

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

DEFINING %COMPLETE IN MICROSOFT PROJECT

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

PEER REVIEWER RECOMMENDATION IN ONLINE SOCIAL LEARNING CONTEXT: INTEGRATING INFORMATION OF LEARNERS AND SUBMISSIONS

Semantic Link Analysis for Finding Answer Experts *

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

iavenue iavenue i i i iavenue iavenue iavenue

Efficient Project Portfolio as a tool for Enterprise Risk Management

Recurrence. 1 Definitions and main statements

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

8 Algorithm for Binary Searching in Trees

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

How To Understand The Results Of The German Meris Cloud And Water Vapour Product

A system for real-time calculation and monitoring of energy performance and carbon emissions of RET systems and buildings

Daily Mood Assessment based on Mobile Phone Sensing

A Secure Password-Authenticated Key Agreement Using Smart Cards

Probabilistic Latent Semantic User Segmentation for Behavioral Targeted Advertising*

The OC Curve of Attribute Acceptance Plans

How To Calculate The Accountng Perod Of Nequalty

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Support Vector Machines

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

Project Networks With Mixed-Time Constraints

The Greedy Method. Introduction. 0/1 Knapsack Problem

M-applications Development using High Performance Project Management Techniques

Cloud-based Social Application Deployment using Local Processing and Global Distribution

J. Parallel Distrib. Comput.

When Network Effect Meets Congestion Effect: Leveraging Social Services for Wireless Services

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Traffic-light a stress test for life insurance provisions

The impact of hard discount control mechanism on the discount volatility of UK closed-end funds

Traffic State Estimation in the Traffic Management Center of Berlin

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

Fixed income risk attribution

Stochastic Protocol Modeling for Anomaly Based Network Intrusion Detection

Overview of monitoring and evaluation

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

Return decomposing of absolute-performance multi-asset class portfolios. Working Paper - Nummer: 16

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

Web Object Indexing Using Domain Knowledge *

Canon NTSC Help Desk Documentation

EVALUATING THE PERCEIVED QUALITY OF INFRASTRUCTURE-LESS VOIP. Kun-chan Lan and Tsung-hsun Wu

Credit Limit Optimization (CLO) for Credit Cards

A Load-Balancing Algorithm for Cluster-based Multi-core Web Servers

L10: Linear discriminants analysis

Enterprise Master Patient Index

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

SPECIALIZED DAY TRADING - A NEW VIEW ON AN OLD GAME

On File Delay Minimization for Content Uploading to Media Cloud via Collaborative Wireless Network

Mining Multiple Large Data Sources

A DYNAMIC CRASHING METHOD FOR PROJECT MANAGEMENT USING SIMULATION-BASED OPTIMIZATION. Michael E. Kuhl Radhamés A. Tolentino-Peña

A Dynamic Energy-Efficiency Mechanism for Data Center Networks

Proactive Secret Sharing Or: How to Cope With Perpetual Leakage

How To Detect An Traffc From A Network With A Network Onlne Onlnet

A neuro-fuzzy collaborative filtering approach for Web recommendation. G. Castellano, A. M. Fanelli, and M. A. Torsello *

Improved SVM in Cloud Computing Information Mining

Calculation of Sampling Weights

A New Task Scheduling Algorithm Based on Improved Genetic Algorithm

Sketching Sampled Data Streams

Optimization Model of Reliable Data Storage in Cloud Environment Using Genetic Algorithm

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A heuristic task deployment approach for load balancing

Performance Analysis and Coding Strategy of ECOC SVMs

Towards a Global Online Reputation

Genetic Algorithm Based Optimization Model for Reliable Data Storage in Cloud Environment

Statistical Methods to Develop Rating Models

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

Dynamic Pricing for Smart Grid with Reinforcement Learning

Effective Network Defense Strategies against Malicious Attacks with Various Defense Mechanisms under Quality of Service Constraints

Introduction CONTENT. - Whitepaper -

For example, you might want to capture security group membership changes. A quick web search may lead you to the 632 event.

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications

Fuzzy TOPSIS Method in the Selection of Investment Boards by Incorporating Operational Risks

Transcription:

Moble App Recommendatons wth Securty and Prvacy Awareness Hengshu Zhu 1 Hu Xong 2 Yong Ge 3 Enhong Chen 1 1 Unversty of Scence and Technology of Chna, 2 Rutgers Unversty, 3 UNC Charlotte zhs@mal.ustc.edu.cn, hxong@rutgers.edu, yong.ge@uncc.edu, cheneh@ustc.edu.cn ABSTRACT Wth the rapd prevalence of smart moble devces, the number of moble Apps avalable has exploded over the past few years. To facltate the choce of moble Apps, exstng moble App recommender systems typcally recommend popular moble Apps to moble users. However, moble Apps are hghly vared and often poorly understood, partcularly for ther actvtes and functons related to prvacy and securty. Therefore, more and more moble users are reluctant to adopt moble Apps due to the rsk of prvacy nvason and other securty concerns. To fll ths crucal vod, n ths paper, we propose to develop a moble App recommender system wth prvacy and securty awareness. The desgn goal s to equp the recommender system wth the functonalty whch allows to automatcally detect and evaluate the securty rsk of moble Apps. Then, the recommender system can provde App recommendatons by consderng both the Apps popularty and the users securty preferences. Specfcally, a moble App can lead to securty rsk because nsecure data access permssons have been mplemented n ths App. Therefore, we frst develop the technques to automatcally detect the potental securty rsk for each moble App by explotng the requested permssons. Then, we propose a flexble approach based on modern portfolo theory for recommendng Apps by strkng a balance between the Apps popularty and the users securty concerns, and buld an App hash tree to effcently recommend Apps. Fnally, we evaluate our approach wth extensve experments on a largescale data set collected from Google Play. The expermental results clearly valdate the effectveness of our approach. Categores and Subject Descrptors H.2.8.d [Informaton Technology and Systems]: Database Applcatons - Data Mnng Keywords Moble Apps, Recommender Systems, Securty and Prvacy Correspondng Author. Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. Copyrghts for components of ths work owned by others than ACM must be honored. Abstractng wth credt s permtted. To copy otherwse, or republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. Request permssons from permssons@acm.org. KDD 14, August 24 27, 2014, New York, NY, USA. Copyrght 2014 ACM 978-1-4503-2956-9/14/08...$15.00. http://dx.do.org/10.1145/2623330.2623705. 1. INTRODUCTION Recent years have wtnessed the rapd and ncreased prevalence of smart moble devces, such as smart phones, a huge number of moble Apps have been developed for moble users. For example, as of the end of July 2013, the Google Play has had over 1 mllon Apps and there have been over 50 bllon cumulatve downloads, and these numbers are stll growng dramatcally. Due to the prosperng moble App ndustry, the functonaltes of smart devces have been ntensely extended to meet dversfed user needs. However, moble Apps are hghly vared and often poorly understood, partcularly for ther actvtes and functons related to prvacy and securty. Indeed, to mprove user experences, more and more advanced moble Apps are commtted to provde ntellgent and personalzed servces for users, such as locaton based servces and socal sharng servces. These servces usually nvolve access permssons of users personal data, such as real-tme locatons and the contact lsts. However, such ntellgent moble Apps may result n the potental securty and prvacy rsks for users. For nstance, users may not expect ther locatons (e.g., home locatons, workplaces) and other prvacy nformaton (e.g., contact lsts, SMS records) to be sped by the thrd party Apps. In fact, as reported by NBC News 1, consumers have grown so concerned about prvacy on ther moble phones. Many consumers have avoded downloadng some moble Apps, and many others have removed Apps whch may have access to ther personal data. Also, a recent survey from IDG News 2 reveals that 54% of U.S. moble App users surveyed have decded not to nstall an App when they dscovered how much personal nformaton t would collect, and 30% of App users have unnstalled an App after learnng about the personal nformaton t collected. Therefore, the development of a moble App recommender system wth securty and prvacy awareness becomes crtcal for the healthy development of the moble App ndustry. In the lterature, there are recent studes about securty and prvacy ssues of moble Apps, and moble App recommendatons. For example, some works are focused on malware code detecton [6, 13], the securty mddleware development [7, 20], and the App access permsson model development [5, 8]. However, these works ether need to analyze the source code of each moble App, or detect the system API calls durng the App runnng. Indeed, these approaches are very hard to be mplemented n practce, snce t s not a trval task to effcently and accurately detect the malware 1 http://www.nbcnews.com/ 2 http://www.dg.com/

Fgure 1: A demo system of moble App recommendatons wth securty and prvacy awareness. codes for each moble App and users often do not want some securty software to frequently scan ther devces. Meanwhle, n the area of moble App recommendaton, some works studed the personalzed App recommendaton methods [17], the ntellgent moble App recommendatons by explotng enrched contextual nformaton [10, 21], and the problem of App rankng fraud detecton [22]. However, all these works only consder user preferences about the Apps popularty (e.g., ratngs, downloads), but not the securty and prvacy rsks nherent n the moble Apps. To ths end, n ths paper, we propose to develop a moble App recommender system wth securty and prvacy awareness. The desgn goal s to equp the recommender system wth the ablty to automatcally detect and evaluate the securty and prvacy rsks of moble Apps. Also, when applyng ths recommender system for App recommendatons, t should be able to strke a balance between the Apps popularty and the users securty preferences. Fgure 1 shows the nterface of our demo system for moble App recommendatons wth securty and prvacy awareness. In ths system, users can select dfferent evaluaton metrcs, such as Popularty, Securty, and Hybrd, to obtan App recommendatons wth respect to ther preferred securty levels. Whle we do not am at developng personalzed App recommender systems because the ndvdual download statstcs and App usage data are often not publcly avalable, our non-personalzed App recommendatons by consderng both popularty and securty are very mportant for moble App servces. For nstance, both Apple and Google provde non-personalzed top pad/free App recommendatons based on the popularty nformaton (e.g., overall download and ratng) every day. However, they do not explore and consder the securty preferences n ther recommended top charts. Indeed, the developed system wll be benefcal for the healthy development of the moble App ndustry. However, there are two crtcal challenge for developng an App recommender system wth securty and prvacy awareness. Specfcally, the frst challenge s how to effectvely dentfy the securty rsks of moble Apps from the largescale moble App data. The second challenge s how to strke a balance between the Apps popularty and the users concerns about securty and prvacy. Indeed, our careful observaton reveals that the potental securty rsks of moble Apps are essentally caused by the data access permssons of each App, such as permssons requested for accessng real-tme locatons. Therefore, n ths paper, we frst propose to explot the requested permssons for detectng the potental securty rsk of each moble App. The proposed approach s based on random walk regularzaton wth an App-permsson bpartte graph, whch can learn the securty rsk of moble Apps automatcally wthout relyng on any predefned rsk functon. Furthermore, based on the modern portfolo theory [16], we develop a flexble optmzaton approach for recommendng Apps by consderng both Apps popularty and users concerns about securty and prvacy. Partcularly, there are often many dfferent securty preferences of moble users, and a huge number of Apps as canddates for recommendatons. To enhance the performances of onlne App recommendatons, we buld an App hash tree to effcently look up Apps. Fnally, we evaluate our moble App recommendaton approach wth extensve experments on a large-scale real-world data set collected from Google Play, whch contans 170,753 moble Apps. The expermental results clearly valdate the effectveness and effcency of our approach n terms of dfferent evaluaton metrcs. 2. PROBLEM FORMULATION In ths secton, we frst ntroduce some prelmnares about the securty/prvacy problems of moble Apps, and then ntroduce the framework of the proposed moble App recommender system wth securty and prvacy awareness. Table 1: Examples of data access permssons. Type Permsson ID Descrpton Strng ACCESS_FINE_LOCATION Allows an applcaton to access fne (e.g., GPS) locaton. Strng READ_CONTACTS Allows an applcaton to read the user s contacts data. Strng READ_SMS Allows an applcaton to read the user s SMS messages. Strng READ_CALENDAR Allows an applcaton to read the user s calendar data. Strng READ_CALL_LOG Allows an applcaton to read the user s call log. 2.1 Prelmnares The most advanced moble operatng systems, such as Apple IOS, Google Androd, and Mcrosoft Wndows Phone, mplement a sandbox whch provdes the securty and prvacy polcy for the thrd-party moble Apps. To be specfc, these operatng systems solate Apps from each other and the resources, thus feature a permsson system [7]. To access the personal data n users moble devces, the permsson system wll convey users to grant correspondng data access permssons explctly (e.g., IOS) or mplctly (e.g., Androd) for each moble App. Actually, these data access permssons may enter some senstve resources n moble users personal data, such as ther locatons or contact lsts. For nstance, Table 1 llustrates some examples of data access permssons n the Androd system [1]. We can see that all these lsted permssons contan potental securty rsks. For example, an App, whch requests READ_CALENDAR and READ_SMS permssons, may access users personal calendar and short messages. Ths may not be comfortable for a busness man due to the rsks of leakng confdental nformaton. Indeed, all these data access permssons can be categorzed nto dfferent levels wth respect to ther potental securty rsks. For example, as defned by Androd Developers [1], there are three dfferent threat levels for managng data access permssons,

Popularty Securty Permssons: Ths applcaton has access to the followng: Your personal nformaton Read calendar events plus confdental nformaton (READ_CALENDAR ) Allows the App to read all calendar events stored on your tablet, ncludng those of frends or coworkers. Malcous Apps may extract personal nformaton from these calendars wthout the owners' knowledge. Allows the App to read all calendar events stored on your phone, ncludng those of frends or coworkers. Malcous Apps may extract personal nformaton from these calendars wthout the owners' knowledge. Phone calls Read phone state and dentty (READ_PHONE_STATE) Allows the App to access the phone features of the devce. An App wth ths permsson can determne the phone number and seral number of ths phone, whether a call s actve, the number that call s connected to and the lke. Storage Modfy/delete USB storage contents modfy/delete SD card contents (WRITE_EXTERNAL_STORAGE) Allows the App to wrte to the USB storage. Allows the App to wrte to the SD card. Fgure 2: A motvatng example. Normal permssons gve an App access to solated App level features, wth the mnmal rsk to other applcatons, the system, or the user access (e.g., the permsson to set screen wallpaper). Dangerous permssons gve an App access to prvate user data or control over the devce, wth a potental rsk that can negatvely mpact the user (e.g., the permsson to have the user s current locaton). Sgnature/System permssons gve an App access to the dangerous prvleges, whch need system sgnature certfcatons such as the ablty to control the system process (e.g., the permsson to delete Apps). To provde better servces to users and gan more downloads of Apps, moble App developers try to request more and more data access permssons, whch can help to mplement the ntellgent applcatons, such as socal sharng servces. However, these servces may result n potental securty and prvacy rsks. For example, Fgure 2 shows an example of a moble App n the Androd market, whch contans both popularty and securty nformaton. In ths fgure, we can observe that ths App may request the permsson of readng the users calender (.e., READ_CALENDAR), readng phone states (.e., READ_PHONE_STATE) and external USB/SD card storage (.e., WRITE_EXTERNAL_STORAGE). Although ths s a qute popular App accordng to user ratngs and the download nformaton, t may stll contan the potental rsk of leakng user nformaton. For nstance, f ths App s controlled by a Trojan, t could gather users calender nformaton and phone numbers, then upload the nformaton nto external USB dsk or SD card (when connected) va the above permssons. However, to the best of our knowledge, ths knd of securty rsks s not taken nto account n most exstng moble App recommender systems. Indeed, they only focus on the Apps popularty nformaton (e.g., user ratngs). Thus, we am on developng a moble App recommender system wth securty and prvacy awareness. 2.2 The Recommendaton Framework Here, we frst formally defne the problem of moble App recommendatons wth securty and prvacy awareness, and then show the recommendaton framework. Defnton 1 (Problem Statement). Gven a category label c, and a set of Apps A = {a}, each of whch contans a set of data access permssons {p }, profle nformaton (e.g., category, popularty), the goal of moble App recommendaton wth securty and prvacy awareness s to buld an optmal ranked lst of Apps n category c based on both the Apps popularty and users securty preferences. Indeed, the above problem statement rases two ssues: How to mne the securty rsks of Apps and produce a ranked lst Λ (Rsk) = {a a c} accordng to ther rsk scores Rsk(a), where a s ranked hgher than a f and only f Rsk(a) > Rsk(a ). How to combne the rsk based ranked lst Λ (Rsk) wth the popularty based ranked lst Λ (P op) to produce fnal rankng so as to meet varous expectatons of users, who have dfferent securty and prvacy concerns. Whle t s appealng to provde moble App recommendatons wth securty and prvacy awareness, t s a non-trval task to effectvely dscover and evaluate the securty rsks of Apps, and produce desrable rankng of Apps by consderng both Apps popularty and users securty preferences. In addton, there are often many dfferent securty preferences of moble users, and a huge number of Apps as canddates for recommendatons. Thus, how to effcently manage Apps for recommendaton s also an open queston. To that end, n ths paper, we propose a novel recommendaton framework to solve these problems. Offlne Learnng Stage App-Permsson Bpartte Graph App Database Estmatng App Rsk Scores Random Walk Regularzaton Buldng App Hash Tree Onlne Recommendaton Stage Moble User Onlne Input App Category Securty Preference Searchng App Hash Tree Portfolo Optmzaton App Recommendaton Fgure 3: The recommendaton framework. Fgure 3 shows the proposed recommendaton framework, whch conssts of two stages. The offlne learnng stage automatcally learns the rsk scores for Apps by leveragng the random walk regularzaton wth an App-permsson bpartte graph, and forms an App hash tree from the App data set for effcently managng Apps. The onlne recommendaton stage matches the gven moble users securty preferences and App categores accordng to the App hash tree, ranks the canddate Apps wth respect to both Apps popularty and users securty preferences by leveragng the modern portfolo theory for recommendatons. 3. ESTIMATING RISK SCORES FOR MO- BILE APPS Generally speakng, the rsk score reflects the securty level of an App. The smaller the score s, the more safe the App s. Accordng to the above dscusson, we can know the securty rsks are essentally caused by the data access permssons of Apps. Thus, an ntutve approach for measurng the rsks of Apps s to drectly check each of the dangerous permssons they request. However, there are many crtcal challenges along ths lne, whch make the problem

Moble Apps a 1 a 2 a 3 a 4 0.2 0.3 0.7 0.5 0.5 0.4 0.8 0.6 Permssons p 1 p 2 p 3 Fgure 4: An example of the bpartte graph. stll under-addressed. Frst, t s hard to explctly defne a rsk functon wth respect to dfferent permssons for evaluatng the potental rsks of moble Apps, snce the permssons are often very ambguous and poorly understood [5, 8]. For example, we observe that although some permssons are dangerous (e.g., locaton related permssons), they are commonly used n the Apps of some categores (e.g., navgaton Apps). Second, the latent relatonshps between Apps and permssons should be taken nto consderaton, snce smlar Apps (permssons) should have smlar rsk scores. Fnally, we should develop a scalable approach to refne rsk scores, snce rch external knowledge can be leveraged for evaluatng potental rsks of Apps. For example, some external rsk reports, the state-of-the-art securty models n relevant domans as well as the pror knowledge from doman experts can be leveraged for mprovng the performance of rankng App rsks. To deal wth the above challenges, n ths paper, we propose a regularzaton approach based on a bpartte graph, whch can learn the securty rsk of moble Apps automatcally wthout relyng on any predefned rsk functon. Partcularly, we develop an App-permsson bpartte graph to buld the connectons between Apps and permssons, whch s defned as follows. Defnton 2 (App-permsson Bpartte Graph). The graph can be denoted as G = {V, E, W }. V = {V a, V p } s the node set, where V a = {a 1,, a M } denotes the set of Apps and V p = {p 1,, p N } denotes the set of permssons. E s the edge set, where e j E exsts f and only f a requests the permsson p j. W s the edge weght set, where each w j W represents the weght of e j and denotes the probablty that a wll request p j. Fgure 4 shows an example of App-permsson bpartte graph. Intutvely, the weght w j can be estmated by the permsson records of all Apps n a s category. Specfcally, we can compute the weght by w j = f j e k E f, (1) k where f j s the number of Apps n category c (a c) requestng permsson p j. Furthermore, we can denote each App a j and permsson p j as vectors a = {w 1,, w N } and p j = {w 1j,, w Mj}, respectvely. Accordngly, we defne the latent smlarty between Apps a and a j by the Cosne dstance, s a j = Cos( a, a a j a j ) = a a j. (2) Smlarly, we defne the latent smlarty between permssons p and p j as s p j = Cos( p, p j ). To estmate App rsk scores wth the App-permsson bpartte graph, we frst defne two scores Rsk(a ) and Rsk(p j ) for node a V a and p j V p, respectvely. Intutvely, Rsk(a ) s the objectve App rsk score and Rsk(p) s the global permsson rsk score. Second, we develop a regularzaton framework by regularzng the smoothness of the above two scores over the bpartte graph. Specfcally, f we denote Rsk(a ) as R a and Rsk(p j ) as R p j, we defne a cost functon as follows, Q(a, p) = λ { 2 R a R a 2 + R p j R p j 2} + (3) j µ { 2 s a j R a Ra j 2 + s p j R p Rp j 2} +,j,j 1 2 R a w j R p j 2, where λ and µ are the regularzaton parameters, Ra and R p j are the pror rsk scores derved from external knowledge. Intutvely, ths cost functon s formed by three parts. The frst part controlled by λ defnes the constrant that the two rsk scores should ft pror knowledge. The second part controlled by µ defnes the global consstency of the refned rsk scores over the graph. Specfcally, t satsfes that, f two Apps (permssons) have hgh latent smlarty, ther rsk scores should be smlar. The thrd part s the smoothness constrant between Apps and permssons, whch guarantees that, f an App has hgh probablty to request a specfc permsson, ther rsk scores should be smlar. Therefore, the problem of estmatng rsk scores s converted to the optmzaton problem of fndng optmal R a and R p j to mnmze the cost functon Q. In ths paper, we explot the classc gradent descent method to solve ths problem. Specfcally, we frst assgn values to R a = 1/M and R p j = 1/N and teratvely update them by settng the followng dfferentated results to zero. Q = λ(r a a R a ) + µ s a j (Ra Ra j ) + w j (R a Rp j ), j j R a = λ R a + µ j sa j Ra j + j w jr p j λ + µ j sa j + j w. (4) j Q = λ(r p j p R p j ) + µ s p j (Rp j Rp ) + w j (R p j Ra ), j R p j = λ R p j + µ sp j Rp + w jr a λ + µ sp j + w. (5) j After each teraton, all the values of R a and Rj a wll be normalzed agan,.e., R a 1 = 1 and R p 1 = 1. Fnally, we can obtan the optmal rsk scores after the results converge. How to assgn pror rsk scores R a and R p j from external knowledge s an open queston. In practce, some ntutve solutons nclude nvtng doman experts for assgnng rsk scores, buldng a securty classfer through external rsk reports, or explotng state-of-the-art securty models n relevant domans. In ths paper, as an attempt, we leverage the probablstc approach PNB (Nave Bayes wth nformaton Prors) proposed n [14] for ths task, whch s based on the scorng scheme, and thus can be drectly adopted by our regularzaton framework. Specfcally, PNB ams to learn a Nave Bayes model wth parameter θ that can best explan the generatve process of permssons,.e., P (p j θ). In ths model, the parameter θ s assumed to follow the Beta pror Beta(θ; α 0, β 0 ), and the probablty can be estmated by M x,j + α 0 P (p j θ) =, (6) M + α 0 + β o,j

where M s the total number of Apps and x,j s a bnary functon whch s equal to 1 (.e., a requests the permsson p j ) or 0 (.e., a does not request the permsson p j ). Partcularly, PNB also defnes three categores of permssons wth respect to ther threat levels (.e., smlar as the prelmnares n Secton 2), and each category has a specfc Beta(θ; α 0, β 0 ) as nformatve prors. Therfore, the rsk scores of permsson p j and App a can be estmated by R p j = ln P (p j θ) and R a = ln P (p 1,, p k θ), where each p k a. Note that, both R a and R p j are normalzed before learnng our regularzaton framework. Although PNB s a straghtforward approach that cannot solve all the challenges mentoned before, ts effectveness on rankng rsks of Apps has been well proved. Therefore, usng PNB as pror knowledge n our regularzaton framework s approprate. 4. RANKING FOR MOBILE APP RECOM- MENDATION Algorthm 1 Automatc Detecton of Securty Levels Input: The set of Apps A = {a }; Parameter δ; Output: The set of securty levels Ψ; 1: Rank A n descendng order accordng to Rsk(a); 2: L = ; 3: for each [1, A ] do 4: A = L {A[]}; 5: calculate CV (A ) n terms of Rsk(a) (a A ); 6: f (CV (A ) > δ) then 7: Ψ = L; L = s a new level; 8: else 9: L = {A[]}; 10: end f 11: end for 12: return Ψ After computng the rsk score for each moble App, we can rank Apps n ascendng order wth respect to ther rsk scores for recommendatons. Moreover, f some Apps have the same rsk scores, they wll be further ranked accordng to popularty scores (e.g., overall ratng). However, for realworld App recommendaton servces, users may have dffcultes to get clear percepton about the rsks of ranked Apps. A promsng way to help users understand the dfferent rsks of Apps s to categorze the rsks nto dscrete levels (e.g., Low, Medum, Hgh). In fact, people often descrbe ther percepton about rsk or securty wth such dscrete levels. Therefore, n ths paper, we further group Apps nto dfferent clusters, each of whch has the same securty level (e.g, Low or Hgh). However, t s not easy to get an accurate and approprate segmentaton of Apps wth respect to ther rsk scores due to the lack of approprate benchmarks. To solve the above problem, we develop a Coeffcent of Varaton (CV) based approach to automatcally segment moble Apps. The man dea of ths approach s that two adjacent Apps n the globally ranked lst are assgned wth dfferent securty levels, f ther rsk scores have dramatc dfferences, whch can be captured by the CV,.e., varance, mean of ther rsk scores. The detaled segmentaton algorthm s shown n Algorthm 1. The parameter δ s a threshold used for determnng the dramatc dfference of CV. After segmentaton, the Apps at lower securty levels have hgher securty rsk. Now, we are able to recommend Apps for users. Specfcally, gven a specfc securty level L and a category c, we can treat all the Apps n category c wth securty L L as canddates. Intutvely, there are two types of rankng prncples for recommendng Apps. Securty Prncple: We frst rank App canddates n ascendng order by ther rsk scores, and Apps have the same scores wll be further ranked by popularty scores (e.g., overall ratng). Popularty Prncple: We frst rank App canddates n descendng order by ther popularty scores (e.g., overall ratng), and Apps have the same popularty scores wll be further ranked by rsk scores. Furthermore, we need to strke a balance between users securty preferences and Apps popularty for recommendatons. To acheve such a balance, we also propose a hybrd prncple for App recommendatons, whch s based on the modern portfolo theory [16]. The portfolo theory s orgnally proposed n the feld of fnance, whch focuses on the nvestment problem of fnancal market. For example, an nvestor often wants to select a portfolo of n stocks wth a fxed nvestment budget, whch wll provde the maxmum future return and the mnmum rsk. In our problem, the stocks can be regarded as Apps, the future return and rsk can be regarded as popularty and securty rsk of Apps. Specfcally, an App portfolo Υ can be represented by a collecton of n Apps wth a correspondng weght w assgned to each App a,.e., Υ = { (a, w } ), s.t. w = 1. (7) Indeed, the weght w n fnance s the percentage of the budget nvested n the -th stock. Accordng to the dscusson n [19], the weght w n our problem ndcates how much attenton the recommender system wants the target user to pay on the App a. Therefore, the weghts can be used to determne the ranks of Apps; that s, Apps should be ranked by the descendng order of ther weghts. Before obtanng the weghts, we frst defne the future return of the App portfolo as E[Υ], whch can be computed by E[Υ] = n w 1, (8) where s the rank of App a n the popularty based ranked lst Λ (P op). Also, we defne the future rsk of the App portfolo as R[Υ], whch can be computed by the followng functon [12], R[Υ] = n (w 2 2 + 2 n j=+1 w w j 1 1 j J j), (9) where s the rank of App a n the rsk based ranked lst Λ (Rsk), and J j s the rsk correlaton between Apps a and a j. Here, we estmate J j accordng to the smlarty of requested permssons. For any two Apps, the more common permssons are requested, the hgher rsk smlarty they have. To ths end, we compute J j usng Jaccard coeffcent between Apps a and a j by, J j = N j N + N j N j, (10) where N s the number of permssons requested by App a, and N j s the number of common permssons requested by two Apps a and a j. In our problem, the objectve s to learn a set of App weghts w for maxmzng the future return and mnmzng

Hash Table App 8 App 9 App k Securty Level Category Level c 1 c 2 l 1 l 2 l 3 Hash Table App 3 App 4 App k Hash Table App 5 App 8 App n Root r c 3 Hash Table App 2 App 3 App Hash Table App 1 App 2 App n App Database a 1 a 2 a 3 a 4 a a n Fgure 5: An example of the App hash tree. the rsk of the App portfolo Υ that conssts of recommendaton canddates (App canddates),.e., arg max w E[Υ] b R[Υ], (11) where b s a specfed rsk preference parameter, whch s defned as the gven securty level L n our experments. The above optmzaton problem can be solved by the effcent fronter based approach ntroduced n [19]. Specfcally, we can obtan the optmal weght w by 1 w E = 1T Σ 1 E E T Σ 1 E Σ 1 1 + 1T Σ 1 1 1 E T Σ 1 1 E Σ 1 E 1T Σ 1 1 1 T Σ 1, (12) E E T Σ 1 1 E T Σ 1 E Fgure 6: The percent of Apps and the average number of requested permssons by each App n dfferent categores. where Σ j = 1 can be computed by 1 j J j, E = ( 1 1,, 1 n ) T, and E E = (xz y2 ) 2 2b(xE y1) T Σ 1 (z1 ye), (13) 2b(xE y1) T Σ 1 (xe y1) where x = 1 T Σ 1 1, y = 1 T Σ 1 E, and z = E T Σ 1 E. After rankng Apps wth respect to three dfferent prncples, the fnal challenge s how to organze and ndex such a large number of Apps wth respect to ther securty levels and categores. Indeed, n an onlne App recommender system, t s necessary to quckly response users requests and effcently manage Apps n ts back-end servers. To ths end, we propose a data structure for App retreval, namely App hash tree. Fgure 5 llustrates an example of an App hash tree, whch contans two herarches, namely a category level and a securty level. For each node n the tree, t holds a hash table to store the ndex of correspondng Apps. For example, the node Root c 1 l 3 may store the ndex of all Apps belong to category c 1 and securty level l 3. Note that the App hash tree can be easly bult wth some basc tree search algorthms (e.g., Breadth-Frst-Search n our experments). Actually, the rankng results of Apps n each node can be computed offlne and pre-stored n the correspondng nodes of the App hash tree. Therefore, durng the onlne recommendaton, the system can quckly look up the ranked lst for recommendatons to users based on ther specfc securty levels and App categores. 5. EXPERIMENTAL RESULTS In ths secton, we emprcally evaluate the Securty and Prvacy aware moble App Recommendaton (SPAR) approach wth a large-scale real-world data set. Fgure 7: The top 25 most used permssons n our data set and the percent of Apps that request those permssons. 5.1 Expermental Data The expermental data were collected from Google Play (Androd Market) [4] n 2012. Ths real-world data set ncludes 170,753 Apps n 30 App categores, and the Apps have 173 unque data access permssons. Partcularly, the data set ncludes more than 25% Apps avalable at the Androd Market, whch totally ncludes 675,000 Apps as of the end of September 2012 [3]. Fgure 6 and Fgure 7 llustrate some statstcs of the data set. Specfcally, Fgure 6 shows the percent of Apps and the average number of requested permssons by each App n dfferent categores. In ths fgure, we can observe that Apps n categores Communcaton, Busness and Socal request more permssons. Fgure 7 shows the top 25 most requested permssons and the percent of Apps that request those permssons. In ths fgure, we can fnd that most of the Apps request the network and locaton related permssons. To further study the relatonshp between permssons and Apps, we show the dstrbutons of the number of Apps wth respect to the number of requested permssons n Fgure 8 (a). We can see that most of the Apps only request few permssons, whch may ndcate that not many Apps have securty rsks. Fgure 8 (b) shows the dstrbuton of the number of Apps wth respect to the number of ther ratngs. We can fnd that the dstrbuton roughly follows the

(a) (b) Fgure 8: The dstrbuton of the number of Apps w.r.t (a) the number of requested access permssons, and (b) the number of ther ratngs. (a) NDCG@K (b) P recson@k 57% 27% 2% 3% 4% 4% 5% 6% 8% 20% (a) All Levels 6% 8% 14% 10% 13% 13% (c) Level 3 Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 App/Entertanment App/Tools App/Personalzaton App/Lfestyle App/Books&Reference App/Productvty Game/Bran&Puzzle App/Musc&Audo Game/Arcade&Acton Others 2% 4% 6% 7% 2% 3% 4% 7% 21% 10% 9% 17% 10% 16% 11% 14% (b) Level 1 11% 19% 12% 15% (d) Level 6 App/Entertanment App/Tools App/Personalzaton App/Travel & Local App/Communcaton App/Lfestyle App/Socal App/Musc&Audo App/Shoppng Others App/Personalzaton App/Entertanment App/Books&Reference Game/Bran&Puzzle App/Tools Game/Casual Game/Arcade&Acton App/Lfestyle App/Sports Fgure 9: The percent of (a) Apps and (b)-(d) App categores at dfferent securty levels. power law. Ths ndcates that only usng App popularty for recommendaton s not enough. 5.2 Evaluaton of App Rsk Scores In ths subsecton, we evaluate the performances of estmatng App rsk scores and segmentng securty levels. 5.2.1 App Securty Levels Specfcally, we set the regularzaton parameters n Equaton 3 as λ = 0.5 and µ = 1, and the settngs of PNB are smlar as [14]. To segment Apps wth respect to ther rsk scores, we emprcally set δ = 0.01 CV (A) n Algorthm 1, where CV (A) s the CV of all App rsk scores. Fgure 9 (a) shows the percent of Apps wth respect to 6 segmented securty levels. We can see that level 6 (.e., most secure) contans most Apps and the App numbers from level 1 to level 4 are relatvely even, whch ndcate most Apps are secure whle only a few Apps have securty rsks. Fgure 9 (b)-(d) show the percent of App categores at securty levels 1, 3, and 6, respectvely. In these fgures, we can fnd that Apps wth more permssons (e.g., Apps n categores Tools, Travel&Local, and Communcaton ) are more lkely to have potental rsks, and vce versa (e.g., Apps n categores Personalzaton and Books&Reference ). Note that, snce categores Entertanment and Personalzaton contan the largest porton of Apps n our data set, they always have hgh percent at all securty levels. Others (c) Recall@K (d) F @K Fgure 10: The performance of each approach w.r.t dfferent metrcs based on user judgment. 5.2.2 Evaluaton of Rankng App Rsk Evaluaton Baselnes. We adopt two state-of-the-art baselnes to evaluate the performances of our SPAR approach n terms of rankng App rsks. To the best of our knowledge, there s only one relevant recent study [14], whch can be drectly leveraged for rankng App rsks. Therefore, we leverage the recommended approach n ths work as the frst baselne. Nave Bayes wth nformaton Prors (PNB) [14] ams to learn a Nave Bayes model wth parameter θ that can best explan the generatve process of permssons,.e., P (p θ). Therefore, the rsk scores of App a can be estmated by R a = ln P (p 1,, p k θ), where each p k a. Partcularly, ths baselne s also used for estmatng the pror rsk scores n our regularzaton framework. Moreover, we also use a popular learnng-to-rank approach as the second baselne for rankng App rsks. RankSVM [9] ams to rank App rsk by the RankSVM model. Specfcally, we manually labeled 200 secure Apps and 200 nsecure Apps accordng to some prevous studes [6, 14, 20] as tranng data. For each App, we used ts category, developer, and permssons as features to learn the rankng model. Evaluaton Metrcs. Specfcally, we set up the evaluaton as follows. Frst, we mplemented our SPAR approach and other baselnes on all the Apps n the data set. For each approach, we selected 100 top ranked moble Apps (.e., most nsecure), and 100 bottom ranked moble Apps (.e., most secure) n the result. Then, we merged all the selected Apps nto a pool whch ncludes 496 unque moble Apps n our data set. For each App, we nvted three users who are famlar wth Androd Apps to manually label these Apps wth score 2 (.e., Insecure), 1 (.e., Not Sure), and 0 (.e., Secure). Each user gave a proper label by comprehensvely consderng ther own experences (.e., they can download and try all these Apps), the App profle and the comments from other users. After user evaluaton, each App a s assgned a judgement score f(a) [0, 6]. Moreover, we computed the Cohen s kappa coeffcent [2] between each par of evaluators to estmate the nter-evaluator agreement. The values of Cohen s kappa coeffcent are between 0.67 to 0.72 n the user evaluaton, whch ndcate the substantal agree-

ment [11]. Fnally, we further ranked the 496 Apps by each approach, and obtaned three ranked lsts of Apps. Thus, we can explot the popular metrc Normalzed Dscounted Cumulatve Gan (NDCG) for determnng the rankng performance of each approach. Specfcally, the dscounted cumulatve gan gven a cut-off rank K can be calculated by DCG@K = K =1 2 Rel(a ) 1 log 2 (1 + ), where Rel(a ) = f(a ) s the relevance score. The NDCG@K s the DCG@K normalzed by the IDCG@K, whch s the DCG@K value of the deal rankng lst of the returned results. In other words, we have NDCG@K = DCG@K IDCG@K. NDCG@K ndcates how well the ranked order of the gven Apps returned by an approach wth a cut-off rank K. A larger NDCG@K value ndcates the better rankng performance. Partcularly, f we treat the 83 commonly agreed nsecure Apps (.e., f(a) = 6) as the ground truth, we can evaluate each approach wth the wdely-used metrcs, namely P recson@k, Recall@K, and F @K. Overall Performances. Fgure 10 shows the results of each approach wth respect to four dfferent evaluaton metrcs. In ths fgure, we can see that SPAR consstently outperforms other baselnes and the mprovement s more sgnfcant for smaller K. These results clearly valdate the effectveness of our regularzaton based approach. Partcularly, the performances of PNB can be refned durng the regularzaton on the bpartte graph. Also, SPAR and PNB outperform RankSVM, whch ndcates the straghtforward learnng-to-rank approach s not enough for estmatng App rsks. Indeed, the performances of learnng-to-rank approaches manly rely on the effectveness of feature extracton. Based on the above observatons, we can argue that SPAR s an approprate approach for estmatng App rsks. Case Study. Ths evaluaton benchmark s based on some pror knowledge from other prevous studes. As reported by Zhou et al [20], there are 13 Apps whch may leak prvate nformaton accordng to the TantDrod system [6]. Here, we select 6 of them (.e., Horoscope, Layar, Trapster, Wertago, Astrd Task and DasTelefonbuch), whch are ncluded n our data set, to evaluate SPAR and other baselnes. Indeed, we study whether each approach can fnd these nsecure Apps wth hgh rsk ranks, snce a good approach should have the capablty of capturng these suspcous Apps. Table 2 shows the top percentage poston of each App n the ranked lst returned by each approach. We can see that SPAR can rank those nsecure Apps nto hgher postons than other baselnes. Specally, all of these sx Apps are categorzed nto low securty levels (.e., L 1 and L 2 ) by the segmentaton approach, whch also valdates the effectveness of our approach. Table 2: The reported nsecure moble Apps. SPAR PNB RankSVM Horoscope 2.64% 5.41% 7.13% Layar 5.34% 7.21% 11.81% Trapster 6.21% 9.34% 12.33% Wertago 2.72% 4.89% 8.37% Astrd Task 8.09% 11.29% 13.32% DasTelefonbuch 6.18% 11.71% 14.38% (a) Level 1 (b) Level 3 (c) Level 5 (d) Level 6 Fgure 11: The recommendaton performances of dfferent rankng prncples. 5.3 Evaluaton of App Recommendaton Here, we evaluate the recommendaton performances of our approach SPAR. Partcularly, we use the average ratng as the popularty score for each App and the parameter b n Equaton 11 equals to the gven securty level n experments. 5.3.1 Recommendaton Performances Snce our App recommender system s non-personalzed, there s no personal data could be used for evaluaton. Also, there s no ground truth for us to evaluate whch recommendaton results really meet users nformaton needs. Thus, n ths paper, we focus on evaluatng our recommendaton approach SPAR by checkng whether t can strke a balance between App popularty and user s securty preferences. Specfcally, there are three dfferent rankng prncples n our recommendaton approach,.e., popularty, securty and hybrd prncples. Gven an App category and securty level, each prncple can generate a ranked App lst as the recommendaton result. Here, we propose to use two metrcs NDCG P op and NDCG Sec to evaluate the the performance of each recommendaton result. Compared wth tradtonal NDCG, the relevance scores of NDCG P op and NDCG Sec are set to the popularty score and the recprocal of rsk score, respectvely. Intutvely, f a recommendaton result has hgher NDCG P op (NDCG Sec ), t has more emphass on App popularty (App Securty). Fgure 11 shows the average recommendaton performance across all App categores wth respect to dfferent rankng prncples and securty levels. From the results, we can observe that the hybrd prncple can rank Apps wth a trade-off between popularty and securty, whch means the recommended Apps are both popular and secure. Also, wth the ncrease of securty levels, the recommendaton results have more emphass on App securty than popularty. 5.3.2 A Case Study To further evaluate the recommendaton performances of dfferent rankng methods, we study fve Apps n category App/Lfestyle, whch are Weterago, BeNaughty, Moment Dary, SmplyNose and Bedsde. Partcularly,

Table 3: The case study of App recommendaton. Recommendaton SEC SmplyNose,Moment Dary,Bedsde,BeNaughty,Weterago POP Weterago,Bedsde,Moment Dary,BeNaughty,SmplyNose H-1 Bedsde,Moment Dary,Weterago,BeNaughty,SmplyNose H-3 Moment Dary,Bedsde,BeNaughty,SmplyNose H-5 Moment Dary,SmplyNose,Besde H-6 SmplyNose,Moment Dary Weterago s one of the reported nsecure Apps, and SmplyNose s an App wthout requestng data access permssons. Table 3 shows the recommendaton results, where SEC recommends most secure Apps based on rsk scores wth securty level 1; POP s based on popularty scores (.e., average ratngs) wth the securty level 1; H-1, H-3, H-5 and H-6 denote the hybrd prncple based recommendaton wth securty levels 1, 3, 5 and 6, respectvely. From these results, we can observe that the popularty-based method recommends nsecure App Weterago n the frst poston, whle t has the hghest rsk score. In contrast, f only usng rsk scores to recommend Apps, some unpopular Apps (e.g., SmplyNose) wll be ranked hgher. Furthermore, we can observe that H-3, H-5, H-6 do not recommend all fve Apps for users. The reason s that these methods only take Apps wth securty levels hgher than the gven levels as canddates. Fnally, we can see that the App Moment Dary s ranked the hghest by H-3, H-5, snce the hybrd prncple can reach some balance between popularty and securty for App recommendaton. 5.4 Effcency and Scalablty Our approach conssts of an offlne stage and an onlne stage. In the offlne stage, the computatonal cost manly comes from two parts: the computaton of regularzaton for estmatng rsk scores, and the computaton for securty level segmentaton and buldng the App hash tree. To evaluate the effcency and scalablty of our approach, we test the runnng tme of each part on dfferent segmentaton of the entre data set (.e., 10%,..., 100%) to llustrate the scalablty of our approach. All the tests were conducted on a 3.4GHZ 8-Core CPU, 8G man memory PC. Fgure 12 shows the runnng tme of each part wth respect to dfferent nput data sze. We can see that the computaton tmes are almost lnear wth the sze of nput data. Thus, our approach s scalable n the offlne stage. In the onlne stage, gven a securty level and App category, the recommender system wll return the ranked lst of Apps to user accordng to dfferent recommendaton prncples. Indeed, snce the popularty scores (e.g., overall ratng) and rsk scores can be obtaned n the offlne stage, and the portfolo optmzaton for hybrd prncple has a close-form soluton (e.g., Equaton 12), the computatonal cost n onlne stage s relatvely low. In partcular, as dscussed n Secton 4, the man rankng process can be conducted n advance and pre-stored n the App hash tree. In ths case, the onlne recommendaton process wll be very fast. 6. RELATED WORK Generally speakng, the related works of ths study can be grouped nto two categores. The frst category s about moble App securty. Indeed, many prevous studes about securty and prvacy ssues of moble Apps have been reported. For example, Enck et al. [6] proposed a malware detecton system named Tant- Drod, whch can provde effcent real-tme analyss of other thrd party moble Apps through the montor of ther data (a) (b) Fgure 12: The runnng tme of (a) each teraton of regularzaton, and (b) securty level segmentaton and buldng the App hash tree. access behavor. Luo et al. [13] dscussed the problem of attacks on WebVew n the Androd system, analyzed the fundamental causes and proposed some potental solutons. To tame the nformaton-stealng moble Apps, Zhou et al. [20] proposed a new prvacy model for Androd system. Also, they developed a system named TISSA as securty mddleware to mplement ths model. Enck et al. [7] developed a rule-based certfcaton model and system named Krn, whch can perform lghtweght certfcaton of moble Apps at nstall tme. Indeed, more and more advanced moble Apps are commtted to provde ntellgent servces for users by requestng varous access permssons of users personal data. To understand these data access permssons, Au et al. [5] surveyed the permsson systems of several popular smart phone operatng systems, such as Apple IOS, and Androd. They also dscussed the problem of permsson over-declaraton and proposed some nsghtful drectons of relevant research. Smlarly, Felt et al. [8] studed the permsson requests of over 900 moble Apps n Androd system, and developed a tool named Stowaway to detect the overprvlege n compled Androd Apps. However, these approaches are very hard to be mplemented n practce, snce t s not a trval task to effcently and accurately detect the malware codes for each moble App and users often do not want some securty software to frequently scan ther devces. Recently, Peng et al. [14] proposed a novel approach wth varous probablstc models for rankng Apps wth respect to ther rsk scores. Although ths approach s straghtforward and not scalable for external knowledge, t s effectve for estmatng App rsk. Therefore, we propose to leverage ths approach for assgnng pror rsk scores n our regularzaton framework. Another category s about moble App recommendaton, whch ams to facltate the choce of moble users. For example, Yan et al. [17] developed a collaboratve flterng based moble App recommender system, namely Appjoy. Dfferent from other moble App recommender systems, the Appjoy s based on users App usage records to buld preference matrx but not explct user ratngs. However, sometmes the App usage records are very sparse. To solve ths problem, Sh et al. [15] studed several recommendaton models and proposed a content based collaboratve flterng model named Egenapp for recommendng Apps n ther Web ste Getjar. Also, some researchers studed the problem of explotng enrched contextual nformaton for moble App recommendaton. For example, Yu et al. [18] proposed a novel personalzed context-aware recommender system by analyzng moble user s context logs. The proposed approach s based on Latent Drchlet Allocaton topc model and scalable for multple contextual features. Furthermore, Zhu et al. [21] proposed a unform framework for personalzed context-aware

recommendaton, whch can ntegrate both context ndependency and dependency assumptons. The framework can mne user s personal context-aware preferences for moble App recommendaton from the context logs of many moble users. However, all the above recommendaton approaches do not take consderaton of the potental securty/prvacy rsk of moble Apps, whch motvates our novel moble App recommender system wth securty and prvacy awareness. 7. CONCLUDING REMARKS In ths paper, we developed a moble App recommender system wth securty and prvacy awareness. Specfcally, wthout relyng on any predefned rsk functons, we desgned a scalable and automatc approach for estmatng the securty rsks of Moble Apps. An unque perspectve of ths approach s the creatve use of external knowledge as pror scores and the regularzaton technques n an Apppermsson bpartte graph. Moreover, to consder both Apps popularty and users securty preferences for recommendatons, we ntroduced a flexble App recommendaton method based on the modern portfolo theory. Partcularly, we also developed an App hash tree to effcently look up Apps n recommendaton. Fnally, the experments on a large-scale real-world data set clearly valdated the effectveness and effcency of the proposed recommendaton framework. 8. ACKNOWLEDGEMENTS Ths work was supported n part by grants from Natonal Scence Foundaton for Dstngushed Young Scholars of Chna (Grant No. 61325010), Natural Scence Foundaton of Chna (NSFC, Grant No. 71329201), and Natonal Hgh Technology Research and Development Program of Chna (Grant No. SS2014AA012303). Ths work was also partally supported by grants from Natonal Scence Foundaton (NSF, Grant No. CCF-1018151 and IIS-1256016), and UNC Charlotte Faculty Research Grants 2014-2015. 9. REFERENCES [1] http://developer.androd.com/. [2] http://en.wkpeda.org/wk/cohen s kappa. [3] http://en.wkpeda.org/wk/google play. [4] https://play.google.com/apps. [5] K. W. Y. Au, Y. F. Zhou, Z. Huang, P. Gll, and D. Le. Short paper: a look at smartphone permsson models. In Proceedngs of the 1st ACM workshop on Securty and prvacy n smartphones and moble devces, SPSM 11, pages 63 68, New York, NY, USA, 2011. ACM. [6] W. Enck, P. Glbert, B.-G. Chun, L. P. Cox, J. Jung, P. McDanel, and A. N. Sheth. Tantdrod: an nformaton-flow trackng system for realtme prvacy montorng on smartphones. In Proceedngs of the 9th USENIX conference on Operatng systems desgn and mplementaton, OSDI 10, pages 1 6, Berkeley, CA, USA, 2010. USENIX Assocaton. [7] W. Enck, M. Ongtang, and P. McDanel. On lghtweght moble phone applcaton certfcaton. In Proceedngs of the 16th ACM conference on Computer and communcatons securty, CCS 09, pages 235 245, New York, NY, USA, 2009. ACM. [8] A. P. Felt, E. Chn, S. Hanna, D. Song, and D. Wagner. Androd permssons demystfed. In Proceedngs of the 18th ACM conference on Computer and communcatons securty, CCS 11, pages 627 638, New York, NY, USA, 2011. ACM. [9] T. Joachms. Optmzng search engnes usng clckthrough data. In Proceedngs of the Eghth ACM SIGKDD Internatonal Conference on Knowledge Dscovery and Data Mnng, KDD 02, pages 133 142, New York, NY, USA, 2002. ACM. [10] A. Karatzoglou, L. Baltrunas, K. Church, and M. Böhmer. Clmbng the app wall: enablng moble app dscovery through context-aware recommendatons. In Proceedngs of the 21st ACM nternatonal conference on Informaton and knowledge management, CIKM 12, pages 2527 2530, New York, NY, USA, 2012. ACM. [11] E.-P. Lm, V.-A. Nguyen, N. Jndal, B. Lu, and H. W. Lauw. Detectng product revew spammers usng ratng behavors. In Proceedngs of the 19th ACM nternatonal conference on Informaton and knowledge management, CIKM 10, pages 939 948, New York, NY, USA, 2010. ACM. [12] C. Luo, H. Xong, W. Zhou, Y. Guo, and G. Deng. Enhancng nvestment decsons n p2p lendng: An nvestor composton perspectve. In Proceedngs of the 17th ACM SIGKDD Internatonal Conference on Knowledge Dscovery and Data Mnng, KDD 11, pages 292 300, New York, NY, USA, 2011. ACM. [13] T. Luo, H. Hao, W. Du, Y. Wang, and H. Yn. Attacks on webvew n the androd system. In Proceedngs of the 27th Annual Computer Securty Applcatons Conference, ACSAC 11, pages 343 352, New York, NY, USA, 2011. ACM. [14] H. Peng, C. Gates, B. Sarma, N. L, Y. Q, R. Potharaju, C. Nta-Rotaru, and I. Molloy. Usng probablstc generatve models for rankng rsks of androd apps. In Proceedngs of the 2012 ACM Conference on Computer and Communcatons Securty, CCS 12, pages 241 252, New York, NY, USA, 2012. ACM. [15] K. Sh and K. Al. Getjar moble applcaton recommendatons wth very sparse datasets. In Proceedngs of the 18th ACM SIGKDD nternatonal conference on Knowledge dscovery and data mnng, KDD 12, pages 204 212, New York, NY, USA, 2012. ACM. [16] J. Wang and J. Zhu. Portfolo theory of nformaton retreval. In Proceedngs of the 32Nd Internatonal ACM SIGIR Conference on Research and Development n Informaton Retreval, SIGIR 09, pages 115 122, New York, NY, USA, 2009. ACM. [17] B. Yan and G. Chen. Appjoy: personalzed moble applcaton dscovery. In Proceedngs of the 9th nternatonal conference on Moble systems, applcatons, and servces, MobSys 11, pages 113 126, New York, NY, USA, 2011. ACM. [18] K. Yu, B. Zhang, H. Zhu, H. Cao, and J. Tan. Towards personalzed context-aware recommendaton by mnng context logs through topc models. In Proceedngs of the 16th Pacfc-Asa conference on Advances n Knowledge Dscovery and Data Mnng - Volume Part I, PAKDD 12, pages 431 443, Berln, Hedelberg, 2012. Sprnger-Verlag. [19] W. Zhang, J. Wang, B. Chen, and X. Zhao. To personalze or not: A rsk management perspectve. In Proceedngs of the 7th ACM Conference on Recommender Systems, RecSys 13, pages 229 236, New York, NY, USA, 2013. ACM. [20] Y. Zhou, X. Zhang, X. Jang, and V. W. Freeh. Tamng nformaton-stealng smartphone applcatons (on androd). In Proceedngs of the 4th nternatonal conference on Trust and trustworthy computng, TRUST 11, pages 93 107, Berln, Hedelberg, 2011. Sprnger-Verlag. [21] H. Zhu, E. Chen, K. Yu, H. Cao, H. Xong, and J. Tan. Mnng personal context-aware preferences for moble users. In Proceedngs of the IEEE 12th Internatonal Conference on Data Mnng, ICDM 12, pages 1212 1217, 2012. [22] H. Zhu, H. Xong, Y. Ge, and E. Chen. Rankng fraud detecton for moble apps: A holstc vew. In Proceedngs of the 22Nd ACM Internatonal Conference on Conference on Informaton and Knowledge Management, CIKM 13, pages 619 628, New York, NY, USA, 2013. ACM.