Internet Activity Analysis Through Proxy Log



Similar documents
Denial of Service and Anomaly Detection

Configuring Your Gateman Proxy Server

User Guide. You will be presented with a login screen which will ask you for your username and password.

SIP: NAT and FIREWALL TRAVERSAL Amit Bir Singh Department of Electrical Engineering George Washington University

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

An Anomaly-Based Method for DDoS Attacks Detection using RBF Neural Networks

Google Analytics for Robust Website Analytics. Deepika Verma, Depanwita Seal, Atul Pandey

Data Driven Success. Comparing Log Analytics Tools: Flowerfire s Sawmill vs. Google Analytics (GA)

Hosting more than one FortiOS instance on. VLANs. 1. Network topology

On the Feasibility of Prefetching and Caching for Online TV Services: A Measurement Study on Hulu

NETWORKS AND THE INTERNET

Burst Technology. bt-loganalyzer. User Guide

Capturing Barracuda Web Filter Activity in Reports

Analyze & Classify Intrusions to Detect Selective Measures to Optimize Intrusions in Virtual Network

Intrusion Detection System using Log Files and Reinforcement Learning

NETFORT LANGUARDIAN MONITORING WAN CONNECTIONS. How to monitor WAN connections with NetFort LANGuardian Aisling Brennan

Approximate mechanism for measuring Stability of Internet Link in aggregated Internet Pipe

Flexible Deterministic Packet Marking: An IP Traceback Scheme Against DDOS Attacks

How To Use Windows Live Family Safety On Windows 7 (32 Bit) And Windows Live Safety (64 Bit) On A Pc Or Mac Or Ipad (32)

Table of Contents INTRODUCTION Prerequisites... 3 Audience... 3 Report Metrics... 3

Networks and the Internet A Primer for Prosecutors and Investigators

DDOS WALL: AN INTERNET SERVICE PROVIDER PROTECTOR

Defending Against Traffic Analysis Attacks with Link Padding for Bursty Traffics

Usage of OPNET IT tool to Simulate and Test the Security of Cloud under varying Firewall conditions

Sophos XG Firewall v Release Notes. Sophos XG Firewall Reports Guide v

Getting Started on the PC and MAC

Barracuda Web Filter Demo Guide Version 3.3 GETTING STARTED

Guide to DDoS Attacks December 2014 Authored by: Lee Myers, SOC Analyst

Identifying the Number of Visitors to improve Website Usability from Educational Institution Web Log Data

Protecting Your Network Against Risky SSL Traffic ABSTRACT

Monitoring Pramati Web Server

Threshold Based Kernel Level HTTP Filter (TBHF) for DDoS Mitigation

THE OPEN UNIVERSITY OF TANZANIA

Preprocessing Web Logs for Web Intrusion Detection

v6.1 Websense Enterprise Reporting Administrator s Guide

Stopping secure Web traffic from bypassing your content filter. BLACK BOX

Bisecting K-Means for Clustering Web Log data

Monitoring Web Browsing Habits of User Using Web Log Analysis and Role-Based Web Accessing Control. Phudinan Singkhamfu, Parinya Suwanasrikham

Using TestLogServer for Web Security Troubleshooting

Patch Management Table of Contents:

Chapter 8 Router and Network Management

Classifying DNS Heavy User Traffic by using Hierarchical Aggregate Entropy. 2012/3/5 Keisuke Ishibashi, Kazumichi Sato NTT Service Integration Labs

CA Nimsoft Monitor. Probe Guide for URL Endpoint Response Monitoring. url_response v4.1 series

MODEL OF SOFTWARE AGENT FOR NETWORK SECURITY ANALYSIS

Detection of Distributed Denial of Service Attack with Hadoop on Live Network

Remote Console Installation & Setup Guide. November 2009

Port evolution: a software to find the shady IP profiles in Netflow. Or how to reduce Netflow records efficiently.

A Web-Based Sensor Data Management System for Distributed Environmental Observation

Optimal Service Pricing for a Cloud Cache

How To Block Unauthorized Internet Access through Proxies

Load Balancing using MS-Redirect Mechanism

Application of Netflow logs in Analysis and Detection of DDoS Attacks

Above the fold: It refers to the section of a web page that is visible to a visitor without the need to scroll down.

SysAidTM Deployment Tool Guide

MONITORING OF TRAFFIC OVER THE VICTIM UNDER TCP SYN FLOOD IN A LAN

CMPT 471 Networking II

Shoal: IaaS Cloud Cache Publisher

Dual Mechanism to Detect DDOS Attack Priyanka Dembla, Chander Diwaker 2 1 Research Scholar, 2 Assistant Professor

modeling Network Traffic

Effective Prediction of Kid s Behaviour Based on Internet Use

Quick Start for Network Agent. 5-Step Quick Start. What is Network Agent?

Challenges in Android Application Development: A Case Study

Configuring SonicWALL TSA on Citrix and Terminal Services Servers

The Use of DNS Resource Records

Architecture Overview

Evaluation Guide. Powerful & Immediate Business Web Security via the Cloud

Internet Safety for Kids and Adults

Safe internet for business use: Getting Started Guide

Web Load Stress Testing

Cyber Essentials Questionnaire

DNS, CDNs Weds March Lecture 13. What is the relationship between a domain name (e.g., youtube.com) and an IP address?

Kaseya 2. User Guide. Version 1.0

Unified Monitoring Portal Online Help List Viewer

Web Tap: Detecting Covert Web Traffic. Presented By: Adam Anthony

SonicWALL Global Management System Reporting User Guide. Version 2.5

co Characterizing and Tracing Packet Floods Using Cisco R

Lehrstuhl für Informatik 4 Kommunikation und verteilte Systeme. Firewall

Docufide Client Installation Guide for Windows

CHAPTER 4 PERFORMANCE ANALYSIS OF CDN IN ACADEMICS

LANCOM Techpaper Content Filter

W3Perl A free logfile analyzer

Proxy Server, Network Address Translator, Firewall. Proxy Server

Check Point submitted the SWG Secure Web Gateway for

Are you getting the speed that you pay for? Understanding Internet Speeds

Chapter 3 Restricting Access From Your Network

Basheer Al-Duwairi Jordan University of Science & Technology

Snapt Balancer Manual

Modeling and Performance Analysis of Telephony Gateway REgistration Protocol

A Vague Improved Markov Model Approach for Web Page Prediction

Second-generation (GenII) honeypots

Wharf T&T Limited DDoS Mitigation Service Customer Portal User Guide

A Critical Investigation of Botnet

Investment Management System. Connectivity Guide. IMS Connectivity Guide Page 1 of 11

ECE 578 Term Paper Network Security through IP packet Filtering

A Novel Distributed Denial of Service (DDoS) Attacks Discriminating Detection in Flash Crowds

SonicWALL Global Management System Reporting Guide Standard Edition

Real-Time Analysis of CDN in an Academic Institute: A Simulation Study

Liferay Portal Performance. Benchmark Study of Liferay Portal Enterprise Edition

Trend Micro KASEYA INTEGRATION GUIDE

Performance Comparison of Server Load Distribution with FTP and HTTP

Transcription:

Internet Activity Analysis Through Proxy Log Kartik Bommepally, Glisa T. K., Jeena J. Prakash, Sanasam Ranbir Singh and Hema A Murthy Department of Computer Science Indian Institute of Technology Madras, Chennai, India Email: kartikaditya@gmail.com, {glisa, jeena, ranbir}@lantana.tenet.res.in, {hema}@cse.iitm.ac.in Abstract The availability of the Internet at the click of a mouse brings with it a host of new problems. Although the World Wide Web was first started by physicists at CERN to enable collation and exchange of data, today, it is used for a wide range of applications. The requirements on bandwidth for each of the applications is also varied. An Internet Service Provider must ensure satisfaction across the entire spectrum of users. To ensure this, analysis of Internet usage becomes essential. Further, an administrator can keep a record of user s Internet activity and prevent unethical activities, since the Internet is also an excellent resource for providing anonymity. This analysis can also help in resource provisioning and monitoring. In this work, a web-based tool is first proposed to analyse the Internet activity. Next, data is collected from a proxy server at a campus-wide network. Traffic patterns of different types of users are studied. Finally, the paper concludes with strategies for monitoring and control of traffic. I. INTRODUCTION With increase in awareness and availability of Internet, information of any kind has become available at the click of a mouse. An administrator has to ensure that all the users get a fair share of the bandwidth. Today, addiction to the Internet is a serious issue amongst users, especially in campus-wide networks where Internet is freely available. An administrator in a campus may wish to ensure controls on Internet usage. In general, most campuses end up restricting usage over specific periods of the day. This has the disadvantage that a genuine user who needs access to information is also denied. Internet traffic data can be collected from various sources such as routers, gateways or proxy servers. In this paper, we analyse a large proxy log to study user access patterns. Such a study can assist a network management system in traffic shaping and monitoring thus remove the necessity of regimentation. This is the main motivation of this paper. There are few studies on the analysis of proxy log [1], [2]. However, in this study, our analyser focuses only on investigating average access time and it provides flexibility to analyse the access patterns in several forms. In this paper, we first discuss a tool that we have implemented to analyse the behaviour of Internet users behind a proxy. Second, we analyse user s Internet usage pattern using the proposed tool. The contributions of this paper can be stated as follows: Design and implementation of a proxy analyser. Analysis of the amount of time a user spends on Internet. Analysis of the traffic pattern generated by different users. Study of few access control mechanisms. Fig. 1. Log analyser s implementation framework The organisation of the paper is as follows. Section II describes a proxy-controlled Internet access system. Section III discusses the design and implementation of a proxy analyser. Section IV presents the analysis. This paper concludes in Section V. II. PROXY-BASED INTERNET ACCESS A proxy server acts as a go-between for requests from clients seeking resources from other servers. It evaluates every request according to its filtering rules and provides the resource by connecting to the relevant server and requesting the service on behalf of the client. There are several functionalities of a proxy server ([3]). However, we focus on the following features: To keep machines behind it anonymous mainly for security. To speed up access to a resource (via caching). Contentf iltering through predefined rules. Logging Internet traffic. III. DESIGN AND IMPLEMENTATION In this section we briefly discuss the design and implementation of a proxy analyser. It has three major components namely log parser, database loader and data analyser. The basic framework is shown in Figure 1. A. Log Parser The purpose of this module is to extract useful information from the logs. The logs usually include the IP address and/or host name, the time of request, the user s id, the URL requested, the status of the request. This module parses the logs and extract the above information.

B. Database Loader This module indexes the information obtained from log parser into the database. It has mainly four components. First component keeps track of the information such as user s id and IP address. Second component keeps track of the domain information. In this study, we record only domain name, instead of entire URL. Third component keeps track of the access time. The information is stored by the four quarters of a day 1. Mainly, this component keeps track of the time that a user spends on Internet. However, proxy server logs only the time when a request has been made, not the time spent on Internet. In this study we use the following formula to compute the access time. { ttotal + Θ t total = cost if t cur t last > Θ limit t total + (t cur t last ) Otherwise where Θ limit is the maximum allowed time difference for two consecutive accesses, Θ cost is a system defined fixed value, t cun is the current access time and t last is the time that the user access last. If the time difference is above the threshold Θ limit 2, only Θ cost 3 is counted against the given user. After every estimation of the access time, t last is updated with t cur and estimation is repeated for all four quarters. The initial value of t total is set to zero. The last component stores the relationship between the above three components. All these components are implemented satisfying all the integrity constraints and agreed upon by BCNF [4]. C. Data Analyser Data analyser is the module which interacts with the user (i.e., network administrator). It provides the facility to view the statistics in the form of a graph. The analysis is done in two forms offline and online. In offline analysis, the log data is collected and loaded into the system offline. However, in online analysis, log information is collected in run time. Whenever a new request arrives on proxy server, it is automatically inserted into the database. As such online analysis may slow down the activities of the proxy servers, it is advisable to deploy online analysis in the networks with low traffic. IV. ACTIVITY ANALYSIS In this section we analyse the Internet usage pattern using the above tool. The analysis is made completely user centric for the administrator to keep track of user s activities. Using the tool one can obtain an estimate of the activity over various websites accessed by the user. However we focus more on social-community websites, which are the main intention of the paper. The analysis presented in this section is broadly divided into three parts. First, we analyse the access pattern in terms of the amount of time that a user spends on Internet using the above tool. Second, we investigate traffic pattern in 1 : to 5:59, 6: to 11:59, 12: to 17:59 and 18: to 23:59 2 If the difference in time of access is very large, it is likely that user has not been sitting at Computer reading the articles. 3 In this study, Θ limit is set to 6min and Θ cost is set to 8min 16 14 12 1 8 6 4 2 TABLE I CHARACTERISTICS OF PROXY LOG Log Duration 5months Oct. 28 to Feb. 29 No. of Users 4128 No. of request about 73millions Stats for CS4b[-9][-9] 1-11-28 2-11-28 3-11-28 4-11-28 5-11-28 6-11-28 7-11-28 8-11-28 9-11-28 1-11-28 11-11-28 12-11-28 13-11-28 14-11-28 15-11-28 16-11-28 17-11-28 18-11-28 19-11-28 2-11-28 21-11-28 22-11-28 23-11-28 24-11-28 25-11-28 26-11-28 27-11-28 28-11-28 29-11-28 3-11-28 Date of Access Q Q1 Q2 Q3 Fig. 2. Average access time viewing the URL matching orkut for the set of students matching the regexp CS4B[-9][-9] for the month of November 28. terms of the number of URLs requested by the users. Third, we further proposed few access control mechanisms to restrict users access without compromising the quality of service and investigate their effects on traffic sharing. A. Dataset We first describe the characteristics of the proxy log used. For this study, we have used a large proxy log collected from IITM 4 proxy server (running Squid [5]) over 5 months. IITM network has two proxy servers Acad-Proxy and Hostel- Proxy. The Characteristics of the log files are shown in Table I. B. Average Access Analysis Various forms of analysis can be carried out using the tool discussed in Section III. However from an academic perspective, the reported analysis focuses on the following issues (i) individual statistics: analysis over individual website accessed by the user, (ii) relative statistics: analysis over collective users for collective sites, and (iii) general statistics: analysis of access pattern by the category. 1) Individual Statistics: The proposed log analyser has provided a facility to explore the statistics of the user s access time for a particular Web site. In Figure 2, we plot the average access time of a batch of users viewing the URLs matching a keyword orkut. Each batch is identified by a regular expresion such as [A-Z][A-Z][-9][- 9][A-Z][-9][-9][-9]. It compares the average access time at different quarters 5 of a day over a period of time. This experiment has been conducted over different batches of users such as CS[-9]B[-9][-9], CS[-9]M[-9][- 9], CS[-9]S[-9][-9] and CS[-9]D[-9][-9]. In Table II, we sumerise the average access time for each batches 4 http://www.iitm.ac.in 5 Q: first quarter, Q1: second quarter, Q2: third quarter, Q3: forth quarter

TABLE II DEGREE WISE ACTIVITY FOR orkut DURING OCTOBER 28 BTECH: CS[-9]B[-9][-9][-9] MTECH: CS[-9]M[-9][-9][-9] MaxAccess(date) AvgAccess MaxAccess(date) AvgAccess Q :52:(28-1-2) :13:48 :3:39(28-1-4) ::14 Q1 :38:55(28-1-26) :16:18 :58:19(28-1-9) :4:18 Q2 :58:57(28-1-11) :29:28 1:24:26(28-1-26) :6:44 Q3 1:6:22(28-1-1) :45:38 :34:11(28-1-26) :4:28 MS: CS[-9]S[-9][-9][-9] Ph.D.: CS[-9]D[-9][-9][-9] MaxAccess(date) AvgAccess MaxAccess(date) AvgAccess Q :47:39(28-1-2) :15:2 :44:41(28-1-21) :4:11 Q1 :46:35(28-1-27) :23:54 1:25:15(28-1-19) :15:16 Q2 1:5:3(28-1-8) :37:44 :53:7(28-1-18) :26:4 Q3 1:1:9(28-1-12) :3:45 :53:52(28-1-2) :19:35 35 3 25 2 15 1 5 Stats for CS4bXX 1-12-28 2-12-28 3-12-28 4-12-28 5-12-28 6-12-28 7-12-28 8-12-28 9-12-28 1-12-28 11-12-28 12-12-28 13-12-28 14-12-28 15-12-28 16-12-28 17-12-28 18-12-28 19-12-28 2-12-28 21-12-28 22-12-28 23-12-28 24-12-28 25-12-28 26-12-28 27-12-28 28-12-28 29-12-28 3-12-28 31-12-28 Date of Access chat social-community browse academics Fig. 4. Statistics of a user over four domains for the month of December. The data is collected from Hostel-Proxy 7 6 5 4 3 2 1 Q Q1 Q2 Q3 CS8SXXX CS7SXXX CS7SXXX CS7SXXX CS7SXXX CS7SXXX CS7SXXX CS6SXXX CS6SXXX Users name matching CS[-9]S[-9][-9][-9] CS8SXXX CS8SXXX CS8SXXX CS8SXXX Fig. 3. Shows statistics for the set of students matching the regular expression CS[-9]S[-9][-9][-9]. To hide the identity of the users we have omitted the last three characters. The data is collected from Hostel-Proxy for the URLs matching orkut. It also shows that for the btech and dual degree students 6, the average access value is close to the maximum access value. The average access values across the four degrees of students can be ordered as btech+dual > ms > phd > mtech. 2) Relative Statistics: In the relative analysis, we can compare the average access time across different users. In Figure 3, we explore the access time for the users matching the regular expression CS[-9]S[-9][-9][-9] for the URLS matching orkut, facebook and iitm. This information can be used to perform inter/intra batch user s access pattern analysis. We further perform various experiments across different batches of students. One interesting observation is that the senior students are likely to have high Internet activity compared to their junior students. 3) General Statistics: In general statistical analysis, we explore the average access time of a user across different sites. To aggregate the results, we group the statistics by the category of the URLs that the user explored. This tool explores only four categories namely chat, social-community, academics and browse. To simplify the classification task, this tool considers only certain number of URLs for each category. The sites in each category are manually selected from the list of popular sites. Figure 4 shows the distribution of the access pattern of a user over the four classes for quarter Q3. The similar 6 User IDs matching the regular expression CS[-9]B[-9][-9]. TABLE III NUMBER OF CONNECTIONS TO DIFFERENT WEB SITES DURING FOUR QUARTERS. IITM GOOGLE FACEBOOK ORKUT Qi Access #con #con #con #con Q 5 3 224 6 Q 4 21 7 4 Q 3 415 15 2 Q 2 1 33 4 152 Q 1 147 776 194 1438 Q1 5 5 34 19 5 Q1 4 2 462 42 4 Q1 3 22 687 65 97 Q1 2 164 2432 18 628 Q1 1 1472 6853 255 6183 Q2 5 4 941 8 28 Q2 4 11 287 3 94 Q2 3 73 314 96 438 Q2 2 457 9936 192 241 Q2 1 2846 13421 477 16856 Q3 5 9 1418 12 65 Q3 4 6 2846 53 176 Q3 3 65 5285 198 74 Q3 2 452 13357 197 4139 Q3 1 3345 56329 542 249 experements are further conducted for all the quarters. It is observed that in every quarter the activity goes by the order browse > social-community > chat > academics. The periods 5-11-28 to 8-11-28 and 19-11-28 to 27-11-28 have activity in all the quarters, which suggest that the user is on vacation and from 28-11-28 onwards the activity again rises to high values in all the quarters. The access values are as high as 5:59:44 for browse, 5:47:58 for social-community, 5:48:49 for chat and finally 3:3:51 for acads in the Q3 quarter of the day (all the extreme values are recorded between 28-12-28 and 31-12-28). Tables III shows the number of connections to the server and the time spent on four different websites. Of all the websites google shows an alarming result of about 56329 connections having access time more than 1Hr for Q3, followed by orkut with 249 connections made in Q3 having access time for more than 1Hr. C. Traffic Analysis through URL Count From the above analysis, it is clear that the distribution of average access time across the users is not uniform. There are students who spend most of their working hours browsing

4 35 overall TABLE IV SIZE OF EACH USER S CLASS FOR THE MONTH OF FEB. 29 No. of URL accessed 3 25 2 15 1 5 Fig. 5. 5 1 15 2 25 3 35 4 Users Average number of URLs generated per day from a user the Internet. It is important for an administrator to monitor traffic at different instances in time and identify the users who are causing maximum traffic. In this section we further investigate the number of URLs requested by the users over a period of time. URL count does not provide the actual network bandwidth consumed by the users. To estimate the actual bandwidth, we also needs the information such as data transfer rate, size of the documents downloaded etc [6], [7]. Such information is not available in our dataset. However, the number of URLs reflects activities of a user on Internet. Therefore, we can use the URL count as a measure to approximate the traffic. The simple hypothesis is that larger the number of URLs requested, higher is the Internet traffic caused 7. Such analysis helps the administrator to perform various Internet traffic shaping based on (i) the type of URLs that the user visited, (ii) the usage of restricted URLs, (iii) restricting certain users from accessing certain URLs at the time when there is slow connection or busy traffic. The analysis reported in this section is done independent of the tool discussed in Section III. 1) Users Classification: Figure 5 shows the distribution of the average number of URLs requested by the users per day. It clearly shows that majority of the users have a small average number of URLs accessed per day and few users have extremely large number of requests. Based on the average number of requests generated by the users, we further classify the users into three lower, middle and higher band users. The bound of each band is defined using the following expressions and l = µ c σ (1) u = µ + c σ (2) where µ is the average number of URLs generated by a user in a day, σ is the standard deviation and c is a constant in [,1]. Users with average count lesser than l are placed in lower band, between l to u are placed in middle band and greater than u in higher band. Table IV shows the size of each band. It clearly shows that number of users in higher band is much smaller compared to other classes. % of traffic.55.5.45.4.35.3.25.2.15.1.5 Fig. 6. lower band middle band higher band 862 2684 32 lower Traffic Distribution middle User s type upper Percentage of traffic generated by each band In Figure 6, we investigate the percentage of the traffic (i.e., URL counts) generated by the users in each band. It clearly shows that though the number of users in higher band is small, it generates a considerable amount of traffic i.e., almost 45%. In Figure 7, we show the percentage of the traffic generated by the users at different time intervals of a day. It clearly shows that the distribution is not uniform. The traffic during fourth quarter is higher than the other quarters. It is also important to investigate the traffic caused by certain Web sites. In Figure 8, we plot the percentage of URLs containing popular keywords. The keywords are selected using term frequency 8. It is clearly observed that a significant portion of the traffic is caused by the URLs containing the keywords such as orkut, facebook, talk, youtube which are discouraged to use in many of the organisations. Entropy is a popular measure used to analyse the distribution of the symbols of a random variable. If the entropy is high, then the distribution is uniform and if entropy is small then the distribution is skewed. We also apply entropy to study the distribution of URLs visited by the users. A user with a small entropy indicates that the user confines his/her activities mostly to few domains. Such study can also be used to analyse user specific traffic shaping to decide whether a particular user is needed to grant more access to certain sites or to restrict access to certain sites. We use a normalised entropy to study this distribution and define as follows. H = u U P(u) log(p(u)) log( U ) where U is the list of URLs access by a user. In Figure 9, we show the distribution of the entropy using the distribution of the most popular 5 URLs. It clearly shows that there are users who mostly confine their activities to only very few URLs. (3) 7 However, this hypothesis does not reflect the actual Internet bandwidth 8 Number of domains containing a term

1 Distribution of the URLs accessed in a day 14 12 % of URLs 8 6 4 Number of Users 1 8 6 4 2 2 [-.1] [.1-.2] 1 3 5 7 9 11 13 15 Hours in a day 17 19 21 23 [.2-.3] [.3-.4] [.4-.5] [.5-.6] [.6-.7] [.7-.8] [.8-.9] [.9-1.] Normalized entropy range Fig. 7. Traffic generated in an hour of the day Fig. 9. Entropy Distribution of the users using the log files for the month Feb. 29 3 % of URLs 25 2 15 1 5 Number of URLs 1.4e+7 1.2e+7 1e+7 8e+6 6e+6 4e+6 Distribution of URLs accessed in a day without access control restrict on limit restrict on domain dynamic control stoptazmo sendspace ibnlive 2e+6 Fig. 8. google-analytics yieldmanager googleads iitm clients neroscout facebook indiatimes doubleclick voice cricinfo talkgadget yahoo orkut megaupload mail google Traffic consumed by the URLs containing a keyword 1 2 3 4 5 6 7 8 9 1111213141516171819221222324 Hours in a day Fig. 1. Comparing the traffic generated with or without access control mechanisms at different intervals in time D. Traffic Control In the above discussion, we explore users Internet access pattern. In this section, few access control mechanisms are discussed to investigate feasibility of controlling users Internet activity through proxy servers. To analyse the effects of these mechanisms, we have run simulation programs over the original dataset. Figure 1 shows the distribution of the URLs at different time intervals of a day after applying access control mechanisms. There are several ways to control users Internet access. However we have focused on the following. 1) First, we apply restriction on the number of URLs requested by the users in a day (limits to average number of URLs per day). Figure 1 clearly shows that the traffic reduces siginificantly (almost 56%) after applying this control mechanism. However this mechanism is not flexible. It may results heavy traffic at one time and no traffic at another time of a day. 2) Second, we apply restriction on the domain names. We have ignored all the URLs containing the keywords megaupload, orkut, talkgadget, voice, facebook. The traffic reduces significantly (almost 34%) after ignoring the URLs. From the figure, it clearly indicates that users are likely to access these social networking sites during evening and night. The reduction at different quarters can be ordered as Q3 > Q2 > Q > Q1. 3) The above restrictions, sometimes, may cause denial of important information access. A better control mechanism will be to restrict dynamically based on traffic load. Lastly, we apply the above domain restriction when the number of connection at a time is more than a limit (average number of connections). This is shown in the figure by the transition from normal traffic to controlled traffic. The control mechanisms are not limited to the above mechanisms. However, we left further analysis on traffic shaping using proxy logs as our future work. V. CONCLUSION In this paper, we discuss a proxy log analyser. Using this tool, we analyse the amount of time that a user spends on Internet. We further investigate the traffic generated by users by exploring URL count. Lastly, we apply few access control mechanisms and investigate their effects on Internet traffic. REFERENCES [1] Squid log analyzers. [Online]. Available: http://www.squidcache.org/scripts/ [2] W. Lou and H. Lu, Efficient prediction of web accesses on a proxy server, Conference on Information and Knowledge Management, 22. [3] Proxy server. [Online]. Available: http://en.wikipedia.org/wiki/proxy server [4] C. J. Date, An Introduction to Database Systems. Addison-Wesley, 1995. [5] Squid proxy server. [Online]. Available: http://www.squid-cache.org/ [6] B. Zhou, D. He, and Z. Sun, Traffic predictability based on ARIMA/GARCH model, 26, pp. 27 215. [Online]. Available: http://ieeexplore.ieee.org/xpls/abs all.jsp?arnumber=1678242 [7] D. M. Divakaran, H. A. Murthy, and T. A. Gonsalves, Detection of SYN flooding attacks using linear prediction analysis, 14th IEEE International Conference on Networks, pp. 1 6, September 26.