Use of web Mining in Network Security Rimmy Chuchra 1, Bharti Mehta 2, Sumandeep Kaur 3 1 Asst.Proff (CSE) & Sri Sai Institute of Engg. And technology, Mannawala Campus (Amritsar) 2,3 M.tech (CSE) & Yadwindra college of Engg, Talwandi Sabo Abstract--Web mining is basically knowledge discovery from the World Wide Web (WWW).This practical application of data mining helps to integrate the data gathered by the traditional data mining methodologies or data mining techniques as well as data gathered by the WWW. The term Web mining has been used in three distinct ways which are web content mining, web usage mining and web structure mining. Here, we are uses Web structure mining, it is the process of using graph theory to analyze the node and connection structure of a web site. In this research paper we are uses the distinct type of web mining called Web Structure Mining which helps to extract patterns from hyperlinks in a web where the function of hyperlink to connect a web page with any other location of the same or different web page. A specific hyperlink behaves like a structural component in case of web structure mining. In this research paper, we are merging the concept of web mining with the network security so that we can easily detect the online attacks occur on the network by using web agents (i.e. - web agents are basically web robots) rather than using man power effort. The major objective is to reduce cost as well as time while identifying online attack. Here we use rule induction data mining technique to achieve maximum accuracy of results. The special focus is to detect online active attack by the web agents after that they will provide security by using various mechanisms and techniques. In this way, we can also say that these web agents help to protect us from attacker during online data transfer which follows the concept of network security. The first task of web agents is to identify the type of active attack after that provide several ways to prevent security. In this way we can use a Hybrid approach (i.e. - web mining with network security).the major benefit to use this hybrid approach is to save time and cost which are the major objectives of data mining. Keywords--Rule Induction, Web mining, Electronic reconnaissance attack, Web agents (web robots), active attacks, denial of service. I. INTRODUCTION Web mining helps to extract useful information from the web pages. Various we mining techniques are used to extract knowledge from the web data, web documents and hyperlinks between the documents. Where the web is universal information platform space which can be accessed by companies, universities, businessman etc. Generally, web hold there are numerous sources of information like internal sources and external sources. 164 Internal sources are those which include personal information of any organization and external sources are those which include information of clients, vendors, suppliers, intranet and extranet etc. The major significance to use the concept of web mining is to provide efficiency and effectiveness of decision making of decision making. In this research paper, we can divide us mining into three categories which are listed as: a) Web Structure mining. b) Web Usage mining. c) Web content mining. Web Structure mining: - It consists of web pages as nodes as hyperlinks and edges connecting related pages. It basically tells the structural layout of the web. it also used the connectivity among websites that are called Hyperlinks. Hyperlinks are further divided into two categories which are listed as below:- Internal hyperlinks that lead to pages within the same web page. External hyperlinks that lead to other web pages. Document structure is basically a schema language for XML which helps to describing a valid XML documents. Web Usage mining: - It holds the knowledge discovered by users which are navigating through the websites. We can also say that it maintains a repository of all record of such requests in log files. It is further divided into two categories which are listed as follows Application Server Data It holds the business transactions and also makes their repository in applications server log. Web Server Data In these logs are made by the web server. It also includes the field of IP address means the number of web pages accessed with access times. Web Content Mining:-It holds the knowledge discovery by going through the web pages contents like image, videos etc. Intelligent agents help to solve the problem of indexing in search engines otherwise it will result in delivery imprecise results due to information overloading. It also helps to select much more relevant documents.
The major effort of web content mining is to organize semi-structured web data into structured collection of resources and getting effective results. It uses various approaches like agent based approach, database approach etc. Figure 1: Classification of Web Mining Network security measures are needed to protect data during their transmission. It basically interconnects their data processing equipment with a collection of interconnected networks. Such kind of collection is often referred to as an internet for this we use the term Internet security. Our major objective is to protect data from attacker during online data transfer. There are several types of attacks will occur active attacks or passive attacks. Active attacks are further categories as like replay, masquerade and modification of messages and denial of service etc. Similarly categories for passive attacks are traffic analysis and release of message contents. In this research paper, we are discussing about active attacks which are detected by web agents (i.e. - web robots). An attacker can be easily entered by clicking on attractive hyperlinks. Here, we are discussing about Electronic reconnaissance attacks. For identifying which system as well as the resources are on the network any attacker must perform Electronic reconnaissance attack (ERA) even in some cases an attacker must holds the complete information about the target network then he or she can easily find out the location of the resources of any organization. Once IP (Internet Protocol) address is known, an attacker can start the scanning and probing on the network. For performing scanning on the network we use a ping sweep utility that actually pings a range of IP address. The major purpose to use scanning is to find out what hosts are currently live on the network. The function of probing is to gather additional information like operating system or applications running on those hosts. It also used to discover information about hosts that are on the network. It is accomplished by looking open ports on the available host computers. When any port is opened, on that time an attacker can find out what services are running on a computer. So, by identifying the opened port an attacker can use information further to discover the operating system and application servicing running on the port. Web agents (I.e.-web robots) can easily identified attack by looking various symptoms like unavailability of particular website, inability to access any website, unusually slow network performance, dramatic increase in the amount of spam you receive in your account. In this research paper, we are merging two broader areas network security with web mining. By using the concept of web mining web agents (i.e. which are basically web robots) will easily discover the knowledge about the attacker from the World Wide Web (WWW) during online data transfer. The major benefit to use such kind of this combined approach is save time as well as cost. When web agents will detect attacker then there will be no need for human effort. In this way, we will save cost. Web agents at first identify the type of attack will occur and after that they will provide security by using various mechanisms and techniques. II. OUR CONTRIBUTION In this research paper, we proposed a hybrid approach that is web mining with network security. By using the concept of web mining we can easily discover information for identifying active attacks like masquerade, replay from the World Wide Web. And once attack is identified then call web agents. These web agents helps to handle such type of active attacks in online mode. 165
We can also use this proposed concept in e-commerce applications like in banking sector during online money transfer web agents will easily find out attacker by using some methods and techniques. Here, we are uses a Rule Induction technique of data mining whose syntax is given below: IF Condition Then Class. i.e. - IF Attack Status=Enable then Call=Web Agents. The major purpose to use Rule induction technique is to achieve the maximum accuracy for getting better results. Rule Induction technique can be implemented as like: Table1 Various naming conventions used in rule induction method. WA Web Agent AA Active Attack OnM Online Mode S Status E Enable(shows value is 1) D Disable(shows value is 0) R Rule For each Class WA Initialize to the set of all A2 While Active Attack contains examples in class WA Create a rule R with an empty L.H.S that Predicts Class WA Until R is 100% accurate (Or there is no more status to use) do: For each status S not in R & each Mode (Online mode_onm). Consider adding the condition (Status_Mode pair) S=M To the L.H.S of R. Select S and M in which status of attack is disable & helps to maximize the rule accuracy & also covering of the Status_Mode Pair. Add Status=mode to R (rule). Removed the examples covered by R from all A2. There is only one possible case of Status_Mode Pair which are as follows:- Case1:Status=Enable,Mode=Online. Status=Disable, Mode=Online. Description: When status is enable and mode is online that indicates data is to be transferred from the source to the destination and when status in disable and mode is again online that indicates there is no data transferred between the source and the destination. Research Design III. CONCLUSIONS In this research paper, we have discussed a hybrid approach that merges two separate broader areas data mining and network security. It also tells how we can use the concept of web mining for providing security on the network only in online mode. There are many more practical applications are related to this concept is used in this real world like to provide security of personal data of any organization which can only transferred in online mode, whether in every domain like in finance, marketing, HR,economics etc. When end-user data will transfer from the source to the destination in secure mode with decreasing the amount of time as well as cost. Cost saving in such manner like web agents (that are basically web robots helps to find out the type of active attack enters in the network or in the system) handles attacker itself so there is no need for any man power effort to identify the type of attack. 166
In this manner, man power reduced there is no money to pay anybody. So, in this way we can say that this also helps for cost saving. Then ultimately data mining objective will also achieve. IV. FUTURE SCOPE In future, this work will be extended by implementing this concept with the help of OLAP (on-line analytical processing) tool. And we can also find out some mechanisms or techniques to identify the passive attacks occur in the web. For example when any user will want to visit on any web page then before using this page he or she will be must Signup that specific page on that time username as well as password must be submitted by the user, Later on attacker will try to break that password. So, we have to design various mechanisms to handle such type of passive attacks. It will be discussed in special two cases of passive attacks that are like traffic analysis and release of message contents will be also done by web agents. REFERENCES [1] Kavita Sharma, Gulshan Shrivastava, Vikas Kumar, Web Mining: Today and Tomorrow In Proceedings of the IEEE 3rdInternational Conference on Electronics Computer Technology, 2011. [2] James B. Lingan, http://whatis.techtarget.com seen on March 2011. [3] L.K. Joshila Grace1, V.Maheswari2, Dhinaharan Nagamalai Analysis of Web Logs and Web User in Web Mining InternationalJournal of Network Security & Its Applications (IJNSA), Vol.3, No.1, January 2012. [4] Sravan Kumar, D. and Naveena Devi, B. Learner s Centric Approach for Web Mining et al. (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 1(2), 2010. [5] T. Nakayama, H. Kato, and Y. Yamane, Discovering the gap between website designers expectations and users behavior InProceding of the Ninth Int l World Wide Web Conference, Amsterdam, May 2009. [6] Ajay Ohri Data mining through Cloud Computing.http://knol.google.com/k/data-mining-through-cloudcomputing#See on Dec. 2010. [7] Gulshan Shrivastava, Kavita Sharma, Swarnlata Rai, Technical Overview Dos and DDos Attack in Proceeding of International Conference in Computing 2010, ACRS, Pp 274-282, 2010. [8] Michael Jennings, What are the major comparisons or differences between Web mining and data in proceeding of International journal of computer science and network security (IJCSNS) March 2009. [9] Magdalini Eirinaki and Michalis Vazirgiannis, Web Mining for WebPersonalization in ACM Transaction on Internet Technology, Vol. 3, No.1, Feb. 2008. [10] Adeyinka.O, Internet attack methods and internet security technology, Modelling and simulation, 2008. AICMS 08. Second Asia International conference on vol., no., pp 77-82, 13-15 May 2011. [11] Marin, G.A, Network security basics, Security & privacy,ieee,vol.3,no.6,pp.68-72,nov-dec.2008. [12] Improving security, http://www.cert.org/tech_tips,2009. [13] Curtin, M. Introduction to network security,http://www..interhack.net/pubs/network security. [14] Security Overview, www.redhat.com/docs/manuals/ enterprise/rhel-4-mannual/security-guide/ch-sgs-ov.html. [15] Virgilio Almeida, Azer Bestavros, Mark Crovella, and Adriana deoliveira, Characterizing reference locality in the WWW, In IEEEInternational Conference in Parallel and Distributed InformationSystems, Miami Beach, Florida, USA, December 2007. Acknowledgement A special thanks to Mylord and there are a bunch of people to thank for this paper, including Mr. Lovish Chuchra. This paper would not exist but for their faith in me, and I offer them my heartful thanks. 167
Author Bibliography Rimmy Chuchra received the Bachelor of Technology in Computer Science & Engineering from Malout Institute of management and information technology, Malout, India in 2010, and Master of Technology in Computer Science & Engineering from Lovely Professional University, jalandhar, India in 2012. She is currently an Assistant Professor at the Department of Computer Science in Sri Sai University Palampur, (HP) India. Her main research interests are Data mining, Information Security, cloud computing And Network Security. 168