Intelligence Techniques for e-government applications

Intelligence Techniques for e-government applications HANAA. M. SAID 1, MOHAMED HAMDY 2, RANIA El GOHARY 3 and ABDEL-BADEEH M. SALEM 4 1 Faculty of Computing & Information Science Ain Shams University, Abbassia, Cairo, EGYPTE 2 Faculty of Computing & Information Science Ain Shams University, Abbassia, Cairo, EGYPTE 3 Faculty of Computing & Information Science Ain Shams University, Abbassia, Cairo, EGYPTE Abstract This paper introduces intelligence security strategy approaches. The successful implementation of the e- government depends on the viable security. E-government security is considered one of the crucial factors for achieving an advanced stage of e-government. In this research we focused on several techniques, algorithms, approaches and different areas of data mining technique models in Cyber Security from different perspectives, to establish a classification and comparison of various types of Intrusion Detection and Countermeasures in E-government of this researches, Intelligent Technique Approaches categorization that reflect the important criteria of the data mining models. It summarizes various Intelligent Data analyses and presents an Intelligent Data Analysis of Cairo Cleaning and Beautification Agency ; establishing such a classification impacts deeply guiding data mining applications towards better operations and performance. Moreover how data mining can help in detection and prevention of these attacks. Information security violations such as access control violations as well as a discussion of various threats are presented. Finally we present a comparative analysis between selected models to improve security. Keywords: E-government, Cyber Security models, Intrusion detection (ID), Penetration testing, Neural Networks, Fuzzy Logic, Genetic algorithm 1. INTRODUCTION 4 Faculty of Computing & Information Science Ain Shams University, Abbassia, Cairo, EGYPTE The field of Artificial Intelligence has found many applications in the operation of power systems. These applications range from Expert Systems to assist with network fault diagnosis and rectification to Artificial Neural Networks and Fuzzy Logic to provide models for complex non-linear control problems. Intrusion detection (ID) has become a critical Component of network administration due to the vast number of attacks persistently threaten our computers. Traditional intrusion detection systems are limited and do not provide a complete solution for the problem. Security is an important issue for the future of the cyberspace; due to access of malicious data in internet and in system security that controls real time data and leads to huge dimensional problems, so a data pre-processing is necessary. Attacks against the computer infrastructures are becoming an increasingly serious problem. Hacking is the act of breaking into another system with or without the owner s knowledge. Intruders have promoted themselves and invented innovative tools that support various types of network attacks. Hence, effective methods for intrusion detection (ID) have become an insisting need to protect our computers from intruders. In general, there are two types of Intrusion Detection Systems (IDS); misuse detection systems and anomaly detection systems [1, 2, and 3]. Over the past few years, there has been tremendous increase in the cyber threats due to penetration of new technologies within the global economy as it involves heavy usage/dependency of the Internet to carry out businesses for personal/business/governmental sectors. E-government- can be defined as the use of information and communication technologies, and particularly the internet, as a tool to achieve better government (OECD, 2003), Electronic Government constitutes the Public Administration that uses Information technology in order to convert its Internal and External relations (United Nations, 2008). Applying Data Mining (DM) techniques on network traffic data is a promising solution that helps in developing better intrusion detection systems. Data mining is defined as the identification of interesting structure in data, where structure designates Patterns, statistical or predictive models of the data, and relationships among parts of the data (Fayyad & Uthurusamy, 2002) [4,5]. We used different algorithms to extract the valuable data. Data mining is important tool to transform the data from large quantities of data through using pattern matching. Data mining has many applications in security including national security, terrorist activities and cyber security. However, the usefulness of this data is negligible if meaningful information or Knowledge cannot be extracted from it. Data mining, otherwise known as knowledge discovery, attempts to answer this need. In contrast to standard Statistical methods, data mining techniques search for interesting information without demanding a priori hypotheses. Finding links between data fields, Use regression to predict future values of data and Model Volume 4, Issue 2, March April 2015 Page 6

sequential patterns in the data that may indicate revealing trends (Tam and Kiang, 1992; Chu & Widjaja, 1994) [6]. Cyber security involves protecting information by preventing, detecting, and responding to attacks. Cyber security also referred to as information technology security, whose main focus is protection of computers, networks, programs and data from unauthorized access, change or destruction. The real cyberspace that is available on the internet. It is very difficult, to conduct on them the assessment of quality. That can be accepted for the extent of securing it. It can be expressed for this real cyberspace as if the series of the minor cyberspaces. The importance of inferring process of the reference measure in the form of procedural assessment is to improve the knowledge and helping in the decision making for the e- government services. A series of the standards are built on the application of data mining methods specifically represented as "Frequencies", "decision tress model", "Logistic regression", "association rules model", " Neural Networks Model", "Hierarchical Clustering" and 'Bayesian network' for making reference measurements, to measure the extent of securing the data, and the provided services. A penetration test is an in-depth information risk analysis practiced to assess the security of the systems from a hacker s perspective. Penetration Testing and Web Application testing service simulate a hacker or an attacker like environment to conduct the exercise so as to match the hacker s thought process. Penetration testing can be done by both the Internet and local area network depending on the placement and operational usage of the system such as: Web Application Penetration Test (Application discovery, Data Mining, Cryptography, Database Listener and Business Logic Testing) [13]. For the above mentioned reasons, we formed intelligent approach for securing the data that consists of penetration test that includes (DM-ID), the results of intelligent approach and penetration testing are used to find out security defects and to patch them before it will be too late. This brings testers to adopt automatic tools widely, as it is demonstrated by the continuous release of platforms finalized to automate this process, discovering gaps in compliance, finding defects now before somebody else does, verifying secure configurations, testing new technology and reporting problems to management. Collaborative processes oriented on large data sets are presented [14]. Also, we will compare the effectiveness of various types of techniques and algorithms of different technologies researches. These help in choosing between several alternatives take of decision making. This paper imparts numbers of applications for the data mining methodologies in cyber security. It have been developed and deployed to protect computer systems against network attacks, we discuss various types of variety of techniques, approaches and different areas of data mining technique models in cyber security from "different perspectives E- government", describing how data mining helps in detection and prevention of these attacks. Finally Results applied on the site of "Cairo Cleaning and Beautification Agency" governorate in Egypt www.ccba.gov.eg; it is one of the important cyberspaces in the frame of the mechanism for the e-government services, and its effect on both the citizens, the investors and on the government, this cyberspace is related with several electronic sites. Combinations of different intelligent system approaches to form hybrid intelligent systems continue to find new applications. Security must be addressed in the phase of planning and designing of E- government System, Management process is needed to assess security control, where management allows departments and agencies to maintain and measure the extent of data security depending on the mechanism of revealing the security weak points.revealing the weak points is done by using a series of standards built on the application of machine learning methods specifically Using the Neural Networks Model, and intelligent data analysis. All these techniques are useful in monitoring and measuring the extent of the secured data and the provided services. The fuzzy set theory was introduced by Zadeh [25]. Fuzzy logic is a multi-value logic which permits intermediate values to be defined between conventional ones like true/false, low/high, good/bad etc. In a classical set theory, an element may either belong to set or not. In fuzzy set theory, an element has a degree of membership. A degree of membership function can be described as an interval [0, 1]. This paper introduces Intelligent Approaches for Securing the Data, these approaches are based on intrusion detection, analysis and monitoring, in order to form penetration test that helps decision makers to take the right decision for facing the threats and control system operations. The strategies of " Frequencies", "decision tress model", "Logistic regression", "association rules model", " Neural Networks Model", "Hierarchical Clustering" and 'Bayesian network" will be utilized in forming data mining intrusion detector (DM-ID), this in turn will be used in forming penetration test that will monitor, measure and test of the audit data and events. Taking into account that, each module will work independently to detect intrusions in the network traffic data. This paper may be useful tool that enables the governorate to find the major points for managing the effective government services, type of the data to be used, type of data that has been moved in a proper way, what are the terms or the requirements that are used in the data organizing, arranging the knowledge from the view of the priority and importance performance for discovering them, compiling the processes based on the followed standards. This paper consists of 4 sections; the first section is the introduction as we are able to get huge information about the literature survey. For assessing the security of the cyberspace, the second section Comparative Intelligent Technique Approaches for E-government Security of securing the data when introducing the strategic information for the different rendered services through the Volume 4, Issue 2, March April 2015 Page 7

minor cyber service. Moreover the concentrates on the means of research and measurements that are used and suggested and how to use them are presented in the section 3.Also presenting the discussions about the different results, finally in section 4 we summarized and concluded the future work. 2. RELATED WORK Data mining techniques have been successfully applied to various private sector industries in marketing, financial services, and health care. Governments are using data mining for improving service delivery, analyzing scientific information, managing human resources, detecting fraud, and detecting criminal and terrorist activities. However, literature is scarce regarding the application of data mining to a project oriented environment. Generally, the purpose of this paper is to show how data mining concepts may be applied in a project oriented environment. It will examine the so called project success framework and show how data mining may be utilized at particular stages to increase the chances of delivering successful projects that will have the intended impact on the corporate business strategies of private and public sector organizations. data mining has evolved in a wide variety of directions, ranging from complexity control of algorithms to the development of applications for many domains, such as counter terrorism, medical diagnosing, marketing and so on (Antonie, Zaïane & Coman, 2001; Bach, 2003; Bank, Min Tjoa & Stolba, 2006; Bhattacharyya, 1999; Choenni, 2000; Wang & Han, 2000). The extraction of econometric models, however, has received relatively little attention in the field of data mining. An econometric model is a model that specifies the statistical relationship that is believed to hold between its variables. These models play a central role in many fields of research and become increasingly important in forecasting tools. For example, in finance, stock prices may be expressed in terms of other stock prices and macro-economic variables, such as industrial production and interest rates (Cheung & Ng, 1998; Nasseh & Strauss, 2000; Pesaran & Timmermann, 2000). Another example, within government forecasting, is the modelling of recorded crime, which may be expressed in terms of demographic and macro-economic variables, such as the number of young males and unemployment (Deadman, 2003; Greenberg, 2001; Hale & Sabbagh, 1991). Two common econometric models are the linear regression model and the cointegrated model. Cyber security is not a single problem in e-government, but rather it is a group of highly different problems involving different sets of threats. Fuzzy Rule based system for cyber security is a system that consists of a rule depository and a mechanism for accessing and running the rules. The depository is usually constructed with a collection of related rule sets. The aim of this study is to develop a fuzzy rule based technical indicator for cyber security with the use of an expert system which is named FRBCES (Fuzzy Rule Based Cyber Expert System). Rule based systems employ fuzzy rule to automate complex processes. Common cyber threats assumed for cyber experts are used as linguistic variables in this paper. We persistent computer security vulnerabilities may expose the government s critical infrastructure and government s network systems to cyber attack by terrorists, possibly affecting the economy or other areas of the national security at large [12]. Furnel and Warren [13] discussed the problems posed by cyber terrorists. They considered the nature of the responses necessary to protect the future security of society. By the rising threat of cyber attacks, some researchers tried to describe cyber threat and made attempts for finding a solution to their studies [14]- [17] this show in figer1. So far, many studies have been done on cyber security, but these are mostly focused on prevention of cyber intrusion, [18]-[21], effects of cyber attacks or on different machine learning applications [5],[6],[8]-[10]. Although there are some studies using fuzzy rules [22]-[24], fuzzy expert systems effectiveness are totally different analysis. In this paper, apart existing literature, a new approach has been developed to prevent cyber attacks using a fuzzy expert system. The proposed fuzzy expert system in this study gives valuable information to system administrators to improve the achievement of the cyber security. This work contributes to the system in a general manner and it can be adapted to different cyber security scenarios. Figer1: E -government application Table 1 Distribution of articles according to data mining and its applications of e-government Volume 4, Issue 2, March April 2015 Page 8

Volume 4, Issue 2, March April 2015 Page 9

Hong Yu et al. [17] performed comparative study on data mining for individual credit risk evaluation. The researcher found that credit risk is referred to as the risk of loss when a debtor does not fulfil his debt contract and it is of natural interest with respect to practitioners in banks as well as to organizers. Ji Dan et al. [18] performed synthesized data mining algorithm based on clustering and decision tree. At present, they have accumulated abundant agriculture information data for the vast territory and diversity of crop resources. However, we just can visit a small quantity of data for lack of useful tools. Mohamed El far et al. [19] compared between data mining algorithms: "Close+, Apriori and CHARM" and K-means classification algorithm and applying them on 3D object indexing. Three-dimensional models are more and more used in applications in which the necessity to visualize realistic objects is felt (CAD/CAO, medical simulations, games, virtual reality etc.). Wangjie Sun et al. [20] implemented an advanced design of data mining algorithms. In order to save the computer data effectively, we should not only check the integrity for the data, but also we have to check storage system to recover data in a timely manner to reduce losses to a minimum, to prevent the recover fails when the fault occurred. S.P.Latha [20] presents algorithm for efficient data mining. Over the years, a variety of algorithms for finding frequent item sets in very large transaction databases have been developed. Data mining algorithms are used extensively to analyze business, commerce, scientific, engineering, and security data and dramatically improve the effectiveness of applications in areas such as marketing, predictive modeling, life sciences, information retrieval, and engineering. In April 2007, Estonia suffered a major cyber-attack, after which Estonia contributed in securing cyber space worldwide. According to Joak AAVIKSOO, Minister of education and Research of Estonia, they analyzed weak points in their infrastructure [58]. As per their conclusions their law enforcements, border line do not hold in cyberspace [58], most of the infrastructure is not under single body and 80% of web infrastructure is in private hands [58]. In 2008, Estonia formulated a National Cyber Security Strategy. The objective of National Cyber Security Strategy is to ensure cyber security and help private sectors to develop highly secured standards [21]. In Malaysian primary schools, cyber bullying and hacking are the major occurring crimes [66]. There is an Adaptive Information Security Model that was developed to lessen the gap between what we can do and control ICT [36]. There are five critical systems that ensure the highly secured and prospered network [36]. Forty-one41 internet crimes have been analyzed [36]. The analyses show that victims were missing in these five security tests [36]. A penetration test on internet service provider was conducted in Sweden [37]. In Burma just before country s first national elections in twenty years, the internet was shutdown [31]. Offenders usually use public places to commit crimes which hides their identity and where there is no effective legislation. Internet gave birth to terrorist propaganda. Radicalization can be done using internet. MIS configuration of websites causes search engines to penetrate into website and causes illegal access to data [66]. Search engines need to obey some rules to disallow, some folders, files and images [66]. Halfond et al [23], [24] presented a technique for penetration testing, which involves static and dynamic analysis to increase the efficiency of the information gathering and response analysis phase. The author implemented static and dynamic analysis to improve penetration testing. To discover the input vector, the static analysis technique of automatic response that analyzes the dynamic analysis technique is used. The main objective of dynamic analysis is to find error while running the program. To test the effectiveness of these techniques, an experiment was conducted for static and dynamic analysis based penetration testing on nine web applications [23]. Halfond et al [24], developed Amnesia (Analysis for Monitoring and Neutralizing SQL Injection Attack). The authors proposed a model based technique that combines the static and dynamic analyses. In this paper the tool first identifies hotspot, where SQL queries are issued to database engines. Non-deterministic finite automata are used at each hot spot to develop query model (2009). Xiong et al [9], [10] presented an approach of model driven framework that integrates the software development life cycle phases with penetration testing process, so vulnerability can be easily detected and testing can be done repeatedly by the expert personnel, to test the cost effectiveness, systematic and fully integrated into systematic and fully integrated into a security oriented software development life cycle, security experts are still required to maintain knowledge. The test cases are derived from models. Stepien et al [6] presented an approach to penetration testing inherent to penetration testing of web application, the approach consists of TTCN-3 languages inherent features. Also, it derives the functional test cases and has taken an example of a malicious bank website. This paper described a message sequence diagram of a malicious bank website to show the XSS attacks. It generates the functional test cases. Pietraszek et al [26],[27] presented an approach of Taint based technique in which the authors modified PHP interpreter to track taint information at the character level, context sensitive analysis is used in this technique to reject SQL queries if an entrusted input has been used to create certain types of SQL tokens. The advantages of this approach are that they require modifications to the run time environment, which decreases the portability. Arkin, Stender and McGraw (Arkin, B. et al 2005) [28] investigated the importance of the subject from the software pen-testers perspective, concentrating on where the role of the tester lies when flaws are assessed during software development. Within the software development life cycle, Arkin et al. suggest without proper and timely Assessment, organizations...often find that their Volume 4, Issue 2, March April 2015 Page 12

software suffers from systemic faults both at the design and implementation levels (Arkin, B. et al, 2005). The same can be said for the network security of organization; without proper and rigorous assessment, the network design of an organization will lead to unknown flaws inherent in the network implementation. The same can be said for the network security of organization. Pierce, Jones and Warren (Pierce, J. et al, 2007) [29] in their paper provided a conceptual model and taxonomy for penetration testing and professional ethics. They described how integrity of the professional pen tester may be achieved by...avoiding conflicts of interest, the provision of false positives and false negatives and finally do the legally binding testers of their ethical obligations in [their] contract This is certainly noteworthy and should be expected of an individual working with potentially sensitive information; however, this appears more of a personal ethical code of conduct than something that can be enforced and assessed. Pierce et al (Pierce, J. et al, 2007) also discussed the provision by universities...toward offering security testing courses. McRue ( McRue, A., 2006), [30] Commented on the "first U.K. University to offer a dedicated degree course in hacking "This has certainly shown an emerging trend in the educational sector for penetration testing courses; however these tend to be degree classifications and not necessarily an industry recognized Certification standard. The literature review shows that data mining is key ingredient in the solution to information security problems. The author in [31] discusses the development of data mining and its application areas. Soft computing framework data mining is presented in paper [32] where soft computing approaches like fuzzy logic, neural network are discussed. Data mining provides a number of algorithms that can help detect and avoid security attacks [33].The author in [34] presents a survey on various data mining techniques for intrusion detection wherein the types of intrusion attacks like network and host based are also summarized. One of the intrusion detection techniques known as anomaly detection has been discussed in details [35]. Paper [36] specifies the measurement criteria for intrusion detection. Fraud detection is another area of focus as the number of online transactions is rising exponentially. Various types of frauds like computer fraud are given in [37] with the respective techniques to overcome the situation. A number of methods are proposed for privacy preserving through data mining in [38], for example K-Anonymity. In paper [39], author talks about the sensitivity of data which may risk an individual s privacy. This data can be general data, user specific or authentication data. Peter in [40] specifies aspects of cloud computing and the top cloud computing companies with their respective key features. The cloud security issues have been addressed via a trusted third party in [41]. Data mining techniques can also be used for the analysis of various firewall policy rules [42]. Security framework for mobile cloud computing is proposed in [43]. In [44], the authors have identified the following types of attacks which are major threats to cloud implementation denial of service attack, Cross virtual machine side-channel attack, malicious insiders attack, Attacks targeting shared memory, and Phishing attack. Table 1 briefs the review of variety of work done in the area of cloud computing security with the help of data mining techniques. Paper [15] details the need of mobile cloud computing. As the mobiles are getting cheaper with the availability of internet facility, a mobile can also be considered as an entity in a cloud. Malicious insiders attack, Attacks targeting shared memory, and Phishing attack. Table 1 briefs the review of variety of work done in the area cloud computing security with the help of data mining techniques. Paper [15] details the need of E-governments cloud computing. The E- governments are getting with the availability of internet facility, the E-governments can also be considered as an entity in a cloud. Currently, many data mining and knowledge discovery frameworks and data classification for everyone and different usage such as the Real-time (On line) Environment for Knowledge Analysis RTDMM [1], other Xiong Deng et al, AKDT [9], other Olivier Thonnard et al, DMCS [10], other Bhavani M.Thuraisingham, APSO [11], other Sandeep Rana et al, SCDI [12], other Chandola DI et al,itics [13]], other Kutoma Wakunuma ET AL, GPLCA [14], Other Ap Jian Zhang1 ET etc[55]. These Frameworks provide a set of methods and algorithms that help in better utilization of available data and information to users; including methods and algorithms for data analysis, cluster analysis, genetic algorithms, nearest neighbor, data visualization, regression analysis, Decision trees, Predictive analysis, text mining, cyber security, world wide web, semantic web Data mining argent, and amplification approach etc. Intrusion detection (ID) is the process of monitoring and analyzing the data and events occurring in a computer and/or network system in order to detect attacks, vulnerabilities and other security problems, Figure 2 below shows a traditional framework in government decision making, for improving the efficiency of service delivery. [15]. Figure 2: traditional framework for ID 3. Proposals From above mentioned studies and according to the several advantages of (DM approaches and "Penetration testing") for E-government intrusion detection, we suggest that a combination of both approaches can help in developing a new generation of high performance IDS. In comparison to traditional IDS (Fig.3), IDS based on DM Volume 4, Issue 2, March April 2015 Page 13

and "Penetration testing" is generally more precise and requires far less manual processing and input from human experts. In this paper we used the application of Minor cyber Cairo Cleaning and Beautification Agency (www.ccba.gove.eg) in Egypt. The following describes our applications of different techniques in the minor cyber space's which is the cyberspace for the authority of cleaning and beautifying Cairo, in the Arab Republic of Egypt (www.ccba.gov.eg) to Analysis the extent of the sufficiency for the suggested reasoning to measure the extent of securing data for the cyberspace. We formed "intelligent approach" for securing the data that consists of penetration test that includes ("Mining Audit Data for Automated Models for Intrusion Detection" (MADAM ID); for evaluating the security state of a system or network by simulating an attack from a malicious source. This process involves identification and exploitation of vulnerabilities in real world scenario which may exist in the systems due to improper configuration, known or unknown weaknesses in hardware or software systems, operational weaknesses or loopholes in deployed safeguards. We will use strategy of inferring and analyzing the data, searching for them in the cyberspace by one of the technology tools (data mining), through the cyberspace, enabling fighting terrorism to limit the harms in advance by making the relief arrangements from the view of comprehensive security and through the analysis of the results for the data survey as it depends on using the models of test to assess the extent of the correctness and safety of the data identifying the standards of test that can exceed the limitations of the available data, such as using the proposed model in the Figure 3" To test the extent of the data correctness for the cyberspace, and that the infrastructure of the propped model of cyberspace for "the Cairo Cleaning and Beautification Agency", a model will be built in steps represented in 2 states as follows: The first stage ("Frequencies", "Association rules", "decision trees" and "hybrid of auto regression") [20], [72], [73]. The second (" Neural Networks Model"," Hierarchical Clustering" and 'Bayesian network") to enable the decision maker to know interact with the features of the value traits. And the data extraction tools will be adapted with data mining [74], [75], and [76]. Penetration testing was among the first activities performed when security concerns were raised many years ago [3]. The basic process used in penetration testing is simple: attempt to compromise the security of the mechanism undergoing the test. In earlier years, computer networked operating systems, with their access control mechanism, were the most suitable components for penetration testing, because O.S. is the core component of the machine, so it is more exposed to security threats [3]. The earliest penetration testing processes were highly and manually intensive, while later automatic processes started to be clearly utilized for cost reduction [3].We need to determine how the attacker is most likely to go about attacking a network or an application. Locating areas of weakness in network or application defenses, determines how an attacker could exploit weaknesses, Locating resources that could be accessed, altered, or destroyed, determine whether the attack was detected, determine what the attack footprint looks like and making recommendations. Other benefits of feature selection are: improving the prediction of ID models, providing faster and costeffective ID models, providing better understanding and virtualization of the generated intrusions. Figure 3: The proposed IDS model based on DM and penetration testing Figure 3 shows the proposed "IDS "model based on "DM" and "penetration test ". The system is composed of the following units: Computer network sensors: collect audit data and network traffic events and transmit these data to ID units. DM-ID unit: contains different modules that employ various DM algorithms and techniques (e.g., Frequencies, decision tree model, logistic regression algorithms, neural networks model, Bayesian network model etc.). Each module works independently to detect intrusions in the network traffic data. Penetration test unit: deploys penetration test to detect intrusions in the network audit data. Collect detected intrusions unit: collects detected intrusions from DM and penetration testing units. Virtualization unit: help monitor and visualize the results of penetration test units. Managerial decision maker: analyzes intrusion results, evaluates system performance, takes decisions on detected intrusions, checks for negatives and positive results, controls system operation, generates a performance report and decides if any changes/updates are needed. Volume 4, Issue 2, March April 2015 Page 14