A Monograph on Data Mining Techniques for Forensics
|
|
|
- Maurice Gilmore
- 10 years ago
- Views:
Transcription
1 A Monograph on Data Mining Techniques for Forensics Venkata Krishna Kota Central Research Laboratory Bharat Electronics Limited Bangalore, India Abstract Forensics occupies key portion in today s Digital Forensics. There is a thirsty need for the tools to analyze large collections forensically. Data mining techniques play a vital role in analyzing large collections of data. Various data mining techniques and architecture are proposed in this paper to help forensic examiners. Keywords Forensics; Data Mining; Information Retrieval; Ontology I. INTRODUCTION With the rapid development in technology, the usage of s for the fraudulent activities is also accelerating with higher pace. Forensic Analysis of these s can prevent, investigate or prove a crime committed. forensics can properly be defined as the use of specialized techniques for the collection, preservation and analysis of s with a view of presenting evidence in a court of law. Analyzing huge size of s is a challenge to forensic examiners. There is a thirsty need for the tools which can automatically analyze these s forensically. Digital evidence search is the heart of digital forensics. Information Retrieval (IR) is the process of retrieving relevant information or documents possessing the relevant information for the specified information need. IR systems play vital role in such scenarios by retrieving the most relevant information for the given user queries. In this paper we propose a system to retrieve the relevant s from the large collections and to present them in an easily understandable form to the forensic examiners. Forensic examiners don t want to miss any piece of information that is relevant to them. So unlike traditional IR systems, Forensic IR systems should focus on high recall. The proposed system achieves high recall through ontology driven query expansion. High recall results in more matching s where some or many of them may not be relevant to the information need of the investigator. The information need also varies from case to case under investigation. For example, assume that a tender is leaked in an organization. While dealing this case, the investigator may give more priority for the s transmitted within the range of tender creation date and the tender results announcing date rather than the transactions in other dates. Such requirements will be best known to the investigator alone. The proposed system offers a customizable ranking facility so that investigators can express his interest to the system which can rank the retrieved s accordingly and serves the most relevant s to him. Now a day s people use short form acronyms instead of writing full words. For example s users write tc. But his intension is to say take care. Analyzing such acronyms is a challenge. The proposed system uses an approach to analyze such acronyms with the help of Ontology. Data mining techniques are applied for in-depth analysis. Link Analysis, Clustering, Summarization and other techniques are applied to identifying interesting patterns. Visualization techniques are used to present the retrieved knowledge to the user in an easily understandable form with graphical support. The system constructs an Forensics Ontology by capturing some semantic relationships among retrieved conversations which are essential for forensics. The ontology can straight away answer some of the domain specific questions of forensic investigators through semantic analysis and inference. Chapter 2 briefs the research done in Forensics so far. Chapter 3 presents the design details of the system. Chapter 4 explains the implementation issues. Chapter 5 concludes the paper. II. RELATED WORK With the rapid development in technology, the usage of s for the fraudulent activities is also accelerating with higher pace. Forensics can prevent, investigate or prove a crime committed. Many researchers have done valuable work and presented many solutions for Forensics. This section of the report briefs their contribution. In [1] a tool is presented for indexing and analyzing textual content, and for providing information retrieval functions to retrieve all s containing interesting information which can be used for digital forensics. Traditional forensic search tools just present the results without a kind of grouping or inappropriate filtering. Crime investigator has to spend a lot of time in order to find documents related to the investigation among the searched results. [2] Has proposed a new ranking method to rank the results according to their relevance to the information need of the investigator. When the users provide narrow queries, the information retrieval may fail to produce some relevant documents. [1] Has chosen the query expansion to solve this problem with the use of WordNet Ontology by expanding the query by ASE 2014 ISBN:
2 including the words which are semantically related to the actual words in the query. Enron data set is a large collection of s. The data set contains around 5,17,431 s from 151 employees. It has been stated as the perfect test bed for testing the effectiveness of techniques used for counter terrorism and fraud detection and it has been used by many researchers [3]. In our experiment we have chosen it as our data set. In [5], authors presented a data mining tool that visualizes a very wide range of detailed analyses of and flows derived from large collection in a variety of formats. In [6], tools have been explained for identifying the originating IP and originating location of an through header analysis. [7] Has proposed an algorithm for forensics to analyze information from network packets on SMTP protocol and HTTP protocol. In [8], authors have proposed techniques for discovering s in one conversation, capturing the conversation structure and summarizing the conversation. The system explained in [9] focused on Phishing scam. It analyzes s to gather additional information related to Forensics using UNIX tools and it also generates forensic reports. The system mentioned in [10] has applied classification methods to Instant Messages to determine the author of it based on user behavior. In current paper an architecture and method is provided for Forensics using data mining techniques. III. DESIGN Block diagram of the system is given in figure 1. s from the corpus will be parsed, tokenized, analyzed and finally indexed one by one. Once the index is ready, users are permitted to query and retrieve relevant s from it. User s query will be analyzed semantically, expanded and refined using WordNet and Chat Acronym Ontologies. This refined query will be matched against the index thus retrieves the matching s. On the basis of the user provided Ranking Profile which is the representation of the interest of the investigator, these matching s will be ranked according to their relevance to the case under investigation. These s will be presented to the user as well as forwarded to the further modules for in-depth analysis. Data mining techniques like Link Analysis, Clustering and User Behavior Modeling.etc will be performed on these relevant s and the retrieved knowledge will be presented in an easily understandable form by the visualization module with graphical support. Ontology Construction module takes the retrieved relevant s, extracts the interesting relationships from them and constructs the Forensics Ontology. Ontology straight away answers some interesting domain specific questions of the investigators through semantic analysis and inference. FIG: 1 BLOCK DIAGRAM OF THE SYSTEM IV. IMPLEMENTATION METHODOLOGY Enron corpus [14] is a good choice as the dataset for this experiment. It has been stated as the suitable test bed for the digital forensics and it has been widely used by many researchers. To provide rapid searching, first an index has to be constructed from the collections. Indexing refers to processing the original data into a highly efficient crossreference lookup in order to facilitate rapid searching [4, 12]. Each from this corpus has to be parsed, analyzed, tokenized and indexed. We suggest LUCENE [13] (an open source java search API) to perform this task. Thus an inverted index can be built. This index can be tested using Luke tool [20]. ASE 2014 ISBN:
3 Usage of short form acronyms instead of actual word (example tc represents take care ) is making the understanding problem as a challenging task. To deal this problem, we propose ontology based acronym expansion while indexing. By constructing ontology for acronym expansion, we can make use of it for expanding acronyms within the . Thus index functionality can be refined by indexing the words instead of their acronyms. Users query the system with the keywords to get the relevant s. Digital forensics generally aims at high recall, because investigators don t want to lose any information event if it is slightly relevant. We used the query expansion to achieve high recall. User s query is expanded with the words that are semantically similar to the words with in the user s query. We have used the WordNet Ontology for Query expansion. Lucene provide a convenient method for query expansion. User s query can be analyzed and refined through query expansion as explained above. This refined query will be mapped against the inverted index and s containing those query terms will be retrieved. Lucene offers a handy way to search for documents those contain query keywords. Various search types like Boolean search, filed based search, weighted search, wild card search, fuzzy search.etc are possible. Suggestions for misspelled queries are also feasible through Lucene. Existing forensic search tools just present the results without a kind of grouping or inappropriate filtering, a crime investigator has to spend a lot of time in order to find documents related to an investigation among the searched results. To solve this we propose a ranking methodology. We have offered the added flavor of customization to the user. User can customize the ranking process based on his interest (which highly varies from case to case under investigation). Due to the high recall, the number of resultant s will be huge. There is a desperate need for a ranking system which ranks the matching s in their order of relevance. In traditional IR systems normally this ranking scheme will be static. For forensic analysis apart from the textual body content, the metadata associated with the like sender and receiver information, date and time of sent or received the origin location of the and other things are much more important. Traditional IR systems ranks the documents based on the occurrence of query terms in the text. They do not bother about such metadata. One more issue is the required metadata ( sent\received time, sender information...etc) will change from case to case and will be best known to the person investigating that case. So forensic ranking method should consider the metadata of s and it should be flexible to change. The system presents the flexibility to the forensic examiner to express his interest in the form of ranking profile. System calculates the relevance score for each matching and presents the s which are most interesting to the investigator in the top of the results list. Metadata plays a key role in forensic analysis. Metadata can be extracted from retrieved s. Sample metadata details of Enron are given below. Message-ID: < JavaMail.evans@t hyme> Date: Tue, 5 Feb :18: (PST) From: [email protected] To: [email protected], [email protected] Subject: east basis and index points Mime-Version: 1.0 Content-Type: text/plain; charset=usascii Content-Transfer-Encoding: 7bit X-From: Rabon, Chance </O=ENRON/OU=NA/CN=RECIPIENTS/CN=CRABON> X-To: Mckay, Jonathan </O=ENRON/OU=NA/CN=RECIPIENTS/CN=Jmckay1 >, Brawner, Sandra F. </O=ENRON/OU=NA/CN=RECIPIENTS/CN=Sbrawne > X-cc: X-bcc: X-Folder: \ExMerge - Mckay, Jonathan\Enron X-Origin: MCKAY-J X-FileName: jon mckay PST From this Meta data more interesting details can be found. For example details like originating location or originating IP of the can be extracted using tools like TrackerPro [18] and SmartWhoIs [19]. Analyzing Body Content is useful in many contexts. Author identification can be done using text content analysis. Author identification plays a key role in spam detection. Text mining is very useful for this purpose. classification also plays an important role in Forensics. If some crime happens, some mails may be actual mails which can contain crime related information and can stay as a witness. But many s can be there talking about the crime scene. They are communicating the crime news. Crime investigator may not be interested in them. Classifying the s into Crime-Related s and News s will help the crime examiner a lot. Text mining tools like NLTK [27] and GATE can be used for these purposes. Social interactions are very important for forensic investigations. Link analysis can be done on the retrieved s to understand the interactions [11]. User interaction graph is a graph which contains the ids of retrieved s as nodes and each retrieved as an edge between corresponding id nodes. Thus a single user interaction graph can be built from resultant s. Thus this user interaction graph depicts whole set of user ASE 2014 ISBN:
4 interactions in a single shot. Using this graph communication links between any parties can be easily analyzed. Mediators between two parties can be easily identified. It also depicts various groups of people who usually communicate among themselves. These details are important for forensic investigation. Efficient visualization of the extracted knowledge is equally important. Visualization module presents the knowledge discovered in above techniques with graphs and charts. User Interaction Graph which visualizes the interactions graphically can be visualized using JUNG (Java Universal Network Graphs) API [22]. Interesting details like sending and receiving counts, the time and date range versus the sent/receive frequency and other such details can be plotted as bar charts and pie charts using JFreeChart java API [21]. Ontology is specification of conceptualization about any particular domain. Ontologies have been widely used in many domains to formally represent the semantics of that domain, to provide automated reasoning, to answer semantically rich domain specific queries. We propose an Ontology using concepts and relationships related to Forensics. Ontologies capture the semantic relations among the entities of the domain. We can infer new knowledge and can answer domain specific questions with help of Ontologies. Ontology can be developed using protégé tool [15, 17]. OWL (Web Ontology Language) [16] can be used to represent the Ontology. Once the Ontology is designed, it can be instantiated using the metadata details extracted from the resultant s using protégé s Data Master plug-in [26]. Consistency check of the ontology can be done using Pellet reasoner. Rules specific to forensic domain can be written in SWRL (Semantic Web Rule Language) [23]. With the help of Jess inference engine [25], we can fire the domain specific rules and infer new knowledge. Ontology will be expanded by updating this new knowledge. Ontology will be queried with domain specific queries. SPARQL language [24] can be used to query the ontology. We can get answers for those queries from the Ontology. For example, the proposed system constructs the Forensics Ontology by capturing the relationships like A is sending to B. etc. Using these details Ontology can infer who is in contact with whom, who is directly connected to whom, whether two people are connected or not and other details. Whenever the investigator wants to know answers for these queries, this Ontology answers him. V. CONCLUSION Need and challenges for forensics and some of the available solutions for Forensics are briefed. A system is proposed to assist the Forensic Examiners in retrieving the relevant s from large corpus in less time and to present the interesting hidden patterns in an easily understandable manner with advanced graphical support. A method is proposed using Ontology to answer some of the interesting domain specific questions of the forensic examiner. ACKNOWLEDGEMNT I am most grateful to Dr. Ajit T. Kalghatgi, Director (R&D), Bharat Electronics Limited, for his most valuable suggestions. REFERENCES [1] Report, Australian Phan Thien Son, Ontology-Driven Text Mining for Digital Forensics, COMP6703 Project National University, [2] Jooyoung Lee, Proposal for Efficient Searching and Presentation in Digital Forensics, The Third International Conference on Availability, Reliability and Security, 2008, pp , doi: /ares [3] Jitesh Shetty and Jafar Adibi, The Enron Dataset Database Schema and Brief Statistical Report. [4] Erik Hatcher, Otis Gospodnetic and Michael McCandless, Lucene in Action, Second Edition, Manning Publications, [5] Salvator J Stolfo and Shlomo Hershkop, Mining Toolkit Supporting Law Enforcement Forensic Analyses. [6] Natarajan Meghanathan, Sumanth Reddy Allam and Loretta A. Moore, Tools and Techniques for Network Forensics, International Journal of Network Security & Its Applications (IJNSA), Vol.1, No.1,April 2009 [7] Wang Wen Qi and Liu WeiGuang, The Research on Forensic Based Network, First International Conference on Information Science and Engineering (ICISE), 2009, pp: , ISBN: [8] Xiaodong Zhou, Discovering and Summarizing Conversations, Thesis Report, The University of British Columbia, 2008 [9] Agarwal S, Bali J, Zhenhai Dvan and Kermes L, The Design and Development of an Undercover Multipurpose Anti-Spoofing Kit (UnMASK) 23 rd Annual Conference on Computer Security Applications, 2007, pp: , ISBN: [10] Angela Orebaugh and Jeremy Allnut, Classification of Instant Messaging Communications for Forensic Analysis, The International Journal of Forensic Computer Science, IJoFCS(2009) 1, pp: [11] Jaiwei Han and Micheline Kamber, Data Mining Concepts and Techniques, Second Edition, ISBN: [12] Christoper D. Manning, Prabhakar Raghavan and Hinrich Schutze, Introduction to Information Retrieval ISBN: , [13] Lucene search library, available at: [14] Enron dataset available at: [15] Protégé Ontology Editing Tool, available at: [16] OWL guide, available at: [17] Protégé Wikipedia, available at: [18] TrackerPro available at: [19] SmartWhoIs available at: [20] Luke available at: [21] JFreeChart available at: [22] JUNG available at: [23] SWRL available at: [24] SPARQL available at: [25] Jess available at: [26] DataMaster available at: ASE 2014 ISBN:
5 [27] NLTK available at: AUTHOR Venkata Krishna Kota received his B.Tech degree in Computer Science and Information Technology from Jawaharlal Nehru Technological University in 2005 and M.E degree in Computer Science from Anna University in He is working as Member (Research Staff) at Central Research Laboratory (CRL), Bharat Electronics Limited (BEL), Bangalore. His research interests are Information Retrieval and Complex Event Processing. ASE 2014 ISBN:
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software
A Statistical Text Mining Method for Patent Analysis
A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, [email protected] Abstract Most text data from diverse document databases are unsuitable for analytical
Search and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
Analysis of Email Fraud Detection Using WEKA Tool
Analysis of Email Fraud Detection Using WEKA Tool Author:Tarushi Sharma, M-Tech(Information Technology), CGC Landran Mohali, Punjab,India, Co-Author:Mrs.Amanpreet Kaur (Assistant Professor), CGC Landran
Data Mining in Web Search Engine Optimization and User Assisted Rank Results
Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management
How to Analyze Company Using Social Network?
How to Analyze Company Using Social Network? Sebastian Palus 1, Piotr Bródka 1, Przemysław Kazienko 1 1 Wroclaw University of Technology, Wyb. Wyspianskiego 27, 50-370 Wroclaw, Poland {sebastian.palus,
Information Retrieval Elasticsearch
Information Retrieval Elasticsearch IR Information retrieval (IR) is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches
DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
Hadoop Technology for Flow Analysis of the Internet Traffic
Hadoop Technology for Flow Analysis of the Internet Traffic Rakshitha Kiran P PG Scholar, Dept. of C.S, Shree Devi Institute of Technology, Mangalore, Karnataka, India ABSTRACT: Flow analysis of the internet
Email Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2
Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,
SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL
SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL Krishna Kiran Kattamuri 1 and Rupa Chiramdasu 2 Department of Computer Science Engineering, VVIT, Guntur, India
Semantic Web based e-learning System for Sports Domain
Semantic Web based e-learning System for Sports Domain S.Muthu lakshmi Research Scholar Dept.of Information Science & Technology Anna University, Chennai G.V.Uma Professor & Research Supervisor Dept.of
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
Semantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph
MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of
Linked Data Interface, Semantics and a T-Box Triple Store for Microsoft SharePoint
Linked Data Interface, Semantics and a T-Box Triple Store for Microsoft SharePoint Christian Fillies 1 and Frauke Weichhardt 1 1 Semtation GmbH, Geschw.-Scholl-Str. 38, 14771 Potsdam, Germany {cfillies,
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
The Ontological Approach for SIEM Data Repository
The Ontological Approach for SIEM Data Repository Igor Kotenko, Olga Polubelova, and Igor Saenko Laboratory of Computer Science Problems, Saint-Petersburg Institute for Information and Automation of Russian
ANDROID APPLICATION TO EXTRACT THE STATISTICS OF AN HPC CLUSTER
ANDROID APPLICATION TO EXTRACT THE STATISTICS OF AN HPC CLUSTER ABSTRACT S.Chakraborty, Miraz Nabi Azad, Sourav Sen, Pritomrit Bora Aditya Singh, Bipal Das and Mohd.Tabeesh Noori Department of Computer
Prediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
Understanding Web personalization with Web Usage Mining and its Application: Recommender System
Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,
A QoS-Aware Web Service Selection Based on Clustering
International Journal of Scientific and Research Publications, Volume 4, Issue 2, February 2014 1 A QoS-Aware Web Service Selection Based on Clustering R.Karthiban PG scholar, Computer Science and Engineering,
A Knowledge Management Framework Using Business Intelligence Solutions
www.ijcsi.org 102 A Knowledge Management Framework Using Business Intelligence Solutions Marwa Gadu 1 and Prof. Dr. Nashaat El-Khameesy 2 1 Computer and Information Systems Department, Sadat Academy For
Facilitating Business Process Discovery using Email Analysis
Facilitating Business Process Discovery using Email Analysis Matin Mavaddat [email protected] Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process
An Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials
ehealth Beyond the Horizon Get IT There S.K. Andersen et al. (Eds.) IOS Press, 2008 2008 Organizing Committee of MIE 2008. All rights reserved. 3 An Ontology Based Method to Solve Query Identifier Heterogeneity
Framework for Live Digital Forensics using Data Mining
Framework for Live Digital Forensics using Data Mining Prof Sonal Honale #1, Jayshree Borkar *2 Computer Science and Engineering Department, Aabha Gaikwad College of Engineering, Nagpur, India Abstract
City Data Pipeline. A System for Making Open Data Useful for Cities. [email protected]
City Data Pipeline A System for Making Open Data Useful for Cities Stefan Bischof 1,2, Axel Polleres 1, and Simon Sperl 1 1 Siemens AG Österreich, Siemensstraße 90, 1211 Vienna, Austria {bischof.stefan,axel.polleres,simon.sperl}@siemens.com
Index Terms Domain name, Firewall, Packet, Phishing, URL.
BDD for Implementation of Packet Filter Firewall and Detecting Phishing Websites Naresh Shende Vidyalankar Institute of Technology Prof. S. K. Shinde Lokmanya Tilak College of Engineering Abstract Packet
A Comparative Approach to Search Engine Ranking Strategies
26 A Comparative Approach to Search Engine Ranking Strategies Dharminder Singh 1, Ashwani Sethi 2 Guru Gobind Singh Collage of Engineering & Technology Guru Kashi University Talwandi Sabo, Bathinda, Punjab
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
Client Overview. Engagement Situation. Key Requirements
Client Overview Our client is one of the leading providers of business intelligence systems for customers especially in BFSI space that needs intensive data analysis of huge amounts of data for their decision
Research of Postal Data mining system based on big data
3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Research of Postal Data mining system based on big data Xia Hu 1, Yanfeng Jin 1, Fan Wang 1 1 Shi Jiazhuang Post & Telecommunication
Strategies for Cleaning Organizational Emails with an Application to Enron Email Dataset
Strategies for Cleaning Organizational Emails with an Application to Enron Email Dataset Yingjie Zhou [email protected] Mark Goldberg [email protected] Malik Magdon-Ismail [email protected] William A. Wallace
Implementation of Botcatch for Identifying Bot Infected Hosts
Implementation of Botcatch for Identifying Bot Infected Hosts GRADUATE PROJECT REPORT Submitted to the Faculty of The School of Engineering & Computing Sciences Texas A&M University-Corpus Christi Corpus
TSRR: A Software Resource Repository for Trustworthiness Resource Management and Reuse
TSRR: A Software Resource Repository for Trustworthiness Resource Management and Reuse Junfeng Zhao 1, 2, Bing Xie 1,2, Yasha Wang 1,2, Yongjun XU 3 1 Key Laboratory of High Confidence Software Technologies,
TORNADO Solution for Telecom Vertical
BIG DATA ANALYTICS & REPORTING TORNADO Solution for Telecom Vertical Overview Last decade has see a rapid growth in wireless and mobile devices such as smart- phones, tablets and netbook is becoming very
Qi Liu Rutgers Business School ISACA New York 2013
Qi Liu Rutgers Business School ISACA New York 2013 1 What is Audit Analytics The use of data analysis technology in Auditing. Audit analytics is the process of identifying, gathering, validating, analyzing,
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
DKIM Enabled Two Factor Authenticated Secure Mail Client
DKIM Enabled Two Factor Authenticated Secure Mail Client Saritha P, Nitty Sarah Alex M.Tech Student[Software Engineering], New Horizon College of Engineering, Bangalore, India Sr. Asst Prof, Department
REAL-TIME ATTENDANCE AND ESTIMATION OF PERFORMANCE USING BUSINESS INTELLIGENCE
REAL-TIME ATTENDANCE AND ESTIMATION OF PERFORMANCE USING BUSINESS INTELLIGENCE Manoj Pandita 1, Pallavi Shinde 2, Dnyanada Shirsat 3, Seema Yadav 4 1 Student, Department of Information Technology, K.J
Data Warehousing and Data Mining in Business Applications
133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business
Semantic Knowledge Management System. Paripati Lohith Kumar. School of Information Technology
Semantic Knowledge Management System Paripati Lohith Kumar School of Information Technology Vellore Institute of Technology University, Vellore, India. [email protected] Abstract The scholarly activities
ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM
ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM IRANDOC CASE STUDY Ammar Jalalimanesh a,*, Elaheh Homayounvala a a Information engineering department, Iranian Research Institute for
Reputation Network Analysis for Email Filtering
Reputation Network Analysis for Email Filtering Jennifer Golbeck, James Hendler University of Maryland, College Park MINDSWAP 8400 Baltimore Avenue College Park, MD 20742 {golbeck, hendler}@cs.umd.edu
Working with telecommunications
Working with telecommunications Minimizing churn in the telecommunications industry Contents: 1 Churn analysis using data mining 2 Customer churn analysis with IBM SPSS Modeler 3 Types of analysis 3 Feature
ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL
International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR
Dr. Anuradha et al. / International Journal on Computer Science and Engineering (IJCSE)
HIDDEN WEB EXTRACTOR DYNAMIC WAY TO UNCOVER THE DEEP WEB DR. ANURADHA YMCA,CSE, YMCA University Faridabad, Haryana 121006,India [email protected] http://www.ymcaust.ac.in BABITA AHUJA MRCE, IT, MDU University
CONCEPTCLASSIFIER FOR SHAREPOINT
CONCEPTCLASSIFIER FOR SHAREPOINT PRODUCT OVERVIEW The only SharePoint 2007 and 2010 solution that delivers automatic conceptual metadata generation, auto-classification and powerful taxonomy tools running
Intelligent Manage for the Operating System Services
Intelligent Manage for the Operating System Services Eman K. Elsayed, Nahed Desouky Mathematical and computer science Department, Faculty of Science(Girls), Al-Azhar University, Cairo, Egypt. [email protected],
Search Result Optimization using Annotators
Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,
A Mind Map Based Framework for Automated Software Log File Analysis
2011 International Conference on Software and Computer Applications IPCSIT vol.9 (2011) (2011) IACSIT Press, Singapore A Mind Map Based Framework for Automated Software Log File Analysis Dileepa Jayathilake
Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results
, pp.33-40 http://dx.doi.org/10.14257/ijgdc.2014.7.4.04 Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results Muzammil Khan, Fida Hussain and Imran Khan Department
CENG 734 Advanced Topics in Bioinformatics
CENG 734 Advanced Topics in Bioinformatics Week 9 Text Mining for Bioinformatics: BioCreative II.5 Fall 2010-2011 Quiz #7 1. Draw the decompressed graph for the following graph summary 2. Describe the
Nemea: Searching for Botnet Footprints
Nemea: Searching for Botnet Footprints Tomas Cejka 1, Radoslav Bodó 1, Hana Kubatova 2 1 CESNET, a.l.e. 2 FIT, CTU in Prague Zikova 4, 160 00 Prague 6 Thakurova 9, 160 00 Prague 6 Czech Republic Czech
Semantic based Web Application Firewall (SWAF V 1.6) Operations and User Manual. Document Version 1.0
Semantic based Web Application Firewall (SWAF V 1.6) Operations and User Manual Document Version 1.0 Table of Contents 1 SWAF... 4 1.1 SWAF Features... 4 2 Operations and User Manual... 7 2.1 SWAF Administrator
An overview of IT Security Forensics
An overview of IT Security Forensics Manu Malek, Ph.D. Stevens Institute of Technology [email protected] www.cs.stevens.edu/~mmalek April 2008 IEEE Calif. 1 Outline Growing Threats/Attacks Need for Security
A Survey on: Efficient and Customizable Data Partitioning for Distributed Big RDF Data Processing using hadoop in Cloud.
A Survey on: Efficient and Customizable Data Partitioning for Distributed Big RDF Data Processing using hadoop in Cloud. Tejas Bharat Thorat Prof.RanjanaR.Badre Computer Engineering Department Computer
Automatic Timeline Construction For Computer Forensics Purposes
Automatic Timeline Construction For Computer Forensics Purposes Yoan Chabot, Aurélie Bertaux, Christophe Nicolle and Tahar Kechadi CheckSem Team, Laboratoire Le2i, UMR CNRS 6306 Faculté des sciences Mirande,
Why are Organizations Interested?
SAS Text Analytics Mary-Elizabeth ( M-E ) Eddlestone SAS Customer Loyalty [email protected] +1 (607) 256-7929 Why are Organizations Interested? Text Analytics 2009: User Perspectives on Solutions
Intelligent Analysis of User Interactions in a Collaborative Software Engineering Context
Intelligent Analysis of User Interactions in a Collaborative Software Engineering Context Alejandro Corbellini 1,2, Silvia Schiaffino 1,2, Daniela Godoy 1,2 1 ISISTAN Research Institute, UNICEN University,
SemWeB Semantic Web Browser Improving Browsing Experience with Semantic and Personalized Information and Hyperlinks
SemWeB Semantic Web Browser Improving Browsing Experience with Semantic and Personalized Information and Hyperlinks Melike Şah, Wendy Hall and David C De Roure Intelligence, Agents and Multimedia Group,
A UPS Framework for Providing Privacy Protection in Personalized Web Search
A UPS Framework for Providing Privacy Protection in Personalized Web Search V. Sai kumar 1, P.N.V.S. Pavan Kumar 2 PG Scholar, Dept. of CSE, G Pulla Reddy Engineering College, Kurnool, Andhra Pradesh,
A Survey on Product Aspect Ranking
A Survey on Product Aspect Ranking Charushila Patil 1, Prof. P. M. Chawan 2, Priyamvada Chauhan 3, Sonali Wankhede 4 M. Tech Student, Department of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra,
AN EFFECTIVE SPAM FILTERING FOR DYNAMIC MAIL MANAGEMENT SYSTEM
ISSN: 2229-6956(ONLINE) ICTACT JOURNAL ON SOFT COMPUTING, APRIL 212, VOLUME: 2, ISSUE: 3 AN EFFECTIVE SPAM FILTERING FOR DYNAMIC MAIL MANAGEMENT SYSTEM S. Arun Mozhi Selvi 1 and R.S. Rajesh 2 1 Department
RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS
ISBN: 978-972-8924-93-5 2009 IADIS RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS Ben Choi & Sumit Tyagi Computer Science, Louisiana Tech University, USA ABSTRACT In this paper we propose new methods for
Recommendation Tool Using Collaborative Filtering
Recommendation Tool Using Collaborative Filtering Aditya Mandhare 1, Soniya Nemade 2, M.Kiruthika 3 Student, Computer Engineering Department, FCRIT, Vashi, India 1 Student, Computer Engineering Department,
Visualization methods for patent data
Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes
Flexible Web Visualization for Alert-Based Network Security Analytics
Flexible Web Visualization for Alert-Based Network Security Analytics Lihua Hao 1, Christopher G. Healey 1, Steve E. Hutchinson 2 1 North Carolina State University, 2 U.S. Army Research Laboratory [email protected]
A THREE-TIERED WEB BASED EXPLORATION AND REPORTING TOOL FOR DATA MINING
A THREE-TIERED WEB BASED EXPLORATION AND REPORTING TOOL FOR DATA MINING Ahmet Selman BOZKIR Hacettepe University Computer Engineering Department, Ankara, Turkey [email protected] Ebru Akcapinar
Visualizing the Top 400 Universities
Int'l Conf. e-learning, e-bus., EIS, and e-gov. EEE'15 81 Visualizing the Top 400 Universities Salwa Aljehane 1, Reem Alshahrani 1, and Maha Thafar 1 [email protected], [email protected], [email protected]
I. INTRODUCTION NOESIS ONTOLOGIES SEMANTICS AND ANNOTATION
Noesis: A Semantic Search Engine and Resource Aggregator for Atmospheric Science Sunil Movva, Rahul Ramachandran, Xiang Li, Phani Cherukuri, Sara Graves Information Technology and Systems Center University
Semantic EPC: Enhancing Process Modeling Using Ontologies
Institute for Information Systems IWi Institut (IWi) für at the German Research Wirtschaftsinformatik Center for im DFKI Saarbrücken Artificial Intelligence (DFKI), Saarland University Semantic EPC: Enhancing
IT services for analyses of various data samples
IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical
The University of Jordan
The University of Jordan Master in Web Intelligence Non Thesis Department of Business Information Technology King Abdullah II School for Information Technology The University of Jordan 1 STUDY PLAN MASTER'S
STATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
A Scheme for Automation of Telecom Data Processing for Business Application
A Scheme for Automation of Telecom Data Processing for Business Application 1 T.R.Gopalakrishnan Nair, 2 Vithal. J. Sampagar, 3 Suma V, 4 Ezhilarasan Maharajan 1, 3 Research and Industry Incubation Center,
K@ A collaborative platform for knowledge management
White Paper K@ A collaborative platform for knowledge management Quinary SpA www.quinary.com via Pietrasanta 14 20141 Milano Italia t +39 02 3090 1500 f +39 02 3090 1501 Copyright 2004 Quinary SpA Index
JRefleX: Towards Supporting Small Student Software Teams
JRefleX: Towards Supporting Small Student Software Teams Kenny Wong, Warren Blanchet, Ying Liu, Curtis Schofield, Eleni Stroulia, Zhenchang Xing Department of Computing Science University of Alberta {kenw,blanchet,yingl,schofiel,stroulia,xing}@cs.ualberta.ca
Semantic Content Management with Apache Stanbol
Semantic Content Management with Apache Stanbol Ali Anil SINACI and Suat GONUL SRDC Software Research & Development and Consultancy Ltd., ODTU Teknokent Silikon Blok No:14, 06800 Ankara, Turkey {anil,suat}@srdc.com.tr
SEO Techniques for various Applications - A Comparative Analyses and Evaluation
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727 PP 20-24 www.iosrjournals.org SEO Techniques for various Applications - A Comparative Analyses and Evaluation Sandhya
Analysis of Data Mining Concepts in Higher Education with Needs to Najran University
590 Analysis of Data Mining Concepts in Higher Education with Needs to Najran University Mohamed Hussain Tawarish 1, Farooqui Waseemuddin 2 Department of Computer Science, Najran Community College. Najran
KEYWORD SEARCH IN RELATIONAL DATABASES
KEYWORD SEARCH IN RELATIONAL DATABASES N.Divya Bharathi 1 1 PG Scholar, Department of Computer Science and Engineering, ABSTRACT Adhiyamaan College of Engineering, Hosur, (India). Data mining refers to
Explorer's Guide to the Semantic Web
Explorer's Guide to the Semantic Web THOMAS B. PASSIN 11 MANNING Greenwich (74 w. long.) contents preface xiii acknowledgments xv about this booh xvii The Semantic Web 1 1.1 What is the Semantic Web? 3
Table of Contents. Chapter No. 1 Introduction 1. iii. xiv. xviii. xix. Page No.
Table of Contents Title Declaration by the Candidate Certificate of Supervisor Acknowledgement Abstract List of Figures List of Tables List of Abbreviations Chapter Chapter No. 1 Introduction 1 ii iii
A Framework for Personalized Healthcare Service Recommendation
A Framework for Personalized Healthcare Service Recommendation Choon-oh Lee, Minkyu Lee, Dongsoo Han School of Engineering Information and Communications University (ICU) Daejeon, Korea {lcol, niklaus,
Visibility optimization for data visualization: A Survey of Issues and Techniques
Visibility optimization for data visualization: A Survey of Issues and Techniques Ch Harika, Dr.Supreethi K.P Student, M.Tech, Assistant Professor College of Engineering, Jawaharlal Nehru Technological
Deriving Business Intelligence from Unstructured Data
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 9 (2013), pp. 971-976 International Research Publications House http://www. irphouse.com /ijict.htm Deriving
ANALYTICS IN BIG DATA ERA
ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant
International Journal of Advanced Computer Technology (IJACT) ISSN:2319-7900 PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS
PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS First A. Dr. D. Aruna Kumari, Ph.d, ; Second B. Ch.Mounika, Student, Department Of ECM, K L University, [email protected]; Third C.
