Keywords cosine similarity, correlation, standard deviation, page count, Enron dataset
|
|
|
- Lucas Foster
- 10 years ago
- Views:
Transcription
1 Volume 4, Issue 1, January 2014 ISSN: X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: Cosine Similarity for Text Detection Sonal N. Deshmukh Ratnadeep R. Deshmukh Sachin N. Deshmukh Department of MCA Department of Computer Science, Department of Computer Science, J.N.E.C., Aurangabad Dr. BAM University, Aurangabad Dr. BAM University, Aurangabad Maharashtra, India Maharashtra, India Maharashtra, India Abstract Text substitution is generally used for communication of messages amongst a group of people who wants to share the secrete information. Unethical group of people are using text substitution for communicating the details of anti-social acts that they want to carry out. Text substitution helps by hiding the actual meaning of the text written. However, as their group members are well aware of the substitution, they can easily understand actual meaning of the text written. This information is generally shared with the help of s or also published on the websites. However, as the content of s or the web contents do not contain any sensitive words rather they look normal, existing algorithms cannot detect the sensitivity of it. This paper deals with detection of substituted text in a sentence using statistical techniques and cosine similarity. Keywords cosine similarity, correlation, standard deviation, page count, Enron dataset I. INTRODUCTION Communication technology and internet has been proved to be the blessing to human being when used with good intension. However, it may be a curse when the intension of user is destructive. A common example is the use of high end technologies by the terrorist groups for planning as well as execution of their tasks and attacks. They use internet to achieve their goals by various means [1]. In order to hide the information they wants to share with each other, initially they started using encryption and decryption techniques. But as the information encrypted can be detected with the help of available algorithms, a possibility of ting it decrypted by policing agencies is more. Hence they searched for other ways to achieve secrete exchange of information. One such example is use of text substitution for sharing information e.g. plan of attack at certain place or procedure of preparation of bomb, via or by publishing it on website. In text substitution, instead of using harmful words like bomb, attack etc., and these words are replaced by innocuous words so that it looks like normal. For e.g. the sentence there will be a blast at 11a.m. today, after substitution becomes there will be a show at 11a.m. today. Here blast, a sensitive word is replaced by show, a normal word, so that it hides meaning of information. There are various measures used by the researchers to detect such type of substitutions. In this paper we tried to find out the similarity between two words by using cosine similarity measure. II. RELATED WORK Illicit groups have adopted multiple ways for substitution. One of the ways is use of synonyms or hypernyms of the sensitive word for substitution. S. Mehta et al. [2] presented these hypernyms as an extended measure. But one can recognize such word substitutions since semantic analysis can detect such substitutions easily. Turney et al. [3] has presented an algorithm for mining the web for synonyms however this algorithm is not useful for detection of substitution as substitution do not follow any specific rule in general. Word frequency information is readily available on so it is possible that, in ordinary circumstances, a terrorist or criminal group might adopt a standard set of substitutions, in which the words they do not wish to use are replaced by other words with similar frequencies. datasets like Enron dataset and 20 News group dataset are the best examples of natural human writing and hence are commonly used in text mining application as training and testing data set. However, it is essential to clean the s before using it in the application [4]. If a list of sensitive words is maintained at Gateway, s containing harmful words can be traced easily at gateway. But use if substitution in text converts the sensitive sentence into normal sentence e.g. the sentence there will be bomb blast in a mall today stop it if you can can be written as there will be a gift distribution in a mall today stop it if you can. As this sentence does not have any sensitive word, it cannot be categorized as sensitive sentence by detection algorithm. Earlier terrorist group used to adopt some common techniques for substitution, example of which can be substitution by replacing the word with the word having near wordcount rank e.g. the word missile have word rank 6316 and garment have word rank 6319 [6][8]. Decision tree algorithm was used to categorize such type of suspicious s containing deceptive text [5][7] where author focused on detecting criminal identity of deception in law enforcement. Aim of this paper to present novel technique for detection of text substitution using cosine similarity. III. COSINE SIMILARITY Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them [9]. In Information Retrieval, each term is notionally assigned a different dimension and a document 2014, IJARCSSE All Rights Reserved Page 903
2 is characterized by a vector where the value of each dimension corresponds to the number of times that term appears in the document. Cosine similarity then gives a useful measure of how similar two documents are likely to be in terms of their subject matter. We can use cosine function to find the similarity between tar word and replaced words. The cosine of two vectors can be derived by using the Euclidean dot product formula: Given two vectors of attributes, A and B, the cosine similarity, cos(θ), is represented using a dot product and magnitude as The resulting similarity ranges from 1 meaning exactly opposite, to 1 meaning exactly the same, with 0 usually indicating independence, and in-between values indicating intermediate similarity or dissimilarity. For text matching, the attribute vectors A and B are usually the term frequency vectors of the documents. The cosine similarity can be seen as a method of normalizing document length during comparison. In the case of information retrieval, the cosine similarity of two documents will range from 0 to 1, since the term frequencies cannot be negative. The angle between two term frequency vectors cannot be greater than 90. TABLE I. COSINE OF A SENTENCE FOR BAG OF WORDS : you will at Words(origi nal) (A ) after substitution: you will at Words (B) (B) Cosine bomb bomb IV. EXPERIMENT We performed experimentations on the training dataset for testing the use of Cosine Similarity for detecting text substitution. We selected s from Enron Dataset having content size less than 18 words. The count of 18 is considered to handful amount of s for testing. Then stop words were removed from the content of the s. In order to test the substitution, we substituted some specific words in the with another words. In the first experimentation, we used the s where only one word is substituted. For example consider the sentence you will bomb at. After removing the stop words, we the bag of words as. Various combinations of the words in the bag of words were generated Table 1. For every combination, a Google hit count is retrieved using Google API. Similarly, Google hit count for the substituted sentence was also retrieved. Based on the values for combinations of both the sentence, Cosine similarity was calculated. In this experiment, we also used the dataset where we searched for the n grams containing tar words only. N grams tar words are excluded from the calculation. Following table 2 shows the result of such combinations. 2014, IJARCSSE All Rights Reserved Page 904
3 TABLE II COSINE OF A SENTENCE FOR BAG OF WORDS WITH TARGET WORD ONLY Words with only Tar Words(original) (A) Words (B) (B) Cosine bomb Chocolate bomb Cosine values thus calculated are all around one. In practical, we need to match a normal word with a list of sensitive words. We tried that approach in second experiment. For example, for the statement we expect that marriage will happen tonight, the normal word marriage is checked with all the sensitive words and accordingly cosine value is calculated. Another variation we expect that flower will happen tonight. The word marriage can be a suitable word for substitution than flower. This can be easily verified from the cosine values obtained and shown below. Following Fig 1.shows the result for four different statements and the cosine values. Fig. 1 Cosine values for four different statements In a sentence We expect that marriage will happen tonight considered by [8], original word is attack which is replaced by marriage. When compared with all sensitive words, a sentence having word attack replaced by marriage had the highest cosine value From this we can infer that the original word can be attack. When we used another substituted word like flower instead of marriage, cosine values for set of sensitive words which were not near to 1 but had highest cosine value for attack i.e , which is again highest as compared with other tar words. We consider another sentence like Our next picture will be business centre, here word picture is used instead of word tar. Cosine value for this sentence is which is also near to one. If we replace word picture by a word bag in the substituted sentence, cosine value is more for picture than bag. Very few sentences didn t match the result in this experiment. Following table shows data used for the experiment and fig.2 shows graph of cosine values for a set of sensitive words. When tested for a set of 250 different sentences taken from Enron dataset, we got an accuracy of around , IJARCSSE All Rights Reserved Page 905
4 Fig. 2 Cosine for list of sensitive words TABLE III COMPARISON OF MEAN, STANDARD DEVIATION AND CORRELATION s You will at (Chocolate) We have to do murder in Mumbai(Felicitati on) Ramesh will come to collect explosive material(cotton) We expect that attack will happen tonight(marriage) Give training to attack on the city(rain) The bomb is in the position(flower) Burn the train tomorrow(colour) Spread violence as soon as possible(happines s) Our next tar will be business centre(picture) Keep the bomb in the car(bag) s stopwards do murder Mumbai come collect explosive material expect attack happen tonight give training attack city bomb position Burn train tomorrow Spread violence soon possible next tar business centre Keep bomb car Mean (original) Mean (substitutio n) Std Dev (original) Std Dev (substitution) Correlation E In third experiment, we tried to find out relationship between original and substituted word in the sentences using correlation and standard deviation. Standard deviation is used to dispersion of values. Values of standard deviation and correlation are shown in the Table 3. In this table, the sentence The bomb is in the position has correlation 1, it means that the relation between original word bomb and replaced word flower is very strong. However, in order to detect the substitution of the text, this measure is not helping much. This is because the randomness in the values of variations that we have tested. 2014, IJARCSSE All Rights Reserved Page 906
5 V. CONCLUSION Problem of text substitution has become a point of concern for Governments as well as researchers. Multi fold research is started recently and various ways are being used to detect it. In this paper, we discussed use of Cosine similarity as a measure for detection of text substitution. Accuracy that we achieved here is However, there is a need to enhance the accuracy of detection. Use of Ontology may enhance the accuracy, provided the domain of communication is well known. As a future scope, we are considering domain specific hit counts with pre-processing of inputs to the similarity measure. REFERENCES [1] The use of the internet for Terrorist purposes, Report of United Nations office on drugs and crime, Vienna in collaboration with the United Nation s Counter Terrorism implementation task force 2004 published by united Nation Newyork Sep [2] Mrs. Shilpa Mehta, Dr. U Eranna, Dr. K. Soundararajan, Surveillance Issues for Security over Computer Communications and Legal Implications, Proceedings of the World Congress on Engineering 2010 Vol I WCE 2010, June 30 - July 2, 2010, London, U.K. [3] Peter D. Turney, Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL, Proceeding of the 12 th Europium Conference on Machine Learning,pages ,Springer-Verlag, UK,2001 [4] J.Tang, H.Li, Y. Cao, Z.Tang, Data Cleaning, Proceedings of KDD, Chicago, USA, (2005). [5] Appavu alias Balamurugan and Ramasamy, Suspicious Detection via Decision Tree: A Data Mining Approach, Rajaram Thiagarajar College of Engineering, Madurai, India [6] [7] Gang Wang, Hsinchun Chen and Homa Atabakhsh, Criminal Identity Deception and Deception Detection in Law Enforcement, Group Decision and Negotiation, Mar 2004, Vol. 13 Issue 2, p p. Department of Management Information Systems, University of Arizona, 430 McClelland Hall, Tucson, AZ [8] Szewang Fong, Dmitri Roussinov, And David B. Skillicorn, Detecting Word Substitutions In Text, IEEE Transactions On Knowledge And Data Engineering, Vol. 20, No. 8, August 2008 [9] , IJARCSSE All Rights Reserved Page 907
Clustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
CLUSTERING FOR FORENSIC ANALYSIS
IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN(E): 2321-8843; ISSN(P): 2347-4599 Vol. 2, Issue 4, Apr 2014, 129-136 Impact Journals CLUSTERING FOR FORENSIC ANALYSIS
TECHNOLOGY ANALYSIS FOR INTERNET OF THINGS USING BIG DATA LEARNING
TECHNOLOGY ANALYSIS FOR INTERNET OF THINGS USING BIG DATA LEARNING Sunghae Jun 1 1 Professor, Department of Statistics, Cheongju University, Chungbuk, Korea Abstract The internet of things (IoT) is an
A NOVEL APPROACH FOR MULTI-KEYWORD SEARCH WITH ANONYMOUS ID ASSIGNMENT OVER ENCRYPTED CLOUD DATA
A NOVEL APPROACH FOR MULTI-KEYWORD SEARCH WITH ANONYMOUS ID ASSIGNMENT OVER ENCRYPTED CLOUD DATA U.Pandi Priya 1, R.Padma Priya 2 1 Research Scholar, Department of Computer Science and Information Technology,
Predictive time series analysis of stock prices using neural network classifier
Predictive time series analysis of stock prices using neural network classifier Abhinav Pathak, National Institute of Technology, Karnataka, Surathkal, India [email protected] Abstract The work pertains
Crime Hotspots Analysis in South Korea: A User-Oriented Approach
, pp.81-85 http://dx.doi.org/10.14257/astl.2014.52.14 Crime Hotspots Analysis in South Korea: A User-Oriented Approach Aziz Nasridinov 1 and Young-Ho Park 2 * 1 School of Computer Engineering, Dongguk
Secure semantic based search over cloud
Volume: 2, Issue: 5, 162-167 May 2015 www.allsubjectjournal.com e-issn: 2349-4182 p-issn: 2349-5979 Impact Factor: 3.762 Sarulatha.M PG Scholar, Dept of CSE Sri Krishna College of Technology Coimbatore,
An Efficient Multi-Keyword Ranked Secure Search On Crypto Drive With Privacy Retaining
An Efficient Multi-Keyword Ranked Secure Search On Crypto Drive With Privacy Retaining 1 B.Sahaya Emelda and 2 Mrs. P. Maria Jesi M.E.,Ph.D., 1 PG Student and 2 Associate Professor, Department of Computer
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
A Proposed Data Mining Model to Enhance Counter- Criminal Systems with Application on National Security Crimes
A Proposed Data Mining Model to Enhance Counter- Criminal Systems with Application on National Security Crimes Dr. Nevine Makram Labib Department of Computer and Information Systems Faculty of Management
Domain Name Abuse Detection. Liming Wang
Domain Name Abuse Detection Liming Wang Outline 1 Domain Name Abuse Work Overview 2 Anti-phishing Research Work 3 Chinese Domain Similarity Detection 4 Other Abuse detection ti 5 System Information 2 Why?
Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2
Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
Credit Card Fraud Detection Using Self Organised Map
International Journal of Information & Computation Technology. ISSN 0974-2239 Volume 4, Number 13 (2014), pp. 1343-1348 International Research Publications House http://www. irphouse.com Credit Card Fraud
Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies
Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,
Understanding Web personalization with Web Usage Mining and its Application: Recommender System
Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,
Data Mining for Digital Forensics
Digital Forensics - CS489 Sep 15, 2006 Topical Paper Mayuri Shakamuri Data Mining for Digital Forensics Introduction "Data mining is the analysis of (often large) observational data sets to find unsuspected
Data Mining in Web Search Engine Optimization and User Assisted Rank Results
Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management
AUTOMATICALLY DECEPTIVE CRIMINAL IDENTITIES
by GANG WANG, HSINCHUN CHEN, and HOMA ATABAKHSH AUTOMATICALLY DETECTING DECEPTIVE CRIMINAL IDENTITIES Fear about identity verification reached new heights since the terrorist attacks on Sept. 11, 2001,
Search engine ranking
Proceedings of the 7 th International Conference on Applied Informatics Eger, Hungary, January 28 31, 2007. Vol. 2. pp. 417 422. Search engine ranking Mária Princz Faculty of Technical Engineering, University
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Secure Collaborative Privacy In Cloud Data With Advanced Symmetric Key Block Algorithm
Secure Collaborative Privacy In Cloud Data With Advanced Symmetric Key Block Algorithm Twinkle Graf.F 1, Mrs.Prema.P 2 1 (M.E- CSE, Dhanalakshmi College of Engineering, Chennai, India) 2 (Asst. Professor
Natural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
AN INTELLIGENT ANALYSIS OF CRIME DATA USING DATA MINING & AUTO CORRELATION MODELS
AN INTELLIGENT ANALYSIS OF CRIME DATA USING DATA MINING & AUTO CORRELATION MODELS Uttam Mande Y.Srinivas J.V.R.Murthy Dept of CSE Dept of IT Dept of CSE GITAM University GITAM University J.N.T.University
Large-Scale Data Sets Clustering Based on MapReduce and Hadoop
Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE
American Journal of Engineering Research (AJER) 2013 American Journal of Engineering Research (AJER) e-issn: 2320-0847 p-issn : 2320-0936 Volume-2, Issue-4, pp-39-43 www.ajer.us Research Paper Open Access
NATIONAL SECURITY CRITICAL MISSION AREAS AND CASE STUDIES
43 Chapter 4 NATIONAL SECURITY CRITICAL MISSION AREAS AND CASE STUDIES Chapter Overview This chapter provides an overview for the next six chapters. Based on research conducted at the University of Arizona
Moscow subway cars to have CCTV
www.breaking News English.com Ready-to-use ESL / EFL Lessons Moscow subway cars to have CCTV URL: http://www.breakingnewsenglish.com/0507/050719-moscow.html Today s contents The Article 2 Warm-ups 3 Before
FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
A survey on Data Mining based Intrusion Detection Systems
International Journal of Computer Networks and Communications Security VOL. 2, NO. 12, DECEMBER 2014, 485 490 Available online at: www.ijcncs.org ISSN 2308-9830 A survey on Data Mining based Intrusion
How To Detect Denial Of Service Attack On A Network With A Network Traffic Characterization Scheme
Efficient Detection for DOS Attacks by Multivariate Correlation Analysis and Trace Back Method for Prevention Thivya. T 1, Karthika.M 2 Student, Department of computer science and engineering, Dhanalakshmi
Introduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
Modeling Suspicious Email Detection Using Enhanced Feature Selection
Modeling Suspicious Email Detection Using Enhanced Feature Selection Sarwat Nizamani, Nasrullah Memon, Uffe Kock Wiil, and Panagiotis Karampelas Abstract The paper presents a suspicious email detection
Secure Data transfer in Cloud Storage Systems using Dynamic Tokens.
Secure Data transfer in Cloud Storage Systems using Dynamic Tokens. P.Srinivas *,K. Rajesh Kumar # M.Tech Student (CSE), Assoc. Professor *Department of Computer Science (CSE), Swarnandhra College of Engineering
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software
Search Engines. Stephen Shaw <[email protected]> 18th of February, 2014. Netsoc
Search Engines Stephen Shaw Netsoc 18th of February, 2014 Me M.Sc. Artificial Intelligence, University of Edinburgh Would recommend B.A. (Mod.) Computer Science, Linguistics, French,
A Comparative Approach to Search Engine Ranking Strategies
26 A Comparative Approach to Search Engine Ranking Strategies Dharminder Singh 1, Ashwani Sethi 2 Guru Gobind Singh Collage of Engineering & Technology Guru Kashi University Talwandi Sabo, Bathinda, Punjab
Data Mining Approach For Subscription-Fraud. Detection in Telecommunication Sector
Contemporary Engineering Sciences, Vol. 7, 2014, no. 11, 515-522 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ces.2014.4431 Data Mining Approach For Subscription-Fraud Detection in Telecommunication
FEATURE SPECIFIC CRIMINAL MAPPING USING DATA MINING TECHNIQUES AND GENERALIZED GAUSSIUN MIXTURE MODEL
FEATURE SPECIFIC CRIMINAL MAPPING USING DATA MINING TECHNIQUES AND GENERALIZED GAUSSIUN MIXTURE MODEL Uttam Mande Y.Srinivas J.V.R.Murthy Dept of CSE Dept of IT Dept of CSE GITAM University GITAM University
Standardization and Its Effects on K-Means Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03
A Statistical Text Mining Method for Patent Analysis
A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, [email protected] Abstract Most text data from diverse document databases are unsuitable for analytical
Search Engine Based Intelligent Help Desk System: iassist
Search Engine Based Intelligent Help Desk System: iassist Sahil K. Shah, Prof. Sheetal A. Takale Information Technology Department VPCOE, Baramati, Maharashtra, India [email protected], [email protected]
International Journal of Advanced Research in Computer Science and Software Engineering
Volume 3, Issue 3, March 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Performance of
Data Mining System, Functionalities and Applications: A Radical Review
Data Mining System, Functionalities and Applications: A Radical Review Dr. Poonam Chaudhary System Programmer, Kurukshetra University, Kurukshetra Abstract: Data Mining is the process of locating potentially
Bisecting K-Means for Clustering Web Log data
Bisecting K-Means for Clustering Web Log data Ruchika R. Patil Department of Computer Technology YCCE Nagpur, India Amreen Khan Department of Computer Technology YCCE Nagpur, India ABSTRACT Web usage mining
Moscow subway cars to have CCTV
www.breaking News English.com Ready-to-use ESL / EFL Lessons Moscow subway cars to have CCTV URL: http://www.breakingnewsenglish.com/0507/050719-moscow-e.html Today s contents The Article 2 Warm-ups 3
SIGNATURE VERIFICATION
SIGNATURE VERIFICATION Dr. H.B.Kekre, Dr. Dhirendra Mishra, Ms. Shilpa Buddhadev, Ms. Bhagyashree Mall, Mr. Gaurav Jangid, Ms. Nikita Lakhotia Computer engineering Department, MPSTME, NMIMS University
A Review of Anomaly Detection Techniques in Network Intrusion Detection System
A Review of Anomaly Detection Techniques in Network Intrusion Detection System Dr.D.V.S.S.Subrahmanyam Professor, Dept. of CSE, Sreyas Institute of Engineering & Technology, Hyderabad, India ABSTRACT:In
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Big Data On Terrorist Attacks: An Analysis Using The Ensemble Classifier Approach
International Conference on Inter Disciplinary Research in Engineering and Technology [ICIDRET] 255 International Conference on Inter Disciplinary Research in Engineering and Technology [ICIDRET] ISBN
Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing
www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University
Keywords - Intrusion Detection System, Intrusion Prevention System, Artificial Neural Network, Multi Layer Perceptron, SYN_FLOOD, PING_FLOOD, JPCap
Intelligent Monitoring System A network based IDS SONALI M. TIDKE, Dept. of Computer Science and Engineering, Shreeyash College of Engineering and Technology, Aurangabad (MS), India Abstract Network security
Sentiment analysis on tweets in a financial domain
Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International
A Survey on Product Aspect Ranking
A Survey on Product Aspect Ranking Charushila Patil 1, Prof. P. M. Chawan 2, Priyamvada Chauhan 3, Sonali Wankhede 4 M. Tech Student, Department of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra,
A Novel Cryptographic Key Generation Method Using Image Features
Research Journal of Information Technology 4(2): 88-92, 2012 ISSN: 2041-3114 Maxwell Scientific Organization, 2012 Submitted: April 18, 2012 Accepted: May 23, 2012 Published: June 30, 2012 A Novel Cryptographic
Email Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
Profile Based Personalized Web Search and Download Blocker
Profile Based Personalized Web Search and Download Blocker 1 K.Sheeba, 2 G.Kalaiarasi Dhanalakshmi Srinivasan College of Engineering and Technology, Mamallapuram, Chennai, Tamil nadu, India Email: 1 [email protected],
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
International Journal of Advanced Computer Technology (IJACT) ISSN:2319-7900 PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS
PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS First A. Dr. D. Aruna Kumari, Ph.d, ; Second B. Ch.Mounika, Student, Department Of ECM, K L University, [email protected]; Third C.
Neural Networks in Data Mining
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V6 PP 01-06 www.iosrjen.org Neural Networks in Data Mining Ripundeep Singh Gill, Ashima Department
Fraud Detection in Online Reviews using Machine Learning Techniques
ISSN (e): 2250 3005 Volume, 05 Issue, 05 May 2015 International Journal of Computational Engineering Research (IJCER) Fraud Detection in Online Reviews using Machine Learning Techniques Kolli Shivagangadhar,
Secure Keyword Based Search in Cloud Computing: A Review
Secure Keyword Based Search in Cloud Computing: A Review Manjeet Gupta 1,Sonia Sindhu 2 1 Assistant Professor, Department of Computer Science & Engineering,Seth Jai Prakash Mukund Lal Institute of Engineering
Evaluation of Feature Selection Methods for Predictive Modeling Using Neural Networks in Credits Scoring
714 Evaluation of Feature election Methods for Predictive Modeling Using Neural Networks in Credits coring Raghavendra B. K. Dr. M.G.R. Educational and Research Institute, Chennai-95 Email: [email protected]
Self Organizing Maps for Visualization of Categories
Self Organizing Maps for Visualization of Categories Julian Szymański 1 and Włodzisław Duch 2,3 1 Department of Computer Systems Architecture, Gdańsk University of Technology, Poland, [email protected]
Distributed Attribute Based Encryption for Patient Health Record Security under Clouds
Distributed Attribute Based Encryption for Patient Health Record Security under Clouds SHILPA ELSA ABRAHAM II ME (CSE) Nandha Engineering College Erode Abstract-Patient Health Records (PHR) is maintained
2.1. Data Mining for Biomedical and DNA data analysis
Applications of Data Mining Simmi Bagga Assistant Professor Sant Hira Dass Kanya Maha Vidyalaya, Kala Sanghian, Distt Kpt, India (Email: [email protected]) Dr. G.N. Singh Department of Physics and
Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework
Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 [email protected] 2 [email protected] Abstract A vast amount of assorted
Ranked Keyword Search Using RSE over Outsourced Cloud Data
Ranked Keyword Search Using RSE over Outsourced Cloud Data Payal Akriti 1, Ms. Preetha Mary Ann 2, D.Sarvanan 3 1 Final Year MCA, Sathyabama University, Tamilnadu, India 2&3 Assistant Professor, Sathyabama
TF-IDF. David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt
TF-IDF David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt Administrative Homework 3 available soon Assignment 2 available soon Popular media article
K-means Clustering Technique on Search Engine Dataset using Data Mining Tool
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 6 (2013), pp. 505-510 International Research Publications House http://www. irphouse.com /ijict.htm K-means
An intelligent Analysis of a City Crime Data Using Data Mining
2011 International Conference on Information and Electronics Engineering IPCSIT vol.6 (2011) (2011) IACSIT Press, Singapore An intelligent Analysis of a City Crime Data Using Data Mining Malathi. A 1,
Keywords social media, internet, data, sentiment analysis, opinion mining, business
Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Real time Extraction
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS Abstract D.Lavanya * Department of Computer Science, Sri Padmavathi Mahila University Tirupati, Andhra Pradesh, 517501, India [email protected]
An Efficient Data Security in Cloud Computing Using the RSA Encryption Process Algorithm
An Efficient Data Security in Cloud Computing Using the RSA Encryption Process Algorithm V.Masthanamma 1,G.Lakshmi Preya 2 UG Scholar, Department of Information Technology, Saveetha School of Engineering
Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product
Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product Sagarika Prusty Web Data Mining (ECT 584),Spring 2013 DePaul University,Chicago [email protected] Keywords:
Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.
Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics
Ontology based ranking of documents using Graph Databases: a Big Data Approach
Ontology based ranking of documents using Graph Databases: a Big Data Approach A.M.Abirami Dept. of Information Technology Thiagarajar College of Engineering Madurai, Tamil Nadu, India Dr.A.Askarunisa
Improving data integrity on cloud storage services
International Journal of Engineering Science Invention ISSN (Online): 2319 6734, ISSN (Print): 2319 6726 Volume 2 Issue 2 ǁ February. 2013 ǁ PP.49-55 Improving data integrity on cloud storage services
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification
Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification Tina R. Patil, Mrs. S. S. Sherekar Sant Gadgebaba Amravati University, Amravati [email protected], [email protected]
Insider Threat Detection Using Graph-Based Approaches
Cybersecurity Applications & Technology Conference For Homeland Security Insider Threat Detection Using Graph-Based Approaches William Eberle Tennessee Technological University [email protected] Lawrence
INFORMATION SECURITY RISK ASSESSMENT UNDER UNCERTAINTY USING DYNAMIC BAYESIAN NETWORKS
INFORMATION SECURITY RISK ASSESSMENT UNDER UNCERTAINTY USING DYNAMIC BAYESIAN NETWORKS R. Sarala 1, M.Kayalvizhi 2, G.Zayaraz 3 1 Associate Professor, Computer Science and Engineering, Pondicherry Engineering
SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL
SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL Krishna Kiran Kattamuri 1 and Rupa Chiramdasu 2 Department of Computer Science Engineering, VVIT, Guntur, India
INVESTIGATIONS INTO EFFECTIVENESS OF GAUSSIAN AND NEAREST MEAN CLASSIFIERS FOR SPAM DETECTION
INVESTIGATIONS INTO EFFECTIVENESS OF AND CLASSIFIERS FOR SPAM DETECTION Upasna Attri C.S.E. Department, DAV Institute of Engineering and Technology, Jalandhar (India) [email protected] Harpreet Kaur
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100
Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Erkan Er Abstract In this paper, a model for predicting students performance levels is proposed which employs three
Movie Classification Using k-means and Hierarchical Clustering
Movie Classification Using k-means and Hierarchical Clustering An analysis of clustering algorithms on movie scripts Dharak Shah DA-IICT, Gandhinagar Gujarat, India [email protected] Saheb Motiani
Predicting the Stock Market with News Articles
Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is
Search Result Optimization using Annotators
Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,
Smart Security by Predicting Future Crime with GIS and LBS Technology on Mobile Device
Smart Security by Predicting Future Crime with GIS and LBS Technology on Mobile Device Gaurav Kumar 1, P. S. Game 2 1 Pune Institute of Computer Technology, Savitribai Phule Pune University, Pune, India
International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
