Cloud Storage-based Intelligent Document Archiving for the Management of Big Data
|
|
- Allison Hardy
- 8 years ago
- Views:
Transcription
1 Cloud Storage-based Intelligent Document Archiving for the Management of Big Data Keedong Yoo Dept. of Management Information Systems Dankook University Cheonan, Republic of Korea Abstract : The cloud storage for the centralized management of organizational big data is gaining much interest because of its benefits in managing and securing information resources. However, cloud storagebased centralized repository also has problems in utilization, which are the difficulty in determining the proper category to store working documents and the complexity in retrieving a document. This paper proposes a methodology to resolve these problems by automating the processes of identifying the topic of working documents and storing them under the identified topic-based category of the cloud storage-based central repository. Without user s direct definition about the title of a working document, it can be automatically stored under the identified topic-based category in the central repository. To demonstrate the validity of the proposed concepts, a prototype system enabling the function of automatic topic identification, automatic category searching, and automatic archiving is implemented. Keywords- Document centralization; Intelligent archiving; Automatic topic identification; Cloud storage I. INTRODUCTION Centralized management of documents, or the document centralization, is emerging as an indispensable choice to strategically secure and utilize organizational intellectual assets nowadays. The up-todate concept of Enterprise Content Management (ECM), the strategies, methods and tools used to capture, manage, store, preserve, and deliver content and documents related to organizational processes, is a typical example centralizing resources and efforts to manage organizational documents and contents. Organization s intellectual assets include not only business-related documents and contents, but also business processes and information technologies. Therefore, through the document centralization, organizations can expect efficient application of information resources by effectively allocating them into proper processes and tasks. Highly secured management of internal resources can be also initiated; therefore many companies are now trying to establish robust and scalable systems for document centralization. Organizational documents can be centralized using the network infrastructure, and the most commonly applied network technology is the cloud computing. Among cloud computing technologies, the cloud storage forms the repository to store transmitted documents. Cloud storage, one of widely known cloud computing technologies, initiates its function by providing the Internet- and Applied Computing ( ICIEACS 2013 ), Bangkok, Thailand on April 6-7, 2013 Page..133
2 based data storage as a service. One of the biggest merits of cloud storage is that users can access data in a cloud anytime and anywhere, using any device [5]. Typical examples of cloud storage services are Amazon S3 ( Mosso ( Wuala ( or ucloud ( All of these services offer users clean and simple storage interfaces, hiding the details of the actual location and management of resources [8]. Once a document to be archived is stored in a cloud storage, users can access and download it anytime and anywhere if the right to access has been granted. Because of such advantage in utilizing organizational information resources, more companies and organizations are implementing the online storage under the cloud computing environment. While cloud storage can deliver users various benefits, it also has not a few technical limits in network security as well as privacy [9]. From the viewpoint of usability, many users also point out a very serious problem in using cloud storage, which is the difficulty in storing and retrieving documents. To store a working document under any categories provided by the cloud storage, a user has to determine the category that exactly coincides with the contents of the document. Since the category is naturally various and the overall structure of categories is complicated, determining the proper category is not an easy work. When retrieving a document in which a user is interested, he/she has to spend not little time to locate the file because too many categories exist. Assistance in concluding the category to store a document can be accomplished by analyzing the contents of the document with respect to the categories defined in the cloud storage. Since any keywords or topics extracted from the document stand for the possible name of the category under which the document must be stored, users can easily achieve their goal. In retrieving a document from the storage, more accurate and fast searching can be made because each document was archived into the topic-based category. This research tries to enhance the usability of cloud storage-based central repository by automatically archiving the working documents according to automatically identified topic or keyword of documents. To do so, this research proposes a methodology to automatically extract the predefined category-specific keywords (or topics) of a working document by applying a text mining algorithm. Based on the extracted keywords, documents can be automatically stored into categories in cloud storage. To demonstrate the validity of the proposed concepts, a prototype system enabling the function of automatic topic identification, automatic category searching, and automatic archiving is implemented. II. PROPOSED METHODOLOGY As Fig. 1 illustrates, the process to automatically identify topics (keywords) of the working document is additionally needed to automate the whole process of cloud storage-based archiving. Tasks in the dotted ellipse are required to perform automated topic identification, and they must be processed sequentially. Once the topic of the given working document is identified, the document can be automatically stored in cloud storage with the and Applied Computing ( ICIEACS 2013 ), Bangkok, Thailand on April 6-7, 2013 Page..134
3 topic. By writing some programming codes to search corresponding directory with the topic and to save the document with the topic, automatic document archiving can be completed. When the destination directory is concluded, the system may send a message to the user to confirm whether the directory is valid. Although the user may change the directory as he/she intends, the system also automatically store the document in the location where the user designated by simply applying the agent programming. Specific roles of each task are as follows; Figure 1. Conceptual Framework of the Proposed Methodology A. File Converter A file converter changes the format (one of.doc,.xls, or.html ) of a working document to an analyzable one (.txt ) so that the following module can read the contents. A file converter plays the role of a file format filter that prepares input documents into a unified format (.txt in this research). B. Word Stemming Module To standardize the words in the document, unnecessary or redundant parts of each word must be eliminated. A stem, in linguistics, is the combination of the basic form of a word (called the root) plus any derivational morphemes, but excluding inflectional elements. This means, alternatively, that the stem is the form of the word to which inflectional morphemes can be added, if applicable. For example, the root of the English verb form destabilized is stabil- (alternate form of stable); the stem is de-stabil-ize, which includes the derivational affixes de- and -ize, but not the inflectional past tense suffix -(e)d. C. Word Vector Tool Based on the word stems from the word stemming module, the word vector tool transforms each word stem into the vector. To extract the vector, TF/IDF(Term Frequency/Inverse Term Frequency) is used. TF/IDF is a statistical technique used to evaluate how important a word is to a document. The importance increases proportionally to the number of times a word appears in the document; however is offset by how common the word is in all of the documents in the collection or corpus. A high weight in TF/IDF is reached by a high term frequency (in the given document) and a low document frequency of the term in the whole collection of documents; the weight hence tends to filter out common terms. The word with the highest TF/IDF can be regarded as a keyword. However, to determine the keyword of a given document, usually every TF/IDF value of meaningful terms (stems) must be respectively calculated. and Applied Computing ( ICIEACS 2013 ), Bangkok, Thailand on April 6-7, 2013 Page..135
4 D. Classifier A classifier extracts resultant keywords by projecting the word vectors of the target document on the vector spaces provided by the training based on a corpus. A corpus is a predefined directories, and each directory possesses a lot of related example documents. To train the classifiers based on the constructed corpus, sample documents can be excavated by browsing conventional Web pages. Because conventional Web pages have been already labeled with corresponding keywords as titles, in a sense, the title of each document can be deemed to be already formalized [4]. This research deploys SVM-based classifier, because it is demonstrated that the SVM outperforms other similar text mining algorithms applicable to topic identification [1, 6]. The SVM determines the keyword of a document by depicting the word vectors on the vector space Rn (n: number of dimensions) and comparing the kernel functions of each document. The accuracy of the SVM was verified to be very satisfactory. If the prediction model has been trained sufficiently, then the SVM outputs very accurate and correct results. Comparing to the accuracy of manual classification, that of SVM-based classification was reported to be over 90% [3]. III. PROTOTYPE IMPLEMENTATION A. Overview To check whether the proposed methodology-based approach can yield correct results, a prototype system which intelligently stores working documents into the cloud storage-based repository under the automatically identified topic-specific category is implemented. The prototype system analyzes and extracts the topic of a working document in a real time basis when a user finishes writing and tries to store the document. Indexing the document by tagging the identified topic with user s ID and time, the prototype transmits and stores the document into the cloud storage. A dialogue between the user and the prototype is to be bridged to confirm the correctness of the identified topic. If the recommended topic has no problem, the prototype transmits the file to the cloud storage with tag information: Automatic archiving can be completed. Fig. 2 shows the sequence of functions provided by the prototype. Figure 2. Sequence Diagram of the Prototype System and Applied Computing ( ICIEACS 2013 ), Bangkok, Thailand on April 6-7, 2013 Page..136
5 The prototype system has been implemented using JDK v1.5.0_06 under Java2 runtime environment. The sub modules of Stemmer and WV Tool have been implemented by using Word stemming tool and Vector creating tool of Yale, an open source environment for KDD(Knowledge Discovery and Data mining) and machine learning [7], respectively. The SVM module as a classifier deploys LibSVM v2.81 [2]. The prototype can process documents related with activities within the university context, and therefore, for the convenience of implementation and test, the categories of cloud storage have been provided by simplifying the University Ontology defined by Department of Computer Science of University of Maryland ( B. Example: A Document on Conference Participation To explain how the prototype system works in detail, a document concerning Conference Participation is exemplified. The prototype initiates its function by converting.doc,.xls, and.html format-based document into analyzable.txt format. Fig. 3 shows the example document formatted in.txt. Figure 3. Example Document based on.txt format To extract the topic of the given document, words in the document must be refined so that only meaningful part of a word can be inputted. Meaningless words, or stop words, must be eliminated in advance, and stems of each meaningful word must be separated. Fig. 4 shows the resultant stems by the stemming module. and Applied Computing ( ICIEACS 2013 ), Bangkok, Thailand on April 6-7, 2013 Page..137
6 Figure 4. Word Stems in Example Document Based on the meaningful stems and predefined categories, the vector of the example document can be calculated. Because only 9 categories are selected and considered in this research, resultant vector is relatively simple comparing to previous researches results. This result can be caused by applying the simple structure of predefined category; however there exists no problem in demonstrating the performance of topic identification, because now a few previous researches also simplified the volume of predefined category for the ease of training and predicting. In this research, the categories in the University Ontology have been modified so that overall categories can be consistent and compatible. Fig. 5 shows specific categories used in this research and the resultant vector. Figure 5. Predefined Category & Resultant Vector of Example Document The SVM module, the classifier, projects the vector of the document onto 9-dimensional vector space, and concludes which topic of category best stands for the contents in the document. Before performing actual categorization, the SVM module must be trained using sufficient number of sample documents already assigned into each category. In this research, from 80 to 90 number of sample documents per each category were used to train the SVM module, and the accuracy of category prediction, which automatically estimated by LibSVM, is concluded as 92.5% (MSE=1.025, SCC= ), which means 37 out of 40 example documents are correctly classified. Fig. 6 shows the resultant category number provided by the SVM module, and the number 2.0 means the third category Conference. and Applied Computing ( ICIEACS 2013 ), Bangkok, Thailand on April 6-7, 2013 Page..138
7 Figure 6. Resultant Category (Category 2.0 Conference) Since identified topic means the target category under which the document can be archived, it must be tagged onto the document. To avoid the case of duplicated saving that different documents are tagged with the same topic, user s ID and time completing documenting need to be tagged together using simple programming as follows; Finishing indexing using tag information, the target category to store the document must be concluded. Since the categories of University Ontology are originally composed of 30 entities, although only 9 categories were selected to determine the topic of the document, the cloud storage of central repository is set to have 30 categories. Therefore the document is to be stored under one of 30 categories. The Hash function is proper to do this job, and corresponding programming codes are as follows; and Applied Computing ( ICIEACS 2013 ), Bangkok, Thailand on April 6-7, 2013 Page..139
8 Once the target category to store the document has been concluded, the document must be save with the file name topic-id-date under the concluded category, as following codes address; During performing automatic archiving, automatically identified topic of the document must be confirmed by user not to save the document under a wrong category. After completing automatic archiving, any message making the user know in which category the document has been stored. These kinds of dialogues between the system and a user can be processed based on following codes; and Applied Computing ( ICIEACS 2013 ), Bangkok, Thailand on April 6-7, 2013 Page..140
9 IV. CONCLUDING REMARKS Benefits from utilizing cloud storage in companies and organizations can be beyond description because it promotes effective and efficient sharing of organizational information and knowledge regardless to the time and place. If some usability issues around cloud computing, however, are not resolved realistically, then the benefits as well as interests can be scattered away. This research tries to resolve one of such usability issues around cloud storage by suggesting a practical guidance to relieve user s burden in selecting directories of cloud storage. The proposed methodology to identify the topics of working documents and to store documents with respect to the identified topics in an automated manner can contribute higher productivity and convenience of work. Companies can also expect more concentrated management of organizational information and knowledge through the proposed concepts, because more accurate and secured processing of organizational document archive is guaranteed. This research, however, must be further studied so that the proposed methodology can be applied to various mobile devices, such as smartphones and smartpads, which are the essential items of current users. To cope with this requirement, wireless-communication-oriented networking protocols must be additionally considered. Formal corpus, in addition, needs to be also developed to heighten the performance of topic identification, because the accuracy of text mining mainly depends on the result of training based on the corpus. Since the corpus may have the same structure with the directory of cloud storage, this adjustment can also reinforce the realistic application of automatic document archiving. ACKNOWLEDGEMENT REFERENCES [1] Basu, A., Watters, C., & Shepherd, M., Support Vector Machines for Text Categorization, Proceedings of the 36th Hawaii International Conference on System Sciences, Vol.4, [2] Chang, C. & Lin, C., LIBSVM: a library for support vector machines, ACM Transactions on Intelligent Systems and Technology, Vol.2, No.3, 1-27, [3] Hsu, C.W., Chang, C.C., & Lin, C.J., A Practical Guide to Support Vector Classification: LibSVM Tutorial, available at [4] Kim, S., Suh, E., & Yoo, K., A study of context inference for Web-based information systems, Electronic Commerce Research and Applications, Vol.6, , and Applied Computing ( ICIEACS 2013 ), Bangkok, Thailand on April 6-7, 2013 Page..141
10 [5] Liu, Q., Wang, G., & Wu, J., Secure and privacy preserving keyword searching for cloud storage services, Journal of Network and Computer Applications, Vol.35, No.3, , [6] Meyer, D., Leisch, F., & Hornik, K., The support vector machine under test, Neurocomputing, Vol.55, , [7] Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., & Euler, T., YALE: Rapid Prototyping for Complex Data Mining Tasks, Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-06), [8] Pamies-Juarez, L., García-López, P., Sánchez-Artigas, M., & Herrera, B., Towards the design of optimal data redundancy schemes for heterogeneous cloud storage infrastructures, Computer Networks, Vol.55, , [9] Svantesson, D. & Clarke, R., Privacy and consumer risks in cloud computing, Computer Law & Security Review, Vol.26, , 2010.] and Applied Computing ( ICIEACS 2013 ), Bangkok, Thailand on April 6-7, 2013 Page..142
INTELLIGENT AND PERVASIVE ARCHIVING FRAMEWORK TO ENHANCE THE USABILITY OF THE ZERO-CLIENT- BASED CLOUD STORAGE SYSTEM
INTELLIGENT AND PERVASIVE ARCHIVING FRAMEWORK TO ENHANCE THE USABILITY OF THE ZERO-CLIENT- BASED CLOUD STORAGE SYSTEM Keedong Yoo Department of Management Information Systems, Dankook University, Cheonan,
More informationClustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
More informationTag-manager based document management prototype system of building material information
icccbe 2010 Nottingham University Press Proceedings of the International Conference on Computing in Civil and Building Engineering W Tizani (Editor) Tag-manager based document management prototype system
More informationSIPAC. Signals and Data Identification, Processing, Analysis, and Classification
SIPAC Signals and Data Identification, Processing, Analysis, and Classification Framework for Mass Data Processing with Modules for Data Storage, Production and Configuration SIPAC key features SIPAC is
More informationSPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
More informationData Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent
More informationRAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE. Luigi Grimaudo 178627 Database And Data Mining Research Group
RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE Luigi Grimaudo 178627 Database And Data Mining Research Group Summary RapidMiner project Strengths How to use RapidMiner Operator
More informationData Integration Hub for a Hybrid Paper Search
Data Integration Hub for a Hybrid Paper Search Jungkee Kim 1,2, Geoffrey Fox 2, and Seong-Joon Yoo 3 1 Department of Computer Science, Florida State University, Tallahassee FL 32306, U.S.A., jungkkim@cs.fsu.edu,
More informationActive Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
More informationEmail Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
More informationFacilitating Business Process Discovery using Email Analysis
Facilitating Business Process Discovery using Email Analysis Matin Mavaddat Matin.Mavaddat@live.uwe.ac.uk Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process
More informationEfficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words
, pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan
More informationEffective Data Retrieval Mechanism Using AML within the Web Based Join Framework
Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 ushaduraisamy@yahoo.co.in 2 anishgracias@gmail.com Abstract A vast amount of assorted
More informationIndex Terms Domain name, Firewall, Packet, Phishing, URL.
BDD for Implementation of Packet Filter Firewall and Detecting Phishing Websites Naresh Shende Vidyalankar Institute of Technology Prof. S. K. Shinde Lokmanya Tilak College of Engineering Abstract Packet
More information131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
More informationFraud Detection in Online Reviews using Machine Learning Techniques
ISSN (e): 2250 3005 Volume, 05 Issue, 05 May 2015 International Journal of Computational Engineering Research (IJCER) Fraud Detection in Online Reviews using Machine Learning Techniques Kolli Shivagangadhar,
More informationKey Factors for Developing a Successful E-commerce Website
IBIMA Publishing Communications of the IBIMA http://www.ibimapublishing.com/journals/cibima/cibima.html Vol. 2010 (2010), Article ID 763461, 9 pages Key Factors for Developing a Successful E-commerce Website
More informationAn Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them
An Open Platform for Collecting Domain Specific Web Pages and Extracting Information from Them Vangelis Karkaletsis and Constantine D. Spyropoulos NCSR Demokritos, Institute of Informatics & Telecommunications,
More informationNatural Language to Relational Query by Using Parsing Compiler
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,
More informationHome Appliance Control and Monitoring System Model Based on Cloud Computing Technology
Home Appliance Control and Monitoring System Model Based on Cloud Computing Technology Yun Cui 1, Myoungjin Kim 1, Seung-woo Kum 3, Jong-jin Jung 3, Tae-Beom Lim 3, Hanku Lee 2, *, and Okkyung Choi 2 1
More informationMobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
More informationThe Implementation of Face Security for Authentication Implemented on Mobile Phone
The Implementation of Face Security for Authentication Implemented on Mobile Phone Emir Kremić *, Abdulhamit Subaşi * * Faculty of Engineering and Information Technology, International Burch University,
More informationecommerce Web-Site Trust Assessment Framework Based on Web Mining Approach
ecommerce Web-Site Trust Assessment Framework Based on Web Mining Approach ecommerce Web-Site Trust Assessment Framework Based on Web Mining Approach Banatus Soiraya Faculty of Technology King Mongkut's
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationMachine Learning Log File Analysis
Machine Learning Log File Analysis Research Proposal Kieran Matherson ID: 1154908 Supervisor: Richard Nelson 13 March, 2015 Abstract The need for analysis of systems log files is increasing as systems
More informationInner Classification of Clusters for Online News
Inner Classification of Clusters for Online News Harmandeep Kaur 1, Sheenam Malhotra 2 1 (Computer Science and Engineering Department, Shri Guru Granth Sahib World University Fatehgarh Sahib) 2 (Assistant
More informationFramework model on enterprise information system based on Internet of things
International Journal of Intelligent Information Systems 2014; 3(6): 55-59 Published online December 22, 2014 (http://www.sciencepublishinggroup.com/j/ijiis) doi: 10.11648/j.ijiis.20140306.11 ISSN: 2328-7675
More informationBig Data Text Mining and Visualization. Anton Heijs
Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark
More informationResearch and Development of Data Preprocessing in Web Usage Mining
Research and Development of Data Preprocessing in Web Usage Mining Li Chaofeng School of Management, South-Central University for Nationalities,Wuhan 430074, P.R. China Abstract Web Usage Mining is the
More informationIntelligent Tools For A Productive Radiologist Workflow: How Machine Learning Enriches Hanging Protocols
GE Healthcare Intelligent Tools For A Productive Radiologist Workflow: How Machine Learning Enriches Hanging Protocols Authors: Tianyi Wang Information Scientist Machine Learning Lab Software Science &
More informationOperations Research and Knowledge Modeling in Data Mining
Operations Research and Knowledge Modeling in Data Mining Masato KODA Graduate School of Systems and Information Engineering University of Tsukuba, Tsukuba Science City, Japan 305-8573 koda@sk.tsukuba.ac.jp
More informationUniversity of Glasgow Terrier Team / Project Abacá at RepLab 2014: Reputation Dimensions Task
University of Glasgow Terrier Team / Project Abacá at RepLab 2014: Reputation Dimensions Task Graham McDonald, Romain Deveaud, Richard McCreadie, Timothy Gollins, Craig Macdonald and Iadh Ounis School
More informationData Mining in Web Search Engine Optimization and User Assisted Rank Results
Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management
More informationEfficient Automated Build and Deployment Framework with Parallel Process
Efficient Automated Build and Deployment Framework with Parallel Process Prachee Kamboj 1, Lincy Mathews 2 Information Science and engineering Department, M. S. Ramaiah Institute of Technology, Bangalore,
More informationAutomated News Item Categorization
Automated News Item Categorization Hrvoje Bacan, Igor S. Pandzic* Department of Telecommunications, Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia {Hrvoje.Bacan,Igor.Pandzic}@fer.hr
More informationIJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS
IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS V.Sudhakar 1 and G. Draksha 2 Abstract:- Collective behavior refers to the behaviors of individuals
More informationFlattening Enterprise Knowledge
Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it
More informationCosdes: A Collaborative Spam Detection System with a Novel E-Mail Abstraction Scheme
IJCSET October 2012 Vol 2, Issue 10, 1447-1451 www.ijcset.net ISSN:2231-0711 Cosdes: A Collaborative Spam Detection System with a Novel E-Mail Abstraction Scheme I.Kalpana, B.Venkateswarlu Avanthi Institute
More informationWeb Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
More informationSemantic Concept Based Retrieval of Software Bug Report with Feedback
Semantic Concept Based Retrieval of Software Bug Report with Feedback Tao Zhang, Byungjeong Lee, Hanjoon Kim, Jaeho Lee, Sooyong Kang, and Ilhoon Shin Abstract Mining software bugs provides a way to develop
More informationUnderstanding Web personalization with Web Usage Mining and its Application: Recommender System
Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,
More informationMonitoring Web Browsing Habits of User Using Web Log Analysis and Role-Based Web Accessing Control. Phudinan Singkhamfu, Parinya Suwanasrikham
Monitoring Web Browsing Habits of User Using Web Log Analysis and Role-Based Web Accessing Control Phudinan Singkhamfu, Parinya Suwanasrikham Chiang Mai University, Thailand 0659 The Asian Conference on
More informationDistributed Computing and Big Data: Hadoop and MapReduce
Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:
More informationMETA DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING Ramesh Babu Palepu 1, Dr K V Sambasiva Rao 2 Dept of IT, Amrita Sai Institute of Science & Technology 1 MVR College of Engineering 2 asistithod@gmail.com
More informationUsing LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.
White Paper Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. Using LSI for Implementing Document Management Systems By Mike Harrison, Director,
More informationUTILIZING COMPOUND TERM PROCESSING TO ADDRESS RECORDS MANAGEMENT CHALLENGES
UTILIZING COMPOUND TERM PROCESSING TO ADDRESS RECORDS MANAGEMENT CHALLENGES CONCEPT SEARCHING This document discusses some of the inherent challenges in implementing and maintaining a sound records management
More informationA Monitored Student Testing Application Using Cloud Computing
A Monitored Student Testing Application Using Cloud Computing R. Mullapudi and G. Hsieh Department of Computer Science, Norfolk State University, Norfolk, Virginia, USA r.mullapudi@spartans.nsu.edu, ghsieh@nsu.edu
More informationA Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment
DOI: 10.15415/jotitt.2014.22021 A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment Rupali Gill 1, Jaiteg Singh 2 1 Assistant Professor, School of Computer Sciences, 2 Associate
More informationAn Object-Oriented Analysis Method for Customer Relationship Management Information Systems. Abstract
75 Electronic Commerce Studies Vol. 2, No.1, Spring 2004 Page 75-94 An Object-Oriented Analysis Method for Customer Relationship Management Information Systems Jyh-Jong Lin Chaoyang University of Technology
More informationA Research Using Private Cloud with IP Camera and Smartphone Video Retrieval
, pp.175-186 http://dx.doi.org/10.14257/ijsh.2014.8.1.19 A Research Using Private Cloud with IP Camera and Smartphone Video Retrieval Kil-sung Park and Sun-Hyung Kim Department of Information & Communication
More informationVCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
More informationKEITH LEHNERT AND ERIC FRIEDRICH
MACHINE LEARNING CLASSIFICATION OF MALICIOUS NETWORK TRAFFIC KEITH LEHNERT AND ERIC FRIEDRICH 1. Introduction 1.1. Intrusion Detection Systems. In our society, information systems are everywhere. They
More informationAutomated Test Approach for Web Based Software
Automated Test Approach for Web Based Software Indrajit Pan 1, Subhamita Mukherjee 2 1 Dept. of Information Technology, RCCIIT, Kolkata 700 015, W.B., India 2 Dept. of Information Technology, Techno India,
More informationWeb Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
More informationSEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL
SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL Krishna Kiran Kattamuri 1 and Rupa Chiramdasu 2 Department of Computer Science Engineering, VVIT, Guntur, India
More informationFahad H.Alshammari, Rami Alnaqeib, M.A.Zaidan, Ali K.Hmood, B.B.Zaidan, A.A.Zaidan
WWW.JOURNALOFCOMPUTING.ORG 85 New Quantitative Study for Dissertations Repository System Fahad H.Alshammari, Rami Alnaqeib, M.A.Zaidan, Ali K.Hmood, B.B.Zaidan, A.A.Zaidan Abstract In the age of technology,
More informationEnterprise Content Management. Image from http://webbuildinginfo.com/wp-content/uploads/ecm.jpg. José Borbinha
Enterprise Content Management Image from http://webbuildinginfo.com/wp-content/uploads/ecm.jpg José Borbinha ECM? Let us start with the help of a professional organization http://www.aiim.org http://www.aiim.org/about
More informationComponent visualization methods for large legacy software in C/C++
Annales Mathematicae et Informaticae 44 (2015) pp. 23 33 http://ami.ektf.hu Component visualization methods for large legacy software in C/C++ Máté Cserép a, Dániel Krupp b a Eötvös Loránd University mcserep@caesar.elte.hu
More informationTowards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis
Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,
More informationWhite Paper Case Study: How Collaboration Platforms Support the ITIL Best Practices Standard
White Paper Case Study: How Collaboration Platforms Support the ITIL Best Practices Standard Abstract: This white paper outlines the ITIL industry best practices methodology and discusses the methods in
More informationCyber Forensic for Hadoop based Cloud System
Cyber Forensic for Hadoop based Cloud System ChaeHo Cho 1, SungHo Chin 2 and * Kwang Sik Chung 3 1 Korea National Open University graduate school Dept. of Computer Science 2 LG Electronics CTO Division
More informationViewpoint ediscovery Services
Xerox Legal Services Viewpoint ediscovery Platform Technical Brief Viewpoint ediscovery Services Viewpoint by Xerox delivers a flexible approach to ediscovery designed to help you manage your litigation,
More informationDesign for Management Information System Based on Internet of Things
Design for Management Information System Based on Internet of Things * School of Computer Science, Sichuan University of Science & Engineering, Zigong Sichuan 643000, PR China, 413789256@qq.com Abstract
More informationHow To Write A Summary Of A Review
PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,
More informationManaging e-records without an EDRMS. Linda Daniels-Lewis Senior IM Consultant Systemscope
Managing e-records without an EDRMS Linda Daniels-Lewis Senior IM Consultant Systemscope Outline The e-record What s involved in managing e-records? Where do we start? How do we classify? How do we proceed?
More informationBuilding a Question Classifier for a TREC-Style Question Answering System
Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given
More informationResearch on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2
Advanced Engineering Forum Vols. 6-7 (2012) pp 82-87 Online: 2012-09-26 (2012) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/aef.6-7.82 Research on Clustering Analysis of Big Data
More informationCLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES
CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES 1 MYOUNGJIN KIM, 2 CUI YUN, 3 SEUNGHO HAN, 4 HANKU LEE 1,2,3,4 Department of Internet & Multimedia Engineering,
More informationHexaware E-book on Predictive Analytics
Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,
More informationCA Deliver r11.7. Business value. Product overview. Delivery approach. agility made possible
PRODUCT SHEET CA Deliver agility made possible CA Deliver r11.7 CA Deliver is an online report management system that provides you with tools to manage and reduce the cost of report distribution. Able
More informationDistributed Dynamic Load Balancing for Iterative-Stencil Applications
Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,
More informationDATA SECURITY IN CLOUD USING ADVANCED SECURE DE-DUPLICATION
DATA SECURITY IN CLOUD USING ADVANCED SECURE DE-DUPLICATION Hasna.R 1, S.Sangeetha 2 1 PG Scholar, Dhanalakshmi Srinivasan College of Engineering, Coimbatore. 2 Assistant Professor, Dhanalakshmi Srinivasan
More informationState of Michigan Records Management Services. Guide to E mail Storage Options
State of Michigan Records Management Services Guide to E mail Storage Options E mail is a fast, efficient and cost effective means for communicating and sharing information. However, e mail software is
More informationWeb Database Integration
Web Database Integration Wei Liu School of Information Renmin University of China Beijing, 100872, China gue2@ruc.edu.cn Xiaofeng Meng School of Information Renmin University of China Beijing, 100872,
More informationSelective dependable storage services for providing security in cloud computing
Selective dependable storage services for providing security in cloud computing Gade Lakshmi Thirupatamma*1, M.Jayaram*2, R.Pitchaiah*3 M.Tech Scholar, Dept of CSE, UCET, Medikondur, Dist: Guntur, AP,
More informationInternational Journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online http://www.ijoer.
REVIEW ARTICLE ISSN: 2321-7758 UPS EFFICIENT SEARCH ENGINE BASED ON WEB-SNIPPET HIERARCHICAL CLUSTERING MS.MANISHA DESHMUKH, PROF. UMESH KULKARNI Department of Computer Engineering, ARMIET, Department
More informationKnowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
More informationSentiment analysis on tweets in a financial domain
Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International
More informationAn Automated Workflow System Geared Towards Consumer Goods and Services Companies
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 An Automated Workflow System Geared Towards Consumer Goods and Services
More informationDissecting the Learning Behaviors in Hacker Forums
Dissecting the Learning Behaviors in Hacker Forums Alex Tsang Xiong Zhang Wei Thoo Yue Department of Information Systems, City University of Hong Kong, Hong Kong inuki.zx@gmail.com, xionzhang3@student.cityu.edu.hk,
More informationCOURSE RECOMMENDER SYSTEM IN E-LEARNING
International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 159-164 COURSE RECOMMENDER SYSTEM IN E-LEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)-II, Walchand
More informationSVM Ensemble Model for Investment Prediction
19 SVM Ensemble Model for Investment Prediction Chandra J, Assistant Professor, Department of Computer Science, Christ University, Bangalore Siji T. Mathew, Research Scholar, Christ University, Dept of
More informationForecasting stock markets with Twitter
Forecasting stock markets with Twitter Argimiro Arratia argimiro@lsi.upc.edu Joint work with Marta Arias and Ramón Xuriguera To appear in: ACM Transactions on Intelligent Systems and Technology, 2013,
More informationTechnical. Overview. ~ a ~ irods version 4.x
Technical Overview ~ a ~ irods version 4.x The integrated Ru e-oriented DATA System irods is open-source, data management software that lets users: access, manage, and share data across any type or number
More informationA Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
More informationElegantJ BI. White Paper. The Enterprise Option Reporting Tools vs. Business Intelligence
ElegantJ BI White Paper The Enterprise Option Integrated Business Intelligence and Reporting for Performance Management, Operational Business Intelligence and Data Management www.elegantjbi.com ELEGANTJ
More informationSYNTHETIC DATA GENERATION CAPABILTIES FOR TESTING DATA MINING TOOLS. Rui Xiao University of California, Riverside djeske@ucr.edu
SYNTHETIC DATA GENERATION CAPABILTIES FOR TESTING DATA MINING TOOLS Daniel R. Jeske Pengyue J. Lin Carlos Rendón Rui Xiao University of California, Riverside djeske@ucr.edu Behrokh Samadi Lucent Technologies
More informationThe multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
More informationJournal of Chemical and Pharmaceutical Research, 2015, 7(3):1388-1392. Research Article. E-commerce recommendation system on cloud computing
Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2015, 7(3):1388-1392 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 E-commerce recommendation system on cloud computing
More informationSearch Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
More informationSoftware Configuration Management Plan
For Database Applications Document ID: Version: 2.0c Planning Installation & Acceptance Integration & Test Requirements Definition Design Development 1 / 22 Copyright 2000-2005 Digital Publications LLC.
More informationCourse 103402 MIS. Foundations of Business Intelligence
Oman College of Management and Technology Course 103402 MIS Topic 5 Foundations of Business Intelligence CS/MIS Department Organizing Data in a Traditional File Environment File organization concepts Database:
More informationData Mining for Successful Healthcare Organizations
Data Mining for Successful Healthcare Organizations For successful healthcare organizations, it is important to empower the management and staff with data warehousing-based critical thinking and knowledge
More informationEfficiently Managing Firewall Conflicting Policies
Efficiently Managing Firewall Conflicting Policies 1 K.Raghavendra swamy, 2 B.Prashant 1 Final M Tech Student, 2 Associate professor, Dept of Computer Science and Engineering 12, Eluru College of Engineeering
More informationCustomer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
More informationIncreasing Marketing ROI with Optimized Prediction
Increasing Marketing ROI with Optimized Prediction Yottamine s Unique and Powerful Solution Smart marketers are using predictive analytics to make the best offer to the best customer for the least cost.
More informationVolume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image
More informationA Framework for Data Migration between Various Types of Relational Database Management Systems
A Framework for Data Migration between Various Types of Relational Database Management Systems Ahlam Mohammad Al Balushi Sultanate of Oman, International Maritime College Oman ABSTRACT Data Migration is
More informationA Framework of User-Driven Data Analytics in the Cloud for Course Management
A Framework of User-Driven Data Analytics in the Cloud for Course Management Jie ZHANG 1, William Chandra TJHI 2, Bu Sung LEE 1, Kee Khoon LEE 2, Julita VASSILEVA 3 & Chee Kit LOOI 4 1 School of Computer
More informationCollaboration. Michael McCabe Information Architect mmccabe@gig-werks.com. black and white solutions for a grey world
Collaboration Michael McCabe Information Architect mmccabe@gig-werks.com black and white solutions for a grey world Slide Deck & Webcast Recording links Questions and Answers We will answer questions at
More information