# Data Mining With Big Data Image De-Duplication In Social Networking Websites

Save this PDF as:

Size: px
Start display at page:

Download "Data Mining With Big Data Image De-Duplication In Social Networking Websites"

## Transcription

2 Set similarity. The distance measure between two images is computed as the similarity of Sets and which is defined as the ratio of the number of elements in the intersection over the union: This similarity measure is used by text search engines to detect near-duplicate text documents. In NDID, the method was used.the efficient algorithm for retrieving near duplicate documents, called min-hash.. Weighted set similarity The set similarity measure assumes that all words are equally important.here we extend the definition of similarity to sets of words with differing importance. Let dw 0 be an importance of a visual word Xw. The similarity of two sets and is called sketches. Identical sketches are then efficiently found using a hash table. Documents with at least h identical sketches (sketch hits) are considered as possible near duplicate candidates and their similarity is then estimated using all available min-hashes 3.1 min-hash algorithm. A number of random hash functions is given assigning a real number to each visual word. Let Xa and Xb be different words from the vocabulary. The random hash functions have to satisfy two conditions: f j(xa) f j(xb) and The functions f j also have to be independent. For small vocabularies,the hash functions can be implemented as a look up table, where each element of the table is generated by a random sample from Un(0,1). Note that each function f j infers an ordering on the set of visual words Xa <j Xb iff f j(xa) < f j(xb). We define a min-hash as a smallest element of a set Ai under ordering induced by function f j Histogram intersection. Let ti be a vector of size where each coordinate is the number of visual words Xw present in the i-th document. The histogram intersection measure is defined as This measure can be also extended using word weightings to give: 3. Min Hash Background we describe how a method originally developed for text nearduplicate detection is adopted to near-duplicate detection of images. Two documents are near duplicate if the similarity sims is higher than a given threshold. The goal is to retrieve all documents in the database that are similar to a query document. This section reviews an efficient randomized hashing based procedure that retrieves near duplicate documents in time proportional to the number of near duplicate documents. The outline of the algorithm is as follows: First a list of min-hashes are extracted from each document. A min-hash is a single number having the property that two sets and have the same value of min-hash with probability equal to their similarity sims. For efficient retrieval the min-hashes are grouped into n-tuples For each documentai and each hash function f j the min- Hashes m(, f j) are recorded. The method is based on the fact, which we show later on, that the probability of m(, f j)= m(, f j) is To estimate sims (, ), N independent hash functions f j are used. Let l be the number of how many times m(, f j) = m(a2, f j). Then, l follows the binomial distribution Bi(N,sims(A1, )).The maximum likelihood estimate of sims (, ) is l/n. 3.2 How does it work? let Since f j is a random hash function, each element of has the same probability of being the least element. Therefore, we can think of X as being drawn at random from. If X is an element of both and, i.e. X ε \, then m(, f j) = m(, f j) = X.Otherwise either X ε \ and X = m(, f j) m(, f j); or X ε \ and m(, f j) m(, f j) = X. The equation (5) states that X is drawn from elements at random and the equality of min-hashes occurs in cases. 4.Experimental Results We demonstrate our method for NDID on two data sets: the TrecVid 2006 data set and the University of Kentucky data set. 4.1TrecVid 2006 TrecVid [21] database consists of 146,588 JPEG keyframes automatically pre-selected from 165 hours (17.8M frames, 127 GB) of MPEG-1 news footage, recorded from different TV stations from around the world. Each frame is at a 16

3 resolution of pixels and normally of quite low quality. Figure 1 displays the number of sketch hits plotted against the similarity measures of the colliding documents. a vocabulary of 64K visual words, N =192 min-hashes, sketch size n=3, and k =64 number of sketches were used. For document pairs with high value of the similarity measure, the number of hits is roughly equal for sims and simw and slightly higher for simh. This means that about the same number of near duplicate images will be recovered by the first two methods and the histogram intersection detects slightly higher number of near duplicates. The detected near duplicate results appear similar after visual inspection and no significant discrepancy can be observed between the results of the methods. However, for document pairs with low similarity (pairs that are of no interest) using simw and simh similarity significantly reduces the number of sketch hits. In the standard version of the algorithm, even uninformative visual words that are common to many images are equally likely to become a min-hash. When this happens, a large number of images is represented by the same frequent min-hash. In the proposed approach, common visual words are down-weighted by a low value of idf. As a result, a lower number of sketch collisions of documents with low similarity is observed. The average number of documents examined per query is 8.5, 7.1, and 7.7 for sims, simw, and simh respectively. Compare this to 43,997.3 of considered documents. using tf-idf inverted file retrieval using a vocabulary of the same size. 4.2 University of Kentucky database This database contains 10,200 images in sets of 4 images of one object / scene. Querying the database with each image should return three more examples. This is used to score the retrieval by the average number of correctly returned images in top four results (the query image is to be retrieved too). We are probing lower values of the similarity measures due to larger variations between images of the same scene in this data set. Therefore more min-hashes and more sketches have to be recorded. we varied several parameters of the method: the size of the vocabulary (30k and 100k), the number of independent random hash functions, and the number of hashed sketches. The number of min-hashes per sketch was set to n = 2. The average number of documents considered (the average number of sketch hits)3 and the average number of correctly retrieved images in the top 4 ranked images were recorded. The results consistently show that the number of sketch hits is significantly decreased while the retrieval score is improved when the idf-weighting is used. The results are further improved when the histogram intersection is used Some example queries and results are shown in figure 2. It can be seen on the results, that sims often retrieves images based on the object background. The background is repeated on many images and is down-weighted by both simw and simh idf weighting. For comparison, the number of considered documents using standard tf-idf retrieval with inverted files would be 10,089.9 and 9,659.4 for vocabulary sizes 30k and 100k respectively. We are not trying to compete with image or specific object retrieval. The method is designed to find images with high similarity by trying out only a few possibilities. This database is too small to highlight the advantages of rapid retrieval and reduced image representation. Despite this, the scores for the histogram intersection similarity measure simh exceed the score of 3.16 for flat tf-idf. 6 ACKNOWLEDGMENTS This research was supported by Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning ( ) and MSIP (Ministry of Science, ICT & Future Planning), Korea, under the ITRC (Information Technology Research Center) support program(nipa-2013-h ) supervised by the NIPA (National IT Industry Promotion Agency). 17

4 18

5 7. Conclusions We have proposed two novel similarity measures whose retrieval performance is approaching the well established tfidf weighting scheme for image / particular object re- trieval. We show that pairs of images with high values of similarity can be efficiently (in time proportional to the number of retrieved images) retrieved using the min-hash algorithm. We have shown experimental evidence that the idf word weighting improves both the search efficiency and the quality of the results. The weighted histogram intersection is the best similarity measure (out of the three examined) in both retrieval quality and search efficiency. Promising results on the retrieval database encourage the use of the hashing scheme beyond near duplicate detection, for example in clustering of large database of images 8.REFERENCE [1] R. Ahmed and G. Karypis, Algorithms for Mining the Evolution of Conserved Relational States in Dynamic Networks, Knowledge and Information Systems, vol. 33, no. 3, pp , Dec [2] M.H. Alam, J.W. Ha, and S.K. Lee, Novel Approaches to Crawling Important Pages Early, Knowledge andinformation Systems, vol. 33, no. 3, pp , Dec [3] S. Aral and D. Walker, Identifying Influential and Susceptible Members of Social Networks, Science, vol. 337, pp , [4] A. Machanavajjhala and J.P. Reiter, Big Privacy: Protecting Confidentiality in Big Data, ACM Crossroads, vol. 19, no. 1, pp , [5] S. Banerjee and N. Agarwal, Analyzing Collective Behavior from Blogs Using Swarm Intelligence, Knowledge and Information Systems, vol. 33, no. 3, pp , Dec [6] T. C. Hoad and J. Zobel. Fast video matching with signature alignment. In MIR, pages , [7] P. Indyk. Stable distributions, pseudorandom generators, embeddings and data stream computation.in IEEE Symposium on Foundations of CS, [8] P. Jain, B. Kulis, and K. Grauman. Fast image search for learned metrics. In Proc. CVPR, [9] H. Jegou, H. Harzallah, and C. Schmid. A contextual dissimilarity measure for accurate and efficient image search. In Proc. CVPR, [10] A. Joly, O. Buisson, and C. Fr elicot. Content-based copy detection using distortion-based probabilistic similarity search. IEEE Transactions on Multimedia, to appear, [11] A. Joly, C. Frelicot, and O. Buisson. Robust contentbased video copy identification in a large reference database. In Proc. CIVR, [12] Y. Ke, R. Sukthankar, and L. Huston. Efficient nearduplicate detection and sub-image retrieval. In ACM Multimedia,

### Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering

### Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object

### Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.

Visual search: what's next? Cees Snoek University of Amsterdam The Netherlands Euvision Technologies The Netherlands Problem statement US flag Tree Aircraft Humans Dog Smoking Building Basketball Table

### Semantic Video Annotation by Mining Association Patterns from Visual and Speech Features

Semantic Video Annotation by Mining Association Patterns from and Speech Features Vincent. S. Tseng, Ja-Hwung Su, Jhih-Hong Huang and Chih-Jen Chen Department of Computer Science and Information Engineering

### Rapid Image Retrieval for Mobile Location Recognition

Rapid Image Retrieval for Mobile Location Recognition G. Schroth, A. Al-Nuaimi, R. Huitl, F. Schweiger, E. Steinbach Availability of GPS is limited to outdoor scenarios with few obstacles Low signal reception

### Finding Groups of Duplicate Images in Very Large Datasets

VORAVUTHIKUNCHAI, ET AL.: FINDING GROUPS OF DUPLICATE IMAGES Finding Groups of Duplicate Images in Very Large Datasets Winn Voravuthikunchai winn.voravuthikunchai@unicaen.fr Bruno Crémilleux bruno.cremilleux@unicaen.fr

### Fast Matching of Binary Features

Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been

### Indexing Local Configurations of Features for Scalable Content-Based Video Copy Detection

Indexing Local Configurations of Features for Scalable Content-Based Video Copy Detection Sebastien Poullot, Xiaomeng Wu, and Shin ichi Satoh National Institute of Informatics (NII) Michel Crucianu, Conservatoire

### The Scientific Data Mining Process

Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

### Probabilistic Latent Semantic Analysis (plsa)

Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} References

### TouchPaper - An Augmented Reality Application with Cloud-Based Image Recognition Service

TouchPaper - An Augmented Reality Application with Cloud-Based Image Recognition Service Feng Tang, Daniel R. Tretter, Qian Lin HP Laboratories HPL-2012-131R1 Keyword(s): image recognition; cloud service;

### Recommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1

Recommender Systems Seminar Topic : Application Tung Do 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Agenda Google news personalization : Scalable Online Collaborative Filtering Algorithm, System Components

### Search and Information Retrieval

Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

### Large Scale Image Copy Detection Evaluation

Large Scale Image Copy Detection Evaluation Bart Thomee Mark J. Huiskes Erwin Bakker Michael S. Lew LIACS Media Lab, Leiden University Niels Bohrweg, 2333 CA Leiden, The Netherlands {bthomee, markh, erwin,

### EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

### Lecture 20: Clustering

Lecture 20: Clustering Wrap-up of neural nets (from last lecture Introduction to unsupervised learning K-means clustering COMP-424, Lecture 20 - April 3, 2013 1 Unsupervised learning In supervised learning,

### Human behavior analysis from videos using optical flow

L a b o r a t o i r e I n f o r m a t i q u e F o n d a m e n t a l e d e L i l l e Human behavior analysis from videos using optical flow Yassine Benabbas Directeur de thèse : Chabane Djeraba Multitel

### Cloud-Based Image Coding for Mobile Devices Toward Thousands to One Compression

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 15, NO. 4, JUNE 2013 845 Cloud-Based Image Coding for Mobile Devices Toward Thousands to One Compression Huanjing Yue, Xiaoyan Sun, Jingyu Yang, and Feng Wu, Senior

### FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

### Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph

### ATTRIBUTE ENHANCED SPARSE CODING FOR FACE IMAGE RETRIEVAL

ISSN:2320-0790 ATTRIBUTE ENHANCED SPARSE CODING FOR FACE IMAGE RETRIEVAL MILU SAYED, LIYA NOUSHEER PG Research Scholar, ICET ABSTRACT: Content based face image retrieval is an emerging technology. It s

### SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL Krishna Kiran Kattamuri 1 and Rupa Chiramdasu 2 Department of Computer Science Engineering, VVIT, Guntur, India

### Similarity Search in a Very Large Scale Using Hadoop and HBase

Similarity Search in a Very Large Scale Using Hadoop and HBase Stanislav Barton, Vlastislav Dohnal, Philippe Rigaux LAMSADE - Universite Paris Dauphine, France Internet Memory Foundation, Paris, France

### A Study on Effective Business Logic Approach for Big Data Mining

A Study on Effective Business Logic Approach for Big Data Mining T. Sathis Kumar Assistant Professor, Dept of C.S.E, Saranathan College of Engineering, Trichy, Tamil Nadu, India. ABSTRACT: Big data is

### Simultaneous Gamma Correction and Registration in the Frequency Domain

Simultaneous Gamma Correction and Registration in the Frequency Domain Alexander Wong a28wong@uwaterloo.ca William Bishop wdbishop@uwaterloo.ca Department of Electrical and Computer Engineering University

### Machine Learning for Data Science (CS4786) Lecture 1

Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 10:10 to 11:25 AM Hollister B14 Instructors : Lillian Lee and Karthik Sridharan ROUGH DETAILS ABOUT THE COURSE Diagnostic assignment 0 is out:

### Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

### AN APPROACH FOR SUPERVISED SEMANTIC ANNOTATION

AN APPROACH FOR SUPERVISED SEMANTIC ANNOTATION A. DORADO AND E. IZQUIERDO Queen Mary, University of London Electronic Engineering Department Mile End Road, London E1 4NS, U.K. E-mail: {andres.dorado, ebroul.izquierdo}@elec.qmul.ac.uk

### Search Engines. Stephen Shaw 18th of February, 2014. Netsoc

Search Engines Stephen Shaw Netsoc 18th of February, 2014 Me M.Sc. Artificial Intelligence, University of Edinburgh Would recommend B.A. (Mod.) Computer Science, Linguistics, French,

### Similarity Search in a Very Large Scale Using Hadoop and HBase

http:/cedric.cnam.fr Similarity Search in a Very Large Scale Using Hadoop and HBase Rapport de recherche CEDRIC 2012 http://cedric.cnam.fr Stanislav Barton (CNAM), Vlastislav Dohnal (Masaryk University),

### Local features and matching. Image classification & object localization

Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to

### Face Recognition using SIFT Features

Face Recognition using SIFT Features Mohamed Aly CNS186 Term Project Winter 2006 Abstract Face recognition has many important practical applications, like surveillance and access control.

### A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow

, pp.233-237 http://dx.doi.org/10.14257/astl.2014.51.53 A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow Giwoo Kim 1, Hye-Youn Lim 1 and Dae-Seong Kang 1, 1 Department of electronices

### Visualization of large data sets using MDS combined with LVQ.

Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk

### Data Mining in Web Search Engine Optimization and User Assisted Rank Results

Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management

### Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University September 19, 2012

Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University September 19, 2012 E-Mart No. of items sold per day = 139x2000x20 = ~6 million

### Semantic Concept Based Retrieval of Software Bug Report with Feedback

Semantic Concept Based Retrieval of Software Bug Report with Feedback Tao Zhang, Byungjeong Lee, Hanjoon Kim, Jaeho Lee, Sooyong Kang, and Ilhoon Shin Abstract Mining software bugs provides a way to develop

### Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 69 Class Project Report Junhua Mao and Lunbo Xu University of California, Los Angeles mjhustc@ucla.edu and lunbo

### The role of multimedia in archiving community memories

The role of multimedia in archiving community memories Jonathon S. Hare, David P. Dupplaw, Wendy Hall, Paul H. Lewis, and Kirk Martinez Electronics and Computer Science, University of Southampton, Southampton,

### Introduction to image coding

Introduction to image coding Image coding aims at reducing amount of data required for image representation, storage or transmission. This is achieved by removing redundant data from an image, i.e. by

### 10-810 /02-710 Computational Genomics. Clustering expression data

10-810 /02-710 Computational Genomics Clustering expression data What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally,

### Large-Scale Data Sets Clustering Based on MapReduce and Hadoop

Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE

### Novelty Detection in image recognition using IRF Neural Networks properties

Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban Université de Haute-Alsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,

### Mining and Detection of Emerging Topics from Social Network Big Data

Mining and Detection of Emerging Topics from Social Network Big Data Divya Kalakuntla M.Tech Scholar, Christu Jyoti Institute of Technology And Science Colombonagar, Yeshwanthapur, Jangaon, Telangana ABSTRACT:

### Image Compression Effects on Face Recognition for Images with Reduction in Size

Image Compression Effects on Face Recognition for Images with Reduction in Size Padmaja.V.K Jawaharlal Nehru Technological University, Anantapur Giri Prasad, PhD. B. Chandrasekhar, PhD. ABSTRACT In this

### New Constructions and Practical Applications for Private Stream Searching (Extended Abstract)

New Constructions and Practical Applications for Private Stream Searching (Extended Abstract)???? John Bethencourt CMU Dawn Song CMU Brent Waters SRI 1 Searching for Information Too much on-line info to

### Text Localization & Segmentation in Images, Web Pages and Videos Media Mining I

Text Localization & Segmentation in Images, Web Pages and Videos Media Mining I Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} PSNR_Y

### 1 o Semestre 2007/2008

Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction

### EFFECTIVE DATA RECOVERY FOR CONSTRUCTIVE CLOUD PLATFORM

INTERNATIONAL JOURNAL OF REVIEWS ON RECENT ELECTRONICS AND COMPUTER SCIENCE EFFECTIVE DATA RECOVERY FOR CONSTRUCTIVE CLOUD PLATFORM Macha Arun 1, B.Ravi Kumar 2 1 M.Tech Student, Dept of CSE, Holy Mary

### Density Map Visualization for Overlapping Bicycle Trajectories

, pp.327-332 http://dx.doi.org/10.14257/ijca.2014.7.3.31 Density Map Visualization for Overlapping Bicycle Trajectories Dongwook Lee 1, Jinsul Kim 2 and Minsoo Hahn 1 1 Digital Media Lab., Korea Advanced

### Data Mining With Application of Big Data

Data Mining With Application of Big Data Aqeel Abbood Rahmah Master of Science (Information System), Nizam College (Autonomous),O.U, Basheer Bagh, Hyderabad. Abstract: Big Data concern large-volume, complex,

### MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of

### How much can Behavioral Targeting Help Online Advertising? Jun Yan 1, Ning Liu 1, Gang Wang 1, Wen Zhang 2, Yun Jiang 3, Zheng Chen 1

WWW 29 MADRID! How much can Behavioral Targeting Help Online Advertising? Jun Yan, Ning Liu, Gang Wang, Wen Zhang 2, Yun Jiang 3, Zheng Chen Microsoft Research Asia Beijing, 8, China 2 Department of Automation

### Dissertation TOPIC MODELS FOR IMAGE RETRIEVAL ON LARGE-SCALE DATABASES. Eva Hörster

Dissertation TOPIC MODELS FOR IMAGE RETRIEVAL ON LARGE-SCALE DATABASES Eva Hörster Department of Computer Science University of Augsburg Adviser: Readers: Prof. Dr. Rainer Lienhart Prof. Dr. Rainer Lienhart

### Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,

### Statistical Models in Data Mining

Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

### Unsupervised Data Mining (Clustering)

Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in

### Content-Based Image Retrieval

Content-Based Image Retrieval Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Image retrieval Searching a large database for images that match a query: What kind

### Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing

Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,

### Keywords: Mobility Prediction, Location Prediction, Data Mining etc

Volume 4, Issue 4, April 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Data Mining Approach

### A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique Jyoti Malhotra 1,Priya Ghyare 2 Associate Professor, Dept. of Information Technology, MIT College of

### Feasibility Study of Searchable Image Encryption System of Streaming Service based on Cloud Computing Environment

Feasibility Study of Searchable Image Encryption System of Streaming Service based on Cloud Computing Environment JongGeun Jeong, ByungRae Cha, and Jongwon Kim Abstract In this paper, we sketch the idea

### siftservice.com - Turning a Computer Vision algorithm into a World Wide Web Service

siftservice.com - Turning a Computer Vision algorithm into a World Wide Web Service Ahmad Pahlavan Tafti 1, Hamid Hassannia 2, and Zeyun Yu 1 1 Department of Computer Science, University of Wisconsin -Milwaukee,

### JAVA IEEE 2015. 6 Privacy Policy Inference of User-Uploaded Images on Content Sharing Sites Data Mining

S.NO TITLES Domains 1 Anonymity-based Privacy-preserving Data Reporting for Participatory Sensing 2 Anonymizing Collections of Tree-Structured Data 3 Making Digital Artifacts on the Web Verifiable and

### Character Image Patterns as Big Data

22 International Conference on Frontiers in Handwriting Recognition Character Image Patterns as Big Data Seiichi Uchida, Ryosuke Ishida, Akira Yoshida, Wenjie Cai, Yaokai Feng Kyushu University, Fukuoka,

### Scalable Mining of Large Video Databases Using Copy Detection

Scalable Mining of Large Video Databases Using Copy Detection Sébastien Poullot Institut National de l Audiovisuel and CEDRIC - CNAM, France spoullot@ina.fr Michel Crucianu CEDRIC - CNAM 292 rue St Martin

### Two-Level Metadata Management for Data Deduplication System

Two-Level Metadata Management for Data Deduplication System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3.,Young Woong Ko 1 1 Dept. of Computer Engineering, Hallym University Chuncheon, Korea { kongjs,

### International Journal of Advanced Computer Technology (IJACT) ISSN:2319-7900 PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS

PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS First A. Dr. D. Aruna Kumari, Ph.d, ; Second B. Ch.Mounika, Student, Department Of ECM, K L University, chittiprolumounika@gmail.com; Third C.

### Email Spam Detection Using Customized SimHash Function

International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email

### Search Result Optimization using Annotators

Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,

### Contact Recommendations from Aggegrated On-Line Activity

Contact Recommendations from Aggegrated On-Line Activity Abigail Gertner, Justin Richer, and Thomas Bartee The MITRE Corporation 202 Burlington Road, Bedford, MA 01730 {gertner,jricher,tbartee}@mitre.org

### The Delicate Art of Flower Classification

The Delicate Art of Flower Classification Paul Vicol Simon Fraser University University Burnaby, BC pvicol@sfu.ca Note: The following is my contribution to a group project for a graduate machine learning

### Storage Management for Files of Dynamic Records

Storage Management for Files of Dynamic Records Justin Zobel Department of Computer Science, RMIT, GPO Box 2476V, Melbourne 3001, Australia. jz@cs.rmit.edu.au Alistair Moffat Department of Computer Science

### An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

### PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software

### A Study on M2M-based AR Multiple Objects Loading Technology using PPHT

A Study on M2M-based AR Multiple Objects Loading Technology using PPHT Sungmo Jung, Seoksoo Kim * Department of Multimedia Hannam University 133, Ojeong-dong, Daedeok-gu, Daejeon-city Korea sungmoj@gmail.com,

### Multimedia Document Authentication using On-line Signatures as Watermarks

Multimedia Document Authentication using On-line Signatures as Watermarks Anoop M Namboodiri and Anil K Jain Department of Computer Science and Engineering Michigan State University East Lansing, MI 48824

### The assignment of chunk size according to the target data characteristics in deduplication backup system

The assignment of chunk size according to the target data characteristics in deduplication backup system Mikito Ogata Norihisa Komoda Hitachi Information and Telecommunication Engineering, Ltd. 781 Sakai,

### Automated Process for Generating Digitised Maps through GPS Data Compression

Automated Process for Generating Digitised Maps through GPS Data Compression Stewart Worrall and Eduardo Nebot University of Sydney, Australia {s.worrall, e.nebot}@acfr.usyd.edu.au Abstract This paper

### A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE

STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LIX, Number 1, 2014 A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE DIANA HALIŢĂ AND DARIUS BUFNEA Abstract. Then

### Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

### TOOLS FOR 3D-OBJECT RETRIEVAL: KARHUNEN-LOEVE TRANSFORM AND SPHERICAL HARMONICS

TOOLS FOR 3D-OBJECT RETRIEVAL: KARHUNEN-LOEVE TRANSFORM AND SPHERICAL HARMONICS D.V. Vranić, D. Saupe, and J. Richter Department of Computer Science, University of Leipzig, Leipzig, Germany phone +49 (341)

### Tracking and Recognition in Sports Videos

Tracking and Recognition in Sports Videos Mustafa Teke a, Masoud Sattari b a Graduate School of Informatics, Middle East Technical University, Ankara, Turkey mustafa.teke@gmail.com b Department of Computer

### Big Data: Image & Video Analytics

Big Data: Image & Video Analytics How it could support Archiving & Indexing & Searching Dieter Haas, IBM Deutschland GmbH The Big Data Wave 60% of internet traffic is multimedia content (images and videos)

### Clustering Data Streams

Clustering Data Streams Mohamed Elasmar Prashant Thiruvengadachari Javier Salinas Martin gtg091e@mail.gatech.edu tprashant@gmail.com javisal1@gatech.edu Introduction: Data mining is the science of extracting

### Tensor Factorization for Multi-Relational Learning

Tensor Factorization for Multi-Relational Learning Maximilian Nickel 1 and Volker Tresp 2 1 Ludwig Maximilian University, Oettingenstr. 67, Munich, Germany nickel@dbs.ifi.lmu.de 2 Siemens AG, Corporate

### Efficient Similarity Search over Encrypted Data

UT DALLAS Erik Jonsson School of Engineering & Computer Science Efficient Similarity Search over Encrypted Data Mehmet Kuzu, Saiful Islam, Murat Kantarcioglu Introduction Client Untrusted Server Similarity

### Linear programming approach for online advertising

Linear programming approach for online advertising Igor Trajkovski Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, Rugjer Boshkovikj 16, P.O. Box 393, 1000 Skopje,

### Multi-level Metadata Management Scheme for Cloud Storage System

, pp.231-240 http://dx.doi.org/10.14257/ijmue.2014.9.1.22 Multi-level Metadata Management Scheme for Cloud Storage System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3, Chuck Yoo 2 and Young Woong Ko 1

### Clustering Big Data. Efficient Data Mining Technologies. J Singh and Teresa Brooks. June 4, 2015

Clustering Big Data Efficient Data Mining Technologies J Singh and Teresa Brooks June 4, 2015 Hello Bulgaria (http://hello.bg/) A website with thousands of pages... Some pages identical to other pages

### Self Organizing Maps for Visualization of Categories

Self Organizing Maps for Visualization of Categories Julian Szymański 1 and Włodzisław Duch 2,3 1 Department of Computer Systems Architecture, Gdańsk University of Technology, Poland, julian.szymanski@eti.pg.gda.pl

### What makes Big Visual Data hard?

What makes Big Visual Data hard? Quint Buchholz Alexei (Alyosha) Efros Carnegie Mellon University My Goals 1. To make you fall in love with Big Visual Data She is a fickle, coy mistress but holds the key

### Algorithmic Aspects of Big Data. Nikhil Bansal (TU Eindhoven)

Algorithmic Aspects of Big Data Nikhil Bansal (TU Eindhoven) Algorithm design Algorithm: Set of steps to solve a problem (by a computer) Studied since 1950 s. Given a problem: Find (i) best solution (ii)

### Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

### Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

### MusicGuide: Album Reviews on the Go Serdar Sali

MusicGuide: Album Reviews on the Go Serdar Sali Abstract The cameras on mobile phones have untapped potential as input devices. In this paper, we present MusicGuide, an application that can be used to

### Web Search Engines: Solutions

Web Search Engines: Solutions Problem 1: A. How can the owner of a web site design a spider trap? Answer: He can set up his web server so that, whenever a client requests a URL in a particular directory

### Euler Vector: A Combinatorial Signature for Gray-Tone Images

Euler Vector: A Combinatorial Signature for Gray-Tone Images Arijit Bishnu, Bhargab B. Bhattacharya y, Malay K. Kundu, C. A. Murthy fbishnu t, bhargab, malay, murthyg@isical.ac.in Indian Statistical Institute,