Learning to Hash for Big Data
|
|
- Cordelia Atkinson
- 7 years ago
- Views:
Transcription
1 Learning to Hash for Big Data oé LAMDA Group H ŒÆOŽÅ Æ EâX ^ #EâI[ : liwujun@nju.edu.cn May 09, 2015 Li ( Learning to Hash LAMDA, CS, NJU 1 / 43
2 Outline 1 Introduction Problem Definition Existing Methods 2 Scalable Graph Hashing with Feature Transformation Motivation Model and Learning Experiment 3 Conclusion 4 Reference Li ( Learning to Hash LAMDA, CS, NJU 2 / 43
3 Introduction Outline 1 Introduction Problem Definition Existing Methods 2 Scalable Graph Hashing with Feature Transformation Motivation Model and Learning Experiment 3 Conclusion 4 Reference Li ( Learning to Hash LAMDA, CS, NJU 3 / 43
4 Introduction Problem Definition Nearest Neighbor Search (Retrieval) Given a query point q, return the points closest (similar) to q in the database (e.g., images). Underlying many machine learning, data mining, information retrieval problems Challenge in Big Data Applications: Curse of dimensionality Storage cost Query speed Li ( Learning to Hash LAMDA, CS, NJU 4 / 43
5 Introduction Problem Definition Similarity Preserving Hashing Li ( Learning to Hash LAMDA, CS, NJU 5 / 43
6 Introduction Problem Definition Reduce Dimensionality and Storage Cost Li ( Learning to Hash LAMDA, CS, NJU 6 / 43
7 Introduction Problem Definition Querying Hamming distance: , H = , H = 1 Li ( Learning to Hash LAMDA, CS, NJU 7 / 43
8 Introduction Problem Definition Fast Query Speed By using hashing-based index, we can achieve constant or sub-linear search time complexity. Exhaustive search is also acceptable because the distance calculation cost is cheap now. Li ( Learning to Hash LAMDA, CS, NJU 8 / 43
9 Introduction Problem Definition Hash Function Learning Easy or hard? Hard: discrete optimization problem Easy by approximation: two stages of hash function learning Projection stage (dimensionality reduction) Projected with real-valued projection function Given a point x, each projected dimension i will be associated with a real-valued projection function f i (x) (e.g. f i (x) = w T i x) Quantization stage Turn real into binary However, there exist essential differences between metric learning (dimensionality reduction) and learning to hash. Simply adapting traditional metric learning is not enough. Li ( Learning to Hash LAMDA, CS, NJU 9 / 43
10 Introduction Existing Methods Data-Independent Methods The hash function family is defined independently of the training dataset: Locality-sensitive hashing (LSH): (Gionis et al., 1999; Andoni and Indyk, 2008) and its extensions (Datar et al., 2004; Kulis and Grauman, 2009; Kulis et al., 2009). SIKH: Shift invariant kernel hashing (SIKH) (Raginsky and Lazebnik, 2009). Hash function: random projections. Li ( Learning to Hash LAMDA, CS, NJU 10 / 43
11 Introduction Existing Methods Data-Dependent Methods Hash functions are learned from a given training dataset (learning to hash). Relatively short codes Seminal papers: (Salakhutdinov and Hinton, 2007, 2009; Torralba et al., 2008; Weiss et al., 2008) Two categories: Unimodal Supervised methods given the labels y i or triplet (x i, x j, x k ) Unsupervised methods Multimodal Supervised methods Unsupervised methods Li ( Learning to Hash LAMDA, CS, NJU 11 / 43
12 Introduction Existing Methods (Unimodal) Unsupervised Methods No labels to denote the categories of the training points. PCAH: principal component analysis. SH: eigenfunctions computed from the data similarity graph (Weiss et al., 2008). ITQ: orthogonal rotation matrix to refine the initial projection matrix learned by PCA (Gong and Lazebnik, 2011). AGH: graph-based hashing (Liu et al., 2011). IsoHash: projected dimensions with isotropic variances (Kong and Li, 2012b). DGH: discrete graph hashing (Liu et al., 2014) etc. Li ( Learning to Hash LAMDA, CS, NJU 12 / 43
13 Introduction Existing Methods (Unimodal) Supervised (semi-supervised) Methods Class labels or pairwise constraints: SSH: semi-supervised hashing (SSH) exploits both labeled data and unlabeled data for hash function learning (Wang et al., 2010a,b). MLH: minimal loss hashing (MLH) based on the latent structural SVM framework (Norouzi and Fleet, 2011). KSH: kernel-based supervised hashing (Liu et al., 2012) LDAHash: linear discriminant analysis based hashing (Strecha et al., 2012) LFH: supervised hashing with latent factor models (Zhang et al., 2014) etc. Triplet-based methods: Hamming distance metric learning (HDML) (Norouzi et al., 2012) Column generation base hashing (CGHash) (Li et al., 2013) Li ( Learning to Hash LAMDA, CS, NJU 13 / 43
14 Introduction Existing Methods Multimodal Methods Multi-source hashing Cross-modal hashing Li ( Learning to Hash LAMDA, CS, NJU 14 / 43
15 Introduction Existing Methods Multi-Source Hashing Aims at learning better codes by leveraging auxiliary views than unimodal hashing. Assumes that all the views provided for a query, which are typically not feasible for many multimedia applications. Multiple feature hashing (Song et al., 2011) Composite hashing (Zhang et al., 2011) etc. Li ( Learning to Hash LAMDA, CS, NJU 15 / 43
16 Introduction Existing Methods Cross-Modal Hashing Given a query of either image or text, return images or texts similar to it. Cross View Hashing (CVH) (Kumar and Udupa, 2011) Multimodal Latent Binary Embedding (MLBE) (Zhen and Yeung, 2012a) Co-Regularized Hashing (CRH) (Zhen and Yeung, 2012b) Inter-Media Hashing (IMH) (Song et al., 2013) Relation-aware Heterogeneous Hashing (RaHH) (Ou et al., 2013) Semantic Correlation Maximization (SCM) (Zhang and Li, 2014) etc. Li ( Learning to Hash LAMDA, CS, NJU 16 / 43
17 Introduction Existing Methods Our Contribution Unsupervised Hashing : (1) Isotropic hashing [NIPS 2012] (2) Scalable graph hashing with feature transformation [IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with latent factor models Multimodal Hashing [AAAI 2014]: Large-scale supervised multimodal hashing with semantic correlation maximization Multiple-Bit Quantization: (1) Double-bit quantization [AAAI 2012] (2) Manhattan quantization [SIGIR 2012] 5ŒêâMFÆSµyG ª³6[5 ÆÏ6,2015] Li ( Learning to Hash LAMDA, CS, NJU 17 / 43
18 Scalable Graph Hashing with Feature Transformation Outline 1 Introduction Problem Definition Existing Methods 2 Scalable Graph Hashing with Feature Transformation Motivation Model and Learning Experiment 3 Conclusion 4 Reference Li ( Learning to Hash LAMDA, CS, NJU 18 / 43
19 Scalable Graph Hashing with Feature Transformation Motivation (Unsupervised) Graph Hashing Guide the hashing code learning procedure by directly exploiting the pairwise similarity (neighborhood structure). min ij S ij b i b j 2 = tr(b T LB) subject to : b i { 1, 1} c b i = 0 i 1 n b i b T i i = I Should be expected to achieve better performance than non-graph based methods if the learning algorithms are effective. Li ( Learning to Hash LAMDA, CS, NJU 19 / 43
20 Scalable Graph Hashing with Feature Transformation Motivation Motivation However, it is difficult to design effective algorithms because both the memory and time complexity are at least O(n 2 ). SH (Weiss et al., 2008): Use an eigenfunction solution of 1-D Laplacian with uniform assumption BRE (Kulis and Darrell, 2009): Subsample a small subset for training AGH (Liu et al., 2011), DGH (Liu et al., 2014): Use anchor graph to approximate the similarity graph Li ( Learning to Hash LAMDA, CS, NJU 20 / 43
21 Scalable Graph Hashing with Feature Transformation Motivation Contribution How to utilize the whole graph and simultaneously avoid O(n 2 ) complexity? Scalable graph hashing (SGH): A feature transformation (Shrivastava and Li, 2014) method to effectively approximate the whole graph without explicitly computing it. A sequential method for bit-wise complementary learning. Linear complexity. Outperform state of the art in terms of both accuracy and scalability. Li ( Learning to Hash LAMDA, CS, NJU 21 / 43
22 Scalable Graph Hashing with Feature Transformation Model and Learning Notation X = {x 1,..., x n } T R n d : n data points. b i {+1, 1} c : binary code of point x i. b i = [h 1 (x i ),..., h c (x i )] T, where h k (x) denotes hash function. Pairwise similarity metric defined as: S ij = e x i x j 2 F /ρ (0, 1] Li ( Learning to Hash LAMDA, CS, NJU 22 / 43
23 Scalable Graph Hashing with Feature Transformation Model and Learning Objective Function min ij ( S ij 1 c bt i b j) 2 S ij = 2S ij 1 ( 1, 1]. Hash function: h k (x i ) = sgn( m j=1 W ijφ(x i, x j ) + bias) x i, map it into Hamming space as b i = [h 1 (x i ),, h c (x i )] T x, define: K(x) = [φ(x, x 1 ) n i=1 φ(x i, x 1 )/n,..., φ(x, x m ) n i=1 φ(x i, x m )/n] Objective function with the parameter W R c m : min W s.t. c S sgn(k(x)w T )sgn(k(x)w T ) T 2 F WK(X) T K(X)W T = I Li ( Learning to Hash LAMDA, CS, NJU 23 / 43
24 Scalable Graph Hashing with Feature Transformation Model and Learning Feature Transformation x, define P (x) and Q(x): 2(e P (x) = [ 2 1) eρ 2(e Q(x) = [ 2 1) eρ e x 2 F ρ e x; e e x 2 F ρ e x; e e x 2 F ρ ; 1] e x 2 F ρ ; 1] x i, x j X P (x i ) T Q(x j ) = 2[ e2 1 2e 2xT i x j ρ + e e 2e x i 2 F x j 2 F +2xT i x j ρ 1 = 2e x i x j ρ 2 F 1 = S ij ]e x i 2 F + x j 2 F ρ 1 Li ( Learning to Hash LAMDA, CS, NJU 24 / 43
25 Scalable Graph Hashing with Feature Transformation Model and Learning Feature Transformation Here, we use an approximation e2 1 2e x + e2 +1 2e e x We assume 1 2 ρ xt i x j 1. It is easy to prove that ρ = 2 max{ x i 2 F }n i=1 can make 1 2 ρ xt i x j 1. Then we have S P (X) T Q(X) Li ( Learning to Hash LAMDA, CS, NJU 25 / 43
26 Scalable Graph Hashing with Feature Transformation Model and Learning Sequential Learning Strategy Direct relaxation may lead to poor performance We adopt a sequential learning strategy in a bit-wise complementary manner Residual definition: R t = c S t 1 i=1 sgn(k(x)w i)sgn(k(x)w i ) T R 1 = c S Objective function: min w t By relaxation, we can get: R t sgn(k(x)w t )sgn(k(x)w t ) T 2 F s.t. w T t K(X) T K(X)w t = 1 min w t tr(w T t K(X) T R t K(X)w t ) s.t. w T t K(X) T K(X)w t = 1 Li ( Learning to Hash LAMDA, CS, NJU 26 / 43
27 Scalable Graph Hashing with Feature Transformation Model and Learning Sequential Learning Strategy Then we obtain a generalized eigenvalue problem: K(X) T R t K(X)w t = λk(x) T K(X)w t Define A t = K(X) T R t K(X) R m m, then: A t = A t 1 K(X) T sgn(k(x)w t 1 )sgn(k(x)w t 1 ) T K(X) Key component: A 1 = ck(x) T SK(X) = c[k(x) T P (X) T ][Q(X) K(X)] }{{} S Li ( Learning to Hash LAMDA, CS, NJU 27 / 43
28 Scalable Graph Hashing with Feature Transformation Model and Learning Sequential Learning Strategy Adopting the residual matrix: c R t = c S sgn(k(x)w i )sgn(k(x)w i ) T i=1,i t we can learn all the W = {w i } c i=1 for multiple rounds. This can further improve the accuracy. We continue it for one more round to get a good tradeoff between accuracy and speed. Li ( Learning to Hash LAMDA, CS, NJU 28 / 43
29 Scalable Graph Hashing with Feature Transformation Model and Learning Sequential Learning Algorithm Li ( Learning to Hash LAMDA, CS, NJU 29 / 43
30 Scalable Graph Hashing with Feature Transformation Model and Learning Complexity Analysis Initialization P (X) and Q(X): O(dn) Kernel initialization: O(dmn + mn) A 1 and Z: O((d + 2)mn) Main procedure O(c(mn + m 2 ) + m 3 ) Typically, m, d, c will be much less than n. Hence, the time complexity is O(n). Furthermore, the storage complexity is also O(n). Li ( Learning to Hash LAMDA, CS, NJU 30 / 43
31 Scalable Graph Hashing with Feature Transformation Experiment Dataset TINY-1M One million images from the set of 80M tiny images. Each tiny image is represented by a 384-dim GIST descriptors. MIRFLICKR-1M One million Flickr images. Extract 512-dim features from each image. Li ( Learning to Hash LAMDA, CS, NJU 31 / 43
32 Scalable Graph Hashing with Feature Transformation Experiment Evaluation Protocols and Baselines Evaluation Protocols: Ground truth: top 2% nearest neighbors in the training set in terms of Euclidean distance Hamming ranking: Top-K precision Training time for scalability Baselines: LSH (Andoni and Indyk, 2008) PCAH (Gong and Lazebnik, 2011) ITQ (Gong and Lazebnik, 2011) AGH (Liu et al., 2011) DGH (Liu et al., 2014): DGH-I and DGH-R Li ( Learning to Hash LAMDA, CS, NJU 32 / 43
33 Scalable Graph Hashing with Feature Transformation Experiment Top-1k Precision Method 32 bits 64 bits 96 bits 128 bits 256 bits SGH ITQ AGH DGH-I DGH-R PCAH LSH Method 32 bits 64 bits 96 bits 128 bits 256 bits SGH ITQ AGH DGH-I DGH-R PCAH LSH Li ( Learning to Hash LAMDA, CS, NJU 33 / 43
34 Scalable Graph Hashing with Feature Transformation Experiment Top-k Precision SGH ITQ AGH DGH I DGH R PCAH LSH #Returned Samples (a) 64 bit Precision SGH ITQ AGH DGH I DGH R PCAH LSH #Returned Samples (b) 128 bit Li ( Learning to Hash LAMDA, CS, NJU 34 / 43
35 Scalable Graph Hashing with Feature Transformation Experiment Top-k Precision SGH ITQ 0.4 AGH DGH I 0.3 DGH R PCAH LSH #Returned Samples (a) 64 bit Precision SGH ITQ AGH DGH I DGH R PCAH LSH #Returned Samples (b) 128 bit Li ( Learning to Hash LAMDA, CS, NJU 35 / 43
36 Scalable Graph Hashing with Feature Transformation Experiment Training Time (in second) Training Here, t 1 = Method 32 bits 64 bits 96 bits 128 bits 256 bits SGH ITQ AGH t t t t t 1 DGH-I t t t t t 1 DGH-R t t t t t 1 PCAH LSH Training Here, t 2 = Method 32 bits 64 bits 96 bits 128 bits 256 bits SGH ITQ AGH t t t t t 2 DGH-I t t t t t 2 DGH-R t t t t t 2 PCAH LSH Li ( Learning to Hash LAMDA, CS, NJU 36 / 43
37 Scalable Graph Hashing with Feature Transformation Experiment Sensitivity to Parameter ρ (a) TINY-1M (b) MIRFLICKR-1M Li ( Learning to Hash LAMDA, CS, NJU 37 / 43
38 Scalable Graph Hashing with Feature Transformation Experiment Sensitivity to Parameter m (a) TINY-1M (b) MIRFLICKR-1M Li ( Learning to Hash LAMDA, CS, NJU 38 / 43
39 Conclusion Outline 1 Introduction Problem Definition Existing Methods 2 Scalable Graph Hashing with Feature Transformation Motivation Model and Learning Experiment 3 Conclusion 4 Reference Li ( Learning to Hash LAMDA, CS, NJU 39 / 43
40 Conclusion Conclusion Hashing can significantly improve retrieval speed and reduce storage cost. It has become a hot research topic in big learning with wide applications. We have proposed a series of hashing methods, including unsupervised, supervised, and multimodal methods. Furthermore, some quantization strategies (Kong and Li, 2012a; Kong et al., 2012) are also designed. In particular, the details of SGH with feature transformation (Jiang and Li, 2015) are introduced in this talk. Li ( Learning to Hash LAMDA, CS, NJU 40 / 43
41 Conclusion Future Trends Discrete optimization and learning Scalable training for supervised hashing Quantization New applications 5ŒêâMFÆSµyG ª³6[5 ÆÏ6,2015] (Li and Zhou, 2015) Li ( Learning to Hash LAMDA, CS, NJU 41 / 43
42 Conclusion Q & A Thanks! Question? Code available at Li ( Learning to Hash LAMDA, CS, NJU 42 / 43
43 Reference Outline 1 Introduction Problem Definition Existing Methods 2 Scalable Graph Hashing with Feature Transformation Motivation Model and Learning Experiment 3 Conclusion 4 Reference Li ( Learning to Hash LAMDA, CS, NJU 43 / 43
44 References A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM, 51(1): , M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the ACM Symposium on Computational Geometry, A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In Proceedings of International Conference on Very Large Data Bases, Y. Gong and S. Lazebnik. Iterative quantization: A procrustean approach to learning binary codes. In Proceedings of Computer Vision and Pattern Recognition, Q.-Y. Jiang and W.-J. Li. Scalable graph hashing with feature transformation. In IJCAI, W. Kong and W.-J. Li. Double-bit quantization for hashing. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI), 2012a. Li ( Learning to Hash LAMDA, CS, NJU 43 / 43
45 References W. Kong and W.-J. Li. Isotropic hashing. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems (NIPS), 2012b. W. Kong, W.-J. Li, and M. Guo. Manhattan hashing for large-scale image retrieval. In The 35th International ACM SIGIR conference on research and development in Information Retrieval (SIGIR), B. Kulis and T. Darrell. Learning to hash with binary reconstructive embeddings. In Proceedings of Neural Information Processing Systems, B. Kulis and K. Grauman. Kernelized locality-sensitive hashing for scalable image search. In Proceedings of International Conference on Computer Vision, B. Kulis, P. Jain, and K. Grauman. Fast similarity search for learned metrics. IEEE Trans. Pattern Anal. Mach. Intell., 31(12): , S. Kumar and R. Udupa. Learning hash functions for cross-view similarity search. In IJCAI, pages , Li ( Learning to Hash LAMDA, CS, NJU 43 / 43
46 References W.-J. Li and Z.-H. Zhou. Learning to hash for big data: current status and future trends. Chinese Science Bulletin, 60: , X. Li, G. Lin, C. Shen, A. van den Hengel, and A. R. Dick. Learning hash functions using column generation. In ICML, W. Liu, J. Wang, S. Kumar, and S. Chang. Hashing with graphs. In Proceedings of International Conference on Machine Learning, W. Liu, J. Wang, R. Ji, Y.-G. Jiang, and S.-F. Chang. Supervised hashing with kernels. In CVPR, pages , W. Liu, C. Mu, S. Kumar, and S. Chang. Discrete graph hashing. In Proceedings of Neural Information Processing Systems, M. Norouzi and D. J. Fleet. Minimal loss hashing for compact binary codes. In Proceedings of International Conference on Machine Learning, M. Norouzi, D. J. Fleet, and R. Salakhutdinov. Hamming distance metric learning. In NIPS, pages , M. Ou, P. Cui, F. Wang, J. Wang, W. Zhu, and S. Yang. Comparing apples to oranges: a scalable solution with heterogeneous hashing. In KDD, pages , Li ( Learning to Hash LAMDA, CS, NJU 43 / 43
47 References M. Raginsky and S. Lazebnik. Locality-sensitive binary codes from shift-invariant kernels. In Proceedings of Neural Information Processing Systems, R. Salakhutdinov and G. Hinton. Semantic Hashing. In SIGIR workshop on Information Retrieval and applications of Graphical Models, R. Salakhutdinov and G. E. Hinton. Semantic hashing. Int. J. Approx. Reasoning, 50(7): , A. Shrivastava and P. Li. Asymmetric lsh (alsh) for sublinear time maximum inner product search (mips). In NIPS, J. Song, Y. Yang, Z. Huang, H. T. Shen, and R. Hong. Multiple feature hashing for real-time large scale near-duplicate video retrieval. In ACM Multimedia, pages , J. Song, Y. Yang, Y. Yang, Z. Huang, and H. T. Shen. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In SIGMOD Conference, pages , C. Strecha, A. A. Bronstein, M. M. Bronstein, and P. Fua. Ldahash: Improved matching with smaller descriptors. IEEE Trans. Pattern Anal. Mach. Intell., 34(1):66 78, Li ( Learning to Hash LAMDA, CS, NJU 43 / 43
48 References A. Torralba, R. Fergus, and Y. Weiss. Small codes and large image databases for recognition. In Proceedings of Computer Vision and Pattern Recognition, J. Wang, S. Kumar, and S.-F. Chang. Sequential projection learning for hashing with compact codes. In Proceedings of International Conference on Machine Learning, 2010a. J. Wang, S. Kumar, and S.-F. Chang. Semi-supervised hashing for large-scale image retrieval. In Proceedings of Computer Vision and Pattern Recognition, 2010b. Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In Proceedings of Neural Information Processing Systems, D. Zhang and W.-J. Li. Large-scale supervised multimodal hashing with semantic correlation maximization. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence (AAAI), D. Zhang, F. Wang, and L. Si. Composite hashing with multiple information sources. In Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval, Li ( Learning to Hash LAMDA, CS, NJU 43 / 43
49 P. Zhang, W. Zhang, W.-J. Li, and M. Guo. Supervised hashing with latent factor models. In SIGIR, Y. Zhen and D.-Y. Yeung. A probabilistic model for multimodal hash function learning. In KDD, pages , 2012a. Y. Zhen and D.-Y. Yeung. Co-regularized hashing for multimodal data. In NIPS, pages , 2012b. () May 12, / 43
The Power of Asymmetry in Binary Hashing
The Power of Asymmetry in Binary Hashing Behnam Neyshabur Payman Yadollahpour Yury Makarychev Toyota Technological Institute at Chicago [btavakoli,pyadolla,yury]@ttic.edu Ruslan Salakhutdinov Departments
More informationFast Matching of Binary Features
Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been
More informationData Structure and Network Searching
Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence A Unified Approximate Nearest Neighbor Search Scheme by Combining Data Structure and Hashing Debing Zhang Genmao
More informationClustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012
Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering
More informationarxiv:1604.07342v2 [cs.cv] 9 Jun 2016
OZDEMIR, NAJIBI, DAVIS: SUPERVISED INCREMENTAL HASHING arxiv:64.7342v2 [cs.cv] 9 Jun 26 Supervised Incremental Hashing Bahadir Ozdemir ozdemir@cs.umd.edu Mahyar Najibi najibi@cs.umd.edu Larry S. Davis
More informationPixels Description of scene contents. Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT) Banksy, 2006
Object Recognition Large Image Databases and Small Codes for Object Recognition Pixels Description of scene contents Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)
More informationHashing with Graphs. wliu@ee.columbia.edu Department of Electrical Engineering, Columbia University, New York, NY 10027, USA
Wei Liu wliu@ee.columbia.edu Department of Electrical Engineering, Columbia University, New York, NY 10027, USA Jun Wang wangjun@us.ibm.com IBM T. J. Watson Research Center, Yorktown Heights, NY 10598,
More informationAsymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS)
Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) Anshumali Shrivastava Department of Computer Science Computing and Information Science Cornell University Ithaca, NY 4853, USA
More informationDistance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center
Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II
More informationTransform-based Domain Adaptation for Big Data
Transform-based Domain Adaptation for Big Data Erik Rodner University of Jena Judy Hoffman Jeff Donahue Trevor Darrell Kate Saenko UMass Lowell Abstract Images seen during test time are often not from
More informationSupervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
More informationBRIEF: Binary Robust Independent Elementary Features
BRIEF: Binary Robust Independent Elementary Features Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua CVLab, EPFL, Lausanne, Switzerland e-mail: firstname.lastname@epfl.ch Abstract.
More informationCompact Hashing with Joint Optimization of Search Accuracy and Time
Compact Hashing with Joint Optimization of Search Accuracy and Time Junfeng He Columbia University ew York, ew York, 0027, USA jh2700@columbia.edu Shih-Fu Chang Columbia University ew York, ew York, 0027,
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationCollective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University
Collective Behavior Prediction in Social Media Lei Tang Data Mining & Machine Learning Group Arizona State University Social Media Landscape Social Network Content Sharing Social Media Blogs Wiki Forum
More informationLearning to Rank Revisited: Our Progresses in New Algorithms and Tasks
The 4 th China-Australia Database Workshop Melbourne, Australia Oct. 19, 2015 Learning to Rank Revisited: Our Progresses in New Algorithms and Tasks Jun Xu Institute of Computing Technology, Chinese Academy
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationVisualization by Linear Projections as Information Retrieval
Visualization by Linear Projections as Information Retrieval Jaakko Peltonen Helsinki University of Technology, Department of Information and Computer Science, P. O. Box 5400, FI-0015 TKK, Finland jaakko.peltonen@tkk.fi
More informationLearning Binary Hash Codes for Large-Scale Image Search
Learning Binary Hash Codes for Large-Scale Image Search Kristen Grauman and Rob Fergus Abstract Algorithms to rapidly search massive image or video collections are critical for many vision applications,
More informationUnsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
More informationLarge Scale Medical Image Search via Unsupervised PCA Hashing
Large Scale Medical Image Search via Unsupervised PCA Hashing Xiang Yu, Shaoting Zhang, Bo Liu, Lin Zhong, Dimitris N. Metaxas Department of Computer Science, Rutgers University Piscataway, NJ, USA {xiangyu,
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationFUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
More informationPredictive Indexing for Fast Search
Predictive Indexing for Fast Search Sharad Goel Yahoo! Research New York, NY 10018 goel@yahoo-inc.com John Langford Yahoo! Research New York, NY 10018 jl@yahoo-inc.com Alex Strehl Yahoo! Research New York,
More informationWhich Space Partitioning Tree to Use for Search?
Which Space Partitioning Tree to Use for Search? P. Ram Georgia Tech. / Skytree, Inc. Atlanta, GA 30308 p.ram@gatech.edu Abstract A. G. Gray Georgia Tech. Atlanta, GA 30308 agray@cc.gatech.edu We consider
More informationComparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
More informationTwo Heads Better Than One: Metric+Active Learning and Its Applications for IT Service Classification
29 Ninth IEEE International Conference on Data Mining Two Heads Better Than One: Metric+Active Learning and Its Applications for IT Service Classification Fei Wang 1,JimengSun 2,TaoLi 1, Nikos Anerousis
More informationFAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION
FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION Marius Muja, David G. Lowe Computer Science Department, University of British Columbia, Vancouver, B.C., Canada mariusm@cs.ubc.ca,
More informationManifold regularized kernel logistic regression for web image annotation
Manifold regularized kernel logistic regression for web image annotation W. Liu 1, H. Liu 1, D.Tao 2*, Y. Wang 1, K. Lu 3 1 China University of Petroleum (East China) 2 *** 3 University of Chinese Academy
More informationProduct quantization for nearest neighbor search
Product quantization for nearest neighbor search Hervé Jégou, Matthijs Douze, Cordelia Schmid Abstract This paper introduces a product quantization based approach for approximate nearest neighbor search.
More informationActive Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
More informationOnline Sketching Hashing
Online Sketching Hashing Cong Leng 1 Jiaxiang Wu 1 Jian Cheng 1 Xiao Bai 2 Hanqing Lu 1 1 National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 2 School of Computer
More informationTensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
More informationSubordinating to the Majority: Factoid Question Answering over CQA Sites
Journal of Computational Information Systems 9: 16 (2013) 6409 6416 Available at http://www.jofcis.com Subordinating to the Majority: Factoid Question Answering over CQA Sites Xin LIAN, Xiaojie YUAN, Haiwei
More informationClassifying Manipulation Primitives from Visual Data
Classifying Manipulation Primitives from Visual Data Sandy Huang and Dylan Hadfield-Menell Abstract One approach to learning from demonstrations in robotics is to make use of a classifier to predict if
More informationMethodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph
More informationCees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.
Visual search: what's next? Cees Snoek University of Amsterdam The Netherlands Euvision Technologies The Netherlands Problem statement US flag Tree Aircraft Humans Dog Smoking Building Basketball Table
More informationBinary Embedding: Fundamental Limits and Fast Algorithm
Xinyang Yi YIXY@UTEXAS.EDU Constantine Caramanis CONSTANTINE@UTEXAS.EDU Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX 787 Eric Price ECPRICE@CS.UTEXAS.EDU
More informationENHANCED WEB IMAGE RE-RANKING USING SEMANTIC SIGNATURES
International Journal of Computer Engineering & Technology (IJCET) Volume 7, Issue 2, March-April 2016, pp. 24 29, Article ID: IJCET_07_02_003 Available online at http://www.iaeme.com/ijcet/issues.asp?jtype=ijcet&vtype=7&itype=2
More informationMicroblog Sentiment Analysis with Emoticon Space Model
Microblog Sentiment Analysis with Emoticon Space Model Fei Jiang, Yiqun Liu, Huanbo Luan, Min Zhang, and Shaoping Ma State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory
More informationMultimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.
Multimedia Databases Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 14 Previous Lecture 13 Indexes for Multimedia Data 13.1
More informationBig Data. Lecture 6: Locality Sensitive Hashing (LSH)
Big Data Lecture 6: Locality Sensitive Hashing (LSH) Nearest Neighbor Given a set P of n oints in R d Nearest Neighbor Want to build a data structure to answer nearest neighbor queries Voronoi Diagram
More informationGRAPH-BASED ranking models have been deeply
EMR: A Scalable Graph-based Ranking Model for Content-based Image Retrieval Bin Xu, Student Member, IEEE, Jiajun Bu, Member, IEEE, Chun Chen, Member, IEEE, Can Wang, Member, IEEE, Deng Cai, Member, IEEE,
More informationA Similarity Search Over The Cloud Based On Image Descriptors Dimensions Value Cardinalities
A Similarity Search Over The Cloud Based On Image Descriptors Dimensions Value Cardinalities STEFANOS ANTARIS and DIMITRIOS RAFAILIDIS, Aristotle University of Thessaloniki In recognition that in modern
More informationSemi-Supervised Learning for Blog Classification
Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Semi-Supervised Learning for Blog Classification Daisuke Ikeda Department of Computational Intelligence and Systems Science,
More informationJoint Feature Learning and Clustering Techniques for Clustering High Dimensional Data: A Review
International Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-03 E-ISSN: 2347-2693 Joint Feature Learning and Clustering Techniques for Clustering High Dimensional
More informationHow To Identify A Churner
2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management
More informationLarge Scale Learning to Rank
Large Scale Learning to Rank D. Sculley Google, Inc. dsculley@google.com Abstract Pairwise learning to rank methods such as RankSVM give good performance, but suffer from the computational burden of optimizing
More informationEfficient Approximate Similarity Search Using Random Projection Learning
Efficient Approximate Similarity Search Using Random Projection Learning Peisen Yuan, Chaofeng Sha, Xiaoling Wang 2,BinYang, and Aoying Zhou 2 School of Computer Science, Shanghai Key Laboratory of Intelligent
More informationUnsupervised and supervised dimension reduction: Algorithms and connections
Unsupervised and supervised dimension reduction: Algorithms and connections Jieping Ye Department of Computer Science and Engineering Evolutionary Functional Genomics Center The Biodesign Institute Arizona
More information4. GPCRs PREDICTION USING GREY INCIDENCE DEGREE MEASURE AND PRINCIPAL COMPONENT ANALYIS
4. GPCRs PREDICTION USING GREY INCIDENCE DEGREE MEASURE AND PRINCIPAL COMPONENT ANALYIS The GPCRs sequences are made up of amino acid polypeptide chains. We can also call them sub units. The number and
More informationSACOC: A spectral-based ACO clustering algorithm
SACOC: A spectral-based ACO clustering algorithm Héctor D. Menéndez, Fernando E. B. Otero, and David Camacho Abstract The application of ACO-based algorithms in data mining is growing over the last few
More informationADVANCED MACHINE LEARNING. Introduction
1 1 Introduction Lecturer: Prof. Aude Billard (aude.billard@epfl.ch) Teaching Assistants: Guillaume de Chambrier, Nadia Figueroa, Denys Lamotte, Nicola Sommer 2 2 Course Format Alternate between: Lectures
More informationLABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
More informationLecture 4 Online and streaming algorithms for clustering
CSE 291: Geometric algorithms Spring 2013 Lecture 4 Online and streaming algorithms for clustering 4.1 On-line k-clustering To the extent that clustering takes place in the brain, it happens in an on-line
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationThe multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationMusic Mood Classification
Music Mood Classification CS 229 Project Report Jose Padial Ashish Goel Introduction The aim of the project was to develop a music mood classifier. There are many categories of mood into which songs may
More informationAn Initial Study on High-Dimensional Data Visualization Through Subspace Clustering
An Initial Study on High-Dimensional Data Visualization Through Subspace Clustering A. Barbosa, F. Sadlo and L. G. Nonato ICMC Universidade de São Paulo, São Carlos, Brazil IWR Heidelberg University, Heidelberg,
More informationMALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph
MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of
More informationDifferential Privacy Preserving Spectral Graph Analysis
Differential Privacy Preserving Spectral Graph Analysis Yue Wang, Xintao Wu, and Leting Wu University of North Carolina at Charlotte, {ywang91, xwu, lwu8}@uncc.edu Abstract. In this paper, we focus on
More informationFacebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
More informationSEARCH ENGINE OPTIMIZATION USING D-DICTIONARY
SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY G.Evangelin Jenifer #1, Mrs.J.Jaya Sherin *2 # PG Scholar, Department of Electronics and Communication Engineering(Communication and Networking), CSI Institute
More informationA Genetic Algorithm-Evolved 3D Point Cloud Descriptor
A Genetic Algorithm-Evolved 3D Point Cloud Descriptor Dominik Wȩgrzyn and Luís A. Alexandre IT - Instituto de Telecomunicações Dept. of Computer Science, Univ. Beira Interior, 6200-001 Covilhã, Portugal
More informationA Learning Based Method for Super-Resolution of Low Resolution Images
A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method
More informationStandardization and Its Effects on K-Means Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03
More informationFast Prototype Based Noise Reduction
Fast Prototype Based Noise Reduction Kajsa Tibell, Hagen Spies, and Magnus Borga Sapheneia Commercial Products AB, Teknikringen 8, 583 3 Linkoping, SWEDEN Department of Biomedical Engineering, Linkoping
More informationMean Shift Based Clustering in High Dimensions: A Texture Classification Example
Mean Shift Based Clustering in High Dimensions: A Texture Classification Example Bogdan Georgescu µ Ilan Shimshoni µ Peter Meer ¾µ Computer Science µ Electrical and Computer Engineering ¾µ Rutgers University,
More informationHashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning
Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning Prateek Jain Algorithms Research Group Microsoft Research, Bangalore, India prajain@microsoft.com Sudheendra Vijayanarasimhan
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationA Survey on Outlier Detection Techniques for Credit Card Fraud Detection
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. VI (Mar-Apr. 2014), PP 44-48 A Survey on Outlier Detection Techniques for Credit Card Fraud
More informationHow To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
More informationE-commerce Transaction Anomaly Classification
E-commerce Transaction Anomaly Classification Minyong Lee minyong@stanford.edu Seunghee Ham sham12@stanford.edu Qiyi Jiang qjiang@stanford.edu I. INTRODUCTION Due to the increasing popularity of e-commerce
More informationThe Delicate Art of Flower Classification
The Delicate Art of Flower Classification Paul Vicol Simon Fraser University University Burnaby, BC pvicol@sfu.ca Note: The following is my contribution to a group project for a graduate machine learning
More informationAn Effective Way to Ensemble the Clusters
An Effective Way to Ensemble the Clusters R.Saranya 1, Vincila.A 2, Anila Glory.H 3 P.G Student, Department of Computer Science Engineering, Parisutham Institute of Technology and Science, Thanjavur, Tamilnadu,
More informationMulti-Class Active Learning for Image Classification
Multi-Class Active Learning for Image Classification Ajay J. Joshi University of Minnesota Twin Cities ajay@cs.umn.edu Fatih Porikli Mitsubishi Electric Research Laboratories fatih@merl.com Nikolaos Papanikolopoulos
More informationHow To Perform An Ensemble Analysis
Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier
More informationCharacter Image Patterns as Big Data
22 International Conference on Frontiers in Handwriting Recognition Character Image Patterns as Big Data Seiichi Uchida, Ryosuke Ishida, Akira Yoshida, Wenjie Cai, Yaokai Feng Kyushu University, Fukuoka,
More informationLearning Compact Binary Codes for Visual Tracking
Learning Compact Binary Codes for Visual Tracking Xi Li, Chunhua Shen, Anthony Dick, Anton van den Hengel Australian Centre for Visual Technologies, The University of Adelaide, SA, Australia Abstract A
More informationScalable Object Detection by Filter Compression with Regularized Sparse Coding
Scalable Object Detection by Filter Compression with Regularized Sparse Coding Ting-Hsuan Chao, Yen-Liang Lin, Yin-Hsi Kuo, and Winston H Hsu National Taiwan University, Taipei, Taiwan Abstract For practical
More informationMimicking human fake review detection on Trustpilot
Mimicking human fake review detection on Trustpilot [DTU Compute, special course, 2015] Ulf Aslak Jensen Master student, DTU Copenhagen, Denmark Ole Winther Associate professor, DTU Copenhagen, Denmark
More informationEmoticon Smoothed Language Models for Twitter Sentiment Analysis
Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Emoticon Smoothed Language Models for Twitter Sentiment Analysis Kun-Lin Liu, Wu-Jun Li, Minyi Guo Shanghai Key Laboratory of
More informationContent Based Data Retrieval on KNN- Classification and Cluster Analysis for Data Mining
Volume 12 Issue 5 Version 1.0 March 2012 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc. (USA) Online ISSN: & Print ISSN: Abstract - Data mining is sorting
More informationAnomaly Detection using multidimensional reduction Principal Component Analysis
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 1, Ver. II (Jan. 2014), PP 86-90 Anomaly Detection using multidimensional reduction Principal Component
More informationGroup Sparse Coding. Fernando Pereira Google Mountain View, CA pereira@google.com. Dennis Strelow Google Mountain View, CA strelow@google.
Group Sparse Coding Samy Bengio Google Mountain View, CA bengio@google.com Fernando Pereira Google Mountain View, CA pereira@google.com Yoram Singer Google Mountain View, CA singer@google.com Dennis Strelow
More informationBiometric Authentication using Online Signatures
Biometric Authentication using Online Signatures Alisher Kholmatov and Berrin Yanikoglu alisher@su.sabanciuniv.edu, berrin@sabanciuniv.edu http://fens.sabanciuniv.edu Sabanci University, Tuzla, Istanbul,
More informationMachine Learning Department, School of Computer Science, Carnegie Mellon University, PA
Pengtao Xie Carnegie Mellon University Machine Learning Department School of Computer Science 5000 Forbes Ave Pittsburgh, PA 15213 Tel: (412) 916-9798 Email: pengtaox@cs.cmu.edu Web: http://www.cs.cmu.edu/
More informationNeural Networks Lesson 5 - Cluster Analysis
Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29
More informationDesign call center management system of e-commerce based on BP neural network and multifractal
Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2014, 6(6):951-956 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 Design call center management system of e-commerce
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More informationClustering Documents with Active Learning using Wikipedia
Clustering Documents with Active Learning using Wikipedia Anna Huang David Milne Eibe Frank Ian H. Witten Department of Computer Science, University of Waikato Private Bag 3105, Hamilton, New Zealand {lh92,
More informationQuery Specific Fusion for Image Retrieval
Query Specific Fusion for Image Retrieval Shaoting Zhang 2,MingYang 1, Timothee Cour 1,KaiYu 1, and Dimitris N. Metaxas 2 1 NEC Laboratories America, Inc. Cupertino, CA 95014 {myang,timothee,kyu}@sv.nec-labs.com
More informationCS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on
CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace
More informationBALANCE LEARNING TO RANK IN BIG DATA. Guanqun Cao, Iftikhar Ahmad, Honglei Zhang, Weiyi Xie, Moncef Gabbouj. Tampere University of Technology, Finland
BALANCE LEARNING TO RANK IN BIG DATA Guanqun Cao, Iftikhar Ahmad, Honglei Zhang, Weiyi Xie, Moncef Gabbouj Tampere University of Technology, Finland {name.surname}@tut.fi ABSTRACT We propose a distributed
More informationMulti-class Classification: A Coding Based Space Partitioning
Multi-class Classification: A Coding Based Space Partitioning Sohrab Ferdowsi, Svyatoslav Voloshynovskiy, Marcin Gabryel, and Marcin Korytkowski University of Geneva, Centre Universitaire d Informatique,
More informationFeature Selection vs. Extraction
Feature Selection In many applications, we often encounter a very large number of potential features that can be used Which subset of features should be used for the best classification? Need for a small
More informationMining Signatures in Healthcare Data Based on Event Sequences and its Applications
Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Siddhanth Gokarapu 1, J. Laxmi Narayana 2 1 Student, Computer Science & Engineering-Department, JNTU Hyderabad India 1
More informationLearning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu
Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of
More informationBayesian Active Distance Metric Learning
44 YANG ET AL. Bayesian Active Distance Metric Learning Liu Yang and Rong Jin Dept. of Computer Science and Engineering Michigan State University East Lansing, MI 4884 Rahul Sukthankar Robotics Institute
More information