Big Data in Web Age - 互 联 网 时 代 的 大 数 据 Zhang Bo( 张 钹 ) Department of Computer Science &Technology, Tsinghua University
大 数 据 时 代 Volume: 2.8ZB (10 21 bytes), Variety, Velocity, 大 海 捞 针 Searching for a needle in a haystack!
The Characteristics of Big Data Data from crowds to crowds 34% useful, illusive, useless, content safety, Raw data 7%-tagged, 1%-analyzed
Man-Machine Interface Text, Speech, Image,. Behaviors Programming Encoding Unser s Intention Interests Meaning Semantics Content Interpretation Decoding Code Data Instruction Computer Net
Image Retrieval by Keywords - white horse (Google)
Beidu (A Chinese Web) - 马, 树 (Horse, Tree)
The New Demands of Information Processing in Big Data Age Users Intention Users Interest Users feeling, Understanding (Comprehension) of information meaning
The fundamental difficulty met by the traditional information processing
Why? Basic Assumption Meaning-Form Separation Meaning independent assumption -R. Hartley These semantic aspects of communication irrelevant to the engineering problem. -C. E. Shannon [1] R. V. L. Hartley, Transmission of information, Bell System Technical Journal, July 1928, pp.535-563 [2] C. E. Shannon, A mathematical theory of communication, Bell System Technical Journal, vol. 27, pp.379-423, July, pp.623-656, October 1948
Comprehension The Natural (Objective) Meaning
The Demand of Meaning Dependent based Information Theory Text Speech Image Human Sender X refer to, correlate physical or conceptual world Machine Receiver X X Traditional Information Processing Meaning M
Challenges! Can a machine deal with information meaning? How a machine to deal with meaning? Can a traditional information theory deal with meaning and how?
Probability-based Theory Sender X M refer to, correlate physical or conceptual world F (W, D) Mapping Receiver X representation coding data Feature Space
Fundamental Problems Feature Representation Meaning Does the mapping exist? How to find the mapping?
Does there exist such a mapping? 数 字 视 频 编 码 技 术 发 展 至 今 已 有 半 个 世 纪 的 历 史, 已 取 得 很 大 的 进 展 从 五 十 年 代 的 差 分 预 测 编 码, 到 七 十 年 代 的 变 换 编 码 基 于 块 的 运 动 预 测 编 码, 直 到 如 今 兴 起 的 分 布 式 编 码 立 体 视 编 码 多 视 编 码 视 觉 编 码 等 等 Mapping? Meaning (Data) (Rules, Concepts)
No, In general! Mapping Semantic Gap Meaning, Semantics Data Bag of words (text) Colors, textures, (image) Frequency spectrum (speech)
Data Driven Methods Dataset Pattern Machine Learning A specific data set A proper representation There exists such a mapping
How to Mining the Mapping Ill-posed Problems Existence Uniqueness 1 3 Stability 2 Machine Learning
Classical Statistics Solution Law of large numbers in function spaces Parametric Statistics Assumption: a known function with a few unknown parameters ax 2 bx c
Recent Results F( x, y) F( y x) F( x), y f ( x) Data Function Rules F( x, y ) f( x) If or exists, the rule can be found in probabilistic sense Pe ( ) N
Data Driven based Machine Learning (Rote, Superficial) Without Comprehension! Can machines understand text, image, or speech?
Artificial Intelligence Methods Human Machine Text Speech Image Sender X refer to, correlate AI physical or conceptual world Meaning S Receiver X Information processing with understanding
Expert Systems Human disease diagnosis system Production Rules If a, symptoms (fuzzy) CF: certainty factors Then b function disorder (fuzzy) Inference Engine
Scopes of Application Deliberative behaviors problem solving, decision making, diagnosis, planning, common sense, natural language understanding, Perception vision, speech, touch, etc.
Nature Language Understanding Manual Rule-based knowledge representation Syntax, Morphology, Semantics,.. Symbolic Inference
Neither Traditional Information Processing nor AI along can solve the comprehension problem How will we do next?
Comprehension Text: Contextual structures Image: Spatial structure Speech: Temporal structure Video: Temporal-Spatial structure Structured Analysis & Representation 数 字 视 频 编 码 技 术 发 展 至 今 已 有 半 个 世 纪 的 历 史, 已 取 得 很 大 的 进 展 从 五 十 年 代 的 差 分 预 测 编 码, 到 七 十 年 代 的 变 换 编 码 基 于 块 的 运 动 预 测 编 码, 直 到 如 今 兴 起 的 分 布 式 编 码 立 体 视 编 码 多 视 编 码 视 觉 编 码 等 等 t
Computer Comprehension of Text Paragraph 数 字 视 频 编 码 技 术 发 展 至 今 已 有 半 个 世 纪 的 历 史, 已 取 得 很 大 的 进 展 从 五 十 年 代 的 差 分 预 测 编 码, 到 七 十 年 代 的 变 换 编 码 基 于 块 的 运 动 预 测 编 码, 直 到 如 今 兴 起 的 分 布 式 编 码 立 体 视 编 码 多 视 编 码 视 觉 编 码 等 等 Sentence-1 Sentence-2. Sentence-n Word-11 Word-12,.. Word-1m, Word-21, Word-22,.
This figure is from Serre et al.'s A quantitative theory of immediate visual recognition. Prog Brain Res. 2007.
Unsupervised Deep Learning 9 layers sparse deep autoencoder 10 million 200x200 images 1 billion connections 1,000 machines (16,000 cores), 3 days 1 billion trainable parameters Q. V. Le, Building high-level feature using large scale unsupervised learning Proc. 29 th ICML, 2012
Results (Generalization Capacity ) Concept Random guess Same architecture with random weights Best linear filter Best first layer neuron Best neuron Best neuron without contrast normalization Faces 64.8% 67.0% 74.0% 71.0% 81.7% 78.5% Human bodies 64.8% 66.5% 68.1% 67.2% 76.8% 71.8% Cats 64.8% 66.0% 67.8% 67.1% 74.6% 69.3% Concept Stanford network Deep autoencoders 3 layers Deep autoencoders 6 layers K-means on 40x40 images Faces 81.7% 72.3% 70.9% 72.5% Human bodies 76.7% 71.2% 69.8% 69.3% Cats 74.8% 67.5% 68.3 68.5%
Computer Comprehension of Visual Information Top-down feedback Top-down feedback High-level Local connection Knowledgedriven Data-driven V1 V2 IT
Data-driven + Knowledge-driven Statistical Inference over An Abstract Structured Declarative Knowledge Representation [1] The probabilistic approach to Artificial Intelligence [2] [1] Tenenbaum, J. B. (CMU), 2011, How to Grow a Mind: Science 11 march 2011: vol.331, no.6022, pp1279-1285 [2] Judea Pearl: 2011 winner of ACM Turing award
Quotient Space Based Problem Solving -A theoretical foundation of granular computing
国 内 发 行
Structural Prediction Learning Learning Rules Classification Structural Prediction Maximal Joint Likelihood Estimation Maximal Conditional Likelihood Estimation Maximal Margin Learning Maximal Entropy Discrimination Learning Naïve Bayesian Network Logistic Regression SVM Maximal Entropy Discrimination Model Hidden Markov Model (1966) 1 Conditional Random Field (2001) 2 Maximal Margin Markov Net (2003) 3 Maximal Entropy Discrimination Markov Net (2008) (zhu Jun)
Prior Distribution Likelihood Function Posteriori Distribution T. Bayes (1702 1761) Bayesian Theorem Optimization based Regularized Bayesian Inference Prior Distribution Likelihood Function Posteriori Constraints Optimization Theory Posteriori Distribution Attributes Domain knowledge Zhu Jun, Tsinghua University
Neural Turing Machine Google DeepMind, London, UK External Input External Output Recurrent NN Feedforward NN Read Heads Write Heads Memory
Three Levels of Processing Natural meaning-recognition Ill-posed problems Sender s Intention Context-Aware, Psychological model Receiver s Reaction-Impact Social knowledge,
Conclusions Basic Foundation Content related information processing Multi-granular Computing Applied Foundation Algorithms, Architecture, Parallelism, Management, Storage,
Publications-Journal Papers J. Zhu, A. Ahmed, E.P. Xing. MedLDA: Maximum Margin Supervised Topic Models. Journal of Machine Learning Research (JMLR), 13(Aug):2237--2278, 2012 N. Chen, J. Zhu, F. Sun, E.P. Xing. Large-margin Subspace Learning for Multi-view Data Analysis. IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), vol. 34, no. 12, pp. 2365-2378, Dec. 2012. C. Liu, B. Zhang, J. Zhu, and D Wang. Learning a Contextual Multithread Model for Movie/TV Scene Segmentation, IEEE Transactions on Multimedia (TMM), 2012. X. Hu and J. Wang, Solving the assignment problem using continuoustime and discrete-time improved dual networks, IEEE Transactions on Neural Networks and Learning Systems (TNNLS), vol. 23, no. 5, pp. 821-827, 2012. X. Hu and B. Zhang, A Gaussian attractor network for memory and recognition with experience-dependent Learning, Neural Computation, vol. 22, no. 5, pp. 1333-1357, 2010. X. Hu, C. Sun and B. Zhang, Design of recurrent neural networks for solving constrained least absolute deviation problems, IEEE Transactions on Neural Networks (TNN), vol. 21, no. 7, pp. 1073-1086, July 2010.
J. Zhu, E.P. Xing. Maximum Entropy Discrimination Markov Networks. Journal of Machine Learning Research (JMLR), vol. 10(Nov):2531-2569, 2009. X. Hu and B. Zhang, A new recurrent neural network for solving convex quadratic programming problems with an application to the k- winners-take-all problem, IEEE Transactions on Neural Networks (TNN), vol. 20, no. 4, pp. 654 664, April 2009. D. Wang, Z. Wang J. Li, B. Zhang, and X. Li. Query representation by structured concept threads with application to interactive video retrieval. Journal of Visual Communication and Image Representation. 2009, Vol 20 (2): 104-116 J. Zhu, Z. Nie, B. Zhang, and J. Wen. Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction, Journal of Machine Learning Research (JMLR), vol. 9(Jul):1583--1614, 2008.
Conference Papers J. Zhu, N. Chen, H. Perkins, B. Zhang. Gibbs Max-Margin Supervised Topic Models with Fast Sampling Algorithms, In Proc. of the 30th International Conference on Machine Learning (ICML), Atlanta, USA, 2013. M. Xu, J. Zhu, B. Zhang. Fast Max-Margin Matrix Factorization with Data Augmentation, In Proc. of the 30th International Conference on Machine Learning (ICML), Atlanta, USA, 2013. N. Chen, J. Zhu, F. Xia, and B. Zhang. Generalized Relational Topic Models with Data Augmentation, To Appear in Proc. of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. M. Xu, J. Zhu, and B. Zhang. Bayesian Nonparametric Maximum Margin Matrix Factorization for Collaborative Prediction, Advances in Neural Information Processing Systems (NIPS), Lake Tahoe, USA, 2012. Q. Jiang, J. Zhu, M. Sun, and E.P. Xing. Monte Carlo Methods for Maximum Margin Supervised Topic Models, Advances in Neural Information Processing Systems (NIPS), Lake Tahoe, USA, 2012. J. Ji, J. Li, S. Yan, B. Zhang, and Q. Tian. Super-Bit Locality-Sensitive Hashing. Advances in Neural Information Processing Systems (NIPS), Lake Tahoe, USA, 2012.
J. Zhu. Max-Margin Nonparametric Latent Feature Models for Link Prediction, In Proc. of the 29th International Conference on Machine Learning (ICML), Edinburgh, Scotland, 2012. J. Zhu, N. Chen, E.P. Xing. Infinite Latent SVM for Classification and Multitask Learning, Advances in Neural Information Processing Systems (NIPS), Granada, Spain, 2011. J. Zhu, E.P. Xing. Sparse Topical Coding, In Proc. of 27th Conference on Uncertainty in Artificial Intelligence (UAI), Barcelona, Spain, 2011. J. Zhu, N. Chen, E.P. Xing. Infinite SVM: a Dirichlet Process Mixture of Large-margin Kernel Machines, In Proc. of the 28th International Conference on Machine Learning (ICML), Bellevue, Washington, USA, 2011. J. Zhu, L.-J. Li, L. Fei-Fei, E.P. Xing. Large Margin Training of Upstream Scene Understanding Models, Advances in Neural Information Processing Systems (NIPS), Vancouver, B.C., Canada, 2010. S. Lee, J. Zhu, E.P. Xing. Detecting eqtls using Adaptive Multi-task Lasso, Advances in Neural Information Processing Systems (NIPS), Vancouver, B.C., Canada, 2010. N. Chen, J. Zhu and E.P. Xing. Predictive Subspace Learning for Multiview Data: a Large Margin Approach, Advances in Neural Information Processing Systems (NIPS), Vancouver, B.C., Canada, 2010.
J. Zhu, E.P. Xing. Conditional Topic Random Fields, In Proc. of the 27th International Conference on Machine Learning (ICML), Haifa, Israel, 2010. J. Zhu, and E.P. Xing. On Primal and Dual Sparsity of Markov Networks, In Proc. of 26th International Conference on Machine Learning (ICML), Montreal, Canada, 2009. J. Zhu, A. Ahmed, and E.P. Xing. MedLDA: Maximum Margin Supervised Topic Models for Regression and Classification, In Proc. of 26th International Conference on Machine Learning (ICML), Montreal, Canada, 2009. J. Zhu, E.P. Xing, and B. Zhang. Partially Observed Maximum Entropy Discrimination Markov Networks, Advances in Neural Information Processing Systems (NIPS), Vancouver, B.C., Canada, 2008. J. Zhu, E.P. Xing, and B. Zhang. Laplace Maximum Margin Markov Networks, In Proc. of the 25th International Conference on Machine Learning (ICML), Helsinki, Finland, 2008. J. Zhu, Z. Nie, et al. 2D Conditional Random Fields for Web Information Extraction, In Proc. of the 22nd International Conference on Machine Learning (ICML), Bonn, Germany, 2005.
J. Zhu, X. Zheng, L. Zhou, and B. Zhang. Scalable Inference in Maxmargin Supervised Topic Models, To Appear in Proc. of the 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD), Chicago, USA, 2013. J. Zhu, X. Zheng, and B. Zhang. Bayesian Logistic Supervised Topic Models with Data Augmentation, To Appear in Proc. of the 51st Annual Meeting of the Association for Computational Linguistics (ACL), Sofia, Bulgaria, 2013. A. Zhang, J. Zhu, and B. Zhang. Sparse Online Topic Models, In Proc. of the 22nd International World Wide Web Conference (WWW), Rio de Janeiro, Brazil, 2013. Y. Tian and J. Zhu. Learning from Crowds in the Presence of Schools of Thought, In Proc. of the 18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD), Beijing, China, 2012. L. Xie, Q. Tian, and B. Zhang: Spatial pooling of heterogeneous features for image applications. ACM Multimedia 2012: 539-548 J. Zhu, N. Lao, and E.P. Xing. Grafting-Light: Fast, Incremental Feature Selection and Structure Learning of Markov Random Fields, In Proc. of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), Washington DC, USA, 2010.
X. Shi, J. Zhu, R. Cai, and L. Zhang. User Grouping Behaviror in Online Forums, In Proc. of 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), Paris, France, 2009. Y. Liang, J. Li, and B. Zhang. Vocabulary-based hashing for image search. ACM MM 2009. 589-592; J. Zhu, Z. Nie, X. Liu, B. Zhang, and J.-R. Wen. StatSnowball: a Statistical Approach to Extracting Entity Relationships, In Proc. of 18th International Word Wide Web Conference (WWW), Madrid, Spain, 2009. J. Yuan, J. Li, and B. Zhang. Scene understanding with discriminative structured prediction. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2008; J. Zhu, Z. Nie, et al. Simultaneous Record Detection and Attribute Labeling in Web Data Extraction, In Proc. of the 12nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), Philadelphia, PA, USA, 2006.
谢 谢!