AI and and Brain Science Toward Mathematical Theory of MLP

Size: px

Start display at page:

Download "AI and and Brain Science Toward Mathematical Theory of MLP"

Egbert Ellis
7 years ago
Views:

1 First Japan Korea Machine Learning Symposium AI and and Brain Science Toward Mathematical Theory of MLP Shun ichi Amari RIKEN Brain Science Institute Principles of the Brain many neurons connected (network) parallel dynamics learning through synaptic plasticity

2 Brain has found and implemented the principles through evolution (random search) historical restriction material restriction Very complex (not smartly designed)

3 Mathematical Neuroscience searches for the principles mathematical studies using simple idealistic models (not realistic) Computational neuroscience AI : technological realization

4 Brief History of AI and BT First Boom 1950~ AI BT Dartmous Conf. Perceptron symbol universal computation logic learning machine Dark period (late 1960~1970 s) stochastic descent learning (1967) for MLP

5 First stochastic descent learning of MLP (1967;1968) Information Theory II Geometrical Theory of Information Shun ichi Amari University of Tokyo Kyoritu Press, Tokyo, 1968

7 x, max w xw, x min w xw, x f v v w 1 max x v 1 y w 4 max v 2

8 Second Boom 1970~ AI 1980~ BT (neural networks) expert system MLP (backprop) (MYCIN) associative memory stochastic inference (Bayes) chess (1997)

9 Third Boom 2000~ Deep learning Stochastic inference (graphical model; Bayesian; WATSON) Deep learning pattern recognition: vision, auditory, sentence analysis shougi (Japanese chess; alpha go) Language processing; sequence and dynamics (word2vec, deep learning with rec. net) Integration of (symbol, logic) vs (pattern, dynamics)

10 Human Brain: Consciousness symbol logic pattern dynamics

11 Libet experiment: Free Will EEG When!

12 Prediction and Postdiction dual dynamics conscious Dynamics decision and action justification, logical reasoning

13 Deep learning Pattern dynamics symbol, sentence, logic (prediction) Learning conscious machine: postdiction

14 Future AI and BT Postdiction: logic symbol, logic pattern dynamics Associative memory AI gives the existence proof of the principles AI and BT searching for the same principles different implementation

15 Mathematical Theory of Multilayer Perceptrons Dynamics of Self Organization and Singularities in Supervised Learning Towards Understanding Deep Learning Shun ichi Amari RIKEN Brain Science Institute collaborator R. Karakida (U Tokyo)

16 Deep Learning Self Organization + Supervised Learning RBM: Restricted Boltzmann Machine Auto Encoder, Recurrent Net Dropout Contrastive divergence convolution

17 Simple Hebbian Self Organization : p( v)

18 self organization of

19 Equillibrium

20 Equillibrium: special cases

21 Two and many clusters

22 Dynamics of self organization

23 Lyapunov Function

24 Further Problems Dimension reduction; PCA, ICA Distributed small clusters; large clusters Mutual interactions among h neurons neural field Localized receptive fields invariance: convolution

25 RBM: Restricted Boltzmann Machine

26 Self Organization

27 Interaction of Hidden Neurons

29 Recurrent Net (Auto Encoder)

30 Gaussian Boltzmann Machine

31 Equilibrium Solution (R. Karakida) General Solution othogonal matrix, diagonalized by You can choose m( k) eigen values form Stable Solution the case of m = k

32 Bernoulli Gaussian RBM ICA R. Karakida

33 Equilibrium Analysis: Results Assumption of Input s: Independent and nonnegative sources B: N N orthogonal matrix ICA (independent Component Analysis) Solutions If, ML and CD learning have the following stable solutions: W s Space Mean value: Model variance : σ CD Solutions ICA ML Solutions 33

34 Simulation The number of Neurons: N = M = 2, σ = 1/2 Sources p (s) Uniform Distribution Mixing Input CD ICA Solution Output Independent sources are extracted in G B RBM 34

35 Structure of environment: good model Uniform : no structure Aggregate of clusters : Hebb self organization PCA : Gaussian RBM submanifolds ICA : Bernoulli Gaussain sparse Hierarchy : deep learning invariancy logical structure hierarchies of hierarchy

36 Supervised Learning Multilayer perceptron Back prop learning Singularity!! Natural Gradient Solves Difficulty

37 Mathematical Neurons y wx h i i w x x ( u) y u

38 Multilayer Perceptrons y v i wi x w 1 x x ( x1, x2,..., x n ) x y f x v w x, i i ( w,..., w ; v,..., v ) 1 m 1 m

39 Multilayer Perceptron neuromanifold () x space of functions S y f x, θ v i w i x θ v, v ; w, w 1 m 1, m

40 Backpropagation --- stochastic gradient learning x x examples :,,, training set y1 1 y t t 1 l( y, x; ) y f x, 2 log p y, x; 2 l( yt, xt; t) t t f x, v w x i i

41 singularities

42 Geometry of singular model y v wx n v v w 0 W

43 model: 2 hidden neurons f x, w J x w J x y f x, t 1 u 2 u e dt 2 2

44 1 loss function: l, y; y f, 2 x x 2 y : teacher signal : 0 stochastic descent learning l x, y, t t t backprop : vanilla gradient

45 Natural Gradient Stochastic Descent x, y, 1 G t t t t G l l : Fisher Information Matrix invarint; steepest descent

46 Natural Gradient (Riemannian) max dl l d l d 2 1 l G l lx (, y; ) t t t t t

47 Steepest Direction---Natural Gradient l( ) l l l,, 1 n 1 l G l 2 d i j d d Gd = G d d ij lx (, y; ) t t t t t

48 Natural gradient is superior Steepest descent; invariant Yan Ollivier Fisher efficient Natural gradient is non vanishing even in multiple layers Good at singular regions (avoid plateaus: Milnor attractor)

49 Adaptive Natural Gradient

50 Singular Region in Parameter Space R w w w w, J J J J, w 0, w w, J w w, w 0, J J J f x, w J x w J x

51 Coordinate transformation v w J w w J w , w w w 1 2, u J J 2 1, z w w w w v, w, u, z

52 Singular Region, J u0 z 1 R w

53 Singular lines in the parameter space

54 Taylor expansion u : small w f w z 8 2 x, vx vx 1 ux w 2 3 vx z 1z ux 24 2 fast dynamics w, v : stability slow dynamics u, z

55 neiborhood of R u w 2 1z eu xx 2 z z z e 4w solution:trajectory 2 3 u x z 3 u t w log c 2 3 z t 2 2

56 Stability 1 true solution is in R : R u 0 or z 1 : stable

57 Dynamic vector fields: Redundant case

58 Stability 2 : true solution is outside R H e T x xx wh : positive-definite z 1 stable ; z 1 unstable wh : negative-definite z 1 stable ; z 1 unstable

59 Learning Trajectory near the singularity

60 Milnor attractor

61 Dynamic vector fields: General case ( z >1 part stable )

62 Fig. 2: trajectories

63 Saddle and plateau

64 retardation of learning: plateau E 1 2 e 2 E E O u O u 5 2

65 Topology of singular R blow-down coordinates : =,, e 2 2 c1 1 z u, u u 2 cz z u 2 3 1, e u S, 1 n e u

66 Singular Region, J u0 z 1 R w

68 Sphere Sn and Projective space Pn

69 natural gradient learning near singularity d dt : true modelr d dt O 1 : true model R Milnor attractor

70 How to realize the natural gradient adaptive natural gradient G G G l lg t 1 1 t t t Unitwise diagonalization of G: Yan Olliver G 1 l : non-singular G: unitwise-diagonalization is OK (Ollivier)

71 Natural Gradient Learning Simple and Multilayer Perceptron y f x, 1 p q f 2 x, y; xexp y x 1 l log p, ; log q 2 x y x y f x 2 2 x f x, y e 1 2 y f x 2 G 1 e

72 Simple perceptron y wx u 2 0 exp v 2 1 u u exp 2 2 u 2 2 dv x w y l wx x w G 1 w E exp q xx wx 2 2

73 Fisher information matrix x 0, q N I G w I 1 2w 1 2w 2 ww 2 w 12 2ww 1 2 G w I l G l 12w e 2 exp 2 2 ElG l 1, w w x w xw w x 0 2

74 q x : singular u x ux 0 ug w u 0 : G singular x x w w u 1, 1, 0,, 0 w 1 w 2 x w y

75 MLP x z r z L y z 0 W W 1 2 W W r L 1 z r r r1 W z r 1,, L z 0 x y W L1 z y f x, W L W W,, W 1 L1

76 error back propagation e e W z, r 1,, L r r1 r r1 e y f x, W L1 0 Fisher information matrix G E Wl r W l r E ee zz r r r r1

77 unitwise metric : Olivier, Kurita 2 G r E erz r1zr1 unitwise metric G diag G, G,, G unit L1 L 1 G diag G,, G r r1 rn

78 Singular Region R w 1 r 1) w w r1 r1 1 2 w 2 w G r1 : singular 2) w 0 G r1 : singular w r

79 W l 0 in R G 1 1 l G l : finite G and G unit

80 High Dimensions 2 e Prob wi wjv1/ n n 2

NEURAL NETWORKS A Comprehensive Foundation

NEURAL NETWORKS A Comprehensive Foundation Second Edition Simon Haykin McMaster University Hamilton, Ontario, Canada Prentice Hall Prentice Hall Upper Saddle River; New Jersey 07458 Preface xii Acknowledgments