Support Vector and Kernel Machines


 Ira Palmer
 1 years ago
 Views:
Transcription
1 Supprt Vectr and Kernel Machines Nell Cristianini BIOwulf Technlgies ICML 2001
2 A Little Histry SVMs intrduced in COLT92 by Bser, Guyn, Vapnik. Greatly develped ever since. Initially ppularied in the NIPS cmmunity, nw an imprtant and active field f all Machine Learning research. Special issues f Machine Learning Jurnal, and Jurnal f Machine Learning Research. Kernel Machines: large class f learning algrithms, SVMs a particular instance.
3 A Little Histry Annual wrkshp at NIPS Centralied website: Tetbk (2000): see Nw: a large and diverse cmmunity: frm machine learning, ptimiatin, statistics, neural netwrks, functinal analysis, etc. etc Successful applicatins in many fields (biinfrmatics, tet, handwriting recgnitin, etc) Fast epanding field, EVERYBODY WELCOME! 
4 Preliminaries Task f this class f algrithms: detect and eplit cmple patterns in data (eg: by clustering, classifying, ranking, cleaning, etc. the data) Typical prblems: hw t represent cmple patterns; and hw t eclude spurius (unstable) patterns (= verfitting) The first is a cmputatinal prblem; the secnd a statistical prblem.
5 Very Infrmal Reasning The class f kernel methds implicitly defines the class f pssible patterns by intrducing a ntin f similarity between data Eample: similarity between dcuments By length By tpic By language Chice f similarity Î Chice f relevant features
6 Mre frmal reasning Kernel methds eplit infrmatin abut the inner prducts between data items Many standard algrithms can be rewritten s that they nly require inner prducts between data (inputs) Kernel functins = inner prducts in sme feature space (ptentially very cmple) If kernel given, n need t specify what features f the data are being used
7 Just in case Inner prduct between vectrs Hyperplane: w, + b = 0, = i i i w b
8 Overview f the Tutrial Intrduce basic cncepts with etended eample f Kernel Perceptrn Derive Supprt Vectr Machines Other kernel based algrithms Prperties and Limitatins f Kernels On Kernel Alignment On Optimiing Kernel Alignment
9 Parts I and II: verview Linear Learning Machines (LLM) Kernel Induced Feature Spaces Generaliatin Thery Optimiatin Thery Supprt Vectr Machines (SVM)
10 Mdularity IMPORTANT CONCEPT Any kernelbased learning algrithm cmpsed f tw mdules: A general purpse learning machine A prblem specific kernel functin Any KB algrithm can be fitted with any kernel Kernels themselves can be cnstructed in a mdular way Great fr sftware engineering (and fr analysis)
11 1Linear Learning Machines Simplest case: classificatin. Decisin functin is a hyperplane in input space The Perceptrn Algrithm (Rsenblatt, 57) Useful t analye the Perceptrn algrithm, befre lking at SVMs and Kernel Methds in general
12 Basic Ntatin Input space Output space Hypthesis Realvalued: Training Set Test errr Dt prduct y h X Y = { 1, + 1} H f : X R S = {( 1, y1),...,( i, yi),...} ε,
13 Perceptrn Linear Separatin f the input space f ( ) = w, + b w h( ) = sign( f ( )) b
14 Perceptrn Algrithm Update rule (ignring threshld): y ( w, ) 0 i k i if then w + 1 w + ηy k k i i k k + 1 b w
15 Observatins Slutin is a linear cmbinatin f training pints w = α i 0 α y i i i Only used infrmative pints (mistake driven) The cefficient f a pint in cmbinatin reflects its difficulty
16 Observatins  2 Mistake bund: M R γ 2 g cefficients are nnnegative pssible t rewrite the algrithm using this alternative representatin
17 Dual Representatin IMPORTANT CONCEPT The decisin functin can be rewritten as fllws: f( ) = w, + b= α iyi i, + b w = αiyii
18 Dual Representatin And als the update rule can be rewritten as fllws: αi αi + η y i α jyj j, i + b if then Nte: in dual representatin, data appears nly inside dt prducts
19 Duality: First Prperty f SVMs DUALITY is the first feature f Supprt Vectr Machines SVMs are Linear Learning Machines represented in a dual fashin f( ) = w, + b= α iyi i, + b Data appear nly within dt prducts (in decisin functin and in training algrithm)
20 Limitatins f LLMs Linear classifiers cannt deal with Nnlinearly separable data Nisy data + this frmulatin nly deals with vectrial data
21 NnLinear Classifiers One slutin: creating a net f simple linear classifiers (neurns): a Neural Netwrk (prblems: lcal minima; many parameters; heuristics needed t train; etc) Other slutin: map data int a richer feature space including nnlinear features, then use a linear classifier
22 Learning in the Feature Space Map data int a feature space where they are linearly separable φ( ) f f() f() f() f() f() f() f() f() X F
23 Prblems with Feature Space Wrking in high dimensinal feature spaces slves the prblem f epressing cmple functins BUT: There is a cmputatinal prblem (wrking with very large vectrs) And a generaliatin thery prblem (curse f dimensinality)
24 Implicit Mapping t Feature Space We will intrduce Kernels: Slve the cmputatinal prblem f wrking with many dimensins Can make it pssible t use infinite dimensins efficiently in time / space Other advantages, bth practical and cnceptual
25 KernelInduced Feature Spaces In the dual representatin, the data pints nly appear inside dt prducts: f( ) = α iyi φ( i ), φ( ) + b The dimensinality f space F nt necessarily imprtant. May nt even knw the map φ
26 Kernels IMPORTANT CONCEPT A functin that returns the value f the dt prduct between the images f the tw arguments K( 1, 2) = φ( 1), φ( 2) Given a functin K, it is pssible t verify that it is a kernel
27 Kernels One can use LLMs in a feature space by simply rewriting it in dual representatin and replacing dt prducts with kernels: 1, 2 K( 1, 2) = φ( 1), φ( 2)
28 The Kernel Matri IMPORTANT CONCEPT (aka the Gram matri): K(1,1) K(1,2) K(1,3) K(1,m) K(2,1) K(2,2) K(2,3) K(2,m) K= K(m,1) K(m,2) K(m,3) K(m,m)
29 The Kernel Matri The central structure in kernel machines Infrmatin bttleneck : cntains all necessary infrmatin fr the learning algrithm Fuses infrmatin abut the data AND the kernel Many interesting prperties:
30 Mercer s Therem The kernel matri is Symmetric Psitive Definite Any symmetric psitive definite matri can be regarded as a kernel matri, that is as an inner prduct matri in sme space
31 Mre Frmally: Mercer s Therem Every (semi) psitive definite, symmetric functin is a kernel: i.e. there eists a mapping φ such that it is pssible t write: K( 1, 2) = φ( 1), φ( 2) Ps. Def. I K (, ) f ( ) f ( ) d d 0 f L 2
32 Mercer s Therem Eigenvalues epansin f Mercer s Kernels: K( 1, 2) = λφ i i( 1) φi( 2) i That is: the eigenfunctins act as features!
33 Eamples f Kernels Simple eamples f kernels are: K(, ) =, d K(, ) = 2 / 2σ e
34 Eample: Plynmial Kernels = ( 1, 2); = ( 1, 2); 2, = ( ) 2 = = = = (,, 212),(,, 212) = 1 2 = φ( ), φ( ) 2 1 2
35 Eample: Plynmial Kernels
36 Eample: the tw spirals Separated by a hyperplane in feature space (gaussian kernels)
37 Making Kernels IMPORTANT CONCEPT The set f kernels is clsed under sme peratins. If K, K are kernels, then: K+K is a kernel ck is a kernel, if c>0 ak+bk is a kernel, fr a,b >0 Etc etc etc can make cmple kernels frm simple nes: mdularity!
38 Secnd Prperty f SVMs: SVMs are Linear Learning Machines, that Use a dual representatin AND Operate in a kernel induced feature space (that is: f( ) = α iyi φ( i ), φ( ) + b is a linear functin in the feature space implicitely defined by K)
39 Kernels ver General Structures Haussler, Watkins, etc: kernels ver sets, ver sequences, ver trees, etc. Applied in tet categriatin, biinfrmatics, etc
40 A bad kernel wuld be a kernel whse kernel matri is mstly diagnal: all pints rthgnal t each ther, n clusters, n structure
41 N Free Kernel IMPORTANT CONCEPT If mapping in a space with t many irrelevant features, kernel matri becmes diagnal Need sme prir knwledge f target s chse a gd kernel
42 Other Kernelbased algrithms Nte: ther algrithms can use kernels, nt just LLMs (e.g. clustering; PCA; etc). Dual representatin ften pssible (in ptimiatin prblems, by Representer s therem).
43 %5($.
44 The Generaliatin Prblem NEW TOPIC The curse f dimensinality: easy t verfit in high dimensinal spaces (=regularities culd be fund in the training set that are accidental, that is that wuld nt be fund again in a test set) The SVM prblem is ill psed (finding ne hyperplane that separates the data: many such hyperplanes eist) Need principled way t chse the best pssible hyperplane
45 The Generaliatin Prblem Many methds eist t chse a gd hyperplane (inductive principles) Bayes, statistical learning thery / pac, MDL, Each can be used, we will fcus n a simple case mtivated by statistical learning thery (will give the basic SVM)
46 Statistical (Cmputatinal) Learning Thery Generaliatin bunds n the risk f verfitting (in a p.a.c. setting: assumptin f I.I.d. data; etc) Standard bunds frm VC thery give upper and lwer bund prprtinal t VC dimensin VC dimensin f LLMs prprtinal t dimensin f space (can be huge)
47 Assumptins and Definitins distributin D ver input space X train and test pints drawn randmly (I.I.d.) frm D training errr f h: fractin f pints in S misclassifed by h test errr f h: prbability under D t misclassify a pint VC dimensin: sie f largest subset f X shattered by H (every dichtmy implemented)
48 VC Bunds ε = O ~ VC m VC = (number f dimensins f X) +1 Typically VC >> m, s nt useful Des nt tell us which hyperplane t chse
49 Margin Based Bunds ε γ = = ~ O min ( R i y / γ ) m i f ( f 2 i ) Nte: als cmpressin bunds eist; and nline bunds.
50 Margin Based Bunds IMPORTANT CONCEPT (The wrst case bund still hlds, but if lucky (margin is large)) the ther bund can be applied and better generaliatin can be achieved: = ~ ( R / γ O m Best hyperplane: the maimal margin ne Margin is large is kernel chsen well ε 2 )
51 Maimal Margin Classifier Minimie the risk f verfitting by chsing the maimal margin hyperplane in feature space Third feature f SVMs: maimie the margin SVMs cntrl capacity by increasing the margin, nt by reducing the number f degrees f freedm (dimensin free capacity cntrl).
52 Tw kinds f margin Functinal and gemetric margin: funct = min yif ( i) g gem = min yif ( i) f
53 Tw kinds f margin
54 Ma Margin = Minimal Nrm If we fi the functinal margin t 1, the gemetric margin equal 1/ w Hence, maimie the margin by minimiing the nrm
55 Ma Margin = Minimal Nrm Distance between The tw cnve hulls g w, w, + + b = b = 1 + w,( ) = 2 w,( w + 2 ) = w
56 The primal prblem IMPORTANT STEP Minimie: subject t: w, w yi w, i + b
57 Optimiatin Thery The prblem f finding the maimal margin hyperplane: cnstrained ptimiatin (quadratic prgramming) Use Lagrange thery (r KuhnTucker Thery) Lagrangian: 1 2! 1 6 w, w αiyi w, i + b 1 " $# α 0
58 Frm Primal t Dual 1 L( w) = w, w αiy i w, i! + b 1 2 αi 0 Differentiate and substitute: L = 0 b L = 0 w 1 6 " $#
59 The Dual Prblem IMPORTANT STEP Maimie: Subject t: W( α) = αi 1 αα i y y, i, j 2 αi 0 αiyi = i 0 i j i j i j The duality again! Can use kernels!
60 Cnveity IMPORTANT CONCEPT This is a Quadratic Optimiatin prblem: cnve, n lcal minima (secnd effect f Mercer s cnditins) Slvable in plynmial time (cnveity is anther fundamental prperty f SVMs)
61 KuhnTucker Therem Prperties f the slutin: Duality: can use kernels KKT cnditins: αi i i i y w, + b = Sparseness: nly the pints nearest t the hyperplane (margin = 1) have psitive weight w = αiyii They are called supprt vectrs
62 KKT Cnditins Imply Sparseness g Sparseness: anther fundamental prperty f SVMs
63 Prperties f SVMs  Summary 9 Duality 9 Kernels 9 Margin 9 Cnveity 9 Sparseness
64 Dealing with nise In the case f nnseparable data in feature space, the margin distributin can be ptimied ε 1 m ( ) R + ξ 2 γ ξ y w, + b i i i
65 The SftMargin Classifier Minimie: Or: w, w w, w + + C C i i ξi ξ 2 i Subject t: y 4 i w, i + b9 1 ξi
66 Slack Variables ( ) 2 ξ 1 R + 2 ε m γ ξ y w, + b i i i
67 Sft MarginDual Lagrangian B cnstraints W( α) = αi 1 αα i y y, i, j 2 0 αi C αiyi = 0 i i j i j i j Diagnal i 1 1 αi αα i jyiyj i, j αjαj i, j 2 2C 0 αi αiy i i 0
68 The regressin case Fr regressin, all the abve prperties are retained, intrducing epsilninsensitive lss: L e 0 y i <w, i >+b
69 Regressin: the εtube
70 Implementatin Techniques Maimiing a quadratic functin, subject t a linear equality cnstraint (and inequalities as well) W( α) = αi 1 αα i jyiyjk( i, j) i, j 2 αi i 0 αiyi = 0 i
71 Simple Apprimatin Initially cmple QP pachages were used. Stchastic Gradient Ascent (sequentially update 1 weight at the time) gives ecellent apprimatin in mst cases 1 αi αi + 1 y i αiyik( i, j) K( i, i)
72 Full Slutin: S.M.O. SMO: update tw weights simultaneusly Realies gradient descent withut leaving the linear cnstraint (J. Platt). Online versins eist (LiLng; Gentile)
73 Other kernelied Algrithms Adatrn, nearest neighbur, fisher discriminant, bayes classifier, ridge regressin, etc. etc Much wrk in past years int designing kernel based algrithms Nw: mre wrk n designing gd kernels (fr any algrithm)
74 On Cmbining Kernels When is it advantageus t cmbine kernels? T many features leads t verfitting als in kernel methds Kernel cmbinatin needs t be based n principles Alignment
75 Kernel Alignment IMPORTANT CONCEPT Ntin f similarity between kernels: Alignment (= similarity between Gram matrices) A( K1, K2) = K1, K2 K1, K1 K2, K2
76 Many interpretatins As measure f clustering in data As Crrelatin cefficient between racles Basic idea: the ultimate kernel shuld be YY, that is shuld be given by the labels vectr (after all: target is the nly relevant feature!)
77 The ideal kernel YY =
78 Cmbining Kernels Alignment in increased by cmbining kernels that are aligned t the target and nt aligned t each ther. A( K1, YY' ) = K1, YY' K1, K1 YY', YY'
79 Spectral Machines Can (apprimately) maimie the alignment f a set f labels t a given kernel By slving this prblem: yky Apprimated by principal eigenvectr (threshlded) (see curanthilbert therem) y yi = arg ma { 1, + 1} yy'
80 CurantHilbert therem A: symmetric and psitive definite, Principal Eigenvalue / Eigenvectr characteried by: λ = ma v vav vv'
81 Optimiing Kernel Alignment One can either adapt the kernel t the labels r vice versa In the first case: mdel selectin methd Secnd case: clustering / transductin methd
82 Applicatins f SVMs Biinfrmatics Machine Visin Tet Categriatin Handwritten Character Recgnitin Time series analysis
83 Tet Kernels Jachims (bag f wrds) Latent semantic kernels (icml2001) String matching kernels See KerMIT prject
84 Biinfrmatics Gene Epressin Prtein sequences Phylgenetic Infrmatin Prmters
85 Cnclusins: Much mre than just a replacement fr neural netwrks.  General and rich class f pattern recgnitin methds %RR RQ 690VZZZVXSSRUWYHFWRUQHW Kernel machines website
Across a wide variety of fields, data are
Frm Data Mining t Knwledge Discvery in Databases Usama Fayyad, Gregry PiatetskyShapir, and Padhraic Smyth Data mining and knwledge discvery in databases have been attracting a significant amunt f research,
More informationThe Elements of Statistical Learning
Springer Series in Statistics Trevr Hastie Rbert Tibshirani Jerme Friedman The Elements f Statistical Learning Data Mining, Inference, and Predictin Secnd Editin This is page v Printer: paque this T ur
More informationAn Introduction to Statistical Learning
Springer Texts in Statistics Gareth James Daniela Witten Trevr Hastie Rbert Tibshirani An Intrductin t Statistical Learning with Applicatins in R Springer Texts in Statistics 103 Series Editrs: G. Casella
More informationThe Synchronization of Periodic Routing Messages
The Synchrnizatin f Peridic Ruting Messages Sally Flyd and Van Jacbsn, Lawrence Berkeley Labratry, One Cycltrn Rad, Berkeley CA 9470, flyd@eelblgv, van@eelblgv T appear in the April 994 IEEE/ACM Transactins
More informationHow to Write Program Objectives/Outcomes
Hw t Write Prgram Objectives/Outcmes Objectives Gals and Objectives are similar in that they describe the intended purpses and expected results f teaching activities and establish the fundatin fr assessment.
More informationR for Beginners. Emmanuel Paradis. Institut des Sciences de l Évolution Université Montpellier II F34095 Montpellier cédex 05 France
R fr Beginners Emmanuel Paradis Institut des Sciences de l Évlutin Université Mntpellier II F34095 Mntpellier cédex 05 France Email: paradis@isem.univmntp2.fr I thank Julien Claude, Christphe Declercq,
More information
THE INTERNATIONAL FRAMEWORK
THE INTERNATIONAL FRAMEWORK ABOUT THE IIRC The Internatinal Integrated Reprting Cuncil (IIRC) is a glbal calitin f regulatrs, investrs, cmpanies, standard setters, the accunting prfessin and NGOs.
More informationHow to use Moodle 2.7. Teacher s Manual for the world s most popular LMS. Jaswinder Singh
Teacher s Manual fr the wrld s mst ppular LMS Jaswinder Singh Hw t Use Mdle 2.7 2 Hw t use Mdle 2.7, 1 st Editin Teacher s Manual fr the wrld s mst ppular LMS Jaswinder Singh 3 This bk is dedicated t my
More informationMost Significant Change
Click4it Wiki  Tlkit Mst Significant Change Step by Step Step 1: Starting and raising interest A. It may help t use ne f the fllwing metaphrs t explain the MSC: Newspaper: Newspapers are structured int
More informationThe Capacity Development Results Framework. A strategic and resultsoriented approach to learning for capacity development
The Capacity Develpment Results Framewrk A strategic and resultsriented apprach t learning fr capacity develpment The Capacity Develpment Results Framewrk A strategic and resultsriented apprach t learning
More informationMEASURING AND/OR ESTIMATING SOCIAL VALUE CREATION: Insights Into Eight Integrated Cost Approaches
MEASURING AND/OR ESTIMATING SOCIAL VALUE CREATION: Insights Int Eight Integrated Cst Appraches Prepared fr Bill & Melinda Gates Fundatin Impact Planning and Imprvement Prepared by Melinda T. Tuan P.O.
More informationSECURITY GUIDANCE FOR CRITICAL AREAS OF FOCUS IN CLOUD COMPUTING V3.0
SECURITY GUIDANCE FOR CRITICAL AREAS OF FOCUS IN CLOUD COMPUTING V3.0 INTRODUCTION The guidance prvided herein is the third versin f the Clud Security Alliance dcument, Security Guidance fr Critical Areas
More informationns Rev. 0 (3.9.15) Reporting water MDL is allowable) Preparatory Method Analysis Method The MDL programs and by covered The LOD reporting?
NR149 LOD/ /LOQ Clarificatin Required frequency Annually an MDL study must be perfrmed fr each cmbinatin f the fllwing: Matrix (if the slid and aqueus matrix methds are identical, extraplatin frm the water
More informationDevelop Agency SPF From SafetyAnalystWiki
Develp Agency SPF Frm SafetyAnalystWiki Cntents [hide] 1 Safety Perfrmance Functins 1.1 What SPFs Are Needed 1.2 Functinal Frm f SPFs 1.3 Data Needs fr Develpment f SPFs 1.4 Statistical Assumptins and
More informationNo Unsafe Lift. Workbook
N Unsafe Lift Wrkbk Cver and Sectin Break image prvided curtesy f Arj Canada Inc. Table Of Cntents Purpse f this wrkbk... 2 Hw t use this wrkbk...3 SECTION ONE A Brief Review f the Literature...5 SECTION
More informationA Beginner s Guide to Successfully Securing Grant Funding
A Beginner s Guide t Successfully Securing Grant Funding Intrductin There is a wide range f supprt mechanisms ut there in the funding wrld, including grants, lans, equity investments, award schemes and
More informationEuropean Investment Bank. Guide to Procurement
GUIDE TO PROCUREMENT fr prjects financed by the EIB Updated versin f June 2011 TABLE OF CONTENTS Intrductin 1. General Aspects...4 1.1. The Bank s Plicy... 4 1.2. Eligibility f Cntractrs and Suppliers
More informationReport for the Food Standards Agency. Nutrition and Public Health Intervention Research Unit London School of Hygiene & Tropical Medicine
Cmparisn f cmpsitin (nutrients and ther substances) f rganically and cnventinally prduced fdstuffs: a systematic review f the available literature Reprt fr the Fd Standards Agency Nutritin and Public Health
More informationBuilding Your Book for Kindle
Building Yur Bk fr Kindle We are excited yu ve decided t design, frmat, and prepare yur bk fr Kindle! We ll walk yu thrugh the necessary steps in creating a prfessinal digital file f yur bk fr quick uplad
More informationWriting a Compare/Contrast Essay
Writing a Cmpare/Cntrast Essay As always, the instructr and the assignment sheet prvide the definitive expectatins and requirements fr any essay. Here is sme general infrmatin abut the rganizatin fr this
More informationNew Zealand s Climate Change Target. Our contribution to the new international climate change agreement  summary of consultation responses
New Zealand s Climate Change Target Our cntributin t the new internatinal climate change agreement  summary f cnsultatin respnses Disclaimer All reasnable measures have been taken t ensure the quality
More informationNot in Cully: AntiDisplacement Strategies for the Cully Neighborhood
Nt in Cully: AntiDisplacement Strategies fr the Cully Neighbrhd Prepared fr Living Cully: A Cully Ecdistrict June 2013 Nt in Cully: AntiDisplacement Strategies fr the Cully Neighbrhd June 2013 Acknwledgements
More informationThe Data Center Management Elephant
The Data Center Management Elephant By David Cle DATA CENTER SOLUTIONS Fr Mre Infrmatin: (866) 7873271 Sales@PTSdcs.cm 2010 N Limits Sftware. All rights reserved. N part f this publicatin may be used,
More informationWhat's New in SAS 9.4
What's New in SAS 9.4 SAS Dcumentatin The crrect bibligraphic citatin fr this manual is as fllws: SAS Institute Inc. 2013. What's New in SAS 9.4. Cary, NC: SAS Institute Inc. What's New in SAS 9.4 Cpyright
More informationRISING TO THE CHALLENGE. ReEnvisioning Public Libraries
RISING TO THE CHALLENGE ReEnvisining Public Libraries RISING TO THE CHALLENGE ReEnvisining Public Libraries A reprt f the Aspen Institute Dialgue n Public Libraries by Amy K. Garmer Directr Aspen Institute
More informationIntroduction ii. Section 1: Biological and Biochemical Foundations of Living Systems 1
What s n the MCAT 2015 Exam? What s n the MCAT 2015 Exam? www.aamc.rg/mcat2015exam Table f Cntents Intrductin Page Number ii Sectin 1: Bilgical and Bichemical Fundatins f Living Systems 1 Sectin 2: Chemical
More informationHOW TO OVERCOME PERFECTIONISM
HOW TO OVERCOME PERFECTIONISM Mst peple wuld cnsider having high standards a gd thing. Striving fr excellence can shw that yu have a gd wrk ethic and strength f character. High standards can als push yu
More informationElectronic Communication
Applicatin fr Tree Wrks: Wrks t Trees Subject t a Tree Preservatin Order (TPO) and/r Ntificatin f Prpsed Wrks t Trees in Cnservatin Areas (CA) Twn and Cuntry Planning Act 1990 Electrnic Cmmunicatin If
More informationLearning with Local and Global Consistency
Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany
More information