Stat3980: Statistics in Banking and Finance Statistics Meets Big Data 統 計 遇 見 大 數 據 Dr. Aijun Zhang Spring 2016@HKBU 1
Course Title: STAT3980/MATH4875 Overview Selected Topics in Statistics Statistics in Banking and Finance ( 銀 行 與 金 融 中 的 統 計 應 用 ) Course Objective: This course aims to provide senior students with statistical methods and applications in banking and finance. Real case studies will be discussed. R/Spark/Python programming techniques will be introduced so that the students may get some hands on experience with data analytics. Class Schedule: Every Monday 8:30 11:50am (Early Bird Gets The Worm!) 2
Assessment No. Assessment Methods Weighting Remarks 1 Continuous Assessment 30% In-class assignment (about 3 times) to help practice the basic concepts. 2 Mini-project 30% Group project (of size 2~3 students) during the 2 nd half of the course. You are expected to work independently on real datasets. Each group will deliver a written report with oral presentation. 3 Final Examination 40% Final examination to see how far you have achieved intended learning outcomes especially in the knowledge domain. You are expected to have a thorough understanding on some important statistical methods and machine learning techniques in banking and finance. 3
Course Outline Part I: Statistics Meets Big Data A. Statistics as Data Science B. Explorative Data Analysis C. Basic Statistical Models D. Machine Learning E. Distributed Computing Part II: Banking and Finance Applications A. Quantitative Risk Management B. Credit Scoring C. Credit Risk Modeling D. Rise of Model Risk Management E. Other Miscellaneous Topics 4
Reference Texts Download Free Copy from Gareth s website HKBU Library Online Access (3 rd edition) 5
Part I: Statistics Meets Big Data A. Statistics as Data Science B. Explorative Data Analysis C. Basic Statistical Models D. Machine Learning E. Distributed Computing 6
What is Statistics? Statistics is the science of learning from data, and of collection, organization, analysis, interpretation, and presentation of data. It also includes the planning of data collection in terms of design of surveys and experiments. (See Wikipedia.) 7
A Brief History Unlike mathematics with a long history, statistics is said to start around 1749. The term "statistics" originally designated systematic collection of demographic and economic data by states. Later it broadened to cover the collection, summary, and analysis of data. Today, statistics is widely employed in government, business and all the sciences. Statistics is going to show more of its power as it meets big data. 8
Keywords in Statistics 9
What do statisticians do? Job Types (What my stats friends are doing): Financial Analyst/Quant/Programmer in streets, banks, hedge fund, etc Data Analyst/Statistician/Scientist in Google, Yahoo!, LinkedIn Consultant/Data Specialist/Analyst in McKinsey, IBM Academic roles in Universities and Research Institutes Job Market: NYT 2009 article: For Today s Graduate, Just One Word: Statistics "I keep saying that the sexy job in the next 10 years will be statisticians," said Hal Varian, chief economist at Google. "And I m not kidding. McKinsey 2011 Report: Big data: The next frontier for competition The United States needs 140,000 to 190,000 more workers with deep analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of big data. 10
Best Jobs by CareerCast.com Rank 2011 2012 2013 2014 2015 1 Software Engineer Software Engineer Actuary Mathematician Actuary 2 Mathematician Actuary Biomedical Engineer University Professor Audiologist 3 Actuary HR Manager Software Engineer Statistician Mathematician 4 Statistician Dental Hygienist Audiologist Actuary Statistician 5 Comp. Systems Analyst Financial Planner Financial Planner Audiologist Biomedical Engineer 6 Meteorologist Audiologist Dental Hygienist Dental Hygienist Data Scientist 7 Biologist Occupational Therapist Occupational Therapist Software Engineer Dental Hygienist 8 Historian Online Ads Manager Optometrist Comp. Systems Analyst Software Engineer 9 Audiologist Comp. Systems Analyst Physical Therapist Occupational Therapist Occupational Therapist 10 Dental Hygienist Mathematician Comp. Systems Analyst Speech Pathologist Comp. Systems Analyst Statistician (18) Mathematician (18) Statistician (20) 11
Top 10 reasons to be a statistician 1. Statisticians are significant. 2. Estimating parameters is easier than dealing with real life. 3. I always wanted to learn the entire Greek alphabet. 4. The probability a statistician major will get a job is >.9999. 5. If I flunk out I can always transfer to Engineering. 6. We do it with confidence, frequency, and variability. 7. You never have to be right - only close. 8. We're normal and everyone else is skewed. 9. The regression line looks better than the unemployment line. 10. No one knows what we do so we are always right. 12
More Statistical Jokes There are three kinds of lies: lies, damned lies, and statistics. Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital. I asked a statistician for her phone number... and she gave me an estimate. Three statisticians went out hunting, and came across a large deer. The first statistician fired, but missed, by a meter to the left. The second statistician fired, but also missed, by a meter to the right. The third statistician didn't fire, but shouted in triumph, "On the average we got it! See here for Dr. Ramseyer's extensive collection of statistical jokes. 13
Statistics vs. Probability The two topics are used to be studied together, however statistics and probability are two separate disciplines: Probability deals with predicting the likelihood of future events. Statistics deals with analysis of the frequency of past events. Probability is primarily a theoretical branch of mathematics, which studies the consequences of mathematical definitions. Statistics evolves to an independent science, which tries to make sense of observations in the real world. See Wikipedia for a list of probability topics. See Wikipedia for list of statistics topics. 14
Statistics vs. Probability 通 俗 地 讲 : 概 率 是 已 知 桶 里 黑 白 子 的 分 布, 问 抓 到 手 里 会 是 什 么 状 况 ( 比 如 有 多 大 可 能 抓 到 白 子 黑 子 )? 而 统 计 是 从 多 次 抓 到 手 中 的 情 况, 推 算 桶 里 黑 白 子 的 分 布 15
Statistics as Data Science Google trends: data mining, data science, machine learning, big data (Search items in comparison) Statistics is the science of dealing with data, learning from data, and extracting meaning from data. Data science is more demanding. It lies in the center of statistics/mathematics, hacking skills and substantive expertise; see Drew Conway s Venn diagram for detailed explanation. 16
Statistical applications in diverse fields Statistical use is pervasive wherever there exist data. The fields of application of statistics are many and very diverse. John Tukey (1915 2000): The best thing about being a statistician is that you get to play in everyone s backyard. Long list of fields of application of statistics: Actuarial science, Agriculture, Bioinformatics, Biostatistics, Business Intelligence, Chemometrics, Clinical Trial, Communication Study, Econometrics, Engineering, Environmetrics, Finance, Genetics, Geostatistics, Hedge Fund, Information Technology, Insurance, Management Science, Manufacturing, Marketing, Medical Statistics, Pharmaceutics, Physics, Politics, Process Control, Psychometrics, Public Health, Quality and Productivity, Reliability, Risk Management, Six Sigma, Social Science, Sports, WWW,... 17