North Carolina State University STAT 370: Probability and Statistics for Engineers [Section 002] Today Introduction: What s statistics and what does this course cover? Course logistics Q&A Instructor: Hua Zhou Harrelson Hall 210 10:15AM 11:30PM, Jan 9, 2012 The New York Times (Aug 5, 2009) What is Statistics? Statistics, the science of data analysis, is the applied mathematics of the 21st century. This era is characterized by massive data sets, automated measurement, and raw computational power. It is the statistical analysis of these data sets which is the driving force behind internet search, online merchants, computational finance, weather forecast, bioinformatics, and dozens of other fields. It is the most important and portable subject you will learn in your quantitative curriculum. Surveys of practicing engineers consistently show that one of their foremost academic regrets was not learning enough statistics, and the reasons for this are clear. 1
Engineers apply physical and chemical laws and mathematics to design, develop, test, and supervise various products and services. Engineers perform tests to learn how things behave under stress, and at what point they might fail. As engineers perform experiments, they collect data that can be used to explain relationships better and to reveal information about the quality of products and services they provide. Statistics in Engineering What is the importance of statistics in the field of engineering? 1. Design of Experiments (DOE) uses statistical techniques to test and construct models of engineering components and systems. 2.Quality control and process control use statistics as a tool to manage conformance to specifications of manufacturing processes and their products. 3. Time and methods engineering uses statistics to study repetitive operations in manufacturing in order to set standards and find optimum (in some sense) manufacturing procedures. 4. Reliability engineering uses statistics to measures the ability of a system to perform for its intended function (and time) and has tools for improving performance. 5. Probabilistic design uses statistics in the use of probability in product and system design. http://en.wikipedia.org/wiki/engineering_statistics Some Interesting Video Clips about Statistics Joy of Statistics http://www.open.ac.uk/openlearn/whats-on/the-joy-stats TED: Ideas worth spreading: about Statistics http://www.ted.com/talks/arthur_benjamin_s_formula_for _changing_math_education.html Course Objectives The main objective in this class is to equip you with basic tools for (1) making sense of real data, (2) designing experiments, and (3) preparing for the higher level classes in machine learning, stochastic processes, and computational statistics Master essential statistical terminology Numerical and graphical summaries of data Plan and analyze simple factorial designs Basic calculations l in simple linear regression Calculate probabilities using basic probability distributions Make inference using basis statistics 2
Graphical Summary: Where are the Cancers? Graphical Summary: Where are the Cancers? What do you observe and why? Many are in the Great Plains and relatively few near the coasts (older people in these counties?) Most shaded counties are in rural areas (worse health centers? Less healthy diets? More exposure to harmful chemicals?) What do you observe and why? What do you expect the map of counties with lowest cancer death rate? Graphical Summary: Where are the Cancers? Graphical Summary: Where are the Cancers? Counties with small population sizes are more likely to be highlighted in both maps! Consider a county with just 100 residents. If there was only 1 death, the rate (0.01) is extremely high. If there were no deaths, the rate (0.00) is the lowest. What do you observe and why? 3
Numerical Summary of Data: Average/Mean Question: Which department of UNC, Chapel Hill produces students that earn the most on average 10 years after they got their degrees? Survey a certain number of graduates from UNC. A lot of departments are surveyed. Answer: Geography!!!!?????? Michael Jordan Median is robust to outlier and a better summary in this case Simple Linear Regression: Do higher people earn more? Is there a trend in the earning with respect to height? How to find the best fit line on this data? Is this line model good? How to interpret the line model? Is this model adequate? Factorial Data Analysis: Diet Coke and Mentos This is from a previous course project for ST370 It is interesting to know how the volume loss depends on number of Mentos applied and initial volumes Conclusion: Percentage of volume loss depends on initial volume but not on # mentos applied Number of Mentos and Diet Coke on % soda volume loss me Initial volume % volume lost Number of Mentos 4 8 0591 ml 0.565 0.57 0591 ml 0.526 0.577 0591 ml 0.54 0.558 ml 0.561 0.587 ml 0.532 Volum1000 0.539 ml 0.519 0.559 2000 ml 0.475 0.537 2000 ml 0.565 0.615 2000 ml 0.537 0.5 4
Calculating Probability: Roulette Players bet $1 that the ball will land in a red (or black) slot and win $1 if it does. Let Xi be the winnings on the i-th day. (a) What is distribution, mean and variance of Xi? (b) Suppose you play once per day for 365 days, what does CLT say about your average winning? (c) What is the probability that your average winning over 365 days is positive? Solution to the Roulette Problem (a) E(X) = -0.0526, Var(X) = 0.9972 (b) Average payoff after 365 days is approximately normal with mean -0.05260526 and standard deviation 0.0523 (c) P( X 0) = 0.1573 Some more examples claims (at least it used to claim) that it contains 1000 chips. Is this true? (Inference) Calculus Course Requirement What is the chance that a poker gambler gets a Royal Flush (AKQJT of the same suit) at Atlantic City? (Probability) Among a group of randomly chosen people, how likely is it for two of them to have the same birthday? (Probability) What is the relationship between Income and Years of Education? (Linear Regression) Design your own experiment, collect data, analyze data and draw conclusions. (Design of Experiment) 5
Course Materials Optional Textbook PowerPoint presentation + blackboard illustration Keep focused in class Optional Text: Selected material from Chapter 1-9, 11, 13 and 14 Class Web Page http://www4.stat.ncsu.edu/~hzhou3/courses/st370-2012-spring/ WebAssign@NCSU (log in using unity ID) http://webassign.ncsu.edu Make sure you have the correct email address on file ISBN: 9780470910610 NCSU Bookstore: $122.50 Amazon $143.33 We will cover selected topics in Chapters 1-9, 11, 13, and 14 Software [StatCrunch@NCSU] http://statcrunch.stat.ncsu.edu Course grade (500pts) Homework 100pts Best 10 out of 12 assignments Midterm I 100pts Wed, Feb 29 (in class) Midterm II 100pts Mon, Apr 2 (in class) Quizzes 100pts - Best 5 out of >6 quizzes Final Exam 100pts - Monday, May 7 (8:00-11:00AM) Final Letter Grade Final Letter Grade: 98-100% A+, 92-97% A, 90-91% A- 88-89% B+, 82-87% B, 80-81% B- 78-79% C+, 72-77% C, 70-71% C- 68-69% D+, 62-67% D, 60-61% D- 0-59% F If sat/unsat, you need to get at least 60% for a pass. 6
Homework and Quizzes Twelve homework assignments Assign and due online @ WebAssign No late homework will be accepted I only count your 10 best homework scores and the lowest 2 homework scores will be dropped The semester homework score will count as 20% of the course grade At least five in-class quizzes will be given. Exam Policy Midterms: Feb 29, Apr 2 (in class, closed book, closed notes) The final exam (8-11am, May 7) is accumulative A review session will be held prior to each exam All exams are required with no make-up permitted. [Conflicts with other subjects, please notify me ASAP; no later than 2 weeks before the test.] Quiz scores will count 20% of the course grade. Strategy to Succeed in this Class Focus in classes: preview and review lecture notes, stay active/involved in class, take notes, answer questions Class attendance is expected and will influence your grade through quizzes (20% of the grade) Ask questions during class (especially if you can not see, read, hear or understand anything) [Not the place to be shy!] Start homework early Make effective use of the office hours Classroom Policies Please come on time and do not start packing up before the class is over Please turn off cell phones Please refrain from conversation with your neighbor during the class, except when you are instructed to do so during in-class activities Do not overly count on the final to pull up your final grade; The two midterms have same weights as final and may be easier because they are not accumulative 7
Academic Integrity You may work together on homework assignments, but simply giving or receiving answers to or from another student is cheating Copying from another student s paper is not allowed Using unauthorized materials during exam is not allowed Falsifying data is not allowed 8