Math 351: Applied Statistics

Michael W. Trosset

The following information is for Fall 2005:

General Description
This course introduces the basic concepts of statistical inference through a careful study of several important procedures. Topics include 1- and 2-sample location problems, the one-way analysis of variance, and simple linear regression. Most assignments involve applying probability models and/or statistical methods to practical situations and/or actual data sets.

Who Should Take This Course?
As reflected by the large number of introductory statistics courses at William & Mary, there are a great many different ways to begin the study of statistics. The best way to have a positive experience with statistics is to take a course that provides the kind of experience that you want to have.

The Department of Mathematics offers three introductory statistics courses. Math 106 emphasizes quantitative reasoning skills and statistical literacy. It should make you a more critical consumer of the quantitative information that you encounter in newspapers, magazines, etc.; however, it is not the purpose of Math 106 to introduce you to a variety of methods for analyzing experimental data. Math 452 is devoted to the mathematical theory of statistical inference. It derives certain statistical methods and explains why they represent optimal practice; however, it is not the purpose of Math 452 to teach you how to apply these methods to actual scientific problems.

Like most introductory statistics courses at William & Mary, Math 351 emphasizes using statistical methods to analyze data. Such "methods" courses come in a variety of flavors. Most describe recipes for analyzing data and use a statistical software package in which these recipes have been implemented. Math 351 differs from typical methods courses in the following respects:

The general sense that I have received from conversations with former Math 351 students is that it is a fairly challenging course. If you're taking a difficult course load this semester and looking for an easy way to fulfill a statistics requirement, then perhaps you'll be happier in another course.

In a nutshell: Students in the empirical sciences collect and analyze data, often using computer software that they don't understand. Math 351 was designed for students who really want to understand what they're doing when they perform such analyses. If you're not certain what you want, then I suggest examining the text (see below). If you like it, then you'll probably like Math 351; if you hate it, then you'll probably be happier in another course.

Prerequisites
No previous knowledge of probability is assumed; Math 351 is recommended for students who wish to take a single, self-contained semester of statistics that emphasizes analyzing data. We will use several basic concepts from calculus; hence, Math 351 has a prerequisite of Math 112.

It is worth noting that statistics is an intrinsically mathematical discipline. The better one's mathematical preparation, the more about statistics that one can learn. This does not mean, however, that one must already know a great deal of mathematics in order to begin to understand many of the fundamental concepts of statistics. (This is indeed fortunate, as many consumers of statistical methods do not have strong math backgrounds.) For this course, some familiarity with sets, functions, and limits is more than adequate preparation.

Students interested in the mathematical theory of statistics should follow this course with the more theoretical Math 401 and Math 452.

Basic Information
Math 351 will meet on Monday-Wednesday-Friday, from 1:00 to 1:50 p.m., in Room 301 of Jones Hall. The final exam is scheduled for Thursday, December 8, from 1:30 to 4:30 p.m.

Enrollment is currently capped at 35 students. Please register early! If the course fills before you can register, then please contact me and I will put you on a waiting list.

Tentative Office Hours
Please come to see me! My office is in Jones Hall, Room 127. My office hours (when I'll definitely be in my office) are MW, noon-12:50 and 2:00-2:50. Typically, I'll be around MWF afternoons after 3pm as well. Please don't worry if your schedule appears to be incompatible with mine: just let me know that you want to meet with me (tell me before/after class, send email to trosset@math.wm.edu, or telephone 1-2040), and we'll schedule an appointment at a mutually convenient time.

Attendance
Class attendance is not formally required, but it is strongly encouraged. Ignorance of supplementary material presented---or announcements made---by the instructor due to absence from class is never excusable. In class you are expected to behave appropriately, e.g., please refrain from conversing with other students while the instructor is lecturing.

Text
The primary text is the current draft of the book, "An Introduction to Statistical Inference and Its Applications," that I am presently writing. Here is a PDF file that contains the current version and here is the PostScript file from which the PDF file was created (by the ps2pdf utility).

If you want/need to install software that will allow you to view and print PostScript files, then I suggest AFPL Ghostscript 8.51 and GSview 4.7. This software is available for virtually all popular platforms. For my laptop, which runs the Windows XP operating system, I downloaded gs851w32.exe and gs47w32.exe. These programs install Ghostscript and GSview; after running them, you should be able to view and print PostScript files.

I recommend saving the text electronically, then printing several chapters at a time, punching holes in the pages, and storing them in a loose-leaf notebook.

Some of the data sets that appear in the text are available here: Exercise 12.3.3, Exercise 12.3.4, Exercise 12.3.5, Exercise 13.5.1.

Here is data collected by former Math 351 students for the penny-spinning experiment (Exercise 1.5.1).

There are some obvious advantages and disadvantages to using an unpublished book manuscript as the text for Math 351. The most compelling advantage is that this book is customized for this course. I began writing it because the textbook that I had been using went out of print, and because it was my professional judgment that the best way to teach my students was to provide them with precisely the information that I wanted them to have. A second advantage is that it's free! However, the current draft lacks several amenities of a published text, e.g., an index. Life is full of tradeoffs...

Several students have expressed an interest in examining other introductory statistics texts. There are a great many in the stacks at Swem Library, but not the two that I consider most valuable. I have placed personal copies of the following (classic, but out-of-print) books at Swem Library:

Finally, I have quite a collection of introductory statistics textbooks in my office, including old editions and duplicate copies. Most of these were complimentary copies, sent to me by publishers who hoped that I would adopt their text. If you drop by my office and ask nicely, I may give one to you. (Offer good while supplies last.)

Computer Software
We will make selective use of the statistical programming language R. R is free, Open Source software, that can be downloaded in compiled or source code form. It runs on a variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows, and MacOS. The primary web site for R is The R Project for Statistical Computing. To efficiently download software, documentation, etc., you should use one of the nearby CRAN (Comprehensive R Archive Network) mirror sites, e.g., Statlib at CMU.

Information about R is included in the text, particularly Appendix R. Here are some additional instructions for entering data into R.

We will rely primarily on functions that are already part of R. On certain occasions, however, it will be convenient to use functions that I have customized to complement the text. These functions are available here: Bivariate Normal Functions, Termite Foraging Functions. They can be read into R using the source function, described in Appendix R.

Syllabus
I will follow the text, so the table of contents serves as a syllabus.

Grades
For each student, a weighted course average will be calculated as follows:

Homework/Test Solutions
Here are links to PDF files that contain homework solutions: Homework 1, Homework 2, Homework 3, Homework 4, Homework 5, Homework 6, Homework 7, Homework 8, Homework 9, Homework 10, Homework 11, Homework 12, Homework 13.
Here are links to PDF files that contain test solutions: Test 1, Test 2.
(Note: Hand-drawn figures are not included in these files and will be distributed by other means.)