Math 352: Data Analysis

Michael W. Trosset

The following information is for Spring 2006:

General Description
This course is a sequel to Math 351 (or its equivalent; in particular, a modest degree of familiarity with the statistical computing language R will be helpful). It continues the study of applied statistics, introducing statistical methods in the context of case studies. Thus, instead of organizing the material by method and illustrating the method with simple examples (as in 351), the material will be organized by application. Each case study will be holistic: we will discuss experimental design, data collection, data management, exploratory and inferential analyses, and presentation of results. This course was especially designed for students who are involved in research projects that entail collecting and analyzing experimental data. When appropriate, some of the case studies may be derived from such projects.

Prerequisites
Math 351, or its equivalent. I am especially interested in recruiting students who are engaged in research projects. Ideally, I would like the class to comprise roughly equal numbers of student "scientists" and student "statisticians". Of course, some students may be both!

In order to provide the quality of experience that I envision, I am obliged to limit enrollment. Enrollment is by permission of the instructor, i.e., me. If you are interested in taking Math 352, then please send me an email message that describes your interest/background in statistics.

Basic Information
Math 352 will meet on Tuesday-Thursday, from 2:00 to 3:20 p.m., in Room 302 of Jones Hall.

Tentative Office Hours
Please come to see me! My office is in Jones Hall, Room 127. My office hours are (tentatively) 11:15am to 1:45pm on Tuesdays and Thursdays. If these hours are bad for you, then please contact me (tell me before/after class, send email to trosset@math.wm.edu, or telephone 1-2040), and we'll schedule an appointment at a mutually convenient time.

Attendance
This course will contain many elements of a seminar. Although I will lecture frequently, there will be considerable class discussion, as well as student presentations. It is likely that we will cover material that is not in the text. For all of these reasons, attendance is required. If you are unable to attend class, then please contact me at your earliest convenience.

Text and Syllabus
The primary text is the second edition of The Statistical Sleuth: A Course in Methods of Data Analysis, by F.L. Ramsey and D.W. Schafer, published by Duxbury (ISBN 0-534-38670-9). Please note the accompanying web page.

I will use The Statistical Sleuth to structure the course, but I may replace case studies and/or material in it with student research projects and material directly related to them. Roughly speaking, Chapters 1-8 of the The Statistical Sleuth cover material studied in Math 351, albeit in a very different way. Please prepare for the start of the semester by reading Chapter 1.

On Thursday, January 19, we will meet for the first time and get acquainted. I will describe the course in greater detail. Please be prepared to introduce yourself and describe your interests to the class. I will be especially interested in hearing a few words about any research projects in which you may be involved or any particular methods in which you are interested.

To get up to speed, we will begin by covering Chapters 7-8. These chapters discuss simple linear regression, the last topic covered in Math 351, and hence provide a natural transition from 351 to 352. Please note that I will be attending a conference in California on January 24 & 26, when Prof. Eva Czabarka will cover Chapter 7. I expect to cover Chapter 8 on January 31 and February 2.

After we cover Chapters 7-8, we will dedicate several meetings to student presentations in which (some of you) describe your interests in greater detail. In particular, I will ask anyone who is involved in a research project to give a presentation that describes it. What we cover subsequently will depend on your interests.

I will post homework assignments here:

  1. Work Computational Exercise 16 in Chapter 1 using R. This assignment is due Thursday, February 1.

  2. Do one (whichever one appeals to you) of the following Data Problems in Chapter 7:
    • 27. Big Bang II
    • 28. Numbers of Stories and Building Height
    • 29. Male Displays
    • 30. Brain Activity in Violin and String Players
    This assignment is due Thursday, February 16.

Data
The data sets analyzed in The Statistical Sleuth are included on an accompanying CD-ROM. Here are instructions for reading Statistical Sleuth data sets into R.

I also will endeavor to post all of the data sets that we analyze to this web page:

Computer Software
We will use the statistical programming language R. If you took Math 351 recently, then you have already installed R on your computer and have some experience using it. Please note, however, that we will tend to use R somewhat differently than we did in Math 351: in 352, we will tend to rely on high-level R functions that perform entire analyses.

If you have not already installed R for another course or project, then you should do so immediately. R is free, Open Source software, that can be downloaded in compiled or source code form. It runs on a variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows, and MacOS. The primary web site for R is The R Project for Statistical Computing. To efficiently download software, documentation, etc., you should use one of the nearby CRAN (Comprehensive R Archive Network) mirror sites, e.g., Statlib at CMU.

Some basic information about R is included in Appendix R of the Math 351 text. Your installation includes manuals and on-line help. I will post additional information about functions that we use here:

Grades
I will grade this course as though it was a seminar, so no tests and no final exam. There will be lots of homework and a project. To insure that everyone stays on the same page, I will call upon students to answer questions in class and/or administer short quizzes, e.g., "What is a Bernoulli trial?" Your grade will be based on homework (60%), a project (20%), your answers to the aforementioned questions (10%), and your general level of involvement in class discussion (10%).

Additional Resources
This semester, Professor Barbara Bailey is teaching Math 4830: Applied Statistics at the University of Colorado at Denver, using The Statistical Sleuth and R. Although that course has slightly different objectives than ours, you may find her web page interesting and/or useful.