# Design, Conduct and Analysis: Division D

1.  VI.            Analysis follows design
1. Introduction
1.                                                               i.      It is not always clear what analysis is to be carried out on the data. One of the key considerations is to determine how the randomization was done since it provides the basis for the tests of significance and other inferential procedures.
2. Rule of Thumb
1.                                                               i.      The analysis should follow design.
3. Illustration
1.                                                               i.      Consider the two following designs assessing the effects of pollution and exercise on asthmatic subjects. In Design A, 40 asthmatic subjects are assigned randomly to one of the 4 treatment combinations. In Design B, each of 10 asthmatic subjects receives all 4 treatments in random order; thus each subject constitutes a totally randomized block. Both designs will generate 40 observations. But the analysis is going to be very different. The Design A is a completely randomized design with the four treatment combinations randomly assigned to subjects; the randomization is carried out in such a way that equal numbers of subjects are assigned to the four treatment combinations.
2.                                                             ii.      Design B is a type of repeated measures design with all treatment combinations assessed within each subject. The analysis is quite different.
4. The basis of the Rule
1.                                                               i.      The sampling strategy used to generate the data determines the appropriate analysis-tying the two together tightly.
5. Discussion and Extensions
1.                                                               i.      The difference between the two analyses is most clearly seen in the degrees of freedom for the error terms. In Design A, the error term has 36 degrees of freedom, based on 9 degrees of freedom for each of the cells. It is a between-subject error term. In Design B, 9 degrees of freedom are subtracted to account for between-subject variability leaving 27 degrees of freedom for the error term. This is a within-subject error. Ordinarily, this error term will be smaller than the error term for Design A, so there will be greater power to detect a treatment effect.
2.                                                             ii.      Violations of the rule usually affect the error term but not it is not necessary that they will have an impact over the estimate. For example, analyzing paired data as two independent data sets still provide a valid estimate of the difference but an invalid estimate of the precision of the estimate.
3.                                                           iii.      There are subtle issues that are not recognized regularly. It is incorrect to analyze a two-factor design involving one factor which is the treatment factor, and the other a classification factor (for example, high school education) as a factorial design since education is not randomly allocated to the subjects. Such an analysis may be valid but will require some of the hand-waving discussed in Rule I and Rule II. Technically, the treatments are nested with education.
4.                                                           iv.      Just as sample size calculations should be based on the proposed analysis (Rule II), the statistical analysis should follow design. These rules indicate the tight linkage between analysis and design; both forces the investigator to look forward and backward.

1. Plan to graph the results of an analysis
1. Introduction
1.                                                               i.      As will be discussed, statistical packages usually do not do a very good job of presenting graphical analysis of data.
2. Rule of Thumb
1.                                                               i.      For every analysis, there is an appropriate graphical display.
3. Illustration
1.                                                               i.      Nitrogen Dioxide (NO2) is an automobile emission pollutant. Sherwin and Layfield (1976) studied the protein leakage in the lungs of mice exposed to NO2 at 0.5 parts per million (ppm) NO2 for 10, 12 and 14 days. Half of a total group of 38 animals was exposed to the NO2; the other half served as controls. This is a two-factor factorial design with one factor (Pollutant) at two levels, and the other factor (Days) at three levels.
2.                                                             ii.      Since the controls are intended to adjust for day-day variation in response, it is appropriate to subtract the control mean from the treatment mean on each day. This provides the basis for the test of interaction of treatment and days.
4. The basis of the Rule
1.                                                               i.      Every statistical analysis involves estimates, relationships among the estimates, the precision of these estimates and possibly covariates. These features can all be displayed pictorially.
5. Discussion and Extensions
1.                                                               i.      In the case of treatment, comparisons use interaction plots rather than bar graphs. A row in an analysis of variance table with, say k-1 degrees of freedom involves a comparison of k quantities. These quantities can be meaningfully graphed to illustrate the effects and their significance. Factorial designs are particularly easy to graph. Interactions involve some kind of non-parallelism and this can be nicely illustrated in a variety of ways.
2.                                                             ii.      This rule basically illustrates the importance of thinking graphically about the interpretation of the results of a study.