Description of the procedure for deriving the decomposition/skeleton analysis-of-variance table

(Brien and Bailey, 2006; Brien and Bailey, 2009; Bailey and Brien, 2016)

The following diagram illustrates the procedure for deriving the decomposition and skeleton analysis-of-variance tables, tables that are useful in assessing the properties of a proposed design, irrespective of whether an analysis of variance is to be used to analyse the data. A description of the procedure follows the diagram or you can go to the description for a particular rectangle by clicking on it. The first two steps of the procedure, which are the same as those for formulating an allocation-based mixed model for an experiment, amount to establishing the factor-allocation description for a design.

The function designAnatomy from dae, a package for the R statistical computing environment and the GenStat procedure ACANONICAL can be used to produce the decomposition tablefrom the intratier formulae, without the need to include pseudofactors. Then the expected mean squares can be added manually, as described below to form the .skeleton analysis-of-variance table

Two examples of the derivation of the decomposition table are available, including R scripts and output: a two-phase sensory experiment and a two-phase wheat experiment. Further examples are available in the Supplementary materials for Brien (2017c).

Decomposition table derivation
  1. Sets of objects and observational unit Firstly, the sets of objects involved in the allocations in the experiment are identified. Then the set of objects that are the observational units is identified. Federer (1975) defines this to be 'the smallest unit on which an observation is made'. An advantage of using the observational unit rather than the experimental unit is that for each response variable there is only one type of observational unit in an experiment whereas it is clear from Federer (1975) that there might be several different types of experimental unit. Thus, it should be easier to identify the observational unit.

  2. Top
  3. Tiers The crucial feature of the procedure is that the factors are divided into sets or tiers, as described by Brien, Harch, Correll and Bailey (2011), according to their status in the allocations that were performed in designing the experiment. Those factors that are nested within other factors and the factors that nest them also need to be identified. It can be useful to depict the allocations in a factor-allocations diagram, in which there is a panel for each tier. For multitiered experiments there will be at least three tiers. It is vital for determining the tiers that all the factors involved in the experiment are identified.

  4. Top
  5. Intratier formulae One then uses each tier, and the nesting and crossing relationships between the factors in the tier, to form an intratier formulae for the tier. It may also be necessary to include pseudofactors in some formulae and indicate that some factors are independent of others. The notation we use in the formulae is that described by Brien and Demétrio (2009, Table 1). A*B indicates that the factors A and B are crossed, A/B indicates that the factor B is nested within the factor A, A+B indicates that the factors are independent and A//B indicates that B is a pseudofactor to A. A factor-allocation diagram is useful in formulating these, the factors in a tier being those within a panel of the diagram.

  6. Top
  7. Analysis formulae These are obtained from the intratier formulae by considering for each whether crossed or nesting relationships between the factors in the current intratier formula and those in other formulae are appropriate.
    Often, but not always, there is a one-to-one correspondence between the intratier formulae and the tiers. There is not when some factors occur in more than one analysis formula, because factors can occur in only one tier. Also, sometimes there are less and sometimes more analysis formula than tiers.

  8. Top
  9. Decomposition table Now form the decomposition table by going around the loop shown in the above figure. The process begins with the analysis formula involving only factors whose levels are intrinsically associated with the observational units - terms involving these factors make up the initial decomposition table. This table is extended by incorporating the terms from the second analysis formula into it as described below. The extended decomposition table is further extended by incorporating the terms from each of the other analysis formulae until the terms from all analysis formulae have been incorporated into the decomposition table.
    For each analysis formula in turn, a circuit of the loop to extend the decomposition table proceeds as follows:
    1. Derive the terms and sources from the current formula Expand the analysis formula using rules such as are given in Wilkinson and Rogers (1973) or Heiberger (1989); Monod and Bailey (1992) give details on the handling of pseudofactors. If the factors A and B are crossed (A*B in a formula), these rules lead to the terms A, B and A^B being included in the analysis where A^B represents the generalized factor formed from the factors A and B. If factor B is nested within factor A (A/B in a formula), the standard rules lead to the terms A and A^B.
      More generally, for formulae L and M:
      L / M = L + gf(L)^M
      L * M = L + M + L^M
      where gf(L) is the generalized factor formed from the tier factors in L and L^M is the sum of products of all pairs of terms in L and M.
      As an example of using the rules for a more complicated formula we expand (A*B)/(C*D):
      (A*B)/(C*D) = (A*B) + A^B^(C*D)
        = (A + B + A^B) + A^B^(C + D + C^D)
        = A + B + A^B + A^B^C + A^B^D + A^B^C^D
      The source for each term is derived as follows:
      1. Form the generalized factor from those factors in the term that nest at least one of the other factors in the term.
      2. List all the factors that are not in the generalized factor of the nesting factors, each separated by ‘#’. Then add the to the end of the list the generalized factor of the nesting factors, placing it between square brackets.
      The sources corresponding to the terms derived from (A*B)/(C*D) are obtained using this rule as follows:

      A + B + A#B + C[A^B] + D[A^B] + C#D[A^B]

      In this set of terms, the term C#D[A^B] stands for the interaction between C and D nested within each combination of the levels of A and B; that is [A^B] represents the combinations of A and B.

    2. Top
    3. Incorporate current sources and their degrees of freedom into the decomposition table.
      Add a major column to the decomposition table consisting of columns for the sources, degrees of freedom and, for nonorthogonal experiments, efficiency factors for the current analysis formula. If the current formula is the first formula, which contains only recipient factors, the column will consist of a row for each source from that formula. When incorporating sources from other than the first formula, place them in the new major column alongside the sources already in the decomposition table with which they are confounded. This amounts to determing the experimental units for the generalized factor corresponding to a source. All sources from the same formula confounded with a particular term will be listed one under the other with the row for the term, with which they are confounded expanded to fit them. Also, if there are Residual degrees of freedom, a Residual source will need to be added, under the list of terms from the current formula. The number of Residual degrees of freedom is equal to the difference between those of the original source and the sum of the degrees of freedom of the sources incorporated under it.
      Sources that arise in two consecutive formulae will not have a line entered for the formula incorporated last. When two sources are totally aliased, such as can occur with fractional factorial experiments, one will be omitted from the analysis and a note of it made separate from the decomposition table.

    4. Top
  10. Categorize terms as fixed or random
    1. One possible categorization of the terms is that all are classifed as random, except those terms that have only ever been allocated. This would lead to an analysis that is equivalent to a randomization analysis when all allocation is by randomization.
    2. Another possibility is that each factor could be categorized as fixed or random. Then a term is fixed provided that it involves only fixed factors or as random if it involve a random factor.
    3. Otherwise, one could independently categorize each term as fixed or random. In the end fixed terms are ones that allow for arbitraty differences between the effects whereas random terms require that the effects conform to a probability distribution, usually normal.

    4. Top
  11. Derive the expected mean squares and add to form the skeleton analysis-of-variance table
    The rules for deriving the expected mean squares (EMS) given here are based on results given by Brien (1992) and Bailey and Brien (2016). They apply to experiments in which all phases are structure balanced and may be able to be used when .

    For each row in the decomposition table, determine its expected mean square as follows:
    Obtain the contribution of the source from each tier in the row:
    Beginning with the left-most column of sources and continuing across to the right-most source for that row (ignore Residual sources), obtain the contribution for a source from a tier as follows:
    1. Identify the generalized factor for the current source: it is comprised of all factors in the source.
    2. Determine whether the generalized factor represents a fixed or random term.
      If it is fixed:
      The contribution is a quadratic form in the expected value for the response, written as q(G) where G is the generalized factor for the current source; the matrix of the quadratic form is the projection matrix for the current source. For brevity, just the first letters of G are given. The q-function is premultiplied by the canonical efficiency factor for the current source in the current row.
      If is it random:
      The contribution is the linear combination of the canonical components for the generalized factor for the current source and for all random terms that are a superset of the factors in this generalized factor. The coefficient of a canonical component for a random term, in the linear combination, is the replication of the generalized factor for the random term multiplied by the canonical efficiency factor for the current source in the current row.
    Form the expected mean square for the row:
    It is the sum of the contributions of its sources.

    The resulting table will be a skeleton analysis-of-variance table.