ANOVA Calculator w/ Pooling

by Cengiz Akinli
<cakinli@gmail.com>

This program performs ANOVA (analysis of variance) for a small dataset. If you don't know what ANOVA is, you probably won't want to, and you almost certainly do not care about what this program does.

Various intermediate results are calculated and stored, and the relative contribution of each factor is determined. The user may then perform a new analysis on the same dataset after selecting some factors for pooling, save the results to a data variable, return the results as a matrix to the caller, or redisplay the inputs or the current ANOVA table.

Usage

canova(Fmat,Rvec,[lbls]) - Perform ANalysis Of VAriance.\n")

Example

canova(factors, results, labels)
where factors is a TI variable holding the L8 Taguchi Orthogonal Array, results is a column vector of values, and labels is a list (row vector) of strings:
factors=              results=       labels={"A","C","AxC","B","D","BxC","E"}
    1 1 1 1 1 1 1          42
    1 1 1 2 2 2 2          50
    1 2 2 1 1 2 2          36
    1 2 2 2 2 1 1          45
    2 1 2 1 2 1 2          35
    2 1 2 2 1 2 1          35
    2 2 1 1 2 2 1          30
    2 2 1 2 1 1 2          54
The first screen will look like

The supplied data is displayed so you can verify you've got the inputs right, and the results are shown as they are calculated, in an unformatted fashion. The next screen shows the formatted results:

In this example, since the variance of the error (Ve) vanishes, the factor variance ratios (Fi = Vi/Ve) are undefined and the pure factor sums of squares (S'i = Si - Ve*fi, where Si is the sum of squares for factor i, and fi is the degrees of freedom for factor i) are the same as the regular sums, meaning that those sums have no error.

Once the formatted ANOVA table is displayed, the user may choose from among the six options shown. Options 1 and 2 will simply redisplay the input data and the ANOVA table, respectively. Option 3 will save the contents of the current ANOVA results set to a data variable, which can later be viewed in the Data/Matrix Editor

or imported into the TI CellSheet app, which I actually like better. Option 4 will return the data as a matrix to the command line or whatever function may have called the program.

Option 5 allows for pooling of insignificant factors. The user is prompted for a comma-separated list of factor column numbers whose relative contributions to the error are considered "small." The factors chosen, along with their column labels, are shown as confirmation, and the remaining factors, which are considered the only significant ones, are also shown.

If any factor is chosen for pooling and its relative contribution to the variability of the result is more than 5% of the total is chosen, a warning message is printed. Then the updated ANOVA results table is printed.

This 5% check is not an objective test, but merely a rule-of-thumb-- if the choice produces a result with a factor whose pure sum of squares and thus relative contribution are negative, then you've made a bad choice. The factor(s) with the negative S' should be considered as candidates for exclusion.

Notes

  1. Numbers truncated to fit in available spaces are tagged with a pound sign (#) as the last character. Be aware that this truncation occurs AFTER the number is converted to a string, so it is possible that the decimal may be removed, changing the meaning of a number display.
  2. The formatting of output is not very sophisticated. The treatment of the precision specifier in the TI/TIGCC implementations of the printf() family of functions is not very rigorous (or maybe there's some background, value-dependent type demotion going on). Doing printf("%5.2f", 1.0) does not produce "1.00" as it should, but just "1"-- the precision is treated as an upper limit. So I abandoned the use of the fieldwidth and precision specifications and just let the routines format the numbers without them. My formatting routines, prAlnStr(), then truncates the string to fit in the space allowed.
  3. Strings (factor labels), are truncated without any indication.
  4. The executable is not as compact as it absolutely could be. For cleanliness and order, I made explicit functions for some things that could have been done by hand (or as inlines), among other things.

    Code Reuse

    Permission is granted to reuse any source code contained herein for any lawful purpose, provided the source is properly attributed in your work.

    The files mlib.h and mlib.c are an almost self-contained matrix library. Though I am an experienced C programmer, I knew nothing about TIGCC or programming for TI calculators when I started this project last week, and those two files contain examples of most of what I've learned. Many things were easy to find in documentation, but others, though surprisingly fundamental in nature, were not.

    I spent hours trying to figure out some things that were apprently so obvious to folks experienced in this realm that there was little or no relevant documentation or reference material. Most of these things were the minutae involvled in reading/writing complex data (i.e. matrices) to/from the expression stack, and the issues involved in writing formatted text to the screen.

    Also, I've never written a program in a run-time environment that doesn't cleanup allocated memory at program termination. So I was forced to write a couple of functions that comprise a very rudimentary memory management system. smalloc(), sfree(), and freeAP(), together will the global allocptrs keep track of allocated memory and provide a simple means to make sure it is all freed before the program exits.

    If you're new to TIGCC programming, some of this code may prove to be useful examples. To that end, I've been pretty verbose in my commenting, and have even gone back and added comments just now.