8b: Statistics (Significance Testing)(cont’d) – Enghouse Insights (Powered by Survox) Documentation

8.4 SIGNIFICANCE TESTING ON WEIGHTED TABLES

When doing significance testing with weighted data, it is recommended that you create the effective base row, even when the percentage base is the System Weighted Total or the System Weighted Any Response row. The effective base row is needed to verify any tests on weighted data.

The effective base is an estimation of how the weighting is affecting the test. It is the actual base number that is used when determining whether two samples are significantly different. The effective base will never be higher than the original unweighted base and will usually be slightly less. As the variance in the weights increases, the effective base decreases in order to compensate for the likely change in the percentages that will occur. Without this correction, some weighting factor could always be applied, which would make any item significantly greater than any other. Since the effective base is so integral to the test, it is recommended that it be printed on the table so that it can be determined how the weighting might be affecting the significance testing.

There are two different ways to create the effective base row. You can use either the $[BASE] or $[EFFECTIVE_N] keywords. The $[BASE] keyword creates two different rows, both of which are needed for the significance tests: the weighted total (which is needed to properly calculate all the percentages) and the effective base (which is used as the base for the significance test). The $[EFFECTIVE_N] keyword can be used to create the effective base when the percentage base is either the System Total or Any Response rows. In this case you do not need to again specify the percentage base because the system has already calculated it.

In the example below, the $[BASE] keyword is used to create the effective base. A new STUB_PREFACE is defined because the SET UNWEIGHTED_TOP option is used to also produce an unweighted total row.

NOTE: The following set of commands defines a standard front end for the next set of examples

>PURGE_SAME

>PRINT_FILE STAT4

~INPUT DATA

~SET DROP_LOCAL_EDIT,BEGIN_TABLE_NAME=T401

~DEFINE

TABLE_SET= { BAN1:

EDIT=: COLUMN_WIDTH=7,STUB_WIDTH=30,-COLUMN_TNA,STATISTICS_DECIMALS=2,

-PERCENT_SIGN,DO_STATISTICS=.95,RUNNING_LINES=1 }

BANNER=:

| GENDER AGE ADVERTISING AWARENESS

| <=========> <=================> <=========================>

| TOTAL MALE FEMALE 18-30 31-50 51-70 BRND A BRND B BRND C BRND D

| —– —- —— —– —– —– —— —— —— ——

}

COLUMN=: TOTAL WITH [5^1/2] WITH [6^1//3] WITH [7^1//4]

}

Example:

STUB= STUB_TOP_UNWGT:

[-VERTICAL_PERCENT] UNWEIGHTED TOTAL

[SUPPRESS] UNWEIGHTED NO ANSWER

[SUPPRESS] WEIGHTED TOTAL

[SUPPRESS] WEIGHTED NO ANSWER }

TABLE_SET= { TAB401:

WEIGHT=: SELECT_VALUE([6^1//3/X],VALUES(1.021,.880,1.130,1))

SET UNWEIGHTED_TOP

STATISTICS=: I=BC,I=DEF,GHIJ;

STUB_PREFACE= STUB_TOP_UNWGT

HEADER=: WEIGHTED TABLE WITH STATISTICAL TESTING

(USING BASE ROW) }

TITLE=: RATING OF SERVICE }

TITLE_4=: BASE= TOTAL SAMPLE }

STUB=:

[BASE_ROW,VERTICAL_PERCENT=*] WEIGHTED TOTAL (% BASE)

[-VERTICAL_PERCENT] EFFECTIVE BASE (STAT BASE) NET GOOD

| VERY GOOD

| GOOD FAIR

NET POOR

| POOR

| VERY POOR

DON’T KNOW/REFUSED

[STAT,LINE=0] MEAN

[STAT,LINE=0] STD DEVIATION

[STAT,LINE=0] STD ERROR }

ROW=: $[BASE] TOTAL $[] [11^4,5/5/4/3/1,2/2/1/X] $[MEAN,STD,SE] [11]

STORE_TABLES=*

}
Here is the table that is printed:

WEIGHTED TABLE WITH STATISTICAL TESTING (USING BASE ROW)

TABLE 401

RATING OF SERVICE

BASE= TOTAL SAMPLE

GENDER AGE ADVERTISING AWARENESS

<=========> <=================> <=========================>

TOTAL MALE FEMALE 18-30 31-50 51-70 BRND A BRND B BRND C BRND D

—– —- —— —– —– —– —— —— —— ——

UNWEIGHTED TOTAL 400 196 204 125 145 113 91 108 107 176

WEIGHTED TOTAL (% BASE) 400 194 206 128 128 128 91 108 106 176

100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0

% % % % % % % % % %

(B) (C) (D) (E) (F) (G) (H) (I) (J)

EFFECTIVE BASE (STAT BASE) 396 194 202 125 145 113 90 107 106 174

NET GOOD 200 120C 80 58 55 84DE 43 55 54 106G

50.1 62.0 38.9 45.6 43.4 65.5 47.5 51.3 51.0 60.6

VERY GOOD 102 69C 33 36 33 31 19 32 30 60G

25.4 35.7 15.8 28.0 26.2 23.9 21.1 29.5 27.8 33.9

GOOD 99 51 48 22 22 53DE 24 24 25 47

24.6 26.3 23.1 17.6 17.2 41.6 26.3 21.8 23.2 26.7

FAIR 89 42 47 29F 39F 16 19 20 23 35

22.3 21.8 22.8 22.4 30.3 12.4 20.7 18.3 21.7 19.9

NET POOR 83 17 65B 30 26 23 19 27J 22 24

20.7 8.9 31.7 23.2 20.7 17.7 20.4 24.9 20.8 13.4

POOR 39 8 30B 12 15 11 8 13 13 12

9.6 4.2 14.7 9.6 11.7 8.8 8.8 11.8 12.0 7.0

VERY POOR 44 9 35B 17 11 11 11 14J 9 11

11.0 4.7 17.0 13.6 9.0 8.8 11.6 13.1 8.8 6.4

DON’T KNOW/REFUSED 28 14 14 11 7 6 10 6 7 11

7.0 7.3 6.7 8.8 5.5 4.4 11.4 5.5 6.5 6.1

MEAN 3.47 3.91C 3.07 3.40 3.42 3.66 3.41 3.45 3.53 3.79GHI

STD DEVIATION 1.31 1.12 1.35 1.41 1.28 1.22 1.31 1.40 1.30 1.20

STD ERROR 0.07 0.08 0.10 0.13 0.11 0.12 0.15 0.14 0.13 0.09

———————————

(sig=.05) (all_pairs) columns tested BC, DEF, GHIJ

In the preceding table, the unweighted total, weighted total, and effective base are all printed. Compare the three numbers and notice that the effective base is usually a little less than the unweighted total. This is because all the weights were close together (near 1.00), so the weighting did not substantially change the percentages on the table. Compare these percentages with those that were printed on Table 101. If the weights had a greater variance (for example, respondents were assigned weights between 5 and .2), the effective base would have been much less than the unweighted total.

To see how the effective base works, look at the TOTAL column in the above table. Notice that the weighted and unweighted totals are both 400, because weights were chosen to weight the sample back to its original size. Also notice that the effective base is only 396, which is due to the minor variation in the weights that were applied to this table. The formula for the effective base is as follows:

EB= WEIGHTED TOTAL SQUARED DIVIDED BY THE SUM OF THE SQUARE OF EACH WEIGHT

Reproduce the number 396 from above by plugging in all the appropriate numbers from TABLE 401.

EB= (WT)**2 / ( Fn*(Wn**2))

EB= ((400)**2) / ((125*(1.021**2)) + (145*(.880**2)) + (113*(1.13**2)) + (17*(1**2)))

EB= 160000 / ( 130.30 + 112.29 + 144.29 + 17)

EB= 160000 / 403.88

EB= 396.16

An important characteristic of the effective base is demonstrated in the 31-50 AGE column, where the unweighted total and the effective base are both 145 while the weighted total is only 128. Since the weighting on this table was based on AGE and everyone in that column was weighted by the same factor of 0.880, the weighted total drops to 128. However, since there is no variance in the weighting, the effective base remains unchanged. Furthermore, the percentages in that column are exactly the same as those in Table 101.

Exactly the same table could be produced by using the $[EFFECTIVE_N] keyword instead of the $[BASE] keyword and a different STUB_PREFACE.

STUB= STUB_TOP_WGT:

[-VERTICAL_PERCENT] UNWEIGHTED TOTAL

[SUPPRESS] UNWEIGHTED NO ANSWER

[BASE_ROW] WEIGHTED TOTAL

[SUPPRESS] WEIGHTED NO ANSWER }

TABLE_SET= { TAB402:

STUB_PREFACE= STUB_TOP_WGT

HEADER=: WEIGHTED TABLE WITH STATISTICAL TESTING (USING EFFECTIVE_N) }

TITLE=: RATING OF SERVICE }

TITLE_4=: BASE= TOTAL SAMPLE }

STUB=:

[-VERTICAL_PERCENT] EFFECTIVE BASE (STAT BASE)

NET GOOD

| VERY GOOD

| GOOD FAIR

NET POOR

| POOR

| VERY POOR

DON’T KNOW/REFUSED

[STAT,LINE=0] MEAN

[STAT,LINE=0] STD DEVIATION

[STAT,LINE=0] STD ERROR }

ROW=: $[EFFECTIVE_N] TOTAL $[] [11^4,5/5/4/3/1,2/2/1/X] $[MEAN,STD,SE] [11]

STORE_TABLES=* }

The printed table would be basically the same as Table 401.

8.4.1 Weighted Tables with Different Weights

When performing significance testing in conjunction with applying different weights to different columns, use the SET option MULTIPLE_WEIGHT_STATISTICS. This option allows significance testing on similarly weighted columns when the table has columns with varying weights. However, it does not allow a given respondent to have a different weight in the same test. You can test independent columns with different weights, but dependent columns must have the same weights applied to them. For instance, MULTIPLE_WEIGHT_STATISTICS allows significance testing on a table where both a weighted and unweighted total column have been created, but it does not allow the unweighted total to be tested against any of the weighted columns. If this statement is not used, then the program will print an error message if a STATISTICS statement is used in conjunction with a COLUMN_SHORT_WEIGHT or COLUMN_WEIGHT table element.

The example below shows how to produce an unweighted total column and still do significance testing on the rest of the table. Notice in the STATISTICS statement that all the letters are one lower in the alphabet than previous statements because an additional category has been added to the column variable.

TABLE_SET= { TAB403:

HEADER=: TABLE WITH STATISTICAL TESTING AND DIFFERENT WEIGHTS APPLIED

TO DIFFERENT COLUMNS OF THE TABLE }

SET MULTIPLE_WEIGHT_STATISTICS

COLUMN_SHORT_WEIGHT=: TOTAL WITH &

SELECT_VALUE([6^1//3/X],VALUES(1.021,.880,1.130,1))

STATISTICS=: I=CD,I=EFG,HIJK;

BANNER=:

| SEX AGE ADVERTISING AWARENESS

| UNWGHT WGHT <==========> <===================> <============================>

| TOTAL TOTAL MALE FEMALE 18-30 31-50 51-70 BRND A BRND B BRND C BRND D

| —— —– —- —— —– —– —– —— —— —— ——}

COLUMN=: TOTAL WITH TOTAL WITH [5^1/2] WITH [6^1//3] WITH [7^1//4]

TITLE= TAB402

TITLE_4= TAB402

STUB= TAB402

ROW= TAB402

STORE_TABLES=* }

Here is the table that is printed:

TABLE WITH STATISTICAL TESTING AND DIFFERENT WEIGHTS APPLIED TO DIFFERENT COLUMNS OF THE TABLE

TABLE 403

RATING OF SERVICE

BASE= TOTAL SAMPLE

SEX AGE ADVERTISING AWARENESS

UNWGHT WGHT <==========> <===================> <============================>

TOTAL TOTAL MALE FEMALE 18-30 31-50 51-70 BRND A BRND B BRND C BRND D

—— —– —- —— —– —– —– —— —— —— ——

UNWEIGHTED TOTAL 400 400 196 204 125 145 113 91 108 107 176

WEIGHTED TOTAL 400 400 194 206 128 128 128 91 108 106 176

100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0

% % % % % % % % % % %

EFFECTIVE BASE (STAT BASE) 400 396 194 202 125 145 113 90 107 106 174

NET GOOD 197 200 120D 80 58 55 84EF 43 55 54 106H

49.2 50.1 62.0 38.9 45.6 43.4 65.5 47.5 51.3 51.0 60.6

VERY GOOD 102 102 69D 33 36 33 31 19 32 30 60H

25.5 25.4 35.7 15.8 28.0 26.2 23.9 21.1 29.5 27.8 33.9

GOOD 95 99 51 48 22 22 53EF 24 24 25 47

23.8 24.6 26.3 23.1 17.6 17.2 41.6 26.3 21.8 23.2 26.7

FAIR 92 89 42 47 29G 39G 16 19 20 23 35

23.0 22.3 21.8 22.8 22.4 30.3 12.4 20.7 18.3 21.7 19.9

NET POOR 83 83 17 65C 30 26 23 19 27K 22 24

20.8 20.7 8.9 31.7 23.2 20.7 17.7 20.4 24.9 20.8 13.4

POOR 39 39 8 30C 12 15 11 8 13 13 12

9.8 9.6 4.2 14.7 9.6 11.7 8.8 8.8 11.8 12.0 7.0

VERY POOR 44 44 9 35C 17 11 11 11 14K 9 11

11.0 11.0 4.7 17.0 13.6 9.0 8.8 11.6 13.1 8.8 6.4

DON’T KNOW/REFUSED 28 28 14 14 11 7 6 10 6 7 11

7.0 7.0 7.3 6.7 8.8 5.5 4.4 11.4 5.5 6.5 6.1

MEAN 3.46 3.47 3.91D 3.07 3.40 3.42 3.66 3.41 3.45 3.53 3.79HIJ

STD DEVIATION 1.31 1.31 1.12 1.35 1.41 1.28 1.22 1.31 1.40 1.30 1.20

STD ERROR 0.07 0.07 0.08 0.10 0.13 0.11 0.12 0.15 0.14 0.13 0.09

———————————

(sig=.05) (all_pairs) columns tested CD, EFG, HIJK

8.5 PRINT PHASE STATISTICAL TESTING

Print phase statistical testing is calculated from the numbers that are printed on the table. This means that this has several advantages. But it also has several drawbacks. The advantages:

Table processing time will be much faster.
Tables can be loaded into table manipulation, altered, and still be tested.
A different type of test can be performed on each row in the table, including a Kruskal-Wallis test on the COLUMN_MEAN.
The columns being tested can be changed without rereading any data. The limitations:
The columns must be independent or inclusive.
The data cannot be weighted.
Means created during the data reading phase with $[MEAN] will not be tested. Only Means created using the EDIT option COLUMN_MEAN can be tested.
If testing errors are made, such as dependent or weighted columns are tested or inclusive tests are not marked as Inclusive, no error messages will be generated and possibly incorrect statistical markings will be printed on the table.

NOTE: Both print phase and table building phase significance testing can be performed on the same table, although it is recommended that each test different sets of columns.

8.5.1 EDIT Options

To produce print phase significance testing, you need to use the EDIT options DO_PRINTER_STATISTICS and DO_STATISTICS_TESTS instead of the STATISTICS statement that is used for table building phase tests. The DO_PRINTER_STATISTICS option lets the program know you are going to be doing print phase testing and the DO_STATISTICS_TESTS specifies the columns to be tested.

Like the STATISTICS statement, the DO_STATISTICS_TESTS option uses letters to designate which columns are being tested and a comma to separate multiple tests. Unlike the STATISTICS statement, all tests must be independent, (there is no I= option), however tests may be inclusive and must be marked as such by using T=. T values may be printed as with the STATISTICS statement, but the PRINTABLE_T option cannot be used to error check the tests. See 8.7 PRINTING THE ACTUAL T AND SIGNIFICANCE VALUES for more information about printing T values for print phase based tests and an example of using the T= option.

To test the first three banner points against each other, the EDIT statement would be look something like:

EDIT= EDIT1: DO_PRINTER_STATISTICS,DO_STATISTICS_TEST=ABC }

The other EDIT options, such as DO_STATISTICS= to set the confidence level, and NEWMAN_KEULS_TEST to perform a Newman Keuls test, work the same way with the same defaults.

NOTE: The following set of commands defines a standard front end for the next set of examples

>PURGESAME

>PRINT_FILE STAT5

~INPUT DATA

~SET DROP_LOCAL_EDIT,BEGIN_TABLE_NAME=T501

~DEFINE

STUB= STUBTOP1:

[BASE_ROW] TOTAL

[SUPPRESS] NO ANSWER }

TABLE_SET= { BAN1:

EDIT=: COLUMN_WIDTH=7,STUB_WIDTH=30,-COLUMN_TNA,STATISTICS_DECIMALS=2,

-PERCENT_SIGN,RUNNING_LINES=1 }

STUB_PREFACE= STUBTOP1

BANNER=:

| SEX AGE ADVERTISING AWARENESS

| <=========> <=================> <=========================>

| TOTAL MALE FEMALE 18-30 31-50 51-70 BRND A BRND B BRND C BRND D

| —– —- —— —– —– —– —— —— —— ——}

COLUMN=: TOTAL WITH [5^1/2] WITH [6^1//3] WITH [7^1//4]

}

Example:

TABLE_SET= { TAB501:

LOCAL_EDIT=: DO_PRINTER_STATISTICS,DO_STATISTICS_TESTS=BC,DEF ALL_POSSIBLE_PAIRS_TEST DO_STATISTICS=.95

COLUMN_STATISTICS_VALUES=VALUES(,5,4,3,,2,1) COLUMN_MEAN,COLUMN_STD COLUMN_SE }

HEADER=: TABLE WITH STATISTICAL TESTING DONE DURING THE PRINT PHASE }

TITLE=: RATING OF SERVICE }

TITLE_4=: BASE= TOTAL SAMPLE }

STUB=:

NET GOOD

| VERY GOOD

| GOOD

FAIR

NET POOR

| POOR

| VERY POOR

DON’T KNOW/REFUSED

[PRINT_ROW=MEAN,LINE=0] MEAN

[PRINT_ROW=STD,LINE=0] STD DEV

[PRINT_ROW=SE,LINE=0] STD ERR }

ROW=: [11^4,5/5/4/3/1,2/2/1/X]

STORE_TABLES=* }

Here is the table that is printed:

TABLE WITH STATISTICAL TESTING DONE ON THE PRINTED NUMBERS

TABLE 501

RATING OF SERVICE

BASE= TOTAL SAMPLE

SEX AGE ADVERTISING AWARENESS

<=========> <=================> <=========================>

TOTAL MALE FEMALE 18-30 31-50 51-70 BRND A BRND B BRND C BRND D

—– —- —— —– —– —– —— —— —— ——

TOTAL 400 196 204 125 145 113 91 108 107 176

100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0

% % % % % % % % % %

(B) (C) (D) (E) (F)

NET GOOD 197 120C 77 57 63 74DE 42 55 54 105

49.2 61.2 37.7 45.6 43.4 65.5 46.2 50.9 50.5 59.7

VERY GOOD 102 70C 32 35 38 27 19 32 30 60

25.5 35.7 15.7 28.0 26.2 23.9 20.9 29.6 28.0 34.1

GOOD 95 50 45 22 25 47DE 23 23 24 45

23.8 25.5 22.1 17.6 17.2 41.6 25.3 21.3 22.4 25.6

FAIR 92 44 48 28F 44F 14 20 20 24 36

23.0 22.4 23.5 22.4 30.3 12.4 22.0 18.5 22.4 20.5

NET POOR 83 18 65B 29 30 20 19 27 22 24

20.8 9.2 31.9 23.2 20.7 17.7 20.9 25.0 20.6 13.6

POOR 39 9 30B 12 17 10 8 13 13 13

9.8 4.6 14.7 9.6 11.7 8.8 8.8 12.0 12.1 7.4

VERY POOR 44 9 35B 17 13 10 11 14 9 11

11.0 4.6 17.2 13.6 9.0 8.8 12.1 13.0 8.4 6.2

DON’T KNOW/REFUSED 28 14 14 11 8 5 10 6 7 11

7.0 7.1 6.9 8.8 5.5 4.4 11.0 5.6 6.5 6.2

MEAN 3.46 3.90C 3.05 3.40 3.42 3.66 3.38 3.45 3.53 3.79

STD DEV 1.31 1.12 1.35 1.41 1.28 1.22 1.32 1.40 1.29 1.21

STD ERR 0.07 0.08 0.10 0.13 0.11 0.12 0.15 0.14 0.13 0.09

———————————

(sig=.05) (all_pairs) columns tested BC, DEF

This table is very similar to Table 101 at the beginning of this chapter. The only difference is that columns G, H, I, and J are not tested in this table because they are not independent. Notice that the footnote has nothing in it to differentiate between tests done during the print phase or table building phase.

8.5.2 Changing the Confidence Level and the Type of Test

Changing the default significance levels or type of test procedure is done in exactly the same way as with the table building phase tests. For example, if you wanted to test at the 90% confidence level using the N-K test procedure you would add the options DO_STATISTICS=.90 and NEWMAN_KEULS_TEST onto your EDIT statement. Bi-level testing and using the approximation formula are also specified the same way. See 8.1.3 through 8.1.7 for more information on the EDIT option DO_STATISTICS. See 8.4 SIGNIFICANCE TESTING ON WEIGHTED TABLES for more information about changing the type of test.

TABLE_SET= { TAB502:

LOCAL_EDIT=: DO_PRINTER_STATISTICS,DO_STATISTICS_TESTS=BC,DEF

NEWMAN_KEULS_TEST,DO_STATISTICS=.90

COLUMN_STATISTICS_VALUES=VALUES(,5,4,3,,2,1) COLUMN_MEAN,COLUMN_STD,COLUMN_SE }

HEADER=: TABLE WITH STATISTICAL TESTING DONE DURING THE PRINT PHASE

CHANGING THE CONFIDENCE LEVEL AND USING THE NEWMAN-KEULS PROCEDURE }

TITLE= TAB501

TITLE_4= TAB501

STUB= TAB501

ROW= TAB501

STORE_TABLES=* }

The printed table would similar to Table 501 except for some of the statistical markings and the footnote. The footnote would be as follows:

(sig=.10) (n_k) columns tested BC,DEF

8.5.3 Changing the Type of Test by Row

When doing print phase statistical testing you can use the STUB option DO_STATISTICS to change the type of test being performed on that row or to exclude that row entirely from testing. This means that you can test some of the rows using the APP test, test other rows using the N-K test, and not test other rows. DO_STATISTICS can be set to any of ALL_POSSIBLE_PAIRS_TEST, NEWMAN_KEULS_TEST, ANOVA_SCAN, FISHER, or KRUSKAL_WALLIS_TEST. If -DO_STATISTICS is used, then that row will be excluded from the test.

In the example below the NET GOOD, FAIR, and NET POOR are tested with the APP test as specified on the EDIT statement. The COLUMN_MEAN is tested using the Kruskall-Wallis test. The rest of the rows are not tested.

TABLE_SET= { TAB503:

LOCAL_EDIT=: DO_PRINTER_STATISTICS,DO_STATISTICS_TESTS=BC,DEF ALL_POSSIBLE_PAIRS_TEST,DO_STATISTICS=.95

COLUMN_STATISTICS_VALUES=VALUES(,5,4,3,,2,1) COLUMN_MEAN,COLUMN_STD,COLUMN_SE }

HEADER=: TABLE WITH STATISTICAL TESTING DONE DURING THE PRINT PHASE

USING THE KRUSKAL WALLIS TEST ON THE MEAN } TITLE= TAB501

TITLE_4= TAB501

TITLE_5=:\2NNETS AND FAIR MENTIONS ARE TESTED USING ALL PAIRS TEST

MEAN IS TESTED USING KRUSKAL WALLIS TEST }

STUB=:

NET GOOD

[-DO_STATISTICS] | VERY GOOD

[-DO_STATISTICS] | GOOD

FAIR

NET POOR

[-DO_STATISTICS] | POOR

[-DO_STATISTICS] | VERY POOR

[-DO_STATISTICS] DON’T KNOW/REFUSED

[PRINT_ROW=MEAN,DO_STATISTICS=KRUSKAL_WALLIS_TEST,LINE=0] MEAN

[PRINT_ROW=STD,LINE=0] STD DEV

[PRINT_ROW=SE,LINE=0] STD ERR }

ROW= TAB501

STORE_TABLES=* }

Here is the table that is printed:

TABLE WITH STATISTICAL TESTING DONE ON THE PRINTED NUMBERS

USING THE KRUSKAL WALLIS TEST ON THE MEAN

TABLE 503

RATING OF SERVICE

BASE= TOTAL SAMPLE

SEX AGE ADVERTISING AWARENESS

<=========> <=================> <=========================>

TOTAL MALE FEMALE 18-30 31-50 51-70 BRND A BRND B BRND C BRND D

—– —- —— —– —– —– —— —— —— ——

TOTAL 400 196 204 125 145 113 91 108 107 176

100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0

% % % % % % % % % %

(B) (C) (D) (E) (F)

NET GOOD 197 120C 77 57 63 74DE 42 55 54 105

49.2 61.2 37.7 45.6 43.4 65.5 46.2 50.9 50.5 59.7

VERY GOOD 102 70 32 35 38 27 19 32 30 60

25.5 35.7 15.7 28.0 26.2 23.9 20.9 29.6 28.0 34.1

GOOD 95 50 45 22 25 47 23 23 24 45

23.8 25.5 22.1 17.6 17.2 41.6 25.3 21.3 22.4 25.6

FAIR 92 44 48 28F 44F 14 20 20 24 36

23.0 22.4 23.5 22.4 30.3 12.4 22.0 18.5 22.4 20.5

NET POOR 83 18 65B 29 30 20 19 27 22 24

20.8 9.2 31.9 23.2 20.7 17.7 20.9 25.0 20.6 13.6

POOR 39 9 30 12 17 10 8 13 13 13

9.8 4.6 14.7 9.6 11.7 8.8 8.8 12.0 12.1 7.4

VERY POOR 44 9 35 17 13 10 11 14 9 11

11.0 4.6 17.2 13.6 9.0 8.8 12.1 13.0 8.4 6.2

DON’T KNOW/REFUSED 28 14 14 11 8 5 10 6 7 11

7.0 7.1 6.9 8.8 5.5 4.4 11.0 5.6 6.5 6.2

MEAN 3.46 3.90C 3.05 3.40 3.42 3.66 3.38 3.45 3.53 3.79

STD DEV 1.31 1.12 1.35 1.41 1.28 1.22 1.32 1.40 1.29 1.21

STD ERR 0.07 0.08 0.10 0.13 0.11 0.12 0.15 0.14 0.13 0.09

NETS AND FAIR MENTIONS ARE TESTED USING ALL PAIRS TEST MEAN IS TESTED USING KRUSKAL WALLIS TEST

———————————

(sig=.05) (all_pairs) (k_w) columns tested BC, DEF

Notice that the standard footnote mentions that both the APP and Kruskall-Wallis tests were used, but does not say what rows were tested with which test. As in this example, you may want to use the TITLE_5 keyword to create a customized footnote.

8.6 EXCLUDING ROWS/COLUMNS FROM SIGNIFICANCE TESTING

Often when doing significance testing there will be only a particular row or couple of rows in the table that are of interest. The processing time and the number of undesirable letters that print on the table can be greatly reduced by only performing the statistical tests on the specific rows.

The Mentor program can be instructed to test mean rows only, test only specified rows, or to drop columns with low sample sizes from the testing.

8.6.1 Testing Mean Rows Only

To perform statistical testing on mean rows only you can use the SET option MEAN_STATISTICS_ONLY. This option causes the program to ignore all testing for percentages. Return to testing both means and percentages by turning the option off with -MEAN_STATISTICS_ONLY.

NOTE: The following set of commands defines a standard front end for the next set of examples.

>PURGESAME
>PRINT_FILE STAT6
~INPUT DATA
~SET DROP_LOCAL_EDIT,BEGIN_TABLE_NAME=T601
~DEFINE
STUB= STUB_TOP_TOT:
      TOTAL
      [SUPPRESS] NO ANSWER }
TABLE_SET= { BAN1:
  EDIT=: COLUMN_WIDTH=7,STUB_WIDTH=30,
  -COLUMN_TNA,STATISTICS_DECIMALS=2,
  -PERCENT_SIGN,RUNNING_LINES=1 }
  STATISTICS=: I=BC,I=DEF,GHIJ;
  BANNER=:
    |    SEX             AGE             ADVERTISING AWARENESS
    | <=========> <=================> <=========================>
    | TOTAL MALE FEMALE 18-30 31-50 51-70 BRND A BRND B BRND C BRND D
    | ----- ---- ------ ----- ----- ----- ------ ------ ------ ------}
  COLUMN=: TOTAL WITH [5^1/2] WITH [6^1//3] WITH [7^1//4]
   }

EX:
TABLE_SET= { TAB601:
             STUB_PREFACE= STUB_TOP_TOT
             SET MEAN_STATISTICS_ONLY
             LOCAL_EDIT=: DO_STATISTICS=.95 }
             HEADER=: TABLE WITH STATISTICAL TESTS PERFORMED ON MEANS ONLY }
             TITLE=: RATING OF SERVICE }
             TITLE_4=: BASE= TOTAL SAMPLE }
             STUB=:
             NET GOOD
             | VERY GOOD
             | GOOD
             FAIR
             NET POOR
             | POOR
             | VERY POOR
             DON'T KNOW/REFUSED
             [STATISTICS_ROW] MEAN
             [STATISTICS_ROW] STD DEVIATION
             [STATISTICS_ROW] STD ERROR }
             ROW=: [11^4,5/5/4/3/1,2/2/1/X] $[MEAN,STD,SE] [11]
             STORE_TABLES=* }

Here is the table that is printed:

TABLE WITH STATISTICAL TESTS PERFORMED ON MEANS ONLY
TABLE 601
RATING OF SERVICE
BASE= TOTAL SAMPLE

                             SEX              AGE            ADVERTISING AWARENESS
                         <=========>  <=================> <=========================>
                 TOTAL   MALE FEMALE  18-30  31-50  51-70 BRND A BRND B BRND C BRND D
                 -----   ---- ------  -----  -----  ----- ------ ------ ------ ------
TOTAL              400    196    204    125    145    113     91    108    107    176
                 100.0  100.0  100.0  100.0  100.0  100.0  100.0  100.0  100.0  100.0
                     %      %      %      %      %      %      %      %      %      %
                          (B)    (C)    (D)    (E)    (F)    (G)    (H)    (I)    (J)

NET GOOD           197    120     77     57     63     74     42     55     54    105
                  49.2   61.2   37.7   45.6   43.4   65.5   46.2   50.9   50.5   59.7

  VERY GOOD        102     70     32     35     38     27     19     32     30     60
                  25.5   35.7   15.7   28.0   26.2   23.9   20.9   29.6   28.0   34.1

  GOOD              95     50     45     22     25     47     23     23     24     45
                  23.8   25.5   22.1   17.6   17.2   41.6   25.3   21.3   22.4   25.6

FAIR                92     44     48     28     44     14     20     20     24     36
                  23.0   22.4   23.5   22.4   30.3   12.4   22.0   18.5   22.4   20.5

NET POOR            83     18     65     29     30     20     19     27     22     24
                  20.8    9.2   31.9   23.2   20.7   17.7   20.9   25.0   20.6   13.6

  POOR             39       9     30     12     17     10      8     13     13     13
                  9.8     4.6   14.7    9.6   11.7    8.8    8.8   12.0   12.1    7.4
  
  VERY POOR        44       9     35     17     13     10     11     14      9     11
                 11.0     4.6   17.2   13.6    9.0    8.8   12.1   13.0    8.4    6.2

DON'T KNOW/REFUSED 28      14     14     11      8      5     10      6      7     11
                  7.0     7.1    6.9    8.8    5.5    4.4   11.0    5.6    6.5    6.2

MEAN             3.46    3.90C  3.05   3.40   3.42   3.66   3.38   3.45   3.53   3.79GH
STD DEVIATION    1.31    1.12   1.35   1.41   1.28   1.22   1.32   1.40   1.29   1.21
STD ERROR        0.07    0.08   0.10   0.13   0.11   0.12   0.15   0.14   0.13   0.09
--------------------------------
(sig=.05) (all_pairs) columns tested BC, DEF, GHIJ

Compare this table with Table 101 and notice that only the MEAN row is marked with any letter because the tests on all the other rows were suppressed.

NOTE: There is no change in the footnote on the table, so a customized notation may want to be included somewhere on the table pointing out which rows were tested.

8.6.2 Excluding any Row from Statistical Testing

If only specific rows in the table are to be tested, you can either mark the rows that are to be tested or the rows that are to be excluded. If a simple variable is being defined, the keyword STATISTICS may be used inside parentheses in front of the code as part of the data definition. If the variable definition is complex (it uses joiners or functions), then the keyword $[DO_STATISTICS] must be used to mark the parts of the table that will be tested and the keyword $[-DO_STATISTICS] to mark which parts of the table will not be tested.

For example, suppose you only wanted to test the top two box (codes 5 and 4) and the bottom two box (codes 1 and 2) in a 5 point scale stored in data position 21. Using the STATISTICS keyword method the variable would look like:

[21^(STATISTICS)4,5/5/4/3/(STATISTICS)1,2/2/1]

This would cause only those categories marked with the STATISTICS keyword to be tested, while all other categories would not be tested.

Using the $[DO_STATISTICS] keyword method the variable would look like:

[21^4,5] $[-DO_STATISTICS] [21^5/4/3] $[DO_STATISTICS] [21^1,2] &
$[-DO_STATISTICS] [21^2/1]

The default is that categories are tested so the net of 4 and 5 will be tested. All categories after $[-DO_STATISTICS] are not tested, while $[DO_STATISTICS] turns testing back on.

If table printing phase statistical testing is being done, you can exclude a row from the test by using the STUB option -DO_STATISTICS. See 8.5.3 Changing the Type of Test by Row for more information.

NOTE: The $[DO_STATISTICS] keyword should not be confused with either the EDIT statement option DO_STATISTICS or the STUB option DO_STATISTICS.

The following example shows how to test only the top box, bottom box, and mean on a rating scale:

TABLE_SET= { TAB602:
   LOCAL_EDIT=: DO_STATISTICS=.95 }
   HEADER=: TABLE WITH STATISTICAL TESTS PERFORMED ON SELECTED ROWS ONLY }
   TITLE=: RATING OF SERVICE }
   TITLE_4=: BASE= TOTAL SAMPLE }
   TITLE_5=:\2N ONLY ROWS WITH (*) ARE TESTED }
   STUB=:
     NET GOOD (*)
     | VERY GOOD
     | GOOD
     FAIR
     NET POOR (*)
     | POOR
     | VERY POOR
     DON'T KNOW/REFUSED
     [STATISTICS_ROW] MEAN (*)
     [STATISTICS_ROW] STD DEVIATION
     [STATISTICS_ROW] STD ERROR }
   ROW=: [11^(STATISTICS)4,5/5/4/3/(STATISTICS)1,2/2/1/X] $[MEAN,STD,SE] [11]
   STORE_TABLES=* }

Here is an alternate way to write the row variable:

ROW602A:[11^4,5] $[-DO_STATISTICS] [11^5/4/3] &
   $[DO_STATISTICS] [11^1,2] $[-DO_STATISTICS] [11^2/1/X] &
   $[DO_STATISTICS, MEAN,STD,SE] [11]

Here is the table that is printed:

TABLE WITH STATISTICAL TESTS PERFORMED ON SELECTED ROWS ONLY
TABLE 602
RATING OF SERVICE
BASE= TOTAL SAMPLE

                             SEX              AGE            ADVERTISING AWARENESS
                         <=========>  <=================> <=========================>
                 TOTAL   MALE FEMALE  18-30  31-50  51-70 BRND A BRND B BRND C BRND D
                 -----   ---- ------  -----  -----  ----- ------ ------ ------ ------
TOTAL              400    196    204    125    145    113     91    108    107    176
                 100.0  100.0  100.0  100.0  100.0  100.0  100.0  100.0  100.0  100.0
                     %      %      %      %      %      %      %      %      %      %
                          (B)    (C)    (D)    (E)    (F)    (G)    (H)    (I)    (J)

NET GOOD (*)       197    120C    77     57     63     74DE   42     55     54    105G
                  49.2   61.2   37.7   45.6   43.4   65.5   46.2   50.9   50.5   59.7

  VERY GOOD        102     70     32     35     38     27     19     32     30     60
                  25.5   35.7   15.7   28.0   26.2   23.9   20.9   29.6   28.0   34.1

  GOOD              95     50     45     22     25     47     23     23     24     45
                  23.8   25.5   22.1   17.6   17.2   41.6   25.3   21.3   22.4   25.6

FAIR                92     44     48     28     44     14     20     20     24     36
                  23.0   22.4   23.5   22.4   30.3   12.4   22.0   18.5   22.4   20.5

NET POOR (*)        83     18     65B    29     30     20     19     27J    22     24
                  20.8    9.2   31.9   23.2   20.7   17.7   20.9   25.0   20.6   13.6

  POOR             39       9     30     12     17     10      8     13     13     13
                  9.8     4.6   14.7    9.6   11.7    8.8    8.8   12.0   12.1    7.4
  
  VERY POOR        44       9     35     17     13     10     11     14      9     11
                 11.0     4.6   17.2   13.6    9.0    8.8   12.1   13.0    8.4    6.2

DON'T KNOW/REFUSED 28      14     14     11      8      5     10      6      7     11
                  7.0     7.1    6.9    8.8    5.5    4.4   11.0    5.6    6.5    6.2

MEAN  (*)        3.46    3.90C  3.05   3.40   3.42   3.66   3.38   3.45   3.53   3.79GH
STD DEVIATION    1.31    1.12   1.35   1.41   1.28   1.22   1.32   1.40   1.29   1.21
STD ERROR        0.07    0.08   0.10   0.13   0.11   0.12   0.15   0.14   0.13   0.09

ONLY ROWS WITH (*) ARE TESTED
--------------------------------
(sig=.05) (all_pairs) columns tested BC, DEF, GHIJ

If you compare this table with Table 601 note that only the NET GOOD, NET POOR, and MEAN row in this table have statistical markings. In addition, a TITLE_5 variable was defined to create a customized footnote.

8.6.3 Excluding Columns with Low Bases from Statistical Testing

If some of the columns in the test could have low bases you might want to exclude them from the testing. You may want to do this because the small bases might skew the tests, or because the sample is such that you do not want to report on any small base. The EDIT option MINIMUM_BASE= can be used to suppress not only statistical testing but also all the other values in that column. If MINIMUM_BASE is set to some value like 50, then any column that has a base less than 50 will print an asterisk (*) under the base row where the letter usually prints, and the rest of the column will be blank. The FLAG_MINIMUM_BASE option can be used in conjunction with MINIMUM_BASE. Instead of blanking the column, the program will print all the numbers in that column followed by an asterisk where the statistical markings would normally print.

TABLE_SET= { TAB603:
   STATISTICS=: I=BC,I=DEF,GHIJ;
   LOCAL_EDIT=: MINIMUM_BASE=100,DO_STATISTICS=.95 }
   HEADER=: USING MINIMUM BASE OPTION TO SUPPRESS A COLUMN WITH A LOW BASE }
   TITLE=: RATING OF SERVICE }
   TITLE_4=: BASE= TOTAL SAMPLE }
   STUB=:
     NET GOOD
     | VERY GOOD
     | GOOD
     FAIR
     NET POOR
     | POOR
     | VERY POOR
     DON'T KNOW/REFUSED
     [STATISTICS_ROW] MEAN
     [STATISTICS_ROW] STD DEVIATION
     [STATISTICS_ROW] STD ERROR }
   ROW=: [11^4,5/5/4/3/1,2/2/1/X] $[MEAN,STD,SE] [11]
   STORE_TABLES=* }

Here is the table that is printed:

USING MINIMUM BASE OPTION TO SUPPRESS A COLUMN WITH A LOW BASE
TABLE 603
RATING OF SERVICE
BASE= TOTAL SAMPLE

                             SEX              AGE            ADVERTISING AWARENESS
                         <=========>  <=================> <=========================>
                 TOTAL   MALE FEMALE  18-30  31-50  51-70 BRND A BRND B BRND C BRND D
                 -----   ---- ------  -----  -----  ----- ------ ------ ------ ------
TOTAL              400    196    204    125    145    113     91    108    107    176
                 100.0  100.0  100.0  100.0  100.0  100.0  100.0  100.0  100.0  100.0
                     %      %      %      %      %      %      %      %      %      %
                          (B)    (C)    (D)    (E)    (F)    (*)    (H)    (I)    (J)

NET GOOD           197    120C    77     57     63     74DE          55     54    105
                  49.2   61.2   37.7   45.6   43.4   65.5          50.9   50.5   59.7

  VERY GOOD        102     70C    32     35     38     27            32     30     60
                  25.5   35.7   15.7   28.0   26.2   23.9          29.6   28.0   34.1

  GOOD              95     50     45     22     25     47DE          23     24     45
                  23.8   25.5   22.1   17.6   17.2   41.6          21.3   22.4   25.6

FAIR                92     44     48     28F    44F    14            20     24     36
                  23.0   22.4   23.5   22.4   30.3   12.4          18.5   22.4   20.5

NET POOR            83     18     65B    29     30     20            27J    22     24
                  20.8    9.2   31.9   23.2   20.7   17.7          25.0   20.6   13.6

  POOR             39       9     30B    12     17     10            13     13     13
                  9.8     4.6   14.7    9.6   11.7    8.8          12.0   12.1    7.4
  
  VERY POOR        44       9     35B    17     13     10            14J     9     11
                 11.0     4.6   17.2   13.6    9.0    8.8          13.0    8.4    6.2

DON'T KNOW/REFUSED 28      14     14     11      8      5             6      7     11
                  7.0     7.1    6.9    8.8    5.5    4.4           5.6    6.5    6.2

MEAN             3.46    3.90C  3.05   3.40   3.42   3.66          3.45   3.53   3.79H
STD DEVIATION    1.31    1.12   1.35   1.41   1.28   1.22          1.40   1.29   1.21
STD ERROR        0.07    0.08   0.10   0.13   0.11   0.12          0.14   0.13   0.09

--------------------------------
(sig=.05) (all_pairs) columns tested BC, DEF, GHIJ
* - small base

Notice that the column BRND A is blank except for the base value. Also notice that the footnote includes a note that the asterisk denotes a small base. You can suppress only the statistical testing instead of the entire column by also using the EDIT option FLAG_MINIMUM_BASE. In the example below the only difference from Table 603 is this option.

TABLE_SET= { TAB604:
   HEADER=: USING MINIMUM BASE OPTION TO FLAG A COLUMN WITH A LOW BASE }
   LOCAL_EDIT=:
   MINIMUM_BASE=100,FLAG_MINIMUM_BASE,DO_STATISTICS=.95 }
   TITLE= TAB603
   TITLE_4= TAB603
   STUB= TAB603
   ROW= TAB603
   STORE_TABLES=* }

Here is the table that is printed:

USING MINIMUM BASE OPTION TO FLAG A COLUMN WITH A LOW BASE
TABLE 604
RATING OF SERVICE
BASE= TOTAL SAMPLE

                             SEX              AGE            ADVERTISING AWARENESS
                         <=========>  <=================> <=========================>
                 TOTAL   MALE FEMALE  18-30  31-50  51-70 BRND A BRND B BRND C BRND D
                 -----   ---- ------  -----  -----  ----- ------ ------ ------ ------
TOTAL              400    196    204    125    145    113     91    108    107    176
                 100.0  100.0  100.0  100.0  100.0  100.0  100.0  100.0  100.0  100.0
                     %      %      %      %      %      %      %      %      %      %
                          (B)    (C)    (D)    (E)    (F)    (*)    (H)    (I)    (J)

NET GOOD (*)       197    120C    77     57     63     74DE   42*    55     54    105G
                  49.2   61.2   37.7   45.6   43.4   65.5   46.2   50.9   50.5   59.7

  VERY GOOD        102     70     32     35     38     27     19*    32     30     60
                  25.5   35.7   15.7   28.0   26.2   23.9   20.9   29.6   28.0   34.1

  GOOD              95     50     45     22     25     47     23*    23     24     45
                  23.8   25.5   22.1   17.6   17.2   41.6   25.3   21.3   22.4   25.6

FAIR                92     44     48     28     44     14     20*    20     24     36
                  23.0   22.4   23.5   22.4   30.3   12.4   22.0   18.5   22.4   20.5

NET POOR (*)        83     18     65B    29     30     20     19*    27J    22     24
                  20.8    9.2   31.9   23.2   20.7   17.7   20.9   25.0   20.6   13.6

  POOR             39       9     30     12     17     10      8*    13     13     13
                  9.8     4.6   14.7    9.6   11.7    8.8    8.8   12.0   12.1    7.4
  
  VERY POOR        44       9     35     17     13     10     11*    14      9     11
                 11.0     4.6   17.2   13.6    9.0    8.8   12.1   13.0    8.4    6.2

DON'T KNOW/REFUSED 28      14     14     11      8      5     10*     6      7     11
                  7.0     7.1    6.9    8.8    5.5    4.4   11.0    5.6    6.5    6.2

MEAN  (*)        3.46    3.90C  3.05   3.40   3.42   3.66   3.38*  3.45   3.53   3.79H
STD DEVIATION    1.31    1.12   1.35   1.41   1.28   1.22   1.32*  1.40   1.29   1.21
STD ERROR        0.07    0.08   0.10   0.13   0.11   0.12   0.15*  0.14   0.13   0.09

--------------------------------
(sig=.05) (all_pairs) columns tested BC, DEF, GHIJ
* - small base

8.7 PRINTING THE ACTUAL T AND SIGNIFICANCE VALUES

Both t values and the significance of the t value can be printed on the table either in addition to or instead of the statistical letter markings. The STUB options DO_T_TEST and DO_SIG_T are used to print these values. If you are printing values from a STATISTICS statement you can also use PRINTABLE_T for error checking. PRINTABLE_T checks that the STAT= tests are only in pairs and that no column is the second column in any pair more than once (you can only print one T value per column).

The STUB options DO_T_TEST and DO_SIG_T can be set to any of the following:

DO_T_TEST=*

Print the t value for the last data row seen

DO_T_TEST=n

Print the t value for the Nth row in the table

DO_T_TEST=-n

Print the t value for the Nth row above this row in the table

DO_T_TEST=PRINT_MEAN

Print the t value for the COLUMN_MEAN

In the example below the t values and their significance are printed for the top box, bottom box, and mean rows in the table.

NOTE: The following set of commands defines a standard front end for the next set of examples:

>PURGESAME
>PRINT_FILE STAT7
~INPUT DATA
~SET DROP_LOCAL_EDIT,BEGIN_TABLE_NAME=T701
~DEFINE
STUB= STUB_TOP_TOT:
   TOTAL
   [SUPPRESS] NO ANSWER }
TABLE_SET= { BAN1:
EDIT=:  COLUMN_WIDTH=7,STUB_WIDTH=30,-COLUMN_TNA,STATISTICS_DECIMALS=2,
   -PERCENT_SIGN }
BANNER=:
   |      SEX             AGE                  ADVERTISING AWARENESS
   |  <==========> <==================>  <===============================>
   |  TOTAL   MALE FEMALE  18-30  31-50  51-70 BRND A BRND B BRND C BRND D
   |  -----   ---- ------  -----  -----  ----- ------ ------ ------ ------}
COLUMN=: TOTAL WITH [5^1/2] WITH [6^1//3] WITH [7^1//4]
}

EX:
TABLE_SET= { TAB701:
STUB_PREFACE= STUB_TOP_TOT
STATISTICS=: PRINTABLE_T
   T=AB,T=AC,T=AD,T=AE,T=AF,T=AG,T=AH,T=AI,T=AJ
LOCAL_EDIT=: DO_STATISTICS=.95 }
HEADER=: TABLE WITH T AND SIGNIFICANCE VALUES PRINTED ON THE TABLE }
TITLE=: RATING OF SERVICE }
TITLE_4=: BASE= TOTAL SAMPLE }
STUB=:
   NET GOOD
   [DO_T_TEST=*,SKIP_LINES=0] T-VALUE
   [DO_SIG_T=*,SKIP_LINES=0] SIGNIFICANCE
   | VERY GOOD
   | GOOD
   FAIR
   NET POOR
   [DO_T_TEST=*,SKIP_LINES=0] T-VALUE
   [DO_SIG_T=*,SKIP_LINES=0] SIGNIFICANCE
   | POOR
   | VERY POOR
   DON'T KNOW/REFUSED
   [STATISTICS_ROW] MEAN
   [STATISTICS_ROW] STD DEVIATION
   [STATISTICS_ROW] STD ERROR
   [DO_T_TEST=9] T-VALUE
   [DO_SIG_T=9] SIGNIFICANCE OF T }
ROW=: [11^4,5/5/4/3/1,2/2/1/X] $[MEAN,STD,SE] [11]
STORE_TABLES=* }

Here is the table that is printed:

TABLE WITH T AND SIGNIFICANCE VALUES PRINTED ON THE TABLE
TABLE 701
RATING OF SERVICE
BASE= TOTAL SAMPLE

                            SEX               AGE            ADVERTISING AWARENESS
                         <=========>  <=================> <=========================>
                 TOTAL   MALE FEMALE  18-30  31-50  51-70 BRND A BRND B BRND C BRND D
                 -----   ---- ------  -----  -----  ----- ------ ------ ------ ------
TOTAL              400    196    204    125    145    113     91    108    107    176
                 100.0  100.0  100.0  100.0  100.0  100.0  100.0  100.0  100.0  100.0
                     %      %      %      %      %      %      %      %      %      %
                   (A)    (B)    (C)    (D)    (E)    (F)    (G)    (H)    (I)    (J)

NET GOOD           197C   120A    77     57     63     74A    42     55     54    105A
                  49.2   61.2   37.7   45.6   43.4   65.5   46.2   50.9   50.5   59.7
T-VALUE                 -4.69   4.69   0.98   1.75  -4.07   0.67  -0.41  -0.29  -3.69
SIGNIFICANCE             0.00   0.00   0.33   0.08   0.00   0.51   0.69   0.77   0.00

  VERY GOOD        102C    70A    32     35     38     27     19     32     30     60A
                  25.5   35.7   15.7   28.0   26.2   23.9   20.9   29.6   28.0   34.1

  GOOD              95E    50     45     22     25     47A    23     23     24     45
                   23.8  25.5   22.1   17.6   17.2   41.6   25.3   21.3   22.4   25.6

FAIR                 92F   44     48     28     44A     14    20     20     24     36
                   23.0  22.4   23.5   22.4   30.3   12.4   22.0   18.5   22.4   20.5

NET POOR            83BJ   18    65A     29     30     20     19     27     22     24
                  20.8    9.2  31.9    23.2   20.7   17.7   20.9   25.0   20.6   13.6
T-VALUE                  5.58 -5.58   -0.81   0.02   0.94  -0.03  -1.27   0.06   3.11
SIGNIFICANCE             0.00  0.00    0.42   0.98   0.35   0.97   0.20   0.95   0.00

  POOR              39B     9    30A     12     17     10      8     13     13     13
                   9.8    4.6  14.7     9.6   11.7    8.8    8.8   12.0   12.1    7.4

  VERY POOR        44BJ     9    35A     17     13     10     11     14      9     11
                 11.0     4.6  17.2    13.6    9.0    8.8   12.1   13.0    8.4    6.2

DON'T KNOW/REFUSED 28      14    14      11      8      5     10      6      7     11
                 7.0      7.1   6.9     8.8    5.5    4.4   11.0    5.6    6.5    6.2

MEAN            3.46C    3.90A 3.05    3.40   3.42   3.66   3.38   3.45   3.53   3.79A
STD DEVIATION   1.31     1.12  1.35    1.41   1.28   1.22   1.32   1.4    1.29   1.21
STD ERROR       0.07     0.08  0.10    0.13   0.11   0.12   0.15   0.14   0.13   0.09
T-VALUE                 -6.58  6.58    0.57   0.44  -1.84   0.62   0.10  -0.60  -4.38
SIGNIFICANCE OF T        0.00  0.00    0.57   0.67   0.06   0.54   0.91   0.55   0.00
---------------------------------
(sig=.05) (all_pairs) columns tested T= AB, T= AC, T= AD, T= AE, T= AF,
T= AG, T= AH, T= AI, T= AJ

Notice that t values are positive when the first item in the cell is greater than the second item, and negative when the opposite is true. Also notice that any cell with a significance of 0.05 or less is either marked with the letter A (negative t value) or the Total column is marked with its letter (positive t value). Further notice that the t values for males and females are opposites of each other. This is because each is being tested inclusively against the total which is actually the same as testing them against each other.

Basically the same table could be produced using the print phase tests. For more information on print phase tests see 8.5 PRINT PHASE STATISTICAL TESTING. Here is an example of printing the t values when doing print phase tests.

TABLE_SET= { TAB702:
LOCAL_EDIT=:
    DO_PRINTER_STATISTICS,ALL_POSSIBLE_PAIRS_TEST,DO_STATISTICS=.95,
   DO_STATISTICS_TESTS=T=AB,T=AC,T=AD,T=AE,T=AF,T=AG,T=AH,T=AI,T=AJ
COLUMN_STATISTICS_VALUES=VALUES(,5,4,3,,2,1)COLUMN_MEAN,COLUMN_STD,COLUMN_SE }
HEADER=: TABLE WITH T AND SIGNIFICANCE VALUES PRINTED ON THE TABLE
  FOR TESTS PERFORMED ON THE NUMBERS ON THE PRINTED TABLE
}
TITLE=: RATING OF SERVICE }
TITLE_4=: BASE= TOTAL SAMPLE }
STUB=:
   NET GOOD
   [DO_T_TEST=*,SKIP_LINES=0] T-VALUE
   [DO_SIG_T=*,SKIP_LINES=0] SIGNIFICANCE
   | VERY GOOD
   | GOOD
   FAIR
   NET POOR
   [DO_T_TEST=*,SKIP_LINES=0] T-VALUE
   [DO_SIG_T=*,SKIP_LINES=0] SIGNIFICANCE
   | POOR
   | VERY POOR
   DON'T KNOW/REFUSED
   [PRINT_ROW=MEAN] MEAN
   [PRINT_ROW=STD] STANDARD DEVIATION
   [PRINT_ROW=SE] STANDARD ERROR
   [DO_T_TEST=PRINT_MEAN] T-VALUE
   [DO_SIG_T=PRINT_MEAN] SIGNIFICANCE OF T }
ROW=: [11^4,5/5/4/3/1,2/2/1/X]
STORE_TABLES=* }

The printed table will look basically the same as Table 701.

8.8 SIGNIFICANCE TESTING ON ROWS (PREFERENCE TESTING)

Significance testing on rows can be performed in two ways: a direct comparison test or a distributed preference test. In both cases only two rows may be compared at one time, although multiple pairs of rows may be compared in a single table. As with column testing, the STATISTICS statement and the DO_STATISTICS option on the EDIT statement control the tests.

NOTE: Row testing cannot be performed during the table-printing phase.

On the STATISTICS statement, rows are designated numerically rather than alphabetically as are the columns. The first data row is assigned the number “1”, the second data row the number “2”, and so on. Every data row is included in this count even if it is not printed. To do a direct comparison of rows use the letter D followed by an equal sign (=) before the two row numbers. Separate the two row numbers with a comma. Separate different pairs of rows with a space. To do a distributed preference test use P instead of D. Rows can only be tested sequentially and a given row may only be in one test on the table. For example, if rows 1 and 4 are being compared, then rows 5 and 6 could also be compared, but row 3 could not be compared with row 7 (row 4 being in between), nor could row 1 be compared to row 2 (row 1 is already being compared to row 4).

8.8.1 Direct Comparison Testing

A direct comparison of two rows is similar to the test that is performed on columns, except that the letter D must be specified to indicate that it is a direct test. To do a direct comparison of rows 1 and 2 use the following STATISTICS statement:

STATISTICS= ROWSTAT1: D=1,2

The following statement would test row 1 versus row 2, row 3 versus row 4, and row 5 versus row 6.

STATISTICS= ROWSTAT2: D=1,2 D=3,4 D=5,6

Row testing can be combined with column testing by specifying both the column and row tests on the same STATISTICS statement. The following statement would do column testing on columns B, C, and D, in addition to testing row 4 versus row 6.

STATISTICS= ROWSTAT3: BCD, D=4,6

The DO_STATISTICS option on the EDIT statement is again used to set the confidence level. The same setting is used for both the row and column tests. As with column testing, a footnote will be printed to indicate which rows were tested and the significance level used. If the difference is significant, a lower case “s” will print under the second row tested.

This is an example of a direct comparison of rows.

NOTE: The following set of commands defines a standard front end for the next set of examples:

>PURGESAME
>PRINT_FILE STAT8
~INPUT DATA
~SET DROP_LOCAL_EDIT,BEGIN_TABLE_NAME=T801
~DEFINE
STUB= STUBTOP1:
   TOTAL
   [SUPPRESS] NO ANSWER }
TABLE_SET= { BAN1:
EDIT=:
   COLUMN_WIDTH=7,STUB_WIDTH=30,-COLUMN_TNA,STUB_PREFACE=STUBTOP1,
   STATISTICS_DECIMALS=2,-PERCENT_SIGN,DO_STATISTICS=1 }
BANNER=:
   |            SEX              AGE
   |         <=========>  <=================>
   | TOTAL   MALE FEMALE  18-30  31-50  51-70
   | -----   ---- ------  -----  -----  -----}
COLUMN=: TOTAL WITH [5^1/2] WITH [6^1//3]
}

And here is this example:

TABLE_SET= { TAB801:
STATISTICS=: D=1,2 D=4,5 D=7,8
HEADER=:
  TABLE WITH DIRECT STATISTICAL TESTING OF ROWS AT THE 95%
  CONFIDENCE LEVEL}
TITLE=: PREFERENCE OF PRODUCTS }
TITLE_4=: BASE= TOTAL SAMPLE }
STUB=:
  [COMMENT,UNDERLINE] FIRST TEST
  [STUB_INDENT=2] PREFER BRAND A
  [STUB_INDENT=2] PREFER BRAND B
  [STUB_INDENT=2] NO PREFERENCE A VS B
  [COMMENT,UNDERLINE] SECOND TEST
  [STUB_INDENT=2] PREFER BRAND C
  [STUB_INDENT=2] PREFER BRAND D
  [STUB_INDENT=2] NO PREFERENCE C VS D
  [COMMENT,UNDERLINE] THIRD TEST
  [STUB_INDENT=2] PREFER BRAND E
  [STUB_INDENT=2] PREFER BRAND F
  [STUB_INDENT=2] NO PREFERENCE E VS F
}
ROW=: [7,8,9^1/2/X]
STORE_TABLES=* }

Here is the table that is printed:

TABLE WITH DIRECT STATISTICAL TESTING OF ROWS AT THE 95% CONFIDENCE LEVEL
TABLE 801
PREFERENCE OF PRODUCTS
TITLE_4=: BASE= TOTAL SAMPLE

                                 SEX               AGE
                              <=========>  <=================>
                      TOTAL   MALE FEMALE  18-30  31-50  51-70
                      -----   ---- ------  -----  -----  -----
TOTAL                   500    251    249    140    223    101
                      100.0  100.0  100.0  100.0  100.0  100.0
                          %      %      %      %      %      %
FIRST TEST
----------
(a) PREFER BRAND A      236     111    125     88     98     30
                       47.2    44.2   50.2   62.9   43.9   29.7

(b) PREFER BRAND B      214     113    101     43    101     59
                       42.8    45.0   40.6   30.7   45.3   58.4
                                                s             s
NO PREFERENCE A VS B     50      27     23      9     24     12
                       10.0    10.8    9.2    6.4   10.8   11.9
SECOND TEST
-----------
(d) PREFER BRAND C      266     125    141     78    121     48
                       53.2    49.8   56.6   55.7   54.3   47.5

(e) PREFER BRAND D      190     108     82     49     84     43
                       38.0    43.0   32.9   35.0   37.7   42.6
                          s              s      s      s
 
NO PREFERENCE C VS D     44     18     26     13     18     10
                        8.8    7.2   10.4    9.3    8.1    9.9

THIRD TEST
----------
(g) PREFER BRAND E      248    132    116     87     93     44
                       49.6   52.6   46.6   62.1   41.7   43.6

(h) PREFER BRAND F      187     87    100     40     96     42
                       37.4   34.7   40.2   28.6   43.0   41.6
                          s      s             s

NO PREFERENCE E VS F     65     32     33     13     34     15
                       13.0   12.7   13.3    9.3   15.2   14.9
---------------------------------
(sig=.05) (all_pairs) rows tested a/b, d/e, g/h

Notice the “s” in the FEMALE column underneath the PREFER BRAND D row. This indicates that there is a significant difference between PREFER BRAND C and PREFER BRAND D for females. The blank under the MALE column in that row indicates that there is no significant difference for males. Also notice the additional lower case letter assigned to each row that was tested. This allows easy identification of which rows were tested against each other when compared to the footnote that prints at the bottom of the page.

8.8.2 Distributed Preference Testing

A distributed preference test allows a “No Preference” (or similar neutral third category) to be distributed between the two original categories while ensuring the integrity of the underlying statistical test. This is usually done for cosmetic purposes so that the percentages of the two preference categories add up to 100 percent.

The rules for the STATISTICS statement are the same as for the direct comparison, except the letter P is used instead of a D. To do a distributed preference test on rows 1 and 2, use the following STATISTICS statement:

STATISTICS= ROWSTAT4: P=1,2

In a distributed preference test, the SELECT_VALUE function is used to define the row variable. This ensures that the “No preference” response is evenly divided between the two categories (see 9.3.2 Functions for more information on the SELECT_VALUE function). A typical row definition might look like this:

ROW=: SELECT_VALUE([7^1/X],VALUES(1,.5)) WITH &
SELECT_VALUE([7^2/X],VALUES(1,.5))

This causes the X punch (“No preference”) to have a value of .5 for both categories, splitting it evenly between the two.

As with the direct comparison, significant differences are marked with an “s” underneath the second row being tested. However, unlike the direct comparison test, small (not significant) differences are marked with a lower case “ns” and statistically equal rows are marked with a lower case “e”.

The following example uses a distributed preference test to compare the same rows used in Table 801. Note the difference in the row variable definition.

TABLE_SET= { TAB802:
STATISTICS=: P=1,2 P=3,4 P=5,6
HEADER=:
  TABLE WITH DISTRIBUTED PREFERENCE TESTING OF ROWS AT THE 95% CONFIDENCE LEVEL}
TITLE=: PREFERENCE OF PRODUCTS }
TITLE_4=: BASE= TOTAL SAMPLE }
STUB=:
  [COMMENT,UNDERLINE] FIRST TEST
  [STUB_INDENT=2] PREFER BRAND A
  [STUB_INDENT=2] PREFER BRAND B
  [COMMENT,UNDERLINE] SECOND TEST
  [STUB_INDENT=2] PREFER BRAND C
  [STUB_INDENT=2] PREFER BRAND D
  [COMMENT,UNDERLINE] THIRD TEST
  [STUB_INDENT=2] PREFER BRAND E
  [STUB_INDENT=2] PREFER BRAND F
}
ROW=: SELECT_VALUE([7^1/X],VALUES(1,.5)) WITH &
  SELECT_VALUE([7^2/X],VALUES(1,.5)) WITH &
  SELECT_VALUE([8^1/X],VALUES(1,.5)) WITH &
  SELECT_VALUE([8^2/X],VALUES(1,.5)) WITH &
  SELECT_VALUE([9^1/X],VALUES(1,.5)) WITH &
  SELECT_VALUE([9^2/X],VALUES(1,.5))
STORE_TABLES=* }

Here is the table that is printed:

TABLE WITH DISTRIBUTED PREFERENCE TESTING OF ROWS AT THE 95% CONFIDENCE LEVEL
TABLE 802
PREFERENCE OF PRODUCTS
BASE= TOTAL SAMPLE

                                 SEX               AGE
                              <=========>  <=================>
                      TOTAL   MALE FEMALE  18-30  31-50  51-70
                      -----   ---- ------  -----  -----  -----
TOTAL                   500    251    249    140    223    101
                      100.0  100.0  100.0  100.0  100.0  100.0
                          %      %      %      %      %      %
FIRST TEST
----------
(a) PREFER BRAND A      261    124    136     92    110     36
                       52.2   49.6   54.8   66.1   49.3   35.6

(b) PREFER BRAND B      239    126    112     48    113     65
                       47.8   50.4   45.2   33.9   50.7   64.4
                          e      e     ns      s      e      s
SECOND TEST
-----------
(c) PREFER BRAND C      288    134    154     84    130     53
                       57.6   53.4   61.8   60.4   58.3   52.5

(d) PREFER BRAND D      212    117     95     56     93     48
                       42.4   46.6   38.2   39.6   41.7   47.5
                          s     ns      s      s      s      e
THIRD TEST
----------
(e) PREFER BRAND E      280    148    132     94    110     52
                       56.1   59.0   53.2   66.8   49.3   51.0

(f) PREFER BRAND F      220    103    116     46    113     50
                       43.9   41.0   46.8   33.2   50.7   49.0
                          s      s     ns      s      e      e
---------------------------------
(sig=.05) (all_pairs) rows tested a/b, c/d, e/f

Compare this table with Table 801 and notice how the frequency and percentages have changed. The numbers in this table equal the sum of the numbers in Table 801 plus half of the numbers that were in the NO PREFERENCE row. Also notice that “s” appears in the same place, but that cells that were previously blank now contain either an “ns” or “e”, depending upon the difference of the two cells. One additional thing to notice is that the footnote for this table is exactly the same as the one from Table 801, so the only way to tell which test was done is by looking to see if there are any of the “ns” or “e” markings on the table.

8.9 CHI-SQUARE AND ANOVA TESTS

Statistical significance testing is often desirable as a part of cross-tabulation reporting. Such testing is used to determine whether or not a statistically significant relationship exists between two or more tabulated factors. Tests commonly used for this are chi-square and analysis of variance (ANOVA). ANOVA tests for significance between means. So, if significance testing is desired on any question for which means are calculated, ANOVA would be the likely choice. The chi-square test is for significance between parts of the column axis (banner) and the entire row axis (stub). It would be chosen for questions where calculation of means is not applicable. For both ANOVA and chi-square testing, row categories must be mutually exclusive, as must column categories within tested parts of the banner.

The ANOVA and chi-square tests discussed here are invoked by EDIT= statements rather than by variable/axis definition expressions and are performed by Mentor at the time of printing rather than at the time of numeric calculation. Run times are shortened by this approach thus improving overall efficiency. There are occasions when having statistics computed at the time of numerical calculation is required. A discussion of invoking statistical testing with variable/axis definition expressions can be found in the Appendix B: TILDE COMMANDS.

The following are some relevant keywords for creating tables with ANOVA and chi-square tests.

EDIT
Used in the ~DEFINE block, this controls numerous printing and percentage options. Each table can have its own EDIT statement, so options can be changed as required by varying question types. Some options that pertain to ANOVA and chi-square testing are:

TABLE_TESTS=<region> is used to specify table regions to be tested. Text labeling for the test is included in the region definition. The EDIT statement must include a separate TABLE_TESTS= command for each region tested.

-TABLE_TESTS causes statistical testing not to be performed. This option should be included in an EDIT statement separate from that which includes TABLE_TESTS=<region> options, and invoked for tables that do not require statistical testing.

COLUMN_STATISTICS_VALUES=VALUES(<values>) assigns response weights to the row categories. These weights are used in the calculation of statistics such as mean and standard deviation, and for ANOVA testing.

MEAN, STD, ANOVA, and CHI_SQUARE cause the corresponding statistical calculations to be performed and printed as part of the table. (STD is the abbreviation for standard deviation.)

-CHI_SQUARE_ANOVA_FORMAT places chi-square and ANOVA statistics in list form after the corresponding table. If this option is not used, these statistics will be printed directly under the table regions tested, provided the regions are wide enough. ANOVA and chi-square statistics for table regions with narrow column widths, such as yes/no questions in the banner, will not print and will cause Mentor to generate error messages.

SHOW_SIGNIFICANCE_ONLY causes only the significance to show under the table regions tested. This will not work with -CHI_SQUARE_ANOVA_FORMAT, so it cannot be done in list form.

MARK_CHI_SQUARE marks cells as significant based on chi-square testing. This is an alternative to Newman_Keuls, ANOVA_SCAN, and ALL_PAIRS testing. For each significant CHI_SQUARE test on the table, a formula is used to determine which cells are the most extreme. For bi-level testing, the process is repeated.

The syntax is: EDIT={edit1: MARK_CHI_SQUARE=abcde } See Mentor, Volume II, ~DEFINE EDIT for more details and examples.

LOCAL_EDIT=<name>
Used in the ~EXECUTE block, this invokes a previously defined EDIT statement. Options specified in the EDIT statement and also named in a LOCAL_EDIT command will take precedence over the same options in any other EDIT statements previously invoked. Options that are in a previously invoked EDIT statement, but not in the LOCAL_EDIT command will stay in affect. If ~SET DROP_LOCAL_EDIT is used, a LOCAL_EDIT command is in effect only for the first table following it.

STUB
Text that will label each vertical category on the printed tables is defined using this statement. Control is offered over various options, but one is of particular interest:

[-COLUMN_STATISTICS_VALUES] excludes that category from statistical calculations. It is often used to exclude the “Don’t know” category. More detailed descriptions of the capabilities and syntax of these keywords can be found in Appendix B: TILDE COMMANDS, STUB=.

The example that follows illustrates how to create tables with ANOVA and chi-square tests. Also demonstrated are the following design characteristics:

Separate definition of table regions for testing makes the specs more readable and easier to understand should maintenance be required in the future.
Region definitions ($R) are created to be banner (column) specific, but not stub (row) specific by always typing “1 to LAST” for the row part of the definition. This way, it is necessary to type the column parts of the region definitions for a banner only once since the same banner regions are tested each time a specific banner is used. Testing of row categories is then controlled on a question-by-question basis through use of [-COLUMN_STATISTICS_VALUES].
Separate EDIT statements invoked by LOCAL_EDIT commands control which statistical tests are to be performed on each question.
For banner 1, the -CHI_SQUARE_ANOVA_FORMAT option is used to print ANOVA and chi-square statistics in list form following their corresponding tables.

~INPUT DATACLN
~SET AUTOMATIC_TABLES,DROP_LOCAL_EDIT
~DEFINE
EDIT={STATS_OFF: -TABLE_TESTS }
''Banner 1 definitions
''                                                column         row
'' Stat tests must be labeled                     region      region
'' since they will appear in list form         in banner     in stub
'' ------------------------------------        ---------     -------
BAN1_REG1: [$T="STAT TEST FOR SERVICE TYPE     " $R 2 TO 3   BY 1 TO LAST]
BAN1_REG2: [$T="STAT TEST FOR NEW SERVICE      " $R 4 TO 5   BY 1 TO LAST]
BAN1_REG3: [$T="STAT TEST FOR TAX PREPARATION  " $R 6 TO 7   BY 1 TO LAST]
BAN1_REG4: [$T="STAT TEST FOR FREQUENCY OF USE " $R 8 TO 10  BY 1 TO LAST]
BAN1_REG5: [$T="STAT TEST FOR SALES            " $R 11 TO 14 BY 1 TO LAST]

EDIT={ BAN1_EDIT: -COLUMN_TNA,PERCENT_DECIMALS=0,
  COLUMN_WIDTH=5,STUB_WIDTH=25,
  -CHI_SQUARE_ANOVA_FORMAT, ''puts stat tests in
                            ''list following table
TABLE_TESTS=BAN1_REG1,TABLE_TESTS=BAN1_REG2,
TABLE_TESTS=BAN1_REG3,TABLE_TESTS=BAN1_REG4,
TABLE_TESTS=BAN1_REG5
}
BANNER={BAN1_BANNER:
''          REG1       REG2          REG3         REG4                 REG5
''        <------->  <------->    <------->  <--------------> <--------------------->
|          SERVICE      NEW          TAX         FREQUENCY             SALES
|            TYPE     SERVICE       PREP.         OF USE      =======================
|         =========  ==========   =========   ===============       500-    1-
|                     SW    NW                     ME-        <500   <1    4.9    5+
| TOTAL   NEW   OLD  AREA  AREA   YES    NO   LOW  DIUM  HIGH  MIL.  BIL.  BIL.  BIL.
| -----   ---   ---  ----  ----   ---    --   ---  ----  ----  ----  ----  ----  ----
}
BAN1_COL: TOTAL WITH &
   [199^1/2] WITH &
   ([217#S] AND [15^1,2]) WITH &
   ([199^1] AND (([217#S] AND &
   [15^N1,2]) OR [217#L])) WITH &
   [78^1/2] WITH &
   [147^1,2,6/3/4,5] WITH &
   [182.8#1-499999/500000-999999/1000000-4999999/5000000-99999999]
''End of banner 1 definitions
''Banner 2 definitions
''               column   column
''               region   region
''             in banner  in stub
''             --------- ---------
BAN2_REG1: [$R 2 TO 4 BY 1 TO LAST]
BAN2_REG2: [$R 5 TO 8 BY 1 TO LAST]
''Since -CHI-SQUARE-ANOVA-FORMAT is not used in this edit
''statement, statistics tests for BAN2 will appear under their
''corresponding table regions.
EDIT={BAN2_EDIT:-COLUMN_TNA,PERCENT_DECIMALS=0,
   COLUMN_WIDTH=5,STUB_WIDTH=25,
   TABLE_TESTS=BAN2_REG1,TABLE_TESTS=BAN2_REG2
}
BANNER={BAN2_BANNER:
''             REG1                 REG2
''       <-------------->  <-------------------->
|           FREQUENCY              SALES
|            OF USE        ======================
|        ================       500-    1-
|               ME-        <500   <1   4.9    5+
| TOTAL   LOW  DIUM  HIGH  MIL.  BIL.  BIL.  BIL.
| -----  ----  ----  ----  ----  ----  ----  ----
}
BAN2_COL: TOTAL WITH [147^1,2,6/3/4,5] WITH &
  [182.8#1-499999/500000-999999/1000000-4999999/5000000-99999999]
''End of banner 2 definitions
''Stub definitions
''Question 4
TITLE={Q4_TITLE:
Q4. Please indicate on a scale from 1 to 4 how satisfied you are with  
your overall relationship with this company.
}
STUB={Q4_STUB:
   4 - Very satisfied
   3 - Satisfied
   2 - Dissatisfied
   1 - Very dissatisfied
   [-COLUMN_STATISTICS_VALUES] Don't know/not sure
''excluded from
''statistics
}
Q4_ROW: [163^4//1/Y]
EDIT={Q4_EDIT:
   COLUMN_STATISTICS_VALUES=VALUES(4,3,2,1),MEAN,STD,ANOVA
}

''Question 19
TITLE={Q19_TITLE:
Q19. Does this company prepare taxes?
}
STUB={Q19_STUB:
   Yes
   No
   [-COLUMN_STATISTICS_VALUES] Don't know/not sure
''excluded from
''statistics
}
Q19_ROW: [78^1//3]
EDIT={Q19_EDIT: CHI_SQUARE }
EDIT={Q19SIG_EDIT: CHI_SQUARE,SHOW_SIGNIFICANCE_ONLY
}
>CREATE_DB TABLES
>PRINT_FILE TABLES
~EXECUTE
''BAN1's EDIT statement causes statistics tests to be printed as
''lists after the corresponding tables.
BANNER=ban1_banner,EDIT=ban1_edit,COLUMN=ban1_col
''Question 4
LOCAL_EDIT=q4_edit,TITLE=q4_title,STUB=q4_stub,ROW=q4_row
''Question 19 with statistics test
LOCAL_EDIT=q19_edit,TITLE=q19_title,STUB=q19_stub,ROW=q19_row
''BAN2's EDIT statement allows statistics tests to be printed
''under their corresponding table regions (default).
BANNER=ban2_banner,EDIT=ban2_edit,COLUMN=ban2_col
''Question 19 with statistics test
LOCAL_EDIT=q19_edit,TITLE=q19_title,STUB=Q19_STUB,ROW=Q19_ROW
''Question 19 with statistics test showing significance only
LOCAL_EDIT=q19sig_edit,TITLE=q19_title,STUB=q19_stub,ROW=q19_row
''Question 19 without statistics test
LOCAL_EDIT=stats_off,TITLE=q19_title,STUB=q19_stub,ROW=q19_row
RESET,PRINT_ALL

Here are the tables that are printed:

TABLE 001
Q4. Please indicate on a scale from 1 to 4 how satisfied
you are with your overall relationship with this company.

                          SERVICE    NEW       TAX      FREQUENCY           SALES
                           TYPE    SERVICE     PREP.      OF USE     ===================
                          ======= =========  ========  =============    500-    1-
                                   SW   NW                  ME-     <500  <1   4.9   5+
                  TOTAL  NEW  OLD AREA AREA  YES   NO  LOW DIUM HIGH MIL. BIL. BIL. BIL.
                  -----  ---  --- ---- ----  ---   --  --- ---- ---- ---- ---  ---- ----
Total               151   96   55   12   84   64   79   56   30   41   37   18   49   25
                    100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%

N/A                   -    -    -    -    -    -    -    -    -    -    -    -    -    -

4 - Very satisfied   31   22    9    -   22   12   19    9    7    9   12    3    7    3
                     21%  23%  16%       26%  19%  24%  16%  23%  22%  32%  17%  14%  12%

3 - Satisfied        68   39   29    6   33   34   30   25   16   17   12   12   22   12
                     45%  41%  53%  50%  39%  53%  38%  45%  53%  41%  32%  67%  45%  48%

2 - Dissatisfied     38   28   10    5   23   11   25   16    5   11   12    1   14    6
                     25%  29%  18%  42%  27%  17%  32%  29%  17%  27%  32%   6%  29%  24%

1 - Very dissatisfied 8    4    4    1    3    5    2    4    1    2    -    1    4    2
                      5%   4%   7%   8%   4%   8%   3%   7%   3%   5%        6%   8%   8%

Don't know/not sure   6    3    3    -    3    2    3    2    1    2    1    1    2    2
                      4%   3%   5%        4%   3%   4%   4%   3%   5%   3%   6%   4%   8%

Mean                2.8  2.8   2.8 2.4  2.9  2.9  2.9  2.7  3.0  2.8  3.0  3.0  2.7  2.7
Standard Deviation  0.8  0.8   0.8 0.7  0.8  0.8  0.8  0.8  0.8  0.8  0.8  0.7  0.8  0.8

STAT TEST FOR SERVICE TYPE
anova = 0.02, df1,df2 = (1,143) prob = 0.8694
STAT TEST FOR NEW SERVICE
anova = 3.83, df1,df2 = (1,91) prob = 0.0504
STAT TEST FOR TAX PREPARATION
anova = 0.01, df1,df2 = (1,136) prob = 0.9204
STAT TEST FOR FREQUENCY OF USE
anova = 1.10, df1,df2 = (2,119) prob = 0.3374
STAT TEST FOR SALES
anova = 1.50, df1,df2 = (3,119) prob = 0.2178

TABLE 002
Q19. Does this company prepare taxes?

                          SERVICE    NEW       TAX      FREQUENCY           SALES
                           TYPE    SERVICE     PREP.      OF USE     ===================
                          ======= =========  ========  =============    500-    1-
                                   SW   NW                  ME-     <500  <1   4.9   5+
                  TOTAL  NEW  OLD AREA AREA  YES   NO  LOW DIUM HIGH MIL. BIL. BIL. BIL.
                  -----  ---  --- ---- ----  ---   --  --- ---- ---- ---- ---  ---- ----
Total               151   96   55   12   84   64   79   56   30   41   37   18   49   25
                    100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%

N/A                   -    -    -    -    -    -    -    -    -    -    -    -    -    -

Yes                  64   23   41    4   19   64    -    25   14   22   15   11   18   14
                     42%  24%  75%  33%  23% 100%        45%  47%  54%  41%  61%  37%  56%

No                   79   69   10    8   61    -   79    31   15   15   20    5   27   11
                     52%  72%  18%  67%  73%      100%   55%  50%  37%  54%  28%  55%  44%

Don't know/not sure   8    4    4    -    4    -    -     -    1    4    2    2    4    -
                      5%   4%   7%        5%                   3%  10%   5%  11%   8%

STAT TEST FOR SERVICE TYPE
chi_square = 38.51, d_f = 1, prob = 0.0000
STAT TEST FOR NEW SERVICE
chi_square = 0.13, d_f = 1, prob E<5
STAT TEST FOR TAX PREPARATION
chi_square = 138.98, d_f = 1, prob = 0.0000
STAT TEST FOR FREQUENCY OF USE
chi_square = 2.00, d_f = 2, prob = 0.3694
STAT TEST FOR SALES
chi_square = 4.93, d_f = 3, prob = 0.1764

TABLE 003
Q19. Does this company prepare taxes?

                           FREQUENCY              SALES
                              OF USE      ======================
                         ================      500-    1-
                               ME-        <500   <1   4.9    5+
                 TOTAL   LOW  DIUM  HIGH  MIL.  BIL.  BIL.  BIL.
                 -----   ---  ----  ----  ----  ----  ----  ----

Total              151    56    30    41    37    18    49    25
                   100%  100%  100%  100%  100%  100%  100%  100%

N/A                  -     -     -     -     -     -     -     -

Yes                 64    25    14    22    15    11    18    14
                    42%   45%   47%   54%   41%   61%   37%   56%

No                  79    31    15    15    20     5    27    11
                    52%   55%   50%   37%   54%   28%   55%   44%

Don't know/not sure  8     -     1     4     2     2     4     -
                     5%         3%    10%    5%   11%    8%

CHI-SQUARE:             <--   2.00   -->   <--     4.93      -->
D.F.:                          2                    3
SIG:                         0.3694               0.1764

TABLE 004
Q19. Does this company prepare taxes?

                            FREQUENCY              SALES
                              OF USE      ======================
                         ================      500-    1-
                               ME-        <500   <1   4.9    5+
                 TOTAL   LOW  DIUM  HIGH  MIL.  BIL.  BIL.  BIL.
                 -----   ---  ----  ----  ----  ----  ----  ----
Total              151    56    30    41    37    18    49    25
                   100%  100%  100%  100%  100%  100%  100%  100%

N/A                  -     -     -     -     -     -     -     -

Yes                 64    25    14    22    15    11    18    14
                    42%   45%   47%   54%   41%   61%   37%   56%

No                  79    31    15    15    20     5    27    11
                    52%   55%   50%   37%   54%   28%   55%   44%

Don't know/not sure  8     -     1     4     2     2     4     -
                     5%          3%   10%    5%   11%    8%

CHI-SQUARE (SIG):       <--  0.3694  -->    <--    0.1764    -->

TABLE 005
Q19. Does this company prepare taxes?


                            FREQUENCY              SALES
                              OF USE      ======================
                         ================      500-    1-
                               ME-        <500   <1   4.9    5+
                 TOTAL   LOW  DIUM  HIGH  MIL.  BIL.  BIL.  BIL.
                 -----   ---  ----  ----  ----  ----  ----  ----
Total              151    56    30    41    37    18    49    25
                   100%  100%  100%  100%  100%  100%  100% 1 00%

N/A                  -     -     -     -     -     -     -     -

Yes                 64    25    14    22    15    11    18    14
                    42%   45%   47%   54%   41%   61%   37%   56%

No                  79    31    15    15    20     5    27    11
                    52%   55%   50%   37%   54%   28%   55%   44%

Don't know/not sure  8     -     1     4     2     2     4     -
                     5%          3%   10%   5%    11%    8%

DISCUSSION OF OUTPUT

While interpretation of these statistical tests for specific reports is beyond the scope of this manual, some explanation of the results might he helpful.

For table 001, output from the ANOVA was as follows:

STAT TEST FOR SERVICE TYPE

anova = 0.02, df1,df2 = (1,143) prob = 0.8694
STAT TEST FOR NEW SERVICE
anova = 3.83, df1,df2 = (1,91) prob = 0.0504
STAT TEST FOR TAX PREPARATION
anova = 0.01, df1,df2 = (1,136) prob = 0.9204
STAT TEST FOR FREQUENCY OF USE
anova = 1.10, df1,df2 = (2,119) prob = 0.3374
STAT TEST FOR SALES
anova = 1.50, df1,df2 = (3,119) prob = 0.2178

The number following “ANOVA=” is the value of the F statistic for the region tested. “df1,df2” are the degrees of freedom for the F statistic’s numerator and denominator respectively. “prob=” is the probability of there not being a more than coincidental relationship between the factors tested.

To check whether the desired row and banner categories are in fact being used in the ANOVA calculation, the following equations can be used:

df1 = (the number of banner points included in the ANOVA) – 1

df2 = (the sum of frequencies of all cells included in the ANOVA) – (the number of banner points included in the ANOVA)

If a column is blank, it is not considered as being included in the test.

Checking the “STAT TEST FOR SALES” ANOVA above, there are four banner points included in the test, resulting in

df1 = 4 – 1 = 3

Adding together all cells included in this test (“DON”T KNOW/NOT SURE” is excluded), ”<500 MIL”=36, “500-<1 BIL.”=17, “1-4.9 BIL.”=47, and “5+ BIL.”=23.

The number of banner points is four, resulting in

df2 = (36 + 17 + 47 + 23) – 4 = 119.

For table 002, the chi-square test produced the following output:

STAT TEST FOR SERVICE TYPE
chi_square = 38.51, d_f = 1, prob = 0.0000
STAT TEST FOR NEW SERVICE
chi_square = 0.13, d_f = 1, prob E<5
STAT TEST FOR TAX PREPARATION
chi_square = 138.98, d_f = 1, prob = 0.0000
STAT TEST FOR FREQUENCY OF USE
chi_square = 2.00, d_f = 2, prob = 0.3694
STAT TEST FOR SALES
chi_square = 4.93, d_f = 3, prob = 0.1764

Similar to the ANOVA output, the number following “chi_square=” is the value of the chi-square statistic for the region tested. “d_f” represents the degrees of freedom for this statistic. The same facts apply to “prob=” as did for ANOVA.

“E<5” means that the expected value of the frequency for 25% or more of the cells in the tested region is less than 5, possibly making probability calculations for the region invalid.

Checking that the desired categories are being tested is easier for the chi-square test is easier than for the ANOVA. For degrees of freedom, the following equation applies:

d_f = (number of stub points included in the test – 1) * (number of banner points included in the test – 1).

If a row or column is blank, it is not considered as being included in the test.

Using “STAT TEST FOR SALES” as an example, “number of stub points included in the test”=2 since “DON’T KNOW/NOT SURE” is excluded, and “number of banner points included in the test”=4. Putting these numbers in the equation:

d_f = (2 – 1) * (4 – 1) = 1 * 3 = 3.

OTHER ANOVA AND CHI-SQUARE OPTIONS

It is possible to mix ANOVA and chi-square tests on the same table. This is true only when the -CHI_SQUARE_ANOVA_FORMAT option is used to output the results in list format. Here is an example of an EDIT statement that would do this. The regions have been defined as in the above example.

EDIT={Q47_EDIT2:
      CHI_SQUARE=BAN1_REG1
      CHI_SQUARE=BAN1_REG2
      CHI_SQUARE=BAN1_REG3
      CHI_SQUARE=BAN1_REG4
      ANOVA=BAN1_REG5
}

NOTE: An alternative way of defining regions and invoking statistical tests is as follows:

EDIT={EDIT1:
     TABLE_TESTS=[$R 1 TO 2 BY 1 TO LAST]
     TABLE_TESTS=[$R 3 TO 4 BY 1 TO LAST]
     TABLE_TESTS=[$R 5 TO 6 BY 1 TO LAST]
     ANOVA
}

If this method is used, Mentor will not allow both ANOVA and chi-square on the same table.

Two other interesting possibilities are testing overlapping ranges and testing columns that are not next to each other. Again, -CHI_SQUARE_ANOVA_FORMAT must be used. Also, doing this might require some creative labeling of the horizontal axis. Example specs follow.

Here are some example specs:

BAN3_REG1: [$T="GENDER" $R 2 TO 3 BY 1 TO LAST]
BAN3_REG2: [$T="DRIVE CAR" $R 4 TO 5 BY 1 TO LAST]
BAN3_REG3: [$T="1-10, 11-49, & 50+ YRS OF DRIVING" $R 6,7,9 BY 1 TO LAST]
BAN3_REG4: [$T="<50 & 50+ YRS OF DRIVING" $R 8,9 BY 1 TO LAST]

''BAN3_REG3 defines columns which are not next to each other.
''BAN3_REG4 defines a region that overlaps BAN3_REG3.

EDIT={BAN3_EDIT: -COLUMN_TNA,PERCENT_DECIMALS=0
      COLUMN_WIDTH=7,STUB_WIDTH=25
      -CHI_SQUARE_ANOVA_FORMAT
      TABLE_TESTS=BAN1_REG1
      TABLE_TESTS=BAN1_REG2
      TABLE_TESTS=BAN1_REG3
      TABLE_TESTS=BAN1_REG4
}
BANNER={BAN3_BANNER:
|            GENDER        DRIVE CAR        YEARS OF DRIVING
|          ===========    ==========   =========================
|  TOTAL   MALE FEMALE    YES     NO   1-10  11-49    <50    50+
|  -----   ---- ------    ---     --   ----  -----    ---    ---
}
BAN3_COL: TOTAL WITH &
   [5^1/2] WITH &
   [10^1/2] WITH &
   [30.2*P#1-10/11-49/1-49/50-99]
EDIT={Q49_EDIT:      ''Used as a LOCAL_EDIT to invoke tests for a
                     ''specific question.
   CHI_SQUARE=BAN3_REG1
   CHI_SQUARE=BAN3_REG2
   CHI_SQUARE=BAN3_REG3
   CHI_SQUARE=BAN3_REG4
}

8.10 NOTES ON SIGNIFICANCE TESTING

This section contains some additional notes on significance testing in order to help you understand some of the problems or errors that might occur.

When using other sources to verify significance testing accuracy, be aware that the majority of other programs or textbooks do not take into consideration the consequences of dependent or inclusive tests, the Newman-Keuls procedure, and pooled variances. Mentor uses the additional information available to produce more reliable results in general.

8.10.1 What Can and Cannot Be Tested

Significance testing can only be performed on means or percentages produced from simple frequencies. As a general rule, a “simple” frequency counts respondents, not responses, nor does it include any mathematical calculations. Basically each data case must return either a 1 (it is there) or a 0 (it is not there) for the cells being tested.

The only exceptions to this are either a Mean or a weighted table. You can test cells with different weighting schemes, but only if a given data case has the same weight everywhere in that test.

The following constructions will not produce simple frequencies and therefore cannot be tested for significance:

*L modifier
all summary statistics other than means
any arithmetic operation
sigma
sums
NUMBER_OF_ITEMS or any other number returning function

These rules apply to both ROW and COLUMN definitions. In addition, if you have any kind of summary statistic such as a $[MEAN] in the COLUMN variable, you cannot test any part of the table. If you have any of the above constructions in the ROW other than summary statistics, you will not be able to test any part of the table, unless you turn off the testing around that item using $[-DO_STATISTICS] (See Section 8.6 EXCLUDING ROWS/COLUMNS FROM THE SIGNIFICANCE TESTING).

Also the AXIS commands $[BREAK] and $[BASE] are acceptable in ROW definitions but not in COLUMN definitions. $[OVERLAY] and $[NETOVERLAY] tables can be tested. If you wish to do a test on responses, you must either use a $[OVERLAY] or a READPROCEDURE to read each data case multiple times.

Tables that are created or modified by table manipulation cannot be tested for significance. Market share tables which use sums cannot be tested. If you are trying to test items based on market share we recommend that you test the mean of the items. Testing a straight market share is very dangerous as a single outlier can easily skew the results.

Any construction can be tested using print phase tests, but remember that Mentor does not check or complain if your testing is illogical. For example, you should never use Mentor to test two numbers that were created as a calculation.

8.10.2 Degrees of Freedom

Degrees of freedom (df) is a measure of sample size for use in statistical tests. The higher the degrees of freedom, the more reliable the resulting t values are. The degrees of freedom should be calculated as follows:

Let’s assume a sample of size n (either simple count or effective_n) with n1 in group 1, n2 in group 2, and n_both in the overlap.

The degrees of freedom for the All Possible Pairs Test procedure, regardless of the variance specified, or for the Newman-Keuls procedure when only testing two groups (all simple t-tests), is calculated as follows:

no overlap, means df = n1 + n2 – 2

no overlap, percents df = n1 + n2 – 1

overlap, means or percents df = n1 + n2 – n_both – 1

When you use the T= (for inclusive tests) option to take out the overlap, the program utilizes the no overlap formula which results in n – 2 for means and n – 1 for percents. If one group is completely contained in the other it always results in n – 1.

The degrees of freedom for the Newman-Keuls procedure when testing more than two groups would be:

no overlap df = n1 + n2 – 2

overlap df = n1 + n2 – n_both – 1

8.10.3 Verifying Statistical Tests

It may be desirable to verify the results on tables that utilize Mentor’s statistical tests. We could use the formulas shown in Appendix A: STATISTICAL FORMULAS, but more often a cursory check that the correct columns and rows were tested is all that is needed.

The ~SET keyword STATISTICS_DUMP does this by sending a printout to the list file showing key elements of the statistical procedure.

For example, if the test performed was an independent test of the means which included a test of column A against column B, then a portion of the list file pertaining to this statistical test will look similar to this:

test 1 (216 len, 2 groups, err=0, base_row=0):I=AB row/col=(15,15; 1,2) means
ncases=28
group 1 12, 76, 564, 0,
group 2 16, 106, 778, 0,
effn,mean,std: 12 6.33333 2.74138 -- sumsq,sumsqadj,effn: 82.6667 1 12
effn,mean,std: 16 6.625 2.24722 -- sumsq,sumsqadj,effn: 75.75 1 16
tags: 1, 2,
getpoolv: 6.09295,2:26=158.417/26
doqs (6.09295,26) 0-1 tags: 1, 2,
-->0: (-0.437582,26)
SIGFAREA(26,-0.437582->0.05 for 1) returns 0.7572 from -0.309417 -> 0,0.7572
differences[ in AB: ]

This information can be used in numerous ways to check the statistical test performed. The first line tells us this is a test of two groups, the base row for statistics is the System Total row (base_row=0), it is an independent test of columns A and B (:I=AB), it is using row 15 and columns 1 and 2, and it is a test of means.

The number of cases in the statistics base is 28 (or 16 + 12). The line with –>0: (-0.437582,26) shows 0:(q-value, df). In the next line down, the “returns 0.7572 from -0.309417 → 0,0.7572” means “returns <significance> from <t-value>”.

A printout like the one shown above would be generated for each pair of columns tested.

If our test was a dependent test of columns A, B, C, D, E, and F for all percent rows and the mean row, then a portion of the list file pertaining to this statistical test would look similar to this:

test 1 (1232 len, 6 groups, err=0, base_row=0):ABCDEF
row/col=(3,3; 1,6) percents ncases=166
group 1 12, 4, 4, 0,
group 2 16, 6, 6, 0,
group 3 14, 2, 2, 0,
group 4 27, 13, 13, 0,
group 5 69, 9, 9, 0,
group 6 28, 9, 9, 0,
sxy matrix all zero
effn,mean,std: 12 0.333333 0.492366 --
sumsq,sumsqadj,effn: 2.66667 1 12
effn,mean,std: 16 0.375 0.5 -- sumsq,sumsqadj,effn: 3.75 1 16
effn,mean,std: 14 0.142857 0.363137 --
sumsq,sumsqadj,effn: 1.71429 1 14
effn,mean,std: 27 0.481481 0.509175 --
sumsq,sumsqadj,effn: 6.74074 1 27
effn,mean,std: 69 0.130435 0.339248 --
sumsq,sumsqadj,effn: 7.82609 1 69
effn,mean,std: 28 0.321429 0.475595 --
sumsq,sumsqadj,effn: 6.10714 1 28
tags: 1, 2, 3, 4, 5, 6, multiple comparisons
getpoolv: 0.1931,2:165=43/166
doq (0.1931,165) 0-5 tags: 1, 2, 3, 4, 5, 6,
-->0: (-0.351143,26) (1.55823,24) (-1.37423,37)(2.08774,79) (0.111041,38)
-->1: (2.04147,28) (-1.08619,41) (2.83657,83) (0.550136,42)
-->2: (-3.309,39) (0.136389,81) (-1.75572,40)
-->3: (4.97691,94) (1.90971,53)
-->4: (-2.74322,95)
doqs (0.1931,165) 0-5 tags: 1, 2, 3, 4, 5, 6,
-->0: (-0.316228,27) (1.59364,25) (-1.2021,38) (2.48386,80) (0.102869,39)
-->1: (1.99451,29) (-0.94989,42) (3.25042,84) (0.50417,43)
-->2: (-2.98179,40) (0.175692,82) (-1.73374,41)
-->3: (5.17632,95) (1.69734,54)
-->4: (-3.08478,96)
differences[ in ABCDEF: D vs E; ]
differences[ D vs E; ]

The first line tells us this is a test of six groups (ABCDEF), the base row is the System Total row (base_row=0), it is a dependent test of columns A, B, C, D, E, and F. The test is using row 3 and columns 1 to 6, and it is a test of percents. The line beginning with “doqs (0.1931,165)” shows:

(q-value,df) for
-->0: (A vs B) (A vs C) (A vs D) (A vs E) (A vs F)
-->1: (B vs C) (B vs D) (B vs E) (B vs F)
-->2: (C vs D) (C vs E) (C vs F)
-->3: (D vs E) (D vs F)
-->4: (E vs F)

A printout like this would be available for each row tested.

8.10.4 Error and Warning Messages

The following is a list of error and warning messages that can occur while doing statistical testing. Each message has a brief description of why it might occur and how to fix it.

(ERROR #603) printable_t collision with code C

This error is caused when the STATISTICS= PRINTABLE_T option is used, but the column list is not printable. Column “C” is probably used multiple times. Either do not print t values or check your column list.

(ERROR #5055) table T002, test 3 (=GHIJ) is an i= test but has cells 4 and 2

Test of columns GHIJ was marked as independent, but is dependent. Data is either dirty or you are using the wrong test.

(ERROR #5056) table T001, test 4 (=ABC) (at 1): duplicated value or missing base

Test of columns ABC has a row which has items that are not in the base row causing an illegal test which cannot be run. If you do not want to actually test that row, you can change the Error to either a warning or to be silent using the SET option MISSING_OR_DUPLICATED_BASE=ERROR/WARNING/OKAY

(ERROR #5091) Newman_Keuls test not ok with approximate significance 0.75

This means you cannot use the DO_STATISTICS APPROXIMATELY option in conjunction with the Newman-Keuls test.

(ERROR #5520) tables with COL/ROW_WEIGHT= cannot have STATS= without SET MULTIWGT

COLUMN_SHORT_WEIGHT or ROW_SHORT_WEIGHT has been used in conjunction with a STATISTICS statement. Use the SET option MULTIPLE_WEIGHT_STATISTICS to override.

(ERROR #5524) stat test 3 (=GHIJ) had an error during construction

Test of columns GHIJ was marked as inclusive, but is not. Data is either dirty or you are using the wrong test.

(ERROR #6736) table T011, test 1 (at 2) has conflicting weight values 1 vs 0.88

Cannot have a data case with multiple weights in a single test even if SET MULTIPLE_WEIGHTS_STATISTICS is used. Table 011 has a data case with weight of 1.00 and 0.88.

(ERROR #6831) col A has stats percent 102/372 different from vertical percent 102/295

There is a cell in column A in which the statistical base does not match the vertical percentage base. In particular the cell has a frequency of 102, the statistical base a value of 372, and the percentage base a value of 295.

(WARN #1141) to install this option, we have to clear out the existing tables first at (18):

SET option STATISTICS_BASE_AR or -STATISTICS_BASE_AR has been used in the middle of a run. No override possible.

(WARN #5157) we will use all_pairs tests as the default statistical test

No test was specified on the EDIT statement so the default All Possible Pairs Test will be used. Suppress this warning by specifying ALL_POSSIBLE_PAIRS_TEST on the main EDIT.

(WARN #5385) table T006 with STATS= TAB6_st but no stat tests to do or report

This warning can appear for multiple reasons. One is that there is a non-simple frequency in the table that was not tested.

(WARN #5621) MULTIPLE_WEIGHT override for stats testing: be careful!

Set option MULTIPLE_WEIGHT_STATS has been used. No override possible.

8.10.5 Commands Summary

The following is a list of all statements/keywords/options that affect statistical testing. Information about these keywords can be found in either Appendix B: TILDE COMMANDS or in the section reference mentioned on the right.

STATEMENT/KEYWORD/OPTION              SECTION
AXIS
----
$[BASE]                               8.2.2 and 8.4
$[EFFECTIVE_N]                        8.4
$[DO_STATISTICS]                      8.6.2
$[MEDIAN]                             7.2.2
$[INTERPOLATED_MEDIAN]                7.2.2
[col.wid^code/(STATISTICS)/code/code] 7.2

EDIT=
-----
ALL_POSSIBLE_PAIRS_TEST               8.3.1
ANOVA_SCAN                            8.3.3
DO_PRINTER_STATISTICS                 8.5.1
DO_STATISTICS                         8.1.3, 8.1.5, 8.1.6, and 8.1.7
DO_STATISTICS_TEST                    8.5.1
FISHER                                8.3.3
FLAG_MINIMUM_BASE                     8.6.3
MARK_CHI_SQUARE                       8.9
MINIMUM_BASE                          8.6.3
NEWMAN_KEULS_TEST                     8.3.2
PAIRED_VARIANCE                       8.3.4
POOLED_VARIANCE                       8.3.4
SEPARATE_VARIANCE                     8.3.4
USUAL_VARIANCE                        8.3.4

STATISTICS=
-----------
ABCD                                  8.1.1
I=ABCD                                8.1.1
T=ABCD                                8.1.1 and 8.1.8
PRINTABLE_T                           8.1.1 and 8.7.1
D=1,2                                 8.8.1
P=1,2                                 8.8.2
RM=ABCD                               8.3.3

STATEMENT/KEYWORD/OPTION SECTION

STUB=
-----
BASE_ROW                              8.2.1
DO_SIG_T=                             8.7.1
DO_STATISTICS=                        8.5.3
DO_T_TEST=                            8.7.1

SET OPTIONS
-----------
MEAN_STATISTICS_ONLY                  8.6.1
MISSING_OR_DUPLICATED_BASE
MULTIPLE_WEIGHT_STATISTICS            8.4.1
STATISTICS_BASE_AR                    8.2.1
STATISTICS_DUMP                       8.9.2