8.4 SIGNIFICANCE TESTING ON WEIGHTED TABLES When doing significance testing with weighted data, it is recommended that you create the effective base row, even when the percentage base is the System Weighted Total or the System Weighted Any Response row. The effective base row is needed to verify any tests on weighted data. The effective base is an estimation of how the weighting is affecting the test. It is the actual base number that is used when determining whether two samples are significantly different. The effective base will never be higher than the original unweighted base and will usually be slightly less. As the variance in the weights increases, the effective base decreases in order to compensate for the likely change in the percentages that will occur. Without this correction, some weighting factor could always be applied, which would make any item significantly greater than any other. Since the effective base is so integral to the test, it is recommended that it be printed on the table so that it can be determined how the weighting might be affecting the significance testing. There are two different ways to create the effective base row. You can use either the $[BASE] or $[EFFECTIVE_N] keywords. The $[BASE] keyword creates two different rows, both of which are needed for the significance tests: the weighted total (which is needed to properly calculate all the percentages) and the effective base (which is used as the base for the significance test). The $[EFFECTIVE_N] keyword can be used to create the effective base when the percentage base is either the System Total or Any Response rows. In this case you do not need to again specify the percentage base because the system has already calculated it. In the example below, the $[BASE] keyword is used to create the effective base. A new STUB_PREFACE is defined because the SET UNWEIGHTED_TOP option is used to also produce an unweighted total row. NOTE: The following set of commands defines a standard front end for the next set of examples >PURGE_SAME >PRINT_FILE STAT4 ~INPUT DATA ~SET DROP_LOCAL_EDIT,BEGIN_TABLE_NAME=T401 ~DEFINE TABLE_SET= { BAN1: EDIT=: COLUMN_WIDTH=7,STUB_WIDTH=30,-COLUMN_TNA,STATISTICS_DECIMALS=2, -PERCENT_SIGN,DO_STATISTICS=.95,RUNNING_LINES=1 } BANNER=: | GENDER AGE ADVERTISING AWARENESS | <=========> <=================> <=========================> | TOTAL MALE FEMALE 18-30 31-50 51-70 BRND A BRND B BRND C BRND D | —– —- —— —– —– —– —— —— —— —— } COLUMN=: TOTAL WITH [5^1/2] WITH [6^1//3] WITH [7^1//4] } Example: STUB= STUB_TOP_UNWGT: [-VERTICAL_PERCENT] UNWEIGHTED TOTAL [SUPPRESS] UNWEIGHTED NO ANSWER [SUPPRESS] WEIGHTED TOTAL [SUPPRESS] WEIGHTED NO ANSWER } TABLE_SET= { TAB401: WEIGHT=: SELECT_VALUE([6^1//3/X],VALUES(1.021,.880,1.130,1)) SET UNWEIGHTED_TOP STATISTICS=: I=BC,I=DEF,GHIJ; STUB_PREFACE= STUB_TOP_UNWGT HEADER=: WEIGHTED TABLE WITH STATISTICAL TESTING (USING BASE ROW) } TITLE=: RATING OF SERVICE } TITLE_4=: BASE= TOTAL SAMPLE } STUB=: [BASE_ROW,VERTICAL_PERCENT=*] WEIGHTED TOTAL (% BASE) [-VERTICAL_PERCENT] EFFECTIVE BASE (STAT BASE) NET GOOD | VERY GOOD | GOOD FAIR NET POOR | POOR | VERY POOR DON’T KNOW/REFUSED [STAT,LINE=0] MEAN [STAT,LINE=0] STD DEVIATION [STAT,LINE=0] STD ERROR } ROW=: $[BASE] TOTAL $[] [11^4,5/5/4/3/1,2/2/1/X] $[MEAN,STD,SE] [11] STORE_TABLES=* } Here is the table that is printed: WEIGHTED TABLE WITH STATISTICAL TESTING (USING BASE ROW) TABLE 401 RATING OF SERVICE BASE= TOTAL SAMPLE GENDER AGE ADVERTISING AWARENESS <=========> <=================> <=========================> TOTAL MALE FEMALE 18-30 31-50 51-70 BRND A BRND B BRND C BRND D —– —- —— —– —– —– —— —— —— —— UNWEIGHTED TOTAL 400 196 204 125 145 113 91 108 107 176 WEIGHTED TOTAL (% BASE) 400 194 206 128 128 128 91 108 106 176 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 % % % % % % % % % % (B) (C) (D) (E) (F) (G) (H) (I) (J) EFFECTIVE BASE (STAT BASE) 396 194 202 125 145 113 90 107 106 174 NET GOOD 200 120C 80 58 55 84DE 43 55 54 106G 50.1 62.0 38.9 45.6 43.4 65.5 47.5 51.3 51.0 60.6 VERY GOOD 102 69C 33 36 33 31 19 32 30 60G 25.4 35.7 15.8 28.0 26.2 23.9 21.1 29.5 27.8 33.9 GOOD 99 51 48 22 22 53DE 24 24 25 47 24.6 26.3 23.1 17.6 17.2 41.6 26.3 21.8 23.2 26.7 FAIR 89 42 47 29F 39F 16 19 20 23 35 22.3 21.8 22.8 22.4 30.3 12.4 20.7 18.3 21.7 19.9 NET POOR 83 17 65B 30 26 23 19 27J 22 24 20.7 8.9 31.7 23.2 20.7 17.7 20.4 24.9 20.8 13.4 POOR 39 8 30B 12 15 11 8 13 13 12 9.6 4.2 14.7 9.6 11.7 8.8 8.8 11.8 12.0 7.0 VERY POOR 44 9 35B 17 11 11 11 14J 9 11 11.0 4.7 17.0 13.6 9.0 8.8 11.6 13.1 8.8 6.4 DON’T KNOW/REFUSED 28 14 14 11 7 6 10 6 7 11 7.0 7.3 6.7 8.8 5.5 4.4 11.4 5.5 6.5 6.1 MEAN 3.47 3.91C 3.07 3.40 3.42 3.66 3.41 3.45 3.53 3.79GHI STD DEVIATION 1.31 1.12 1.35 1.41 1.28 1.22 1.31 1.40 1.30 1.20 STD ERROR 0.07 0.08 0.10 0.13 0.11 0.12 0.15 0.14 0.13 0.09 ——————————— (sig=.05) (all_pairs) columns tested BC, DEF, GHIJ In the preceding table, the unweighted total, weighted total, and effective base are all printed. Compare the three numbers and notice that the effective base is usually a little less than the unweighted total. This is because all the weights were close together (near 1.00), so the weighting did not substantially change the percentages on the table. Compare these percentages with those that were printed on Table 101. If the weights had a greater variance (for example, respondents were assigned weights between 5 and .2), the effective base would have been much less than the unweighted total. To see how the effective base works, look at the TOTAL column in the above table. Notice that the weighted and unweighted totals are both 400, because weights were chosen to weight the sample back to its original size. Also notice that the effective base is only 396, which is due to the minor variation in the weights that were applied to this table. The formula for the effective base is as follows: EB= WEIGHTED TOTAL SQUARED DIVIDED BY THE SUM OF THE SQUARE OF EACH WEIGHT Reproduce the number 396 from above by plugging in all the appropriate numbers from TABLE 401. EB= (WT)**2 / ( Fn*(Wn**2)) EB= ((400)**2) / ((125*(1.021**2)) + (145*(.880**2)) + (113*(1.13**2)) + (17*(1**2))) EB= 160000 / ( 130.30 + 112.29 + 144.29 + 17) EB= 160000 / 403.88 EB= 396.16 An important characteristic of the effective base is demonstrated in the 31-50 AGE column, where the unweighted total and the effective base are both 145 while the weighted total is only 128. Since the weighting on this table was based on AGE and everyone in that column was weighted by the same factor of 0.880, the weighted total drops to 128. However, since there is no variance in the weighting, the effective base remains unchanged. Furthermore, the percentages in that column are exactly the same as those in Table 101. Exactly the same table could be produced by using the $[EFFECTIVE_N] keyword instead of the $[BASE] keyword and a different STUB_PREFACE. STUB= STUB_TOP_WGT: [-VERTICAL_PERCENT] UNWEIGHTED TOTAL [SUPPRESS] UNWEIGHTED NO ANSWER [BASE_ROW] WEIGHTED TOTAL [SUPPRESS] WEIGHTED NO ANSWER } TABLE_SET= { TAB402: STUB_PREFACE= STUB_TOP_WGT HEADER=: WEIGHTED TABLE WITH STATISTICAL TESTING (USING EFFECTIVE_N) } TITLE=: RATING OF SERVICE } TITLE_4=: BASE= TOTAL SAMPLE } STUB=: [-VERTICAL_PERCENT] EFFECTIVE BASE (STAT BASE) NET GOOD | VERY GOOD | GOOD FAIR NET POOR | POOR | VERY POOR DON’T KNOW/REFUSED [STAT,LINE=0] MEAN [STAT,LINE=0] STD DEVIATION [STAT,LINE=0] STD ERROR } ROW=: $[EFFECTIVE_N] TOTAL $[] [11^4,5/5/4/3/1,2/2/1/X] $[MEAN,STD,SE] [11] STORE_TABLES=* } The printed table would be basically the same as Table 401.
8.4.1 Weighted Tables with Different Weights
When performing significance testing in conjunction with applying different weights to different columns, use the SET option MULTIPLE_WEIGHT_STATISTICS. This option allows significance testing on similarly weighted columns when the table has columns with varying weights. However, it does not allow a given respondent to have a different weight in the same test. You can test independent columns with different weights, but dependent columns must have the same weights applied to them. For instance, MULTIPLE_WEIGHT_STATISTICS allows significance testing on a table where both a weighted and unweighted total column have been created, but it does not allow the unweighted total to be tested against any of the weighted columns. If this statement is not used, then the program will print an error message if a STATISTICS statement is used in conjunction with a COLUMN_SHORT_WEIGHT or COLUMN_WEIGHT table element. The example below shows how to produce an unweighted total column and still do significance testing on the rest of the table. Notice in the STATISTICS statement that all the letters are one lower in the alphabet than previous statements because an additional category has been added to the column variable. TABLE_SET= { TAB403: HEADER=: TABLE WITH STATISTICAL TESTING AND DIFFERENT WEIGHTS APPLIED TO DIFFERENT COLUMNS OF THE TABLE } SET MULTIPLE_WEIGHT_STATISTICS COLUMN_SHORT_WEIGHT=: TOTAL WITH & SELECT_VALUE([6^1//3/X],VALUES(1.021,.880,1.130,1)) STATISTICS=: I=CD,I=EFG,HIJK; BANNER=: | SEX AGE ADVERTISING AWARENESS | UNWGHT WGHT <==========> <===================> <============================> | TOTAL TOTAL MALE FEMALE 18-30 31-50 51-70 BRND A BRND B BRND C BRND D | —— —– —- —— —– —– —– —— —— —— ——} COLUMN=: TOTAL WITH TOTAL WITH [5^1/2] WITH [6^1//3] WITH [7^1//4] TITLE= TAB402 TITLE_4= TAB402 STUB= TAB402 ROW= TAB402 STORE_TABLES=* } Here is the table that is printed: TABLE WITH STATISTICAL TESTING AND DIFFERENT WEIGHTS APPLIED TO DIFFERENT COLUMNS OF THE TABLE TABLE 403 RATING OF SERVICE BASE= TOTAL SAMPLE SEX AGE ADVERTISING AWARENESS UNWGHT WGHT <==========> <===================> <============================> TOTAL TOTAL MALE FEMALE 18-30 31-50 51-70 BRND A BRND B BRND C BRND D —— —– —- —— —– —– —– —— —— —— —— UNWEIGHTED TOTAL 400 400 196 204 125 145 113 91 108 107 176 WEIGHTED TOTAL 400 400 194 206 128 128 128 91 108 106 176 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 % % % % % % % % % % % (C) (D) (E) (F) (G) (H) (I) (J) (K) EFFECTIVE BASE (STAT BASE) 400 396 194 202 125 145 113 90 107 106 174 NET GOOD 197 200 120D 80 58 55 84EF 43 55 54 106H 49.2 50.1 62.0 38.9 45.6 43.4 65.5 47.5 51.3 51.0 60.6 VERY GOOD 102 102 69D 33 36 33 31 19 32 30 60H 25.5 25.4 35.7 15.8 28.0 26.2 23.9 21.1 29.5 27.8 33.9 GOOD 95 99 51 48 22 22 53EF 24 24 25 47 23.8 24.6 26.3 23.1 17.6 17.2 41.6 26.3 21.8 23.2 26.7 FAIR 92 89 42 47 29G 39G 16 19 20 23 35 23.0 22.3 21.8 22.8 22.4 30.3 12.4 20.7 18.3 21.7 19.9 NET POOR 83 83 17 65C 30 26 23 19 27K 22 24 20.8 20.7 8.9 31.7 23.2 20.7 17.7 20.4 24.9 20.8 13.4 POOR 39 39 8 30C 12 15 11 8 13 13 12 9.8 9.6 4.2 14.7 9.6 11.7 8.8 8.8 11.8 12.0 7.0 VERY POOR 44 44 9 35C 17 11 11 11 14K 9 11 11.0 11.0 4.7 17.0 13.6 9.0 8.8 11.6 13.1 8.8 6.4 DON’T KNOW/REFUSED 28 28 14 14 11 7 6 10 6 7 11 7.0 7.0 7.3 6.7 8.8 5.5 4.4 11.4 5.5 6.5 6.1 MEAN 3.46 3.47 3.91D 3.07 3.40 3.42 3.66 3.41 3.45 3.53 3.79HIJ STD DEVIATION 1.31 1.31 1.12 1.35 1.41 1.28 1.22 1.31 1.40 1.30 1.20 STD ERROR 0.07 0.07 0.08 0.10 0.13 0.11 0.12 0.15 0.14 0.13 0.09 ——————————— (sig=.05) (all_pairs) columns tested CD, EFG, HIJK8.5 PRINT PHASE STATISTICAL TESTING
Print phase statistical testing is calculated from the numbers that are printed on the table. This means that this has several advantages. But it also has several drawbacks. The advantages:- Table processing time will be much faster.
- Tables can be loaded into table manipulation, altered, and still be tested.
- A different type of test can be performed on each row in the table, including a Kruskal-Wallis test on the COLUMN_MEAN.
- The columns being tested can be changed without rereading any data. The limitations:
- The columns must be independent or inclusive.
- The data cannot be weighted.
- Means created during the data reading phase with $[MEAN] will not be tested. Only Means created using the EDIT option COLUMN_MEAN can be tested.
- If testing errors are made, such as dependent or weighted columns are tested or inclusive tests are not marked as Inclusive, no error messages will be generated and possibly incorrect statistical markings will be printed on the table.
8.5.1 EDIT Options
To produce print phase significance testing, you need to use the EDIT options DO_PRINTER_STATISTICS and DO_STATISTICS_TESTS instead of the STATISTICS statement that is used for table building phase tests. The DO_PRINTER_STATISTICS option lets the program know you are going to be doing print phase testing and the DO_STATISTICS_TESTS specifies the columns to be tested. Like the STATISTICS statement, the DO_STATISTICS_TESTS option uses letters to designate which columns are being tested and a comma to separate multiple tests. Unlike the STATISTICS statement, all tests must be independent, (there is no I= option), however tests may be inclusive and must be marked as such by using T=. T values may be printed as with the STATISTICS statement, but the PRINTABLE_T option cannot be used to error check the tests. See 8.7 PRINTING THE ACTUAL T AND SIGNIFICANCE VALUES for more information about printing T values for print phase based tests and an example of using the T= option. To test the first three banner points against each other, the EDIT statement would be look something like: EDIT= EDIT1: DO_PRINTER_STATISTICS,DO_STATISTICS_TEST=ABC } The other EDIT options, such as DO_STATISTICS= to set the confidence level, and NEWMAN_KEULS_TEST to perform a Newman Keuls test, work the same way with the same defaults. NOTE: The following set of commands defines a standard front end for the next set of examples >PURGESAME >PRINT_FILE STAT5 ~INPUT DATA ~SET DROP_LOCAL_EDIT,BEGIN_TABLE_NAME=T501 ~DEFINE STUB= STUBTOP1: [BASE_ROW] TOTAL [SUPPRESS] NO ANSWER } TABLE_SET= { BAN1: EDIT=: COLUMN_WIDTH=7,STUB_WIDTH=30,-COLUMN_TNA,STATISTICS_DECIMALS=2, -PERCENT_SIGN,RUNNING_LINES=1 } STUB_PREFACE= STUBTOP1 BANNER=: | SEX AGE ADVERTISING AWARENESS | <=========> <=================> <=========================> | TOTAL MALE FEMALE 18-30 31-50 51-70 BRND A BRND B BRND C BRND D | —– —- —— —– —– —– —— —— —— ——} COLUMN=: TOTAL WITH [5^1/2] WITH [6^1//3] WITH [7^1//4] } Example: TABLE_SET= { TAB501: LOCAL_EDIT=: DO_PRINTER_STATISTICS,DO_STATISTICS_TESTS=BC,DEF ALL_POSSIBLE_PAIRS_TEST DO_STATISTICS=.95 COLUMN_STATISTICS_VALUES=VALUES(,5,4,3,,2,1) COLUMN_MEAN,COLUMN_STD COLUMN_SE } HEADER=: TABLE WITH STATISTICAL TESTING DONE DURING THE PRINT PHASE } TITLE=: RATING OF SERVICE } TITLE_4=: BASE= TOTAL SAMPLE } STUB=: NET GOOD | VERY GOOD | GOOD FAIR NET POOR | POOR | VERY POOR DON’T KNOW/REFUSED [PRINT_ROW=MEAN,LINE=0] MEAN [PRINT_ROW=STD,LINE=0] STD DEV [PRINT_ROW=SE,LINE=0] STD ERR } ROW=: [11^4,5/5/4/3/1,2/2/1/X] STORE_TABLES=* } Here is the table that is printed: TABLE WITH STATISTICAL TESTING DONE ON THE PRINTED NUMBERS TABLE 501 RATING OF SERVICE BASE= TOTAL SAMPLE SEX AGE ADVERTISING AWARENESS <=========> <=================> <=========================> TOTAL MALE FEMALE 18-30 31-50 51-70 BRND A BRND B BRND C BRND D —– —- —— —– —– —– —— —— —— —— TOTAL 400 196 204 125 145 113 91 108 107 176 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 % % % % % % % % % % (B) (C) (D) (E) (F) NET GOOD 197 120C 77 57 63 74DE 42 55 54 105 49.2 61.2 37.7 45.6 43.4 65.5 46.2 50.9 50.5 59.7 VERY GOOD 102 70C 32 35 38 27 19 32 30 60 25.5 35.7 15.7 28.0 26.2 23.9 20.9 29.6 28.0 34.1 GOOD 95 50 45 22 25 47DE 23 23 24 45 23.8 25.5 22.1 17.6 17.2 41.6 25.3 21.3 22.4 25.6 FAIR 92 44 48 28F 44F 14 20 20 24 36 23.0 22.4 23.5 22.4 30.3 12.4 22.0 18.5 22.4 20.5 NET POOR 83 18 65B 29 30 20 19 27 22 24 20.8 9.2 31.9 23.2 20.7 17.7 20.9 25.0 20.6 13.6 POOR 39 9 30B 12 17 10 8 13 13 13 9.8 4.6 14.7 9.6 11.7 8.8 8.8 12.0 12.1 7.4 VERY POOR 44 9 35B 17 13 10 11 14 9 11 11.0 4.6 17.2 13.6 9.0 8.8 12.1 13.0 8.4 6.2 DON’T KNOW/REFUSED 28 14 14 11 8 5 10 6 7 11 7.0 7.1 6.9 8.8 5.5 4.4 11.0 5.6 6.5 6.2 MEAN 3.46 3.90C 3.05 3.40 3.42 3.66 3.38 3.45 3.53 3.79 STD DEV 1.31 1.12 1.35 1.41 1.28 1.22 1.32 1.40 1.29 1.21 STD ERR 0.07 0.08 0.10 0.13 0.11 0.12 0.15 0.14 0.13 0.09 ——————————— (sig=.05) (all_pairs) columns tested BC, DEF This table is very similar to Table 101 at the beginning of this chapter. The only difference is that columns G, H, I, and J are not tested in this table because they are not independent. Notice that the footnote has nothing in it to differentiate between tests done during the print phase or table building phase.8.5.2 Changing the Confidence Level and the Type of Test
Changing the default significance levels or type of test procedure is done in exactly the same way as with the table building phase tests. For example, if you wanted to test at the 90% confidence level using the N-K test procedure you would add the options DO_STATISTICS=.90 and NEWMAN_KEULS_TEST onto your EDIT statement. Bi-level testing and using the approximation formula are also specified the same way. See 8.1.3 through 8.1.7 for more information on the EDIT option DO_STATISTICS. See 8.4 SIGNIFICANCE TESTING ON WEIGHTED TABLES for more information about changing the type of test. TABLE_SET= { TAB502: LOCAL_EDIT=: DO_PRINTER_STATISTICS,DO_STATISTICS_TESTS=BC,DEF NEWMAN_KEULS_TEST,DO_STATISTICS=.90 COLUMN_STATISTICS_VALUES=VALUES(,5,4,3,,2,1) COLUMN_MEAN,COLUMN_STD,COLUMN_SE } HEADER=: TABLE WITH STATISTICAL TESTING DONE DURING THE PRINT PHASE CHANGING THE CONFIDENCE LEVEL AND USING THE NEWMAN-KEULS PROCEDURE } TITLE= TAB501 TITLE_4= TAB501 STUB= TAB501 ROW= TAB501 STORE_TABLES=* } The printed table would similar to Table 501 except for some of the statistical markings and the footnote. The footnote would be as follows: (sig=.10) (n_k) columns tested BC,DEF8.5.3 Changing the Type of Test by Row
When doing print phase statistical testing you can use the STUB option DO_STATISTICS to change the type of test being performed on that row or to exclude that row entirely from testing. This means that you can test some of the rows using the APP test, test other rows using the N-K test, and not test other rows. DO_STATISTICS can be set to any of ALL_POSSIBLE_PAIRS_TEST, NEWMAN_KEULS_TEST, ANOVA_SCAN, FISHER, or KRUSKAL_WALLIS_TEST. If -DO_STATISTICS is used, then that row will be excluded from the test. In the example below the NET GOOD, FAIR, and NET POOR are tested with the APP test as specified on the EDIT statement. The COLUMN_MEAN is tested using the Kruskall-Wallis test. The rest of the rows are not tested. TABLE_SET= { TAB503: LOCAL_EDIT=: DO_PRINTER_STATISTICS,DO_STATISTICS_TESTS=BC,DEF ALL_POSSIBLE_PAIRS_TEST,DO_STATISTICS=.95 COLUMN_STATISTICS_VALUES=VALUES(,5,4,3,,2,1) COLUMN_MEAN,COLUMN_STD,COLUMN_SE } HEADER=: TABLE WITH STATISTICAL TESTING DONE DURING THE PRINT PHASE USING THE KRUSKAL WALLIS TEST ON THE MEAN } TITLE= TAB501 TITLE_4= TAB501 TITLE_5=:\2NNETS AND FAIR MENTIONS ARE TESTED USING ALL PAIRS TEST MEAN IS TESTED USING KRUSKAL WALLIS TEST } STUB=: NET GOOD [-DO_STATISTICS] | VERY GOOD [-DO_STATISTICS] | GOOD FAIR NET POOR [-DO_STATISTICS] | POOR [-DO_STATISTICS] | VERY POOR [-DO_STATISTICS] DON’T KNOW/REFUSED [PRINT_ROW=MEAN,DO_STATISTICS=KRUSKAL_WALLIS_TEST,LINE=0] MEAN [PRINT_ROW=STD,LINE=0] STD DEV [PRINT_ROW=SE,LINE=0] STD ERR } ROW= TAB501 STORE_TABLES=* } Here is the table that is printed: TABLE WITH STATISTICAL TESTING DONE ON THE PRINTED NUMBERS USING THE KRUSKAL WALLIS TEST ON THE MEAN TABLE 503 RATING OF SERVICE BASE= TOTAL SAMPLE SEX AGE ADVERTISING AWARENESS <=========> <=================> <=========================> TOTAL MALE FEMALE 18-30 31-50 51-70 BRND A BRND B BRND C BRND D —– —- —— —– —– —– —— —— —— —— TOTAL 400 196 204 125 145 113 91 108 107 176 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 % % % % % % % % % % (B) (C) (D) (E) (F) NET GOOD 197 120C 77 57 63 74DE 42 55 54 105 49.2 61.2 37.7 45.6 43.4 65.5 46.2 50.9 50.5 59.7 VERY GOOD 102 70 32 35 38 27 19 32 30 60 25.5 35.7 15.7 28.0 26.2 23.9 20.9 29.6 28.0 34.1 GOOD 95 50 45 22 25 47 23 23 24 45 23.8 25.5 22.1 17.6 17.2 41.6 25.3 21.3 22.4 25.6 FAIR 92 44 48 28F 44F 14 20 20 24 36 23.0 22.4 23.5 22.4 30.3 12.4 22.0 18.5 22.4 20.5 NET POOR 83 18 65B 29 30 20 19 27 22 24 20.8 9.2 31.9 23.2 20.7 17.7 20.9 25.0 20.6 13.6 POOR 39 9 30 12 17 10 8 13 13 13 9.8 4.6 14.7 9.6 11.7 8.8 8.8 12.0 12.1 7.4 VERY POOR 44 9 35 17 13 10 11 14 9 11 11.0 4.6 17.2 13.6 9.0 8.8 12.1 13.0 8.4 6.2 DON’T KNOW/REFUSED 28 14 14 11 8 5 10 6 7 11 7.0 7.1 6.9 8.8 5.5 4.4 11.0 5.6 6.5 6.2 MEAN 3.46 3.90C 3.05 3.40 3.42 3.66 3.38 3.45 3.53 3.79 STD DEV 1.31 1.12 1.35 1.41 1.28 1.22 1.32 1.40 1.29 1.21 STD ERR 0.07 0.08 0.10 0.13 0.11 0.12 0.15 0.14 0.13 0.09 NETS AND FAIR MENTIONS ARE TESTED USING ALL PAIRS TEST MEAN IS TESTED USING KRUSKAL WALLIS TEST ——————————— (sig=.05) (all_pairs) (k_w) columns tested BC, DEF Notice that the standard footnote mentions that both the APP and Kruskall-Wallis tests were used, but does not say what rows were tested with which test. As in this example, you may want to use the TITLE_5 keyword to create a customized footnote.8.6 EXCLUDING ROWS/COLUMNS FROM SIGNIFICANCE TESTING
8.6.1 Testing Mean Rows Only
>PURGESAME >PRINT_FILE STAT6 ~INPUT DATA ~SET DROP_LOCAL_EDIT,BEGIN_TABLE_NAME=T601 ~DEFINE STUB= STUB_TOP_TOT: TOTAL [SUPPRESS] NO ANSWER } TABLE_SET= { BAN1: EDIT=: COLUMN_WIDTH=7,STUB_WIDTH=30, -COLUMN_TNA,STATISTICS_DECIMALS=2, -PERCENT_SIGN,RUNNING_LINES=1 } STATISTICS=: I=BC,I=DEF,GHIJ; BANNER=: | SEX AGE ADVERTISING AWARENESS | <=========> <=================> <=========================> | TOTAL MALE FEMALE 18-30 31-50 51-70 BRND A BRND B BRND C BRND D | ----- ---- ------ ----- ----- ----- ------ ------ ------ ------} COLUMN=: TOTAL WITH [5^1/2] WITH [6^1//3] WITH [7^1//4] }
EX: TABLE_SET= { TAB601: STUB_PREFACE= STUB_TOP_TOT SET MEAN_STATISTICS_ONLY LOCAL_EDIT=: DO_STATISTICS=.95 } HEADER=: TABLE WITH STATISTICAL TESTS PERFORMED ON MEANS ONLY } TITLE=: RATING OF SERVICE } TITLE_4=: BASE= TOTAL SAMPLE } STUB=: NET GOOD | VERY GOOD | GOOD FAIR NET POOR | POOR | VERY POOR DON'T KNOW/REFUSED [STATISTICS_ROW] MEAN [STATISTICS_ROW] STD DEVIATION [STATISTICS_ROW] STD ERROR } ROW=: [11^4,5/5/4/3/1,2/2/1/X] $[MEAN,STD,SE] [11] STORE_TABLES=* }Here is the table that is printed:
TABLE WITH STATISTICAL TESTS PERFORMED ON MEANS ONLY TABLE 601 RATING OF SERVICE BASE= TOTAL SAMPLE SEX AGE ADVERTISING AWARENESS <=========> <=================> <=========================> TOTAL MALE FEMALE 18-30 31-50 51-70 BRND A BRND B BRND C BRND D ----- ---- ------ ----- ----- ----- ------ ------ ------ ------ TOTAL 400 196 204 125 145 113 91 108 107 176 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 % % % % % % % % % % (B) (C) (D) (E) (F) (G) (H) (I) (J) NET GOOD 197 120 77 57 63 74 42 55 54 105 49.2 61.2 37.7 45.6 43.4 65.5 46.2 50.9 50.5 59.7 VERY GOOD 102 70 32 35 38 27 19 32 30 60 25.5 35.7 15.7 28.0 26.2 23.9 20.9 29.6 28.0 34.1 GOOD 95 50 45 22 25 47 23 23 24 45 23.8 25.5 22.1 17.6 17.2 41.6 25.3 21.3 22.4 25.6 FAIR 92 44 48 28 44 14 20 20 24 36 23.0 22.4 23.5 22.4 30.3 12.4 22.0 18.5 22.4 20.5 NET POOR 83 18 65 29 30 20 19 27 22 24 20.8 9.2 31.9 23.2 20.7 17.7 20.9 25.0 20.6 13.6 POOR 39 9 30 12 17 10 8 13 13 13 9.8 4.6 14.7 9.6 11.7 8.8 8.8 12.0 12.1 7.4 VERY POOR 44 9 35 17 13 10 11 14 9 11 11.0 4.6 17.2 13.6 9.0 8.8 12.1 13.0 8.4 6.2 DON'T KNOW/REFUSED 28 14 14 11 8 5 10 6 7 11 7.0 7.1 6.9 8.8 5.5 4.4 11.0 5.6 6.5 6.2 MEAN 3.46 3.90C 3.05 3.40 3.42 3.66 3.38 3.45 3.53 3.79GH STD DEVIATION 1.31 1.12 1.35 1.41 1.28 1.22 1.32 1.40 1.29 1.21 STD ERROR 0.07 0.08 0.10 0.13 0.11 0.12 0.15 0.14 0.13 0.09 -------------------------------- (sig=.05) (all_pairs) columns tested BC, DEF, GHIJCompare this table with Table 101 and notice that only the MEAN row is marked with any letter because the tests on all the other rows were suppressed. NOTE: There is no change in the footnote on the table, so a customized notation may want to be included somewhere on the table pointing out which rows were tested.
8.6.2 Excluding any Row from Statistical Testing
[21^(STATISTICS)4,5/5/4/3/(STATISTICS)1,2/2/1]This would cause only those categories marked with the STATISTICS keyword to be tested, while all other categories would not be tested. Using the $[DO_STATISTICS] keyword method the variable would look like:
[21^4,5] $[-DO_STATISTICS] [21^5/4/3] $[DO_STATISTICS] [21^1,2] & $[-DO_STATISTICS] [21^2/1]The default is that categories are tested so the net of 4 and 5 will be tested. All categories after $[-DO_STATISTICS] are not tested, while $[DO_STATISTICS] turns testing back on. If table printing phase statistical testing is being done, you can exclude a row from the test by using the STUB option -DO_STATISTICS. See 8.5.3 Changing the Type of Test by Row for more information. NOTE: The $[DO_STATISTICS] keyword should not be confused with either the EDIT statement option DO_STATISTICS or the STUB option DO_STATISTICS. The following example shows how to test only the top box, bottom box, and mean on a rating scale:
TABLE_SET= { TAB602: LOCAL_EDIT=: DO_STATISTICS=.95 } HEADER=: TABLE WITH STATISTICAL TESTS PERFORMED ON SELECTED ROWS ONLY } TITLE=: RATING OF SERVICE } TITLE_4=: BASE= TOTAL SAMPLE } TITLE_5=:\2N ONLY ROWS WITH (*) ARE TESTED } STUB=: NET GOOD (*) | VERY GOOD | GOOD FAIR NET POOR (*) | POOR | VERY POOR DON'T KNOW/REFUSED [STATISTICS_ROW] MEAN (*) [STATISTICS_ROW] STD DEVIATION [STATISTICS_ROW] STD ERROR } ROW=: [11^(STATISTICS)4,5/5/4/3/(STATISTICS)1,2/2/1/X] $[MEAN,STD,SE] [11] STORE_TABLES=* }Here is an alternate way to write the row variable:
ROW602A:[11^4,5] $[-DO_STATISTICS] [11^5/4/3] & $[DO_STATISTICS] [11^1,2] $[-DO_STATISTICS] [11^2/1/X] & $[DO_STATISTICS, MEAN,STD,SE] [11]Here is the table that is printed:
TABLE WITH STATISTICAL TESTS PERFORMED ON SELECTED ROWS ONLY TABLE 602 RATING OF SERVICE BASE= TOTAL SAMPLE SEX AGE ADVERTISING AWARENESS <=========> <=================> <=========================> TOTAL MALE FEMALE 18-30 31-50 51-70 BRND A BRND B BRND C BRND D ----- ---- ------ ----- ----- ----- ------ ------ ------ ------ TOTAL 400 196 204 125 145 113 91 108 107 176 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 % % % % % % % % % % (B) (C) (D) (E) (F) (G) (H) (I) (J) NET GOOD (*) 197 120C 77 57 63 74DE 42 55 54 105G 49.2 61.2 37.7 45.6 43.4 65.5 46.2 50.9 50.5 59.7 VERY GOOD 102 70 32 35 38 27 19 32 30 60 25.5 35.7 15.7 28.0 26.2 23.9 20.9 29.6 28.0 34.1 GOOD 95 50 45 22 25 47 23 23 24 45 23.8 25.5 22.1 17.6 17.2 41.6 25.3 21.3 22.4 25.6 FAIR 92 44 48 28 44 14 20 20 24 36 23.0 22.4 23.5 22.4 30.3 12.4 22.0 18.5 22.4 20.5 NET POOR (*) 83 18 65B 29 30 20 19 27J 22 24 20.8 9.2 31.9 23.2 20.7 17.7 20.9 25.0 20.6 13.6 POOR 39 9 30 12 17 10 8 13 13 13 9.8 4.6 14.7 9.6 11.7 8.8 8.8 12.0 12.1 7.4 VERY POOR 44 9 35 17 13 10 11 14 9 11 11.0 4.6 17.2 13.6 9.0 8.8 12.1 13.0 8.4 6.2 DON'T KNOW/REFUSED 28 14 14 11 8 5 10 6 7 11 7.0 7.1 6.9 8.8 5.5 4.4 11.0 5.6 6.5 6.2 MEAN (*) 3.46 3.90C 3.05 3.40 3.42 3.66 3.38 3.45 3.53 3.79GH STD DEVIATION 1.31 1.12 1.35 1.41 1.28 1.22 1.32 1.40 1.29 1.21 STD ERROR 0.07 0.08 0.10 0.13 0.11 0.12 0.15 0.14 0.13 0.09 ONLY ROWS WITH (*) ARE TESTED -------------------------------- (sig=.05) (all_pairs) columns tested BC, DEF, GHIJIf you compare this table with Table 601 note that only the NET GOOD, NET POOR, and MEAN row in this table have statistical markings. In addition, a TITLE_5 variable was defined to create a customized footnote.
8.6.3 Excluding Columns with Low Bases from Statistical Testing
TABLE_SET= { TAB603: STATISTICS=: I=BC,I=DEF,GHIJ; LOCAL_EDIT=: MINIMUM_BASE=100,DO_STATISTICS=.95 } HEADER=: USING MINIMUM BASE OPTION TO SUPPRESS A COLUMN WITH A LOW BASE } TITLE=: RATING OF SERVICE } TITLE_4=: BASE= TOTAL SAMPLE } STUB=: NET GOOD | VERY GOOD | GOOD FAIR NET POOR | POOR | VERY POOR DON'T KNOW/REFUSED [STATISTICS_ROW] MEAN [STATISTICS_ROW] STD DEVIATION [STATISTICS_ROW] STD ERROR } ROW=: [11^4,5/5/4/3/1,2/2/1/X] $[MEAN,STD,SE] [11] STORE_TABLES=* }Here is the table that is printed:
USING MINIMUM BASE OPTION TO SUPPRESS A COLUMN WITH A LOW BASE TABLE 603 RATING OF SERVICE BASE= TOTAL SAMPLE SEX AGE ADVERTISING AWARENESS <=========> <=================> <=========================> TOTAL MALE FEMALE 18-30 31-50 51-70 BRND A BRND B BRND C BRND D ----- ---- ------ ----- ----- ----- ------ ------ ------ ------ TOTAL 400 196 204 125 145 113 91 108 107 176 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 % % % % % % % % % % (B) (C) (D) (E) (F) (*) (H) (I) (J) NET GOOD 197 120C 77 57 63 74DE 55 54 105 49.2 61.2 37.7 45.6 43.4 65.5 50.9 50.5 59.7 VERY GOOD 102 70C 32 35 38 27 32 30 60 25.5 35.7 15.7 28.0 26.2 23.9 29.6 28.0 34.1 GOOD 95 50 45 22 25 47DE 23 24 45 23.8 25.5 22.1 17.6 17.2 41.6 21.3 22.4 25.6 FAIR 92 44 48 28F 44F 14 20 24 36 23.0 22.4 23.5 22.4 30.3 12.4 18.5 22.4 20.5 NET POOR 83 18 65B 29 30 20 27J 22 24 20.8 9.2 31.9 23.2 20.7 17.7 25.0 20.6 13.6 POOR 39 9 30B 12 17 10 13 13 13 9.8 4.6 14.7 9.6 11.7 8.8 12.0 12.1 7.4 VERY POOR 44 9 35B 17 13 10 14J 9 11 11.0 4.6 17.2 13.6 9.0 8.8 13.0 8.4 6.2 DON'T KNOW/REFUSED 28 14 14 11 8 5 6 7 11 7.0 7.1 6.9 8.8 5.5 4.4 5.6 6.5 6.2 MEAN 3.46 3.90C 3.05 3.40 3.42 3.66 3.45 3.53 3.79H STD DEVIATION 1.31 1.12 1.35 1.41 1.28 1.22 1.40 1.29 1.21 STD ERROR 0.07 0.08 0.10 0.13 0.11 0.12 0.14 0.13 0.09 -------------------------------- (sig=.05) (all_pairs) columns tested BC, DEF, GHIJ * - small baseNotice that the column BRND A is blank except for the base value. Also notice that the footnote includes a note that the asterisk denotes a small base. You can suppress only the statistical testing instead of the entire column by also using the EDIT option FLAG_MINIMUM_BASE. In the example below the only difference from Table 603 is this option.
TABLE_SET= { TAB604: HEADER=: USING MINIMUM BASE OPTION TO FLAG A COLUMN WITH A LOW BASE } LOCAL_EDIT=: MINIMUM_BASE=100,FLAG_MINIMUM_BASE,DO_STATISTICS=.95 } TITLE= TAB603 TITLE_4= TAB603 STUB= TAB603 ROW= TAB603 STORE_TABLES=* }Here is the table that is printed:
USING MINIMUM BASE OPTION TO FLAG A COLUMN WITH A LOW BASE TABLE 604 RATING OF SERVICE BASE= TOTAL SAMPLE SEX AGE ADVERTISING AWARENESS <=========> <=================> <=========================> TOTAL MALE FEMALE 18-30 31-50 51-70 BRND A BRND B BRND C BRND D ----- ---- ------ ----- ----- ----- ------ ------ ------ ------ TOTAL 400 196 204 125 145 113 91 108 107 176 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 % % % % % % % % % % (B) (C) (D) (E) (F) (*) (H) (I) (J) NET GOOD (*) 197 120C 77 57 63 74DE 42* 55 54 105G 49.2 61.2 37.7 45.6 43.4 65.5 46.2 50.9 50.5 59.7 VERY GOOD 102 70 32 35 38 27 19* 32 30 60 25.5 35.7 15.7 28.0 26.2 23.9 20.9 29.6 28.0 34.1 GOOD 95 50 45 22 25 47 23* 23 24 45 23.8 25.5 22.1 17.6 17.2 41.6 25.3 21.3 22.4 25.6 FAIR 92 44 48 28 44 14 20* 20 24 36 23.0 22.4 23.5 22.4 30.3 12.4 22.0 18.5 22.4 20.5 NET POOR (*) 83 18 65B 29 30 20 19* 27J 22 24 20.8 9.2 31.9 23.2 20.7 17.7 20.9 25.0 20.6 13.6 POOR 39 9 30 12 17 10 8* 13 13 13 9.8 4.6 14.7 9.6 11.7 8.8 8.8 12.0 12.1 7.4 VERY POOR 44 9 35 17 13 10 11* 14 9 11 11.0 4.6 17.2 13.6 9.0 8.8 12.1 13.0 8.4 6.2 DON'T KNOW/REFUSED 28 14 14 11 8 5 10* 6 7 11 7.0 7.1 6.9 8.8 5.5 4.4 11.0 5.6 6.5 6.2 MEAN (*) 3.46 3.90C 3.05 3.40 3.42 3.66 3.38* 3.45 3.53 3.79H STD DEVIATION 1.31 1.12 1.35 1.41 1.28 1.22 1.32* 1.40 1.29 1.21 STD ERROR 0.07 0.08 0.10 0.13 0.11 0.12 0.15* 0.14 0.13 0.09 -------------------------------- (sig=.05) (all_pairs) columns tested BC, DEF, GHIJ * - small base
8.7 PRINTING THE ACTUAL T AND SIGNIFICANCE VALUES
-
DO_T_TEST=*
-
DO_T_TEST=n
-
DO_T_TEST=-n
-
DO_T_TEST=PRINT_MEAN
>PURGESAME >PRINT_FILE STAT7 ~INPUT DATA ~SET DROP_LOCAL_EDIT,BEGIN_TABLE_NAME=T701 ~DEFINE STUB= STUB_TOP_TOT: TOTAL [SUPPRESS] NO ANSWER } TABLE_SET= { BAN1: EDIT=: COLUMN_WIDTH=7,STUB_WIDTH=30,-COLUMN_TNA,STATISTICS_DECIMALS=2, -PERCENT_SIGN } BANNER=: | SEX AGE ADVERTISING AWARENESS | <==========> <==================> <===============================> | TOTAL MALE FEMALE 18-30 31-50 51-70 BRND A BRND B BRND C BRND D | ----- ---- ------ ----- ----- ----- ------ ------ ------ ------} COLUMN=: TOTAL WITH [5^1/2] WITH [6^1//3] WITH [7^1//4] }
EX: TABLE_SET= { TAB701: STUB_PREFACE= STUB_TOP_TOT STATISTICS=: PRINTABLE_T T=AB,T=AC,T=AD,T=AE,T=AF,T=AG,T=AH,T=AI,T=AJ LOCAL_EDIT=: DO_STATISTICS=.95 } HEADER=: TABLE WITH T AND SIGNIFICANCE VALUES PRINTED ON THE TABLE } TITLE=: RATING OF SERVICE } TITLE_4=: BASE= TOTAL SAMPLE } STUB=: NET GOOD [DO_T_TEST=*,SKIP_LINES=0] T-VALUE [DO_SIG_T=*,SKIP_LINES=0] SIGNIFICANCE | VERY GOOD | GOOD FAIR NET POOR [DO_T_TEST=*,SKIP_LINES=0] T-VALUE [DO_SIG_T=*,SKIP_LINES=0] SIGNIFICANCE | POOR | VERY POOR DON'T KNOW/REFUSED [STATISTICS_ROW] MEAN [STATISTICS_ROW] STD DEVIATION [STATISTICS_ROW] STD ERROR [DO_T_TEST=9] T-VALUE [DO_SIG_T=9] SIGNIFICANCE OF T } ROW=: [11^4,5/5/4/3/1,2/2/1/X] $[MEAN,STD,SE] [11] STORE_TABLES=* }Here is the table that is printed:
TABLE WITH T AND SIGNIFICANCE VALUES PRINTED ON THE TABLE TABLE 701 RATING OF SERVICE BASE= TOTAL SAMPLE SEX AGE ADVERTISING AWARENESS <=========> <=================> <=========================> TOTAL MALE FEMALE 18-30 31-50 51-70 BRND A BRND B BRND C BRND D ----- ---- ------ ----- ----- ----- ------ ------ ------ ------ TOTAL 400 196 204 125 145 113 91 108 107 176 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 % % % % % % % % % % (A) (B) (C) (D) (E) (F) (G) (H) (I) (J) NET GOOD 197C 120A 77 57 63 74A 42 55 54 105A 49.2 61.2 37.7 45.6 43.4 65.5 46.2 50.9 50.5 59.7 T-VALUE -4.69 4.69 0.98 1.75 -4.07 0.67 -0.41 -0.29 -3.69 SIGNIFICANCE 0.00 0.00 0.33 0.08 0.00 0.51 0.69 0.77 0.00 VERY GOOD 102C 70A 32 35 38 27 19 32 30 60A 25.5 35.7 15.7 28.0 26.2 23.9 20.9 29.6 28.0 34.1 GOOD 95E 50 45 22 25 47A 23 23 24 45 23.8 25.5 22.1 17.6 17.2 41.6 25.3 21.3 22.4 25.6 FAIR 92F 44 48 28 44A 14 20 20 24 36 23.0 22.4 23.5 22.4 30.3 12.4 22.0 18.5 22.4 20.5 NET POOR 83BJ 18 65A 29 30 20 19 27 22 24 20.8 9.2 31.9 23.2 20.7 17.7 20.9 25.0 20.6 13.6 T-VALUE 5.58 -5.58 -0.81 0.02 0.94 -0.03 -1.27 0.06 3.11 SIGNIFICANCE 0.00 0.00 0.42 0.98 0.35 0.97 0.20 0.95 0.00 POOR 39B 9 30A 12 17 10 8 13 13 13 9.8 4.6 14.7 9.6 11.7 8.8 8.8 12.0 12.1 7.4 VERY POOR 44BJ 9 35A 17 13 10 11 14 9 11 11.0 4.6 17.2 13.6 9.0 8.8 12.1 13.0 8.4 6.2 DON'T KNOW/REFUSED 28 14 14 11 8 5 10 6 7 11 7.0 7.1 6.9 8.8 5.5 4.4 11.0 5.6 6.5 6.2 MEAN 3.46C 3.90A 3.05 3.40 3.42 3.66 3.38 3.45 3.53 3.79A STD DEVIATION 1.31 1.12 1.35 1.41 1.28 1.22 1.32 1.4 1.29 1.21 STD ERROR 0.07 0.08 0.10 0.13 0.11 0.12 0.15 0.14 0.13 0.09 T-VALUE -6.58 6.58 0.57 0.44 -1.84 0.62 0.10 -0.60 -4.38 SIGNIFICANCE OF T 0.00 0.00 0.57 0.67 0.06 0.54 0.91 0.55 0.00 --------------------------------- (sig=.05) (all_pairs) columns tested T= AB, T= AC, T= AD, T= AE, T= AF, T= AG, T= AH, T= AI, T= AJNotice that t values are positive when the first item in the cell is greater than the second item, and negative when the opposite is true. Also notice that any cell with a significance of 0.05 or less is either marked with the letter A (negative t value) or the Total column is marked with its letter (positive t value). Further notice that the t values for males and females are opposites of each other. This is because each is being tested inclusively against the total which is actually the same as testing them against each other. Basically the same table could be produced using the print phase tests. For more information on print phase tests see 8.5 PRINT PHASE STATISTICAL TESTING. Here is an example of printing the t values when doing print phase tests.
TABLE_SET= { TAB702: LOCAL_EDIT=: DO_PRINTER_STATISTICS,ALL_POSSIBLE_PAIRS_TEST,DO_STATISTICS=.95, DO_STATISTICS_TESTS=T=AB,T=AC,T=AD,T=AE,T=AF,T=AG,T=AH,T=AI,T=AJ COLUMN_STATISTICS_VALUES=VALUES(,5,4,3,,2,1)COLUMN_MEAN,COLUMN_STD,COLUMN_SE } HEADER=: TABLE WITH T AND SIGNIFICANCE VALUES PRINTED ON THE TABLE FOR TESTS PERFORMED ON THE NUMBERS ON THE PRINTED TABLE } TITLE=: RATING OF SERVICE } TITLE_4=: BASE= TOTAL SAMPLE } STUB=: NET GOOD [DO_T_TEST=*,SKIP_LINES=0] T-VALUE [DO_SIG_T=*,SKIP_LINES=0] SIGNIFICANCE | VERY GOOD | GOOD FAIR NET POOR [DO_T_TEST=*,SKIP_LINES=0] T-VALUE [DO_SIG_T=*,SKIP_LINES=0] SIGNIFICANCE | POOR | VERY POOR DON'T KNOW/REFUSED [PRINT_ROW=MEAN] MEAN [PRINT_ROW=STD] STANDARD DEVIATION [PRINT_ROW=SE] STANDARD ERROR [DO_T_TEST=PRINT_MEAN] T-VALUE [DO_SIG_T=PRINT_MEAN] SIGNIFICANCE OF T } ROW=: [11^4,5/5/4/3/1,2/2/1/X] STORE_TABLES=* }The printed table will look basically the same as Table 701.
8.8 SIGNIFICANCE TESTING ON ROWS (PREFERENCE TESTING)
8.8.1 Direct Comparison Testing
STATISTICS= ROWSTAT1: D=1,2The following statement would test row 1 versus row 2, row 3 versus row 4, and row 5 versus row 6.
STATISTICS= ROWSTAT2: D=1,2 D=3,4 D=5,6Row testing can be combined with column testing by specifying both the column and row tests on the same STATISTICS statement. The following statement would do column testing on columns B, C, and D, in addition to testing row 4 versus row 6.
STATISTICS= ROWSTAT3: BCD, D=4,6The DO_STATISTICS option on the EDIT statement is again used to set the confidence level. The same setting is used for both the row and column tests. As with column testing, a footnote will be printed to indicate which rows were tested and the significance level used. If the difference is significant, a lower case “s” will print under the second row tested. This is an example of a direct comparison of rows. NOTE: The following set of commands defines a standard front end for the next set of examples:
>PURGESAME >PRINT_FILE STAT8 ~INPUT DATA ~SET DROP_LOCAL_EDIT,BEGIN_TABLE_NAME=T801 ~DEFINE STUB= STUBTOP1: TOTAL [SUPPRESS] NO ANSWER } TABLE_SET= { BAN1: EDIT=: COLUMN_WIDTH=7,STUB_WIDTH=30,-COLUMN_TNA,STUB_PREFACE=STUBTOP1, STATISTICS_DECIMALS=2,-PERCENT_SIGN,DO_STATISTICS=1 } BANNER=: | SEX AGE | <=========> <=================> | TOTAL MALE FEMALE 18-30 31-50 51-70 | ----- ---- ------ ----- ----- -----} COLUMN=: TOTAL WITH [5^1/2] WITH [6^1//3] }And here is this example:
TABLE_SET= { TAB801: STATISTICS=: D=1,2 D=4,5 D=7,8 HEADER=: TABLE WITH DIRECT STATISTICAL TESTING OF ROWS AT THE 95% CONFIDENCE LEVEL} TITLE=: PREFERENCE OF PRODUCTS } TITLE_4=: BASE= TOTAL SAMPLE } STUB=: [COMMENT,UNDERLINE] FIRST TEST [STUB_INDENT=2] PREFER BRAND A [STUB_INDENT=2] PREFER BRAND B [STUB_INDENT=2] NO PREFERENCE A VS B [COMMENT,UNDERLINE] SECOND TEST [STUB_INDENT=2] PREFER BRAND C [STUB_INDENT=2] PREFER BRAND D [STUB_INDENT=2] NO PREFERENCE C VS D [COMMENT,UNDERLINE] THIRD TEST [STUB_INDENT=2] PREFER BRAND E [STUB_INDENT=2] PREFER BRAND F [STUB_INDENT=2] NO PREFERENCE E VS F } ROW=: [7,8,9^1/2/X] STORE_TABLES=* }Here is the table that is printed:
TABLE WITH DIRECT STATISTICAL TESTING OF ROWS AT THE 95% CONFIDENCE LEVEL TABLE 801 PREFERENCE OF PRODUCTS TITLE_4=: BASE= TOTAL SAMPLE SEX AGE <=========> <=================> TOTAL MALE FEMALE 18-30 31-50 51-70 ----- ---- ------ ----- ----- ----- TOTAL 500 251 249 140 223 101 100.0 100.0 100.0 100.0 100.0 100.0 % % % % % % FIRST TEST ---------- (a) PREFER BRAND A 236 111 125 88 98 30 47.2 44.2 50.2 62.9 43.9 29.7 (b) PREFER BRAND B 214 113 101 43 101 59 42.8 45.0 40.6 30.7 45.3 58.4 s s NO PREFERENCE A VS B 50 27 23 9 24 12 10.0 10.8 9.2 6.4 10.8 11.9 SECOND TEST ----------- (d) PREFER BRAND C 266 125 141 78 121 48 53.2 49.8 56.6 55.7 54.3 47.5 (e) PREFER BRAND D 190 108 82 49 84 43 38.0 43.0 32.9 35.0 37.7 42.6 s s s s NO PREFERENCE C VS D 44 18 26 13 18 10 8.8 7.2 10.4 9.3 8.1 9.9 THIRD TEST ---------- (g) PREFER BRAND E 248 132 116 87 93 44 49.6 52.6 46.6 62.1 41.7 43.6 (h) PREFER BRAND F 187 87 100 40 96 42 37.4 34.7 40.2 28.6 43.0 41.6 s s s NO PREFERENCE E VS F 65 32 33 13 34 15 13.0 12.7 13.3 9.3 15.2 14.9 --------------------------------- (sig=.05) (all_pairs) rows tested a/b, d/e, g/hNotice the “s” in the FEMALE column underneath the PREFER BRAND D row. This indicates that there is a significant difference between PREFER BRAND C and PREFER BRAND D for females. The blank under the MALE column in that row indicates that there is no significant difference for males. Also notice the additional lower case letter assigned to each row that was tested. This allows easy identification of which rows were tested against each other when compared to the footnote that prints at the bottom of the page.
8.8.2 Distributed Preference Testing
STATISTICS= ROWSTAT4: P=1,2In a distributed preference test, the SELECT_VALUE function is used to define the row variable. This ensures that the “No preference” response is evenly divided between the two categories (see 9.3.2 Functions for more information on the SELECT_VALUE function). A typical row definition might look like this:
ROW=: SELECT_VALUE([7^1/X],VALUES(1,.5)) WITH & SELECT_VALUE([7^2/X],VALUES(1,.5))This causes the X punch (“No preference”) to have a value of .5 for both categories, splitting it evenly between the two. As with the direct comparison, significant differences are marked with an “s” underneath the second row being tested. However, unlike the direct comparison test, small (not significant) differences are marked with a lower case “ns” and statistically equal rows are marked with a lower case “e”. The following example uses a distributed preference test to compare the same rows used in Table 801. Note the difference in the row variable definition.
TABLE_SET= { TAB802: STATISTICS=: P=1,2 P=3,4 P=5,6 HEADER=: TABLE WITH DISTRIBUTED PREFERENCE TESTING OF ROWS AT THE 95% CONFIDENCE LEVEL} TITLE=: PREFERENCE OF PRODUCTS } TITLE_4=: BASE= TOTAL SAMPLE } STUB=: [COMMENT,UNDERLINE] FIRST TEST [STUB_INDENT=2] PREFER BRAND A [STUB_INDENT=2] PREFER BRAND B [COMMENT,UNDERLINE] SECOND TEST [STUB_INDENT=2] PREFER BRAND C [STUB_INDENT=2] PREFER BRAND D [COMMENT,UNDERLINE] THIRD TEST [STUB_INDENT=2] PREFER BRAND E [STUB_INDENT=2] PREFER BRAND F } ROW=: SELECT_VALUE([7^1/X],VALUES(1,.5)) WITH & SELECT_VALUE([7^2/X],VALUES(1,.5)) WITH & SELECT_VALUE([8^1/X],VALUES(1,.5)) WITH & SELECT_VALUE([8^2/X],VALUES(1,.5)) WITH & SELECT_VALUE([9^1/X],VALUES(1,.5)) WITH & SELECT_VALUE([9^2/X],VALUES(1,.5)) STORE_TABLES=* }Here is the table that is printed:
TABLE WITH DISTRIBUTED PREFERENCE TESTING OF ROWS AT THE 95% CONFIDENCE LEVEL TABLE 802 PREFERENCE OF PRODUCTS BASE= TOTAL SAMPLE SEX AGE <=========> <=================> TOTAL MALE FEMALE 18-30 31-50 51-70 ----- ---- ------ ----- ----- ----- TOTAL 500 251 249 140 223 101 100.0 100.0 100.0 100.0 100.0 100.0 % % % % % % FIRST TEST ---------- (a) PREFER BRAND A 261 124 136 92 110 36 52.2 49.6 54.8 66.1 49.3 35.6 (b) PREFER BRAND B 239 126 112 48 113 65 47.8 50.4 45.2 33.9 50.7 64.4 e e ns s e s SECOND TEST ----------- (c) PREFER BRAND C 288 134 154 84 130 53 57.6 53.4 61.8 60.4 58.3 52.5 (d) PREFER BRAND D 212 117 95 56 93 48 42.4 46.6 38.2 39.6 41.7 47.5 s ns s s s e THIRD TEST ---------- (e) PREFER BRAND E 280 148 132 94 110 52 56.1 59.0 53.2 66.8 49.3 51.0 (f) PREFER BRAND F 220 103 116 46 113 50 43.9 41.0 46.8 33.2 50.7 49.0 s s ns s e e --------------------------------- (sig=.05) (all_pairs) rows tested a/b, c/d, e/fCompare this table with Table 801 and notice how the frequency and percentages have changed. The numbers in this table equal the sum of the numbers in Table 801 plus half of the numbers that were in the NO PREFERENCE row. Also notice that “s” appears in the same place, but that cells that were previously blank now contain either an “ns” or “e”, depending upon the difference of the two cells. One additional thing to notice is that the footnote for this table is exactly the same as the one from Table 801, so the only way to tell which test was done is by looking to see if there are any of the “ns” or “e” markings on the table.
8.9 CHI-SQUARE AND ANOVA TESTS
-
Separate definition of table regions for testing makes the specs more readable and easier to understand should maintenance be required in the future.
-
Region definitions ($R) are created to be banner (column) specific, but not stub (row) specific by always typing “1 to LAST” for the row part of the definition. This way, it is necessary to type the column parts of the region definitions for a banner only once since the same banner regions are tested each time a specific banner is used. Testing of row categories is then controlled on a question-by-question basis through use of [-COLUMN_STATISTICS_VALUES].
-
Separate EDIT statements invoked by LOCAL_EDIT commands control which statistical tests are to be performed on each question.
-
For banner 1, the -CHI_SQUARE_ANOVA_FORMAT option is used to print ANOVA and chi-square statistics in list form following their corresponding tables.
~INPUT DATACLN ~SET AUTOMATIC_TABLES,DROP_LOCAL_EDIT ~DEFINE EDIT={STATS_OFF: -TABLE_TESTS } ''Banner 1 definitions '' column row '' Stat tests must be labeled region region '' since they will appear in list form in banner in stub '' ------------------------------------ --------- ------- BAN1_REG1: [$T="STAT TEST FOR SERVICE TYPE " $R 2 TO 3 BY 1 TO LAST] BAN1_REG2: [$T="STAT TEST FOR NEW SERVICE " $R 4 TO 5 BY 1 TO LAST] BAN1_REG3: [$T="STAT TEST FOR TAX PREPARATION " $R 6 TO 7 BY 1 TO LAST] BAN1_REG4: [$T="STAT TEST FOR FREQUENCY OF USE " $R 8 TO 10 BY 1 TO LAST] BAN1_REG5: [$T="STAT TEST FOR SALES " $R 11 TO 14 BY 1 TO LAST] EDIT={ BAN1_EDIT: -COLUMN_TNA,PERCENT_DECIMALS=0, COLUMN_WIDTH=5,STUB_WIDTH=25, -CHI_SQUARE_ANOVA_FORMAT, ''puts stat tests in ''list following table TABLE_TESTS=BAN1_REG1,TABLE_TESTS=BAN1_REG2, TABLE_TESTS=BAN1_REG3,TABLE_TESTS=BAN1_REG4, TABLE_TESTS=BAN1_REG5 } BANNER={BAN1_BANNER: '' REG1 REG2 REG3 REG4 REG5 '' <-------> <-------> <-------> <--------------> <---------------------> | SERVICE NEW TAX FREQUENCY SALES | TYPE SERVICE PREP. OF USE ======================= | ========= ========== ========= =============== 500- 1- | SW NW ME- <500 <1 4.9 5+ | TOTAL NEW OLD AREA AREA YES NO LOW DIUM HIGH MIL. BIL. BIL. BIL. | ----- --- --- ---- ---- --- -- --- ---- ---- ---- ---- ---- ---- } BAN1_COL: TOTAL WITH & [199^1/2] WITH & ([217#S] AND [15^1,2]) WITH & ([199^1] AND (([217#S] AND & [15^N1,2]) OR [217#L])) WITH & [78^1/2] WITH & [147^1,2,6/3/4,5] WITH & [182.8#1-499999/500000-999999/1000000-4999999/5000000-99999999] ''End of banner 1 definitions ''Banner 2 definitions '' column column '' region region '' in banner in stub '' --------- --------- BAN2_REG1: [$R 2 TO 4 BY 1 TO LAST] BAN2_REG2: [$R 5 TO 8 BY 1 TO LAST] ''Since -CHI-SQUARE-ANOVA-FORMAT is not used in this edit ''statement, statistics tests for BAN2 will appear under their ''corresponding table regions. EDIT={BAN2_EDIT:-COLUMN_TNA,PERCENT_DECIMALS=0, COLUMN_WIDTH=5,STUB_WIDTH=25, TABLE_TESTS=BAN2_REG1,TABLE_TESTS=BAN2_REG2 } BANNER={BAN2_BANNER: '' REG1 REG2 '' <--------------> <--------------------> | FREQUENCY SALES | OF USE ====================== | ================ 500- 1- | ME- <500 <1 4.9 5+ | TOTAL LOW DIUM HIGH MIL. BIL. BIL. BIL. | ----- ---- ---- ---- ---- ---- ---- ---- } BAN2_COL: TOTAL WITH [147^1,2,6/3/4,5] WITH & [182.8#1-499999/500000-999999/1000000-4999999/5000000-99999999] ''End of banner 2 definitions ''Stub definitions ''Question 4 TITLE={Q4_TITLE: Q4. Please indicate on a scale from 1 to 4 how satisfied you are with your overall relationship with this company. } STUB={Q4_STUB: 4 - Very satisfied 3 - Satisfied 2 - Dissatisfied 1 - Very dissatisfied [-COLUMN_STATISTICS_VALUES] Don't know/not sure ''excluded from ''statistics } Q4_ROW: [163^4//1/Y] EDIT={Q4_EDIT: COLUMN_STATISTICS_VALUES=VALUES(4,3,2,1),MEAN,STD,ANOVA } ''Question 19 TITLE={Q19_TITLE: Q19. Does this company prepare taxes? } STUB={Q19_STUB: Yes No [-COLUMN_STATISTICS_VALUES] Don't know/not sure ''excluded from ''statistics } Q19_ROW: [78^1//3] EDIT={Q19_EDIT: CHI_SQUARE } EDIT={Q19SIG_EDIT: CHI_SQUARE,SHOW_SIGNIFICANCE_ONLY } >CREATE_DB TABLES >PRINT_FILE TABLES ~EXECUTE ''BAN1's EDIT statement causes statistics tests to be printed as ''lists after the corresponding tables. BANNER=ban1_banner,EDIT=ban1_edit,COLUMN=ban1_col ''Question 4 LOCAL_EDIT=q4_edit,TITLE=q4_title,STUB=q4_stub,ROW=q4_row ''Question 19 with statistics test LOCAL_EDIT=q19_edit,TITLE=q19_title,STUB=q19_stub,ROW=q19_row ''BAN2's EDIT statement allows statistics tests to be printed ''under their corresponding table regions (default). BANNER=ban2_banner,EDIT=ban2_edit,COLUMN=ban2_col ''Question 19 with statistics test LOCAL_EDIT=q19_edit,TITLE=q19_title,STUB=Q19_STUB,ROW=Q19_ROW ''Question 19 with statistics test showing significance only LOCAL_EDIT=q19sig_edit,TITLE=q19_title,STUB=q19_stub,ROW=q19_row ''Question 19 without statistics test LOCAL_EDIT=stats_off,TITLE=q19_title,STUB=q19_stub,ROW=q19_row RESET,PRINT_ALLHere are the tables that are printed:
TABLE 001 Q4. Please indicate on a scale from 1 to 4 how satisfied you are with your overall relationship with this company. SERVICE NEW TAX FREQUENCY SALES TYPE SERVICE PREP. OF USE =================== ======= ========= ======== ============= 500- 1- SW NW ME- <500 <1 4.9 5+ TOTAL NEW OLD AREA AREA YES NO LOW DIUM HIGH MIL. BIL. BIL. BIL. ----- --- --- ---- ---- --- -- --- ---- ---- ---- --- ---- ---- Total 151 96 55 12 84 64 79 56 30 41 37 18 49 25 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% N/A - - - - - - - - - - - - - - 4 - Very satisfied 31 22 9 - 22 12 19 9 7 9 12 3 7 3 21% 23% 16% 26% 19% 24% 16% 23% 22% 32% 17% 14% 12% 3 - Satisfied 68 39 29 6 33 34 30 25 16 17 12 12 22 12 45% 41% 53% 50% 39% 53% 38% 45% 53% 41% 32% 67% 45% 48% 2 - Dissatisfied 38 28 10 5 23 11 25 16 5 11 12 1 14 6 25% 29% 18% 42% 27% 17% 32% 29% 17% 27% 32% 6% 29% 24% 1 - Very dissatisfied 8 4 4 1 3 5 2 4 1 2 - 1 4 2 5% 4% 7% 8% 4% 8% 3% 7% 3% 5% 6% 8% 8% Don't know/not sure 6 3 3 - 3 2 3 2 1 2 1 1 2 2 4% 3% 5% 4% 3% 4% 4% 3% 5% 3% 6% 4% 8% Mean 2.8 2.8 2.8 2.4 2.9 2.9 2.9 2.7 3.0 2.8 3.0 3.0 2.7 2.7 Standard Deviation 0.8 0.8 0.8 0.7 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.7 0.8 0.8 STAT TEST FOR SERVICE TYPE anova = 0.02, df1,df2 = (1,143) prob = 0.8694 STAT TEST FOR NEW SERVICE anova = 3.83, df1,df2 = (1,91) prob = 0.0504 STAT TEST FOR TAX PREPARATION anova = 0.01, df1,df2 = (1,136) prob = 0.9204 STAT TEST FOR FREQUENCY OF USE anova = 1.10, df1,df2 = (2,119) prob = 0.3374 STAT TEST FOR SALES anova = 1.50, df1,df2 = (3,119) prob = 0.2178
TABLE 002 Q19. Does this company prepare taxes? SERVICE NEW TAX FREQUENCY SALES TYPE SERVICE PREP. OF USE =================== ======= ========= ======== ============= 500- 1- SW NW ME- <500 <1 4.9 5+ TOTAL NEW OLD AREA AREA YES NO LOW DIUM HIGH MIL. BIL. BIL. BIL. ----- --- --- ---- ---- --- -- --- ---- ---- ---- --- ---- ---- Total 151 96 55 12 84 64 79 56 30 41 37 18 49 25 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% N/A - - - - - - - - - - - - - - Yes 64 23 41 4 19 64 - 25 14 22 15 11 18 14 42% 24% 75% 33% 23% 100% 45% 47% 54% 41% 61% 37% 56% No 79 69 10 8 61 - 79 31 15 15 20 5 27 11 52% 72% 18% 67% 73% 100% 55% 50% 37% 54% 28% 55% 44% Don't know/not sure 8 4 4 - 4 - - - 1 4 2 2 4 - 5% 4% 7% 5% 3% 10% 5% 11% 8% STAT TEST FOR SERVICE TYPE chi_square = 38.51, d_f = 1, prob = 0.0000 STAT TEST FOR NEW SERVICE chi_square = 0.13, d_f = 1, prob E<5 STAT TEST FOR TAX PREPARATION chi_square = 138.98, d_f = 1, prob = 0.0000 STAT TEST FOR FREQUENCY OF USE chi_square = 2.00, d_f = 2, prob = 0.3694 STAT TEST FOR SALES chi_square = 4.93, d_f = 3, prob = 0.1764
TABLE 003 Q19. Does this company prepare taxes? FREQUENCY SALES OF USE ====================== ================ 500- 1- ME- <500 <1 4.9 5+ TOTAL LOW DIUM HIGH MIL. BIL. BIL. BIL. ----- --- ---- ---- ---- ---- ---- ---- Total 151 56 30 41 37 18 49 25 100% 100% 100% 100% 100% 100% 100% 100% N/A - - - - - - - - Yes 64 25 14 22 15 11 18 14 42% 45% 47% 54% 41% 61% 37% 56% No 79 31 15 15 20 5 27 11 52% 55% 50% 37% 54% 28% 55% 44% Don't know/not sure 8 - 1 4 2 2 4 - 5% 3% 10% 5% 11% 8% CHI-SQUARE: <-- 2.00 --> <-- 4.93 --> D.F.: 2 3 SIG: 0.3694 0.1764
TABLE 004 Q19. Does this company prepare taxes? FREQUENCY SALES OF USE ====================== ================ 500- 1- ME- <500 <1 4.9 5+ TOTAL LOW DIUM HIGH MIL. BIL. BIL. BIL. ----- --- ---- ---- ---- ---- ---- ---- Total 151 56 30 41 37 18 49 25 100% 100% 100% 100% 100% 100% 100% 100% N/A - - - - - - - - Yes 64 25 14 22 15 11 18 14 42% 45% 47% 54% 41% 61% 37% 56% No 79 31 15 15 20 5 27 11 52% 55% 50% 37% 54% 28% 55% 44% Don't know/not sure 8 - 1 4 2 2 4 - 5% 3% 10% 5% 11% 8% CHI-SQUARE (SIG): <-- 0.3694 --> <-- 0.1764 -->
TABLE 005 Q19. Does this company prepare taxes? FREQUENCY SALES OF USE ====================== ================ 500- 1- ME- <500 <1 4.9 5+ TOTAL LOW DIUM HIGH MIL. BIL. BIL. BIL. ----- --- ---- ---- ---- ---- ---- ---- Total 151 56 30 41 37 18 49 25 100% 100% 100% 100% 100% 100% 100% 1 00% N/A - - - - - - - - Yes 64 25 14 22 15 11 18 14 42% 45% 47% 54% 41% 61% 37% 56% No 79 31 15 15 20 5 27 11 52% 55% 50% 37% 54% 28% 55% 44% Don't know/not sure 8 - 1 4 2 2 4 - 5% 3% 10% 5% 11% 8%DISCUSSION OF OUTPUT While interpretation of these statistical tests for specific reports is beyond the scope of this manual, some explanation of the results might he helpful. For table 001, output from the ANOVA was as follows:
STAT TEST FOR SERVICE TYPE anova = 0.02, df1,df2 = (1,143) prob = 0.8694 STAT TEST FOR NEW SERVICE anova = 3.83, df1,df2 = (1,91) prob = 0.0504 STAT TEST FOR TAX PREPARATION anova = 0.01, df1,df2 = (1,136) prob = 0.9204 STAT TEST FOR FREQUENCY OF USE anova = 1.10, df1,df2 = (2,119) prob = 0.3374 STAT TEST FOR SALES anova = 1.50, df1,df2 = (3,119) prob = 0.2178The number following “ANOVA=” is the value of the F statistic for the region tested. “df1,df2” are the degrees of freedom for the F statistic’s numerator and denominator respectively. “prob=” is the probability of there not being a more than coincidental relationship between the factors tested. To check whether the desired row and banner categories are in fact being used in the ANOVA calculation, the following equations can be used: df1 = (the number of banner points included in the ANOVA) – 1 df2 = (the sum of frequencies of all cells included in the ANOVA) – (the number of banner points included in the ANOVA) If a column is blank, it is not considered as being included in the test. Checking the “STAT TEST FOR SALES” ANOVA above, there are four banner points included in the test, resulting in df1 = 4 – 1 = 3 Adding together all cells included in this test (“DON”T KNOW/NOT SURE” is excluded), ”<500 MIL”=36, “500-<1 BIL.”=17, “1-4.9 BIL.”=47, and “5+ BIL.”=23. The number of banner points is four, resulting in df2 = (36 + 17 + 47 + 23) – 4 = 119. For table 002, the chi-square test produced the following output:
STAT TEST FOR SERVICE TYPE chi_square = 38.51, d_f = 1, prob = 0.0000 STAT TEST FOR NEW SERVICE chi_square = 0.13, d_f = 1, prob E<5 STAT TEST FOR TAX PREPARATION chi_square = 138.98, d_f = 1, prob = 0.0000 STAT TEST FOR FREQUENCY OF USE chi_square = 2.00, d_f = 2, prob = 0.3694 STAT TEST FOR SALES chi_square = 4.93, d_f = 3, prob = 0.1764Similar to the ANOVA output, the number following “chi_square=” is the value of the chi-square statistic for the region tested. “d_f” represents the degrees of freedom for this statistic. The same facts apply to “prob=” as did for ANOVA. “E<5” means that the expected value of the frequency for 25% or more of the cells in the tested region is less than 5, possibly making probability calculations for the region invalid. Checking that the desired categories are being tested is easier for the chi-square test is easier than for the ANOVA. For degrees of freedom, the following equation applies: d_f = (number of stub points included in the test – 1) * (number of banner points included in the test – 1). If a row or column is blank, it is not considered as being included in the test. Using “STAT TEST FOR SALES” as an example, “number of stub points included in the test”=2 since “DON’T KNOW/NOT SURE” is excluded, and “number of banner points included in the test”=4. Putting these numbers in the equation: d_f = (2 – 1) * (4 – 1) = 1 * 3 = 3. OTHER ANOVA AND CHI-SQUARE OPTIONS It is possible to mix ANOVA and chi-square tests on the same table. This is true only when the -CHI_SQUARE_ANOVA_FORMAT option is used to output the results in list format. Here is an example of an EDIT statement that would do this. The regions have been defined as in the above example.
EDIT={Q47_EDIT2: CHI_SQUARE=BAN1_REG1 CHI_SQUARE=BAN1_REG2 CHI_SQUARE=BAN1_REG3 CHI_SQUARE=BAN1_REG4 ANOVA=BAN1_REG5 }NOTE: An alternative way of defining regions and invoking statistical tests is as follows:
EDIT={EDIT1: TABLE_TESTS=[$R 1 TO 2 BY 1 TO LAST] TABLE_TESTS=[$R 3 TO 4 BY 1 TO LAST] TABLE_TESTS=[$R 5 TO 6 BY 1 TO LAST] ANOVA }If this method is used, Mentor will not allow both ANOVA and chi-square on the same table. Two other interesting possibilities are testing overlapping ranges and testing columns that are not next to each other. Again, -CHI_SQUARE_ANOVA_FORMAT must be used. Also, doing this might require some creative labeling of the horizontal axis. Example specs follow. Here are some example specs:
BAN3_REG1: [$T="GENDER" $R 2 TO 3 BY 1 TO LAST] BAN3_REG2: [$T="DRIVE CAR" $R 4 TO 5 BY 1 TO LAST] BAN3_REG3: [$T="1-10, 11-49, & 50+ YRS OF DRIVING" $R 6,7,9 BY 1 TO LAST] BAN3_REG4: [$T="<50 & 50+ YRS OF DRIVING" $R 8,9 BY 1 TO LAST] ''BAN3_REG3 defines columns which are not next to each other. ''BAN3_REG4 defines a region that overlaps BAN3_REG3. EDIT={BAN3_EDIT: -COLUMN_TNA,PERCENT_DECIMALS=0 COLUMN_WIDTH=7,STUB_WIDTH=25 -CHI_SQUARE_ANOVA_FORMAT TABLE_TESTS=BAN1_REG1 TABLE_TESTS=BAN1_REG2 TABLE_TESTS=BAN1_REG3 TABLE_TESTS=BAN1_REG4 } BANNER={BAN3_BANNER: | GENDER DRIVE CAR YEARS OF DRIVING | =========== ========== ========================= | TOTAL MALE FEMALE YES NO 1-10 11-49 <50 50+ | ----- ---- ------ --- -- ---- ----- --- --- } BAN3_COL: TOTAL WITH & [5^1/2] WITH & [10^1/2] WITH & [30.2*P#1-10/11-49/1-49/50-99] EDIT={Q49_EDIT: ''Used as a LOCAL_EDIT to invoke tests for a ''specific question. CHI_SQUARE=BAN3_REG1 CHI_SQUARE=BAN3_REG2 CHI_SQUARE=BAN3_REG3 CHI_SQUARE=BAN3_REG4 }
8.10 NOTES ON SIGNIFICANCE TESTING
8.10.1 What Can and Cannot Be Tested
-
*L modifier
-
all summary statistics other than means
-
any arithmetic operation
-
sigma
-
sums
-
NUMBER_OF_ITEMS or any other number returning function
8.10.2 Degrees of Freedom
8.10.3 Verifying Statistical Tests
test 1 (216 len, 2 groups, err=0, base_row=0):I=AB row/col=(15,15; 1,2) means ncases=28 group 1 12, 76, 564, 0, group 2 16, 106, 778, 0, effn,mean,std: 12 6.33333 2.74138 -- sumsq,sumsqadj,effn: 82.6667 1 12 effn,mean,std: 16 6.625 2.24722 -- sumsq,sumsqadj,effn: 75.75 1 16 tags: 1, 2, getpoolv: 6.09295,2:26=158.417/26 doqs (6.09295,26) 0-1 tags: 1, 2, -->0: (-0.437582,26) SIGFAREA(26,-0.437582->0.05 for 1) returns 0.7572 from -0.309417 -> 0,0.7572 differences[ in AB: ]This information can be used in numerous ways to check the statistical test performed. The first line tells us this is a test of two groups, the base row for statistics is the System Total row (base_row=0), it is an independent test of columns A and B (:I=AB), it is using row 15 and columns 1 and 2, and it is a test of means. The number of cases in the statistics base is 28 (or 16 + 12). The line with –>0: (-0.437582,26) shows 0:(q-value, df). In the next line down, the “returns 0.7572 from -0.309417 → 0,0.7572” means “returns <significance> from <t-value>”. A printout like the one shown above would be generated for each pair of columns tested. If our test was a dependent test of columns A, B, C, D, E, and F for all percent rows and the mean row, then a portion of the list file pertaining to this statistical test would look similar to this:
test 1 (1232 len, 6 groups, err=0, base_row=0):ABCDEF row/col=(3,3; 1,6) percents ncases=166 group 1 12, 4, 4, 0, group 2 16, 6, 6, 0, group 3 14, 2, 2, 0, group 4 27, 13, 13, 0, group 5 69, 9, 9, 0, group 6 28, 9, 9, 0, sxy matrix all zero effn,mean,std: 12 0.333333 0.492366 -- sumsq,sumsqadj,effn: 2.66667 1 12 effn,mean,std: 16 0.375 0.5 -- sumsq,sumsqadj,effn: 3.75 1 16 effn,mean,std: 14 0.142857 0.363137 -- sumsq,sumsqadj,effn: 1.71429 1 14 effn,mean,std: 27 0.481481 0.509175 -- sumsq,sumsqadj,effn: 6.74074 1 27 effn,mean,std: 69 0.130435 0.339248 -- sumsq,sumsqadj,effn: 7.82609 1 69 effn,mean,std: 28 0.321429 0.475595 -- sumsq,sumsqadj,effn: 6.10714 1 28 tags: 1, 2, 3, 4, 5, 6, multiple comparisons getpoolv: 0.1931,2:165=43/166 doq (0.1931,165) 0-5 tags: 1, 2, 3, 4, 5, 6, -->0: (-0.351143,26) (1.55823,24) (-1.37423,37)(2.08774,79) (0.111041,38) -->1: (2.04147,28) (-1.08619,41) (2.83657,83) (0.550136,42) -->2: (-3.309,39) (0.136389,81) (-1.75572,40) -->3: (4.97691,94) (1.90971,53) -->4: (-2.74322,95) doqs (0.1931,165) 0-5 tags: 1, 2, 3, 4, 5, 6, -->0: (-0.316228,27) (1.59364,25) (-1.2021,38) (2.48386,80) (0.102869,39) -->1: (1.99451,29) (-0.94989,42) (3.25042,84) (0.50417,43) -->2: (-2.98179,40) (0.175692,82) (-1.73374,41) -->3: (5.17632,95) (1.69734,54) -->4: (-3.08478,96) differences[ in ABCDEF: D vs E; ] differences[ D vs E; ]The first line tells us this is a test of six groups (ABCDEF), the base row is the System Total row (base_row=0), it is a dependent test of columns A, B, C, D, E, and F. The test is using row 3 and columns 1 to 6, and it is a test of percents. The line beginning with “doqs (0.1931,165)” shows:
(q-value,df) for -->0: (A vs B) (A vs C) (A vs D) (A vs E) (A vs F) -->1: (B vs C) (B vs D) (B vs E) (B vs F) -->2: (C vs D) (C vs E) (C vs F) -->3: (D vs E) (D vs F) -->4: (E vs F)A printout like this would be available for each row tested.
8.10.4 Error and Warning Messages
8.10.5 Commands Summary
STATEMENT/KEYWORD/OPTION SECTION AXIS ---- $[BASE] 8.2.2 and 8.4 $[EFFECTIVE_N] 8.4 $[DO_STATISTICS] 8.6.2 $[MEDIAN] 7.2.2 $[INTERPOLATED_MEDIAN] 7.2.2 [col.wid^code/(STATISTICS)/code/code] 7.2 EDIT= ----- ALL_POSSIBLE_PAIRS_TEST 8.3.1 ANOVA_SCAN 8.3.3 DO_PRINTER_STATISTICS 8.5.1 DO_STATISTICS 8.1.3, 8.1.5, 8.1.6, and 8.1.7 DO_STATISTICS_TEST 8.5.1 FISHER 8.3.3 FLAG_MINIMUM_BASE 8.6.3 MARK_CHI_SQUARE 8.9 MINIMUM_BASE 8.6.3 NEWMAN_KEULS_TEST 8.3.2 PAIRED_VARIANCE 8.3.4 POOLED_VARIANCE 8.3.4 SEPARATE_VARIANCE 8.3.4 USUAL_VARIANCE 8.3.4 STATISTICS= ----------- ABCD 8.1.1 I=ABCD 8.1.1 T=ABCD 8.1.1 and 8.1.8 PRINTABLE_T 8.1.1 and 8.7.1 D=1,2 8.8.1 P=1,2 8.8.2 RM=ABCD 8.3.3
STATEMENT/KEYWORD/OPTION SECTION STUB= ----- BASE_ROW 8.2.1 DO_SIG_T= 8.7.1 DO_STATISTICS= 8.5.3 DO_T_TEST= 8.7.1 SET OPTIONS ----------- MEAN_STATISTICS_ONLY 8.6.1 MISSING_OR_DUPLICATED_BASE MULTIPLE_WEIGHT_STATISTICS 8.4.1 STATISTICS_BASE_AR 8.2.1 STATISTICS_DUMP 8.9.2