[toc]
INTRODUCTION
This chapter explains how to produce tables other than straightforward simple cross-tabulations. When producing more complex tables, the biggest problem is usually understanding how the table is constructed rather than using the proper syntax to construct the table. This chapter is designed to help you understand both.
This chapter is broken into major sections by the type of table you are trying to produce. Each section has a set of sequentially numbered tables starting from n01, where n is the sub-section number. The beginning of each section contains a description of when it is usually appropriate to use such a table construction. Each section is then broken down further into a particular type of table design. In each sub-section there is a description of the different ways to produce that table, and an example of how to produce it.
This chapter assumes a basic understanding of simple table building and of CfMC terminology. There are many references to things such as the EDIT statement and joiners like WITH; if you do not know what they are you may have difficulty following along. A review of Chapters 1 through 5 may be necessary.
6.1 TOP BOX/BOTTOM BOX SUMMARY TABLES
A Top Box table is one in which the highest rating for one of a series of brands or attributes is compared against the highest ratings of the others. The Top Box usually contains those respondents who have given the highest rating to that particular brand, but sometimes also contains those who have given the second and/or third highest rating as well. These tables may have a percentage base of either the total sample, or of the number of respondents that were asked about each brand if this might be different for each. To percentage each brand off the number responding to that brand you will need to define the percentage base along with the Top Box for each brand. In addition you will need to make sure that you use the stub options [VERTICAL_PERCENT=*,SUPPRESS] on each base row so that it does not print (SUPPRESS) and the next row is percentaged off of it (VERTICAL_PERCENT=*).
You may also need to produce a Bottom Box table which is similar in design to the Top Box except it contains those respondents who have given the lowest rating to each brand.
6.1.1 Top Box Tables with Constant Percentage Base
We’ll start with a Top Box table with a constant percentage base such as total or some sub-base which is applied to the entire table. For purposes of the example below, the respondent’s gender is stored in column 5 while columns 7 through 11 hold the rating for five different brands (A-E), where the rating scale is from 1 to 4, 4 being the highest possible rating.
There are usually several ways to write any variable definition. In the examples below the shortest and most computer efficient usage is included in the tabset, while other perfectly reasonable definitions are also noted after the tabset. Note that any of these definitions (and many others also) will produce the same table and although we recommend the one inside the tabset, you can use any that is easy for you to understand.
NOTE: The following set of commands define a standard front end for the all the examples in this section, except where noted.
>PRINT_FILE TOPBX
~INPUT TOPBX
>CREATEDB TOPBX,DUPLICATE=WARN
~SET AUTOMATIC_TABLES,DROP_LOCAL_EDIT, DROP_BASE,BEGIN_TABLE_NAME=T101
~DEFINE
TABLE_SET= {BAN1:
EDIT=: COLUMN_WIDTH=8, STUB_WIDTH=20, -COLUMN_TNA }
STUB_PREFACE=:
TOTAL
[SUPPRESS] NO ANSWER }
BANNER=:
| GENDER
| <———->
| TOTAL MALE FEMALE
| —– —- ——}
COLUMN=: TOTAL WITH [5#1/2]
}
~EXECUTE
TABLE_SET= BAN1
These commands are specific to this example:
~DEFINE
TABLE_SET= {TAB101:
HEADER=: TOP BOX TABLE EXAMPLE}
TITLE=: TOP BOX SUMMARY TABLE FOR OVERALL RATING}
STUB=:
BRAND A-TOP BOX
BRAND B-TOP BOX
BRAND C-TOP BOX
BRAND D-TOP BOX
BRAND E-TOP BOX }
ROW=: [07,…,11#4]
}
Here are some alternate ways to write the row variable:
ROW101A: [7#4] WITH [8#4] WITH [9#4] WITH [10^4] WITH [11#4]
ROW101B: &
>REPEAT $A=07,…,11;STRIP=”WITH &”
[$A#4] WITH &
>END_REPEAT
~EXECUTE
TABLE_SET= TAB101
Here is the table Mentor prints:
TOP BOX TABLE EXAMPLE
TABLE 101
TOP BOX SUMMARY TABLE FOR OVERALL RATING
GENDER
<———–>
TOTAL MALE FEMALE
—– —- ——
TOTAL 100 52 48
100.0% 100.0% 100.0%
BRAND A-TOP BOX 21 9 12
21.0% 17.3% 25.0%
BRAND B-TOP BOX 27 14 13
27.0% 26.9% 27.1%
BRAND C-TOP BOX 30 15 15
30.0% 28.8% 31.3%
BRAND D-TOP BOX 19 11 8
19.0% 21.2% 16.7%
BRAND E-TOP BOX 25 13 12
25.0% 25.0% 25.0%
6.1.2 Top Box Tables with a Changing Percentage Base
When producing a Top Box table you may want to percentage each row off a different base. Suppose each rating scale was only asked of those who have used that brand; this base value will probably be different for each brand. Further suppose that the original rating scale tables were percentaged back to this base and for purposes of this summary table you want to maintain those same percentages. To do this, you will need to not only define each Top Box, but also its percentage base.
The syntax to produce this table is very similar to producing the Top Box table with constant percentage (See 6.1.1 Top Box Tables with Constant Percentage Base), so make sure you understand that before proceeding. One very important difference is that you will need to create a new label set which controls the printing and percentage base in the table. Each base row label will want to have the [SUPPRESS] option to suppress its printing and the [VERTICAL_PERCENT=*] option to cause the next row to be percentaged off of it.
The following example assumes the same scenario as the previous example. The only valid response to one of the rating scale questions is 1,2,3,4 or 9. If you compare the frequencies in Table 102 with Table 101 you will see that they are the same, but the percentages are generally much higher in Table 102.
~DEFINE
TABLE_SET= {TAB102:
HEADER=: TOP BOX TABLE WITH DIFFERENT PERCENTAGE BASE ON EACH ROW}
TITLE=: TOP BOX SUMMARY TABLE FOR OVERALL RATING}
TITLE_4=: BASE= RESPONDENTS WHO USED THE BRAND}
STUB=:
[VERTICAL_PERCENT=*,SUPPRESS] BRAND A BASE-WON’T PRINT
BRAND A TOP BOX
[VERTICAL_PERCENT=*,SUPPRESS] BRAND B BASE-WON’T PRINT BRAND B TOP BOX
[VERTICAL_PERCENT=*,SUPPRESS] BRAND C BASE-WON’T PRINT BRAND C TOP BOX
[VERTICAL_PERCENT=*,SUPPRESS] BRAND D BASE-WON’T PRINT
BRAND D TOP BOX
[VERTICAL_PERCENT=*,SUPPRESS] BRAND E BASE-WON’T PRINT BRAND E TOP BOX }
ROW=: [07,…,11#1-4,X/4]
}
Here is an alternate way to write the row variable:
ROW102A: [7#1-4,X/4] WITH [8#1-4,X/4] WITH [9#1-4,X/4] WITH &
[10#1-4,X/4] WITH [11#1-4,X/4]
~EXECUTE
TABLE_SET= TAB102
Here is the table that Mentor prints:
TOP BOX TABLE WITH DIFFERENT PERCENTAGE BASE ON EACH ROW
TABLE 102
TOP BOX SUMMARY TABLE FOR OVERALL RATING
BASE= RESPONDENTS WHO USED THE BRAND
GENDER
<———->
TOTAL MALE FEMALE
—– —- ——
TOTAL 100 52 48
100.0% 100.0% 100.0%
BRAND A TOP BOX 21 9 12
22.3% 18.0% 27.3%
BRAND B TOP BOX 27 14 13
40.3% 35.9% 46.4%
BRAND C TOP BOX 30 15 15
34.5% 34.9% 34.1%
BRAND D TOP BOX 19 11 8
43.2% 42.3% 44.4%
BRAND E TOP BOX 25 13 12
28.4% 29.5% 27.3%
NOTE: When the percentage base is changing, you may want to suppress the Total row, since it is not the percentage base for the rows under it and this might be confusing for anyone reading the table. To suppress the Total row you can use a different STUB_PREFACE that looks like the one below.
STUB_PREFACE=: [SUPPRESS] TOTAL
[SUPPRESS] NO ANSWER }
If you wish to do significance testing on a top box table with a changing percentage base, you need to create a $[base] row for every tox box in the table. See section 8.2.4 for an example of how to do this.
6.1.3 Ranking of Top Box Tables
Top box tables which are percentaged off a constant base, can be ranked by just using the RANK option on a LOCAL_EDIT statement. It is much more difficult to rank a Top Box with a changing percentage base. Since you most likely want to rank this table off of the percentages, that percentage row must be created in the table.
As with the Top Box table with a changing percentage base (See 6.1.2 Top Box Tables with a Changing Percentage Base), extra rows and stubs must be created in order to produce the desired result. In this instance, three rows will be created for each row printed in the table. The percentage needs to be created first, followed by the percentage base, followed by the frequency.
To create the percentage you must realize that a percentage is the same as a mean with a value of 100 assigned to those who are in the topbox and a value of 0 assigned to those who are in the base, but not in the topbox. This first row is set to rank level 1 while the 2 subsequent rows are set to rank level 2 so that however the percentage row is ranked the 2 other rows will remain below it. See 6.7.2 Ranking With Nets And Sub-Nets for more information about setting rank levels.
~DEFINE
TABLE_SET= {TAB103:
LOCAL_EDIT=: RANK_LEVEL=1, STUB_RANK_INDENT=0, RANK_COLUMN_BASE=1
}
HEAD=: RANKED TOP BOX TABLE WITH DIFFERENT PERCENTAGE BASE ON EACH ROW}
TITLE=: TOP BOX SUMMARY TABLE FOR OVERALL RATING}
TITLE_4=: BASE= RESPONDENTS WHO USED THE BRAND}
STUB=:
[SUPPRESS,RANK=1] BRAND A TOP BOX PERCENTAGE (WON’T PRINT)
[VERTICAL_PERCENT=*,SUPPRESS,RANK=2] BRAND A BASE (WON’T PRINT)
[RANK=2] BRAND A TOP BOX
[SUPPRESS,RANK=1] BRAND B TOP BOX PERCENTAGE (WON’T PRINT)
[VERTICAL_PERCENT=*,SUPPRESS,RANK=2] BRAND B BASE (WON’T PRINT)
[RANK=2] BRAND B TOP BOX
[SUPPRESS,RANK=1] BRAND C TOP BOX PERCENTAGE (WON’T PRINT)
[VERTICAL_PERCENT=*,SUPPRESS,RANK=2] BRAND C BASE (WON’T PRINT)
[RANK=2] BRAND C TOP BOX
[SUPPRESS,RANK=1] BRAND D TOP BOX PERCENTAGE (WON’T PRINT)
[VERTICAL_PERCENT=*,SUPPRESS,RANK=2] BRAND D BASE (WON’T PRINT)
[RANK=2] BRAND D TOP BOX
[SUPPRESS,RANK=1] BRAND E TOP BOX PERCENTAGE (WON’T PRINT)
[VERTICAL_PERCENT=*,SUPPRESS,RANK=2] BRAND E BASE (WON’T PRINT)
[RANK=2] BRAND E TOP BOX
}
ROW=: &
>REPEAT $A=07,…,11; STRIP=”&”
$[MEAN] SELECT_VALUE([$A#4/1-3,9],VALUES(100,0)) $[] [$A#1-4,X/4] &
>END_REPEAT
}
~EXECUTE
TABLE_SET= TAB103
Here is the table that Mentor prints:
RANKED TOP BOX TABLE WITH DIFFERENT PERCENTAGE BASE ON EACH ROW
TABLE 103
TOP BOX SUMMARY TABLE FOR OVERALL RATING
BASE= RESPONDENTS WHO USED THE BRAND
GENDER
<———–>
TOTAL MALE FEMALE
—– —- ——
TOTAL 100 52 48
100.0% 100.0% 100.0%
BRAND D TOP BOX 19 11 8
43.2% 42.3% 44.4%
BRAND B TOP BOX 27 14 13
40.3% 35.9% 46.4%
BRAND C TOP BOX 30 15 15
34.5% 34.9% 34.1%
BRAND E TOP BOX 25 13 12
28.4% 29.5% 27.3%
BRAND A TOP BOX 21 9 12
22.3% 18.0% 27.3%
Notice in the above table, that BRAND D prints highest on the table because it has the highest percentage, even though it has the lowest frequency. The only difference between Table 103 and Table 102 is the order in which the stubs are printed.
6.2 SUMMARY STATISTICS (MEANS)
This section shows how to produce summary statistics such as the mean, the standard deviation, the standard error, and the median on a table. It shows how you can efficiently and effectively produce these statistics no matter how the data was originally coded. In this section whenever the creation of a mean is discussed, you can replace the mean with any similar summary statistic (except where noted). For a complete list of all the summary statistics that can be produced see Appendix B: TILDE COMMANDS, ~DEFINE AXIS=.
If you wish to do significance testing on a top box table with a changing percentage base, you need to create a $[BASE] row for every tox box in the table. See section 8.2.4 for an example of how to do this.
Means are usually produced on one of the following types of questions:
- Rating Scales
- A scale such as age where a number has been coded into a range and you want to use the midpoint of the range in order to calculate the mean.
- An actual value is stored in the data.
There are two different ways to produce means. You can either produce the mean directly on the variable definition or you can use the EDIT statement options. The variable definition method will allow you to produce an appropriate mean in all circumstances while the EDIT statement options are only appropriate on means of type 1 or 2 above if you are doing neither weighting nor dependent significance testing. See 6.2.5 Means And Medians Using The EDIT Options for an explanation on how to produce summary statistics using the EDIT options. That section also discusses the differences in the two methods and the pros and cons of each. See 5.3 Changing Table Specifications for a description of the syntax for creating a summary statistic as part of the variable definition.
It is very important to make sure that you create your own summary columns (total, no answer, etc.) when you are producing summary statistics. The values that will be in the system-generated columns will reflect the number of respondents who were used in that statistic and not the true value of the statistic.
The number of decimal places that the statistic will print may be changed by using the STATISTICS_DECIMALS=# option on the EDIT, STUB, or COLUMN_INFO statements. The default decimal significance is 1 and valid settings are 0-7.
NOTE: The program prints a question mark (?) when the statistic is missing; i.e., it cannot be calculated because there is no respondent in that cell who has a valid number in the calculation. See Table 263 in 6.2.8 Summary Statistics with Arithmetic for an example.
6.2.1 Means on Rating Scales Using the Variable Definition
Typically, creating a mean on a rating scale will follow one of these scenarios:
- No recode needed (values for mean = punch and DK coded as X or Y punch)
- Exclude values (values for mean = punch and DK coded as a number)
- Reverse the scale (values are in inverted order and DK coded as X or Y punch)
- Reverse the scale and exclude values (values are inverted and DK coded as a number)Scale is 1 to 10 with 10 coded as a 0, or scale is 0 to 10 with X or Y coded as 10.
WITH NO RECODING NEEDED
If the value of the rating has been coded with the corresponding punch and the Don’t Know has been coded as a non-numeric code (X or Y punch), then there is no need to do any recoding on the mean. When no recoding is required you only need to define the row variable and specify the data location after the keyword $[MEAN].
NOTE: In the stub you will need to include a label for the mean and you will want to make sure you mark it as a statistics row, so that it will be printed with the appropriate number of decimals and without any percentage.
In the example below, the gender question is stored in column 5 while column 7 is a 4 point rating scale with 4 equal to Excellent and Don’t Know coded as an X punch. STATISTICS_DECIMALS=2 is specified on the EDIT statement so that the statistics will print with two decimal places of significance.
NOTE: The following set of commands define a standard front end for the next set of examples.
>PRINT_FILE MEAN
~INPUT MEAN
~SET AUTOMATIC_TABLES,DROP_LOCAL_EDIT,DROP_BASE, BEGIN_TABLE_NAME=T201
~DEFINE
TABLE_SET= {BAN1:
EDIT=: COLUMN_WIDTH=8,STUB_WIDTH=20,-COLUMN_TNA,STATISTICS_DECIMALS=2 }
STUB_PREFACE=: TOTAL
[SUPPRESS] NO ANSWER }
BANNER=:
| GENDER
| <———->
| TOTAL MALE FEMALE
| —– —- ——}
COLUMN=: TOTAL WITH [5^1/2]
}
~EXECUTE
TABLE_SET= BAN1
These commands are exclusive to this example.
~DEFINE
TABLE_SET= {TAB201:
HEADER=: MEAN AND OTHER SUMMARY STATISTICS ON A RATING SCALE WITH NO RECODING}
TITLE=: OVERALL RATING OF PRODUCT A}
STUB=:
EXCELLENT (4)
OOD (3)
FAIR (2)
POOR (1)
DON’T KNOW
[STAT] MEAN
[STAT] STD DEV
[STAT] STD ERROR
[STAT, STAT_DECIMALS=0] MEDIAN }
ROW=: [7^4//1/X] $[MEAN,STD,SE,MEDIAN] [7]
}
~EXECUTE
TABLE_SET= TAB201
Here is the table Mentor prints:
MEAN AND OTHER SUMMARY STATISTICS ON A RATING SCALE WITH NO RECODING
TABLE 201
OVERALL RATING OF PRODUCT A
GENDER
<———->
TOTAL MALE FEMALE
—– —- ——
TOTAL 100 58 42
100.0% 100.0% 100.0%
EXCELLENT (4) 15 11 4
15.0% 19.0% 9.5%
GOOD (3) 15 10 5
15.0% 17.2% 11.9%
FAIR (2) 23 12 11
23.0% 20.7% 26.2%
POOR (1) 24 15 9
24.0% 25.9% 21.4%
DON’T KNOW 23 10 13
23.0% 17.2% 31.0%
MEAN 2.27 2.35 2.14
STD DEV 1.11 1.16 1.03
STD ERROR 0.13 0.17 0.19
MEDIAN 2 2 2
WITH A NUMERIC DON’T KNOW EXCLUDED
If the value of the rating has been coded with the corresponding punch but the Don’t Know has been coded as a numeric code (like 5), then if you do not put a qualifier on the mean, the program will use that value (5) in calculating the mean rather than excluding respondents who said Don’t Know. In order to ensure proper calculation of the mean you need to use either the *RANGES= modifier or the WHEN joiner to only include appropriate items in the calculation. The *RANGES= modifier allows you to specify which numbers will be included in the statistic calculation. You can use either a dash to signify a range or a comma to separate individual items. See 5.1.2 Vector Joiners for an explanation of the WHEN joiner.
The assumptions in the example below are the same as the previous example, except that the rating scale is now stored in column 8 and the Don’t Know was coded as a 5 punch.
~DEFINE
TABLE_SET= {TAB202:
HEADER=: MEAN EXCLUDING A DON’T KNOW CODED AS A NUMBER}
TITLE=: OVERALL RATING OF PRODUCT B}
STUB=:
EXCELLENT (4)
GOOD (3)
FAIR (2)
POOR (1)
DON’T KNOW
[STAT] MEAN }
ROW=: [8^4//1/5] $[MEAN] [8*RANGES=1-4]
}
~EXECUTE
TABLE_SET= TAB202
Here is an alternate way to write the row variable:
ROW202A: [8^4//1/5] $[MEAN] [8] WHEN [8^1-4]
Here is the table Mentor prints:
MEAN EXCLUDING A DON’T KNOW CODED AS A NUMBER
TABLE 202
OVERALL RATING OF PRODUCT B
GENDER
<———->
TOTAL MALE FEMALE
—– —- ——
TOTAL 100 58 42
100.0% 100.0% 100.0%
EXCELLENT (4) 15 11 4
15.0% 19.0% 9.5%
GOOD (3) 15 10 5
15.0% 17.2% 11.9%
FAIR (2) 23 12 11
23.0% 20.7% 26.2%
POOR (1) 24 15 9
24.0% 25.9% 21.4%
DON’T KNOW 23 10 13
23.0% 17.2% 31.0%
MEAN 2.60 2.46 2.81
WITH THE SCALE REVERSED
Sometimes rating scales are coded such that a 1 signifies the highest possible rating and the highest number in the scale signifies the lowest rating. If you produce a mean on this variable without doing any recoding, then the lower the value of the mean, the higher the overall rating. In order to make a higher mean reflect a higher rating, you need to reverse the scale so that a 1 punch now has the highest value and the highest punch now has a value of 1. There are basically two different approaches, the subtraction method and the function method.
Using the subtraction method, you reverse the scale by taking one more than the highest value in the scale and subtracting the variable or column location from it. Suppose you have a 7 point scale coded in column 20, but you want the 1 punch to have a value of 7 on down to the 7 punch having a value of 1. So you would define the mean as $[MEAN] 8 – [20]. This is how Mentor reverses a scale:
If there is a 1 in column 20, it becomes (8 – 1) = 7
2 (8 – 2) = 6
3 (8 – 3) = 5
4 (8 – 4) = 4
5 (8 – 5) = 3
6 (8 – 6) = 2
7 (8 – 7) = 1
See below for an example of the subtraction method inside of a table set.
A second way to reverse the scale is by using a function to reassign the values you want to use for purposes of calculating the mean. You can use either the SUBSCRIPT or the SELECT_VALUE function. The SUBSCRIPT function assigns the values as the subscript position of each category. This means that the first category in the variable is assigned a value of 1, the second a value or 2, and so on. If the variable is multiple then no value is assigned. The SELECT_VALUE function assigns the values based on the position of the category in the variable and by what values are written in the VALUES portion of the function. See 8.3.2 FUNCTIONS for more information on these functions.
In the example below, the 4 point rating scale is stored in column 10 with Excellent coded as a 1, Poor coded as a 4, and Don’t Know coded as an X punch.
~DEFINE
TABLE_SET= {TAB203:
HEADER=: MEAN REVERSING THE SCALE AND DON’T KNOW CODED AS AN X OR Y}
TITLE=: OVERALL RATING OF PRODUCT C}
TUB=:
EXCELLENT (4)
GOOD (3)
FAIR (2)
POOR (1)
DON’T KNOW
[STAT] MEAN }
ROW=: [10^1//4/X] $[MEAN] 5 – [10]
~EXECUTE
TABLE_SET= TAB203
}
Here are some alternate ways to write the row variable:
ROW203A: [10^1//4/X] $[MEAN] SUBSCRIPT([10^4//1])
ROW203B: [10^1//4/X] $[MEAN] SELECT_VALUE([10^4//1],VALUES(1,2,3,4))
The printed table will look fundamentally the same as Table 202 above.
WITH THE SCALE REVERSED AND DK/NA CODED AS NUMERIC
If the rating scale is coded in reverse as in table 203 and the Don’t Know is a numeric code as in Table 202, then you will need to use one of the following methods to properly calculate the mean. Either you need to combine the subtraction method of Table 203 with the exclusion method of Table 202 or you can just use either the SUBSCRIPT or SELECT_VALUE functions as in Table 203.
In the example below, the 4 point rating scale is coded in column 9 with a 1 punch signifying Excellent, a 4 punch signifying Poor and a 5 punch standing for Don’t Know.
~DEFINE
TABLE_SET= {TAB204:
HEADER=: MEAN REVERSING THE SCALE AND EXCLUDING DON’T KNOW}
TITLE=: OVERALL RATING OF PRODUCT D}
STUB=:
EXCELLENT (4)
GOOD (3)
FAIR (2)
POOR (1)
DON’T KNOW
[STAT] MEAN }
ROW=: [11^1//5] $[MEAN] SUBSCRIPT([11^4//1])
}
~EXECUTE
TABLE_SET= TAB204
Here are some alternate ways to write the row variable:
ROW204A: [11^1//5] $[MEAN] 5 – [11*RANGES=1-4]
ROW204B: [11^1//5] $[MEAN] (5 – [11]) WHEN [11^1-4]
ROW204C: [11^1//5] $[MEAN] SELECT_VALUE([11^1//4],VALUES(4,3,2,1))
The printed table will look fundamentally the same as Table 202 above.
WITH 10 CODED AS A ZERO, AN X, OR Y
If you have a 10 point rating scale that has been coded in one column to save space, the 0 is probably used to stand for a rating of 10. If you do no recoding on this variable, the mean will be low as all those who rated the item a 10 will be assigned the value of 0 for purposes of calculating the mean. In order to properly calculate this mean you will want to use the SUBSCRIPT function.
In the example below a 10 point rating scale is coded in column 12, with Don’t Know coded as a Y punch. If the scale is reversed, all you need to do is reverse the order of the categories within the SUBSCRIPT function.
~DEFINE
TABLE_SET= {TAB205:
HEADER=: MEAN USING THE SUBSCRIPT FUNCTION TO RECODE 0 AS 10}
TITLE=: 10 POINT OVERALL RATING OF PRODUCT E}
STUB=:
>REPEAT
$A=ONE,TWO,THREE,FOUR,FIVE,SIX,SEVEN,EIGHT,NINE,TEN,”DK/NA”
$A
>END_REPEAT
[STAT] MEAN }
ROW=: [12^1//0/Y] $[MEAN] SUBSCRIPT([12^1//0])
}
~EXECUTE
TABLE_SET= TAB205
Here is an alternate way to write the row variable:
ROW205A: [12^1//0/Y] $[MEAN] SELECT_VALUE([12^1//0],VALUES(1,2,…,10))
Here is the table Mentor prints:
MEAN USING THE SUBSCRIPT FUNCTION TO RECODE 0 AS 10
TABLE 205
10 POINT OVERALL RATING OF PRODUCT E
GENDER
<———->
TOTAL MALE FEMALE
—– —- ——
TOTAL 100 58 42
100.0% 100.0% 100.0%
ONE 9 4 5
9.0% 6.9% 11.9%
TWO 13 8 5
13.0% 13.8% 11.9%
THREE 5 4 1
5.0% 6.9% 2.4%
FOUR 10 6 4
10.0% 10.3% 9.5%
FIVE 9 4 5
9.0% 6.9% 11.9%
SIX 9 4 5
9.0% 6.9% 11.9%
SEVEN 7 2 5
7.0% 3.4% 11.9%
EIGHT 8 6 2
8.0% 10.3% 4.8%
NINE 10 5 5
10.0% 8.6% 11.9%
TEN 11 9 2
11.0% 15.5% 4.8%
DK/NA 9 6 3
9.0% 10.3% 7.1%
MEAN 5.52 5.71 5.26
One other possible scenario very similar to this, is if the scale is from 0 to 10 with the 0 (or 10) punch standing for 0, and either the X or Y punch standing for 10. This row definition would look the same as above, except that you would probably always want to use the SELECT_VALUE function. In the example below, assume that the 0 stands for 0, the X for 10, and the Y punch for Don’t Know. Only the row variable definition is shown.
ROW205X: [12^0/1//9/X/Y] $[MEAN] &
SELECT_VALUE([12^0/1//9/X],VALUES(0,1,…,10))
6.2.2 Means For Range Type Variables
Sometimes you need to calculate a mean on data that has been coded as a range type variable with a code standing for a range of numbers. For example, you might have the age question coded so that a 1 punch means 18 to 30 and a 2 punch means 31 to 45 and so on. If you were to just do a straight mean on this table your result would be some number like 3.5, so to ensure you get a mean reflecting the real value you need to use the SELECT_VALUE function to assign the midpoint of each range to the category. To determine the midpoint of the range, add the starting point and the ending point of the range and divide by 2 (i.e., (18+30)/2 = 24). Open-ended categories (Under 18 or Over 60) will need someone to decide what the midpoint value should be. It is usually a good idea to include the value that was used on each stub to reduce possible confusion about how the mean was derived.
NOTE: If you try to calculate a regular median with this method, it will just be the midpoint value of where the middle respondent exists. Instead, you would want to use an interpolated median, which produces a median value that is the mid-point between the true median and next value beyond it. TABLE 206 below includes both the meidan and interpolated median for comparision.
INTERPOLATED MEDIANS
If you try to calculate a median with this method, it will just be the midpoint value of where the middle respondent exists. You might want to use INTERPOLATED_MEDIAN instead.
Mentor provides three different kinds of medians:
$[MEDIAN]
$[INTERPOLATED_MEDIAN] and
~EDIT COLUMN_MEDIAN.
$[MEDIAN] is best for numeric data. $[INTERPOLATED_MEDIAN] and COLUMN_MEDIAN are two different types of interpolated medians, and they are better for rating scales and range questions. It must be noted here that interpolated medians are considered a “junk statistic” by many marketing people.
A true median is always the value of an existing data element. To find this median, values must be sorted from low to high. If there are an even number of elements, the median will be the value in the N/2 position, where N is the number of data elements. If there are an odd number of elements, then the median will be the item in the (N+1)/2 position. The unfortunate result of this is that most rating scales and range questions will always have the midpoint of the middle category as their median, because if the median falls anywhere in that category, the value is that is assigned is the midpoint. Interpolated medians breaks the range of data into pieces and can make the median higher or lower, depending on the number of people in each category. If there is an odd number of values, then MEDIAN and INTERPOLATED_MEDIAN will produce the same result. If there is an even number of values, INTERPOLATED_MEDIAN will find and average the two middle values. For example, if the values are 1, 3, 5, and 7, the INTERPOLATED_MEDIAN is 4.
$INTERPOLATED_MEDIAN will work on unweighted data and data weighted with integer weights. Due to rounding issues, INTERPOLATED_MEDIAN is not recommended with fractional weights.
In the following example, both MEDIAN and INTERPOLATED_MEDIAN are included to show the difference between them. For details about ~EDIT COLUMN_MEDIAN, see 6.2.5 MEANS AND COLUMN_MEDIANS USING THE EDIT OPTIONS.
In the example below, the age question was coded as range variable in column 13 with a 1 punch signifying age 18 to 30, a 2 punch the 31 to 45 age group, a 3 punch the 46 to 60 group, a 4 punch the over 60 group, and a 5 punch for Don’t Know.
~DEFINE
TABLE_SET= {TAB206:
HEADER=: MEAN AND MEDIAN USING THE SELECT FUNCTION TO ASSIGN
MIDPOINT VALUES TO CATEGORIES}
TITLE: AGE OF RESPONDENT}
STUB=:
18 – 30 (24)
31 – 45 (38)
46 – 60 (53)
OVER 60 (61) DON’T KNOW
[STAT] MEAN
[STAT, STAT_DECIMALS=0] MEDIAN
[STAT, STAT_DECIMALS=0] INTERPOLATED MEDIAN }
ROW=: [13^1//5] $[MEAN,MEDIAN,INTERPOLATED_MEDIAN]&
SELECT_VALUE( [13^1//4], VALUES(24,38,53,61) ) }
~EXECUTE
TABLE_SET= TAB206
Here is the table Mentor prints:
MEAN AND MEDIAN USING THE SELECT FUNCTION TO ASSIGN MIDPOINT VALUES TO CATEGORIES
TABLE 206
AGE OF RESPONDENT
GENDER
GENDER
<———->
TOTAL MALE FEMALE
—– —- ——
TOTAL 200 57 66
100.0% 100.0% 100.0%
18 – 30 (24) 37 12 13
18.5% 21.1% 19.7%
31 – 45 (38) 35 8 13
17.5% 14.0% 19.7%
46 – 60 (53) 39 13 10
19.5% 22.8% 15.2%
OVER 60 (61) 36 7 13
18.0% 12.3% 19.7%
DON’T KNOW 30 10 8
15.0% 17.5% 12.1%
MEAN 44.09 42.70 43.45
MEDIAN 53 38 38
INTERPOLATED_MEDIAN 53 46 38
6.2.3 Means For Numeric Data
Doing a mean on a numeric type variable usually falls into one of these three basic scenarios:
- The valid range of numbers is from j to k (where j and k are any real numbers) and DK is coded as some non-numeric value (i.e., DK or XX or –).
- The valid range of numbers is from j to k (j and k as above) and DK is coded as a number (i.e., 99 or 98 or -1).
- The valid range of numbers is from j to k (j and k as above) and some non-numeric code is used for some outer value (i.e., – for 100 or && for .5).
WITH NO RECODING NECESSARY
If no recoding is necessary for the mean, then you can produce the mean by just referencing the data location or variable name in brackets. In the example below, suppose that the number of times the respondent has used product A in the last year is stored in columns 14 and 15, where 99 or more is coded as 99, and Don’t Know is coded as DK. Since all items you wish to include in the mean are numeric and all you wish to exclude are non-numeric, no recoding is needed and you can just specify the location of the mean in brackets after the $[MEAN] keyword.
~DEFINE
TABLE_SET= {TAB207:
HEADER=: MEAN AND MEDIAN ON NUMERIC TYPE VARIABLE, DON’T KNOW CODED AS DK}
TITLE=: NUMBER OF TIMES USED PRODUCT A IN LAST YEAR}
STUB=:
0 – 10
11 – 20
21 – 50
OVER 50
DON’T KNOW
[STAT] MEAN
[STAT, STAT_DECIMALS=0] MEDIAN }
ROW=: [14.2#0-10/11-20/21-50/51-99/”DK”] $[MEAN MEDIAN] [14.2]
}
~EXECUTE
TABLE_SET= TAB207
Here is the table Mentor prints:
MEAN AND MEDIAN ON NUMERIC TYPE VARIABLE, DON’T KNOW CODED AS DK
TABLE 207
NUMBER OF TIMES USED PRODUCT A IN LAST YEAR
GENDER
<———->
TOTAL MALE FEMALE
—– —- ——
TOTAL 100 58 42
100.0% 100.0% 100.0%
0 – 10 11 6 5
11.0% 10.3% 11.9%
11 – 20 7 3 4
7.0% 5.2% 9.5%
21 – 50 28 16 12
28.0% 27.6% 28.6%
OVER 50 45 28 17
45.0% 48.3% 40.5%
DON’T KNOW 9 5 4
9.0% 8.6% 9.5%
MEAN 50.63 52.43 48.11
MEDIAN 50 56 45
WITH DON’T KNOW CODED AS A NUMBER
If the Don’t Know response has been coded as a numeric value like 99, then if you do not do any recoding the mean will be much too high as all the Don’t Know responses will be assigned a value of 99. You need to exclude these responses by either using the *RANGES modifier or the WHEN joiner. See 6.2.1 Means on Rating Scales Using the Variable Definition for a more detailed explanation of these.
In the example below the number of times the product was used in the past year has been coded as a two digit number in columns 16 and 17 while the Don’t Know response was coded as 99.
~DEFINE
TABLE_SET= {TAB208:
HEADER=: MEAN AND MEDIAN ON NUMERIC TYPE VARIABLE WITH DON’T KNOW CODED AS 99}
TITLE=: NUMBER OF TIMES USED PRODUCT B IN LAST YEAR}
STUB=:
0 – 10
11 – 20
21 – 50
OVER 50
DON’T KNOW
[STAT] MEAN
[STAT, STAT_DECIMALS=0] MEDIAN }
ROW=: [16.2#0-10/11-20/21-50/51-98/99] $[MEAN,MEDIAN] [16.2*RANGES=0-98]
}
~EXECUTE
TABLE_SET= TAB208
Here is an alternate way to write the row variable:
ROW208A: [16.2#0-10/11-20/21-50/51-98/99] &
$[MEAN,MEDIAN] [16.2] WHEN [16.2#0-98]
The printed table will look fundamentally the same as Table 207 above.
WITH A NUMERIC VALUE CODED AS A NON-NUMERIC
If the data has been coded so that a non-numeric code like XX has been used to code a numeric value like 100, then you will need to use either the OTHERWISE joiner or the SUM function in combination with the SELECT_VALUE function in order to properly recode the data. The SELECT_VALUE function is used to assign the values to the non-numeric codes and the OTHERWISE joiner or the SUM function is used to combine these values with the good numeric values that exist in the data. See 5.1.2 VECTOR JOINERS for more information on the OTHERWISE joiner and 8.3.2 FUNCTIONS for details on the SUM function.
In the example below the data was coded in columns 18 and 19. 100 or more was coded as ”–” and Don’t Know was coded as DK. The [18.2] after the [$MEAN] variable will assign the numeric value to all valid numeric codes, the SELECT_VALUE function will assign a value of 100 to the ”–” code, and the OTHERWISE joiner will cause the program to combine the two values using the SELECT_VALUE function only if there is no valid numeric.
NOTE: The X punch is referenced as a ”-” when you are inside a # type variable.
~DEFINE
TABLE_SET= {TAB209:
HEADER=: MEAN ON A NUMERIC TYPE VARIABLE WITH 100 CODED AS — (XX in punch mode)}
TITLE=: NUMBER OF TIMES USED PRODUCT C IN LAST YEAR}
STUB=:
0 – 10
11 – 20
21 – 50
51 – 99
100
DON’T KNOW
[STAT] MEAN }
ROW=: [18.2#0-10/11-20/21-50/51-99/”–“/DK] $[MEAN] &
[18.2] OTHERWISE SELECT_VALUE([18.2#”–“],VALUES(100))
}
~EXECUTE
TABLE_SET= TAB209
Here is an alternate way to write the row variable:
ROW209A: [18.2#0-10/11-20/21-50/51-99/”–“/DK] $[MEAN] &
SUM([18.2],SELECT_VALUE([18.2#”–“],VALUES(100)))
Here is the table Mentor prints:
MEAN ON A NUMERIC TYPE VARIABLE WITH 100 CODED AS — (XX in punch mode)
TABLE 209
NUMBER OF TIMES USED PRODUCT C IN LAST YEAR
GENDER
<———->
TOTAL MALE FEMALE
—– —- ——
Total 100 58 42
100.0% 100.0% 100.0%
0 – 10 7 4 3
7.0% 6.9% 7.1%
11 – 20 7 4 3
7.0% 6.9% 7.1%
21 – 50 24 14 10
24.0% 24.1% 23.8%
51 – 99 44 25 19
44.0% 43.1% 45.2%
100 OR MORE 10 8 2
10.0% 13.8% 4.8%
DON’T KNOW 8 3 5
8.0% 5.2% 11.9%
MEAN 56.95 57.96 55.43
6.2.4 Summary Statistics in the Column Variable
All the summary statistics that can be produced for a row variable can also be produced for a column variable. The syntax for the definition of the column variable is exactly the same as for the row definition and any recoding would be specified the same way.
The one major difference of producing a mean in the column is that you will need to use the COLUMN_INFO option on the EDIT statement to make sure the data is formatted correctly. The COLUMN_INFO option allows you to set different print options for each banner point of the table just as the STUB options allow you to control how each row prints. See 5.3 Changing Table Specifications for a description of the syntax for the COLUMN_INFO statement.
You will also need to make sure that you create your own summary rows (total, no answer, etc.) in the variable and suppress the system-generated ones, because the system-generated summary rows will print the number of valid respondents in the statistic and not the value of the statistic.In the example below suppose you want to produce a table similar to Table 207 in section 6.2.3, but wanted the number of times used the product in the column and the gender of the respondent as the row variable. Notice that there is a new STUB_PREFACE used to suppress the system-generated total and that both the STUB and the ROW have a user-generated total. Also note how the COLUMN_INFO option causes the 7th banner point to print as a statistic with 2 decimals and the 8th one as a statistic with 0 decimals.
~DEFINE
TABLE_SET= {TAB210:
EDIT=: COLUMN_WIDTH=6,STUB_WIDTH=10,-COLUMN_TNA,STATISTICS_DECIMALS=2,
PERCENT_DECIMALS=0,VERTICAL_PERCENT=1,
COLUMN_INFO=(COLUMN=7,WIDTH=8,STATISTICS_COLUMN/&
COLUMN=8,WIDTH=7,STATISTICS_COLUMN,STATISTICS_DECIMALS=0)}
STUB_PREFACE=:
[SUPPRESS] TOTAL
[SUPPRESS] NO ANSWER }
BANNER=:
| NUMBER OF TIMES USED PRODUCT A IN PAST YEAR
| <===========================================>
| TOTAL 0-10 11-20 21-50 51-99 DK MEAN MEDIAN
| —– —- —– —– —– — —- ——}
COLUMN=: TOTAL WITH [14.2#0-10/11-20/21-50/51-99/”DK”] $[MEAN,MEDIAN] [14.2]
HEADER=: SUMMARY STATISTICS IN THE BANNER}
TITLE=: GENDER OF RESPONDENT}
STUB=:
TOTAL
MALE
FEMALE }
ROW=: TOTAL WITH [5^1/2]
}
~EXECUTE
TABLE_SET= TAB210
Here is the table Mentor prints:
SUMMARY STATISTICS IN THE BANNER TABLE 210
GENDER OF RESPONDENT
NUMBER OF TIMES USED PRODUCT A IN PAST YEAR
<============================================>
TOTAL 0-10 11-20 21-50 51-99 DK MEAN MEDIAN
—– —- —– —– —– — —- ——
TOTAL 100 11 7 28 45 9 50.63 50
100% 100% 100% 100% 100% 100%
MALE 58 6 3 16 28 5 52.43 56
58% 55% 43% 57% 62% 56%
FEMALE 42 5 4 12 17 4 48.11 45
42% 45% 57% 43% 38% 44%
SUMMARY STATISTICS IN BOTH THE COLUMN AND THE ROW
You can have both the column and row variables have summary statistics in them, but the cross of the two statistics will always be missing and will print as a ”?”. You can change the ”?” to some other character by using the EDIT=PUT_CHARACTERS option.
The example below crosses the number of times product A was used last year with the number of times product B was used last year. This will use the column and banner variables from the previous Table 210 and a similar definition to the row and stub that were used for Table 208. The difference between this row and stub and those for Table 208 is that these must have a user-generated Total row.
~DEFINE
TABLE_SET= {TAB211:
HEADER=: MEAN AND MEDIAN IN BOTH THE COLUMN AND THE ROW VARIABLE}
TITLE=: NUMBER OF TIMES USED PRODUCT B IN LAST YEAR}
STUB=:
TOTAL
0 – 10
11 – 20
21 – 50
OVER 50
DON’T KNOW
[STAT] MEAN
[STAT, STAT_DECIMALS=0] MEDIAN }
ROW=: TOTAL WITH [16.2#0-10/11-20/21-50/51-98/99] &
$[MEAN,MEDIAN] [16.2*RANGES=0-98]
}
~EXECUTE
TABLE_SET= TAB211
Here is the table Mentor prints:
MEAN AND MEDIAN IN BOTH THE COLUMN AND THE ROW VARIABLE
TABLE 211
NUMBER OF TIMES USED PRODUCT B IN LAST YEAR
NUMBER OF TIMES USED PRODUCT A IN PAST YEAR
<=========================================>
TOTAL 0-10 11-20 21-50 51-99 DK MEAN MEDIAN
—– —- —– —– —– — —- ——
TOTAL 100 11 7 28 45 9 50.63 50
100% 100% 100% 100% 100% 100%
0 – 10 13 1 2 5 5 – 44.54 37
13% 9% 29% 18% 11%
11 – 20 12 1 1 2 5 3 47.33 54
12% 9% 14% 7% 11% 33%
21 – 50 22 2 1 7 10 2 51.05 44
22% 18% 14% 25% 22% 22%
OVER 50 40 4 3 12 19 2 52.87 48
40% 36% 43% 43% 42% 22%
DON’T KNOW 13 3 – 2 6 2 52.00 68
13% 27% 7% 13% 22%
MEAN 45.61 50.88 38.29 44.27 47.77 39.86 ? ?
MEDIAN 43 34 27 40 48 28 ? ?
6.2.5 Means And Medians Using The EDIT Options
The summary statistics mean, standard deviation, standard error, variance, and median can be produced by using either the variable definition as explained in the previous sections or by using the EDIT options, COLUMN_STATISTICS_VALUES, COLUMN_MEAN, COLUMN_STD, COLUMN_SE, COLUMN_VARIANCE, and COLUMN_MEDIAN to produce column statistics (extra rows will be generated at the bottom of the table) or ROW_STATISTICS_VALUES, ROW_MEAN, ROW_STD, ROW_SE, ROW_VARIANCE, and ROW_MEDIAN to produce row statistics (extra columns will be generated on the right hand side of the table).
The advantages to creating the summary statistics by using the EDIT options are:
- The processing time of the computer is significantly less
- There is no change in the syntax regardless of how the Don’t Know was coded
- The labels are automatically generated
- The system-generated summary columns/rows will have the correct values.
Limitations of creating summary statistics this way are:
- Values are only correct if every distinct number in the range is a category that is printed on the table (this will most likely not work for standard numeric type variables).
- Standard deviations, standard errors, and variances cannot be properly calculated on weighted data.
- Dependent statistical testing cannot be performed on the mean.
- Only one mean can be generated per table (No mean summaries).
The amount of processing time saved by doing summary statistics using the EDIT options is dependent on the number of respondents in the data file and the number of rows in the table. The greater the number of respondents causes the savings to be greater. For example, if you had 1000 respondents who answered a 5 point rating scale, then the EDIT options method would process the mean up to 100 times faster. This is not to imply that the entire table will process 100 times faster, but it might be as much as twice as fast. This could be a very significant difference if you increase the number of respondents to 100,000.
When you use the EDIT options to produce summary statistics you must tell the program what value you want to assign to each row in the table. Use the COLUMN_STATISTICS_VALUES=VALUES() option to set the values for a mean, standard deviation, or standard error. The COLUMN_MEDIAN=VALUES() option is used to set the values for a median. In either case separate each row value with a comma; use a double comma to exclude a row from the calculation, and use the ellipses to generate a string of values.
NOTE: If you have more rows than values, all the extra rows will be excluded from any calculation.
MEANS ON A RATING SCALE
Since it makes no difference how the Don’t Know was coded, this example can replace all the rating scale examples in 6.2.1 MEANS ON RATING SCALES USING THE VARIABLE DEFINITION. In this example the rating for brand A is stored in column 7. The COLUMN_STATISTICS_VALUES option is set to “VALUES(4,3,2,1)” which will cause the first row in the table to have a value of 4, the second row 3, the third row 2, and the fourth row 1. The fifth and any additional rows will not be included in the calculation.
NOTE: The statistical values for this table are exactly the same as Table 201.
~DEFINE
TABLE_SET= {TAB212:
HEADER=: MEAN AND OTHER SUMMARY STATISTICS ON A RATING SCALE USING THE EDIT OPTIONS}
TITLE=: OVERALL RATING OF PRODUCT A}
LOCAL_EDIT=: COLUMN_STATISTICS_VALUES=VALUES(4,3,2,1),COLUMN_MEAN,COLUMN_STD, COLUMN_SE }
STUB=:
EXCELLENT (4)
GOOD (3)
FAIR (2)
POOR (1)
DON’T KNOW }
ROW=: [7^4//1/X]
}
~EXECUTE
TABLE_SET= BAN1
TABLE_SET= TAB212
The printed table will look fundamentally the same as Table 201.
MEANS ON A RATING SCALE WITH ROWS IN THE MIDDLE THAT NEED TO BE EXCLUDED
If you have additional rows in the middle of the table that need to be excluded from the calculation such as a top 2 box or a bottom 2 box, then you need to make sure that the COLUMN_STATISTICS_VALUES command has the appropriate number of commas.
This example is the same as Table 212 except it has both a top 2 box and bottom 2 box. Notice the COLUMN_STATISTICS_VALUES command now looks like “VALUES(,4,32,1)” which will cause the first and fourth rows in the table to be excluded from the statistics. <code> ~DEFINE TABLE_SET= {TAB213: HEADER=: MEAN USING THE EDIT OPTION AND EXCLUDING INTERNAL ROWS FROM THE CALCULATION } TITLE=: OVERALL RATING OF PRODUCT A} LOCAL_EDIT=: COLUMN_STATISTICS_VALUES=VALUES(,4,32,1), COLUMN_MEAN } STUB=:
|TOP BOX
| EXCELLENT (4)
| GOOD (3)
|BOTTOM BOX
| FAIR (2)
| POOR (1)
|DON’T KNOW }
ROW=: [7^4,3/4/3/2,1/2/1/X] }
~EXECUTE TABLE_SET= TAB213 </code>
Here is the table Mentor prints:
MEAN USING THE EDIT OPTION AND EXCLUDING INTERNAL ROWS FROM THE CALCULATION
TABLE 213
OVERALL RATING OF PRODUCT A
GENDER
<———->
TOTAL MALE FEMALE
—– —- ——
TOTAL 100 58 42
100.0% 100.0% 100.0%
TOP BOX 30 21 9
30.0% 36.2% 21.4%
EXCELLENT (4) 15 11 4
15.0% 19.0% 9.5%
GOOD (3) 15 10 5
15.0% 17.2% 11.9%
BOTTOM BOX 47 27 20
47.0% 46.6% 47.6%
FAIR (2) 23 12 11
23.0% 20.7% 26.2%
POOR (1) 24 15 9
24.0% 25.9% 21.4%
DON’T KNOW 23 10 13
23.0% 17.2% 31.0%
Mean 2.27 2.35 2.14
MEANS ON A RANGE VARIABLE
If you have a range variable you can produce the summary statistics by assigning the value of the midpoint of each range to each row. See 6.2.2 Means For Range Type Variables for more information on how and why to use the midpoint.
NOTE: This will produce the same mean values as Table 206 in section 6.2.2.
~DEFINE
TABLE_SET= {TAB214:
HEADER=: MEAN ON A RANGE TYPE VARIABLE USING THE EDIT OPTION }
TITLE=: AGE OF RESPONDENT}
LOCAL_EDIT=: COLUMN_STATISTICS_VALUES=VALUES(24,38,53,61), COLUMN_MEAN }
STUB=:
18 – 30 (24)
31 – 45 (38)
46 – 60 (53)
OVER 60 (61)
DON’T KNOW }
ROW=: [13^1//5]
}
~EXECUTE
TABLE_SET= TAB214
The printed table will look fundamentally the same as Table 206.
CHANGING THE DEFAULT PRINT OPTIONS
If you do not like the default labelling or default printing format for the system-generated summary statistics, you can use the PRINT_ROW option on the stub label to change them. The PRINT_ROW option allows you to print any system-generated row anywhere in the table, and when used in conjunction with all the other STUB options it allows you to format these lines any way in which you choose. The following example shows how to change the default labelling for the mean, standard deviation, and standard error.
NOTE: The numbers for this table are exactly the same as Table 212.
~DEFINE
TABLE_SET= {TAB215:
HEADER=: SUMMARY STATISTICS USING THE EDIT OPTION AND CHANGING THE WAY IT PRINTS }
TITLE=: OVERALL RATING OF PRODUCT A}
LOCAL_EDIT=: COLUMN_STATISTICS_VALUES=VALUES(4,3,2,1), COLUMN_MEAN,COLUMN_STD,COLUMN_SE }
STUB=:
EXCELLENT (4)
GOOD (3)
FAIR (2)
POOR (1)
DON’T KNOW
[PRINT_ROW=MEAN] AVERAGE
[PRINT_ROW=STD] STD DEV
[PRINT_ROW=SE] STD ERR }
ROW=: [7^4//1/X]
}
~EXECUTE
TABLE_SET= TAB215
Here is the table Mentor prints:
SUMMARY STATISTICS USING THE EDIT OPTION AND CHANGING THE WAY IT PRINTS
TABLE 215
OVERALL RATING OF PRODUCT A
GENDER
<———->
TOTAL MALE FEMALE
—– —- ——
TOTAL 100 58 42
100.0% 100.0% 100.0%
EXCELLENT (4) 15 11 4
15.0% 19.0% 9.5%
GOOD (3) 15 10 5
15.0% 17.2% 11.9%
FAIR (2) 23 12 11
23.0% 20.7% 26.2%
POOR (1) 24 15 9
24.0% 25.9% 21.4%
DON’T KNOW 23 10 13
23.0% 17.2% 31.0%
AVERAGE 2.27 2.35 2.14
STD DEV 1.11 1.16 1.03
STD ERR 0.13 0.17 0.19
To rename the system-generated statistics for an entire run you can build a stub that formats the statistics the way you want and then use the STUB_SUFFIX table element on any table that you are producing this type of statistic on. The example below produces the exact same table as Table 215.
~DEFINE
STUB= STUBBOT1:
[PRINT_ROW=MEAN] AVERAGE
[PRINT_ROW=STD] STD DEV
[PRINT_ROW=SE] STD ERR }
TABLE_SET= {TAB216:
HEADER=: SUMMARY STATISTICS USING THE EDIT OPTION AND CHANGING THE WAY IT PRINTS }
TITLE=: OVERALL RATING OF PRODUCT A}
STUB_SUFFIX= STUBBOT1
LOCAL_EDIT=: COLUMN_STATISTICS_VALUES=VALUES(4,3,2,1), COLUMN_MEAN,COLUMN_STD, COLUMN_SE }
STUB=:
EXCELLENT (4)
GOOD (3)
FAIR (2)
POOR (1)
DON’T KNOW }
ROW=: [7^4//1/X]
}
~EXECUTE
TABLE_SET= TAB216
The printed table will look fundamentally the same as Table 215.
COLUMN MEDIANS
COLUMN_MEDIAN is type of interpolated median, which you can use instead of MEDIAN or INTERPOLATED_MEDIAN. Because of the way it is calculated, a true median on rating scales and range questions is usually the midpoint of the middle category. The COLUMN_MEDIAN does not use the midpoint of the median category. It breaks that category into ranges for each data element and, therefore, can make the median higher or lower, depending on the number of people in each category. (See 6.2.2 Means For Range Type Variables for more information on interpolated medians).
COLUMN_MEDIAN is best used with grouped data. The values for COLUMN_MEDIANS are not the midpoints, but rather the starting point of each range. Each category then goes from its starting point to the starting point of the next category. You will need one extra value at the end to act as the ending point for the last category. If you have a Don’t Know category, exclude it with an extra comma. Examples for setting up a COLUMN_MEDIAN on rating scales and ranges follows. If you get a question mark instead of a median, see the section entitled Lost Medians.
Column Medians on Rating Scales
When doing an column median on a rating scale with an ascending scale you will want to assign the starting value for each range as .5 less than its actual value. This will cause that value to have a range from .5 less than its value to .5 greater than its value. In the example below, the first row in the table has a range from .5 to 1.5, which is exactly what you want for the value of 1. You will also need to make sure the Don’t Know is excluded and that there is an upper boundary for the last category. Notice that there is one more value in the COLUMN_MEDIAN option than there are rows in the table. If you have a descending scale then you will want to assign the starting value as .5 higher than its actual value. The example below produces a column median on data column 7. Compare this with the median that was printed in Table 003 in 6.2.1 Means on Rating Scales Using the Variable Definition.
~DEFINE
TABLE_SET= {TAB216:
HEADER=: COLUMN MEDIAN ON A RATING SCALE}
TITLE=: OVERALL RATING OF PRODUCT A}
LOCAL_EDIT=: COLUMN_MEDIAN=VALUES(4.5,3.5,2.5,1.5,,.5) }
STUB=:
EXCELLENT (4)
GOOD (3)
FAIR (2)
POOR (1)
DON’T KNOW }
ROW=: [7^4//1/X]
}
~EXECUTE
TABLE_SET= TAB216
Here is the table Mentor prints:
COLUMN MEDIAN ON A RATING SCALE
TABLE 216
OVERALL RATING OF PRODUCT A
GENDER
<———->
TOTAL MALE FEMALE
—– —- ——
TOTAL 100 58 42
100.0% 100.0% 100.0%
EXCELLENT (4) 15 11 4
15.0% 19.0% 9.5%
GOOD (3) 15 10 5
15.0% 17.2% 11.9%
FAIR (2) 23 12 11
23.0% 20.7% 26.2%
POOR (1) 24 15 9
24.0% 25.9% 21.4%
DON’T KNOW 23 10 13
23.0% 17.2% 31.0%
Median 2.13 2.25 2.00
Column Medians On Range Type Variables
The approach for defining the values for a median on a range type variable is similar to that for a rating scale, except the value you assign to each row is the starting point for that row. Again, you will need to exclude any Don’t Know category and make sure you assign an upper boundary to the last category.
If you do not define an upper boundary for the last category and the median should fall into that category, then a question mark (?) will print on the table denoting a missing median. When assigning the upper boundary, someone will have to decide which value to use. But, unlike the value assigned for a mean, the value of the upper boundary will only matter if the median falls into the last category. Otherwise, it will have no effect on the median at all.
~DEFINE
TABLE_SET= {TAB217:
HEADER=: COLUMN MEDIAN ON A RANGE TYPE VARIABLE }
TITLE=: AGE OF RESPONDENT}
LOCAL_EDIT=: COLUMN_MEDIAN=VALUES(18,31,46,61,,75) }
STUB=:
18 – 30
31 – 45
46 – 60
OVER 60
DON’T KNOW }
ROW=: [13^1//5]
}
~EXECUTE
TABLE_SET= TAB217
Here is the table Mentor prints:
COLUMN MEDIAN ON A RANGE TYPE VARIABLE
TABLE 217
AGE OF RESPONDENT
GENDER
<———->
TOTAL MALE FEMALE
—– —- ——
TOTAL 100 58 42
100.0% 100.0% 100.0%
18 – 30 30 16 14
30.0% 27.6% 33.3%
31 – 45 17 7 10
17.0% 12.1% 23.8%
46 – 60 13 11 2
13.0% 19.0% 4.8%
OVER 60 20 10 10
20.0% 17.2% 23.8%
DON’T KNOW 20 14 6
20.0% 24.1% 14.3%
Median 39.82 43.86 37.00
Formula For Column Medians
This is the procedure the program uses to determine what the column median is. You can use it if you want to verify that the median that is printing on the table is the one you expect (i.e., you have properly assigned the values for the median calculation).
The formula for the interpolated median is:
S + ((D * (E – S)) / F)
where:
S is the starting point of the range of the row where the median exists
E is the ending point of the range of the row where the median exists
D is the difference that is left when you subtract out all the categories
before the row where the median exists.
F is the frequency of the row where the median exists
S, E, and F can all be determined once you know in which row the median exists. D can be determined in the process of finding out in which row the median exists.
To determine the row in which the median exists you must first sum up all the frequencies on the valid responses and divide that by 2. You then want to subtract the first valid frequency from the above total, and then subtract the second valid number from that, and so on until that subtraction would result in a number less than or equal to 0. The row that would cause this is the row where the median resides. The value you had before subtracting out this row is the value D above. So now you can calculate the median.
Here is an example to better explain this. Let us try to reproduce the median that the program produced in the total column for Table 217.
The pertinent numbers from that table are:
VALUES FROM
AGE FREQUENCY COLUMN_MEDIAN
— ——— ————-
18 – 30 30 18
31 – 45 17 31
46 – 60 13 46
OVER 60 20 60
75
The sum of the valid responses is 80 (30+17+13+20) and the midpoint is (80/2)= 40. Now you take 40 and subtract 30 from it and get 10. Now if you try to subtract 17 from 10 you will get -7 which is ⇐0 so the second row is the one where the median exists and 10 was the difference prior to the subtraction. So S=31, E=46, F=17, and D=10. Now substituting into the formula above:
S + (((D *(E – S)) / F)=
31 + (((10 *(46 – 31)) /17)=
31 + ((10 *15) /17)=
31 + (150 /17)=
31 + 8.82 = 39.82
NOTE: If you have an odd number of valid responses then N/2 will be a number with .5 in it and so will D.
Lost Medians
Calculating a median is a more complicated statistic for the computer to do because it must maintain the entire collection of values in an array. (Arrays are sometimes described as “the number of buckets to hold values.”) The default array size is 50, which is sometimes too low for Mentor to calculate a median. If the array size is too small and the median cannot be calculated, it is called a “lost median” and Mentor will print a question mark (?) where the median should be.
The three main causes of lost medians are as follows:
- There are too many values for Mentor to calculate the median. (For example, you are trying to get the median from a seven column field with hundreds of different values.)
- Over half of the values exist in either the highest or lowest value. (For example, over half of the values are zero.)
- The data is sorted by the variable you are trying to produce the median on. (This is NOT recommended! Try sorting the file on another variable.)
In each of these situations, you can increase your chances of recovering the lost medians by increase the array size when defining the median. (For example, $[MEAN, MEDIAN(100)] ). Setting the array size too high slows processing and may cause memory problems. (Conversely, if you know you have less than 50 values, you may set the array below 50 to speed processing.)
An example of a table with a lost median follows. Notice the ? in the median row in the columns where the median was lost. Also included is an example of the warning that will appear in your list file when the median is lost. The same table is run again with larger array size to allow Mentor to calculate the median.
TABLE_SET= {TAB219:
HEADER=: LOST MEDIANS }
TITLE=: NUMBER OF PRODUCTS USED }
STUB=:
1 – 1000
1001 – 10000
10001 – 20000
OVER 20000
DON’T KNOW
[STAT] MEAN
[STAT,STAT_DECIMALS=0] MEDIAN }
ROW=: [22.5#1-1000/1001-10000/10001-20000/20001-99999/DK] &
$[MEAN,MEDIAN] [22.5]
}
~EXECUTE
TABLE_SET= TAB219
Here is the table Mentor prints:
LOST MEDIANS
TABLE 219
NUMBER OF PRODUCTS USED
GENDER
<————>
TOTAL MALE FEMALE
—– —- ——
TOTAL 100 58 42
100.0% 100.0% 100.0%
1 – 1000 28 21 7
28.0% 36.2% 16.7%
1001 – 10000 11 6 5
11.0% 10.3% 11.9%
10001 – 20000 5 2 3
5.0% 3.4% 7.1%
OVER 20000 51 26 25
51.0% 44.8% 59.5%
DON’T KNOW 5 3 2
5.0% 5.2% 4.8%
MEAN 32913.46 28634.76 38796.68
MEDIAN ? ? 32232
Notice the question marks (?) that print in the median row for both the total column and the MALE column. This means that the program was unable to determine those medians.
Also the following message will print once in the list file for every median that is lost:
(ERROR #5070) for T219 number buckets is 50, try $[median(501) or [ptile.xx(501}]
**error** Randomizing the input file so that values are not consecutive may help.
**error** ~set median_cells= can be used to change the number of
**error** buckets for the entire run. If this is a common problem then
**error** this set option may be added to your mentinit file. Be aware
**error** however that increasing the number of buckets will cause runs
**error** with medians or ptfiles to take a bit longer.
The number 50 above is the current number of buckets that were used. The number 501 is just a guess by the program (1 plus 10 times the current setting). You can guarantee the median will be calculated if you set the number of buckets to the number of different categories that have values in the range. You can run a frequency count to determine this number.
And here is the example to recover the lost medians. Notice the medians are now correct for the total column and the MALE column.
~DEFINE
TABLE_SET= {TAB220:
HEADER=:CHANGING THE DEFAULT NUMBER OF BUCKETS TO RECOVER LOST MEDIANS }
ROW=: [22.5#1-1000/1001-10000/10001-20000/20001-99999/DK] &
$[MEAN,MEDIAN(200)] [22.5]
}
~EXECUTE
TABLE_SET= TAB220
Here is the printed table. Only the median row is printed for this table as the rest of the table would appear the same as Table 219.
CHANGING THE DEFAULT NUMBER OF BUCKETS TO RECOVER LOST MEDIANS
TABLE 220
NUMBER OF PRODUCTS USED
GENDER
<————>
TOTAL MALE FEMALE
—– —- ——
TOTAL 100 58 42
100.0% 100.0% 100.0%
MEDIAN 23849 12035 32232
PERCENTILES
Percentiles are values that exist in the data such that a certain percentage of data is below that value. A median is equivalent to the 50th percentile, since 50 percent of the values in the range are below it. Other percentiles that are often used are the 25th, 75th, and 90th percentiles. You define a percentile as follows: $[PERCENTILE=.NN], where NN is the percentile (percentage) you are looking for. Following is a sample table showing how to produce multiple percentiles on a single table.
~DEFINE
TABLE_SET= {TAB221:
HEADER=: PRODUCING PERCENTILES }
TITLE=: NUMBER OF PRODUCTS USED IN THE LAST MONTH }
STUB=:
1 – 25
26 – 50
51 – 75
OVER 75
DON’T KNOW
[STAT,STAT_DECIMALS=0] MINIMUM
[STAT,STAT_DECIMALS=0] 25TH PERCENTILE
[STAT,STAT_DECIMALS=0] 50TH PERCENTILE/MEDIAN
[STAT,STAT_DECIMALS=0] 75TH PERCENTILE
[STAT,STAT_DECIMALS=0] MAXIMUM }
ROW=: [25.2#1-25/26-50/51-75/76-99/” “] &
$[MINIMUM,PERCENTILE=.25,MEDIAN,PERCENTILE=.75,MAXIMUM] [25.2]
}
~EXECUTE
TABLE_SET= TAB221
Here is the printed table.
PRODUCING PERCENTILES
TABLE 221
NUMBER OF PRODUCTS USED IN THE LAST MONTH
GENDER
<————>
TOTAL MALE FEMALE
—– —- ——
TOTAL 100 58 42
100.0% 100.0% 100.0%
1 – 25 18 11 7
18.0% 19.0% 16.7%
26 – 50 31 16 15
31.0% 27.6% 35.7%
51 – 75 30 20 10
30.0% 34.5% 23.8%
OVER 75 16 8 8
16.0% 13.8% 19.0%
DON’T KNOW 5 3 2
5.0% 5.2% 4.8%
MINIMUM 1 1 3
25TH PERCENTILE 29 28 29
50TH PERCENTILE/ MEDIAN 49 51 45
75TH PERCENTILE 66 66 65
MAXIMUM 99 97 99
NOTE: Percentiles may be “lost” just as medians are and you can increase the array size in the same way as medians to recover those that are “lost”.
6.2.6 Mean Summary Tables
Mean summary tables are usually produced when you have a series of rating scales or a series of numeric type responses and you want to compare the means. Creating a mean summary table with no recoding is quite simple and can be done either by using the WITH joiner between each data location or by putting multiple data locations in the same set of brackets. If recoding of the mean is required then you will need to combine the recoding with either of the above approaches.
The first and most important thing to understand when doing a mean summary table is that once you have specified $[MEAN] all the categories after that will produce a mean until there is another $[ ] command (See 5.2 Axis Commands/Cross-Case Operations” and Appendix B: TILDE COMMANDS, ~DEFINE AXIS= for more explanation of $[ ]).
RATING SCALES WITH NO RECODING
A typical example of when a mean summary table would be used is when you have a series of rating scales. There are a number of different ways to construct the row variable definition for this type of table, but the easiest is to tell the program you are now creating means by using $[MEAN] and then following that with each location you wish to do the mean on in a single set of brackets. You can also simplify this if the locations are consecutive by using the ellipses.
In the following example the gender of the respondent has been stored in column 5 and the rating of 5 different brands has been stored in columns 6 through 10.
NOTE: The following set of commands define a standard front end for the next set of examples.
>PRINT_FILE MNSUM
~INPUT MNSUM
~SET AUTOMATIC_TABLES,DROP_LOCAL_EDIT, DROP_BASE,BEGIN_TABLE_NAME=T251
~DEFINE
STUB= STUBTOP1:
[SUPPRESS] TOTAL
[SUPPRESS] NO ANSWER }
TABLE_SET= {BAN1:
EDIT=: COLUMN_WIDTH=8,STUB_WIDTH=20,-COLUMN_TNA,STATISTICS_DECIMALS=2 }
STUB_PREFACE= STUBTOP1
BANNER=
| GENDER
| <———->
| TOTAL MALE FEMALE
| —– —- ——}
COLUMN=: TOTAL WITH [5^1/2]
}
~EXECUTE
TABLE_SET= BAN1
These commands are specific to this example:
~DEFINE
TABLE_SET= {TAB251:
HEADER=: STRAIGHT MEAN SUMMARY TABLE WITH NO RECODING}
TITLE=: MEAN OVERALL RATING FOR BRANDS}
LOCAL_EDIT=: STUB_EXTRA=[STAT]}
STUB=:
BRAND A
BRAND B
BRAND C
BRAND D
BRAND E}
ROW=: $[MEAN] [06,…,10]
}
~EXECUTE
TABLE_SET= TAB251
Here are some alternate ways to write the row variable:
ROW251A: $[MEAN] [6] $[MEAN] [7] $[MEAN] [8] $[MEAN] [9] $[MEAN] [10]
ROW251B: $[MEAN] [6] WITH [7] WITH [8] WITH [9] WITH [10]
ROW251C: $[MEAN] [6,7,8,9,10]
Here is the table Mentor prints:
STRAIGHT MEAN SUMMARY TABLE WITH NO RECODING
TABLE 251
MEAN OVERALL RATING FOR BRANDS
GENDER
<———->
TOTAL MALE FEMALE
—– —- ——
BRAND A 2.87 2.67 3.11
BRAND B 2.93 3.07 2.76
BRAND C 3.04 3.00 3.09
BRAND D 2.94 3.09 2.76
BRAND E 2.73 2.93 2.49
RATING SCALES WITH RECODING NEEDED
If the rating scales were coded so that some recoding is needed for the mean to be calculated you can combine the recoding process discussed in 6.2.1 Means on Rating Scales Using the Variable Definition with the processes discussed above. There are also some shortcuts you can use here to keep from having to explicitly define each mean. The four types of possible recoding as previously discussed are (A) Exclude a numeric Don’t Know code, (B) Reverse the scale, (C) Both A and B, and (D) Recode the zero punch as 10.
RATING SCALES WITH THE DON’T KNOW CODED AS A NUMERIC
If the Don’t Know response was coded as a numeric value you will need to exclude it from the calculation. If the same code was used for all the rating scales then you can just use the *RANGES modifier to exclude it from each mean. If different codes were used for different scales then you will have to exclude them individually (See ROW252A below). The example below assumes the same things as the example for TAB251, except that now the Don’t Know response was coded as a 5 for each rating scale. The TITLE, STUB, and LOCAL_EDIT are omitted here since they would be exactly the same as TAB251.
~DEFINE
TABLE_SET= {TAB252:
HEADER=: MEAN SUMMARY TABLE WITH DON’T KNOW CODED AS A NUMERIC}
ROW=: $[MEAN] [6,…,10*RANGES=1-4]
}
~EXECUTE
TABLE_SET= TAB252
Here are some alternate ways to write the row variable:
ROW252A: $[MEAN] [6*RANGES=1-4] WITH [7*RANGES=1-4] WITH &
[8*RANGES=1-4] WITH [9*RANGES=1-4] WITH [10*RANGES=1-4]
ROW252B: $[MEAN] ([6] WHEN [6^1-4]) WITH ([7] WHEN [7^1-4]) WITH &
([8] WHEN [8^1-4]) WITH ([9] WHEN [9^1-4]) WITH ([10] WHEN [10^1-4])
ROW252C: $[MEAN] [6,…,10] INTERSECT [6,…,10^1-4]
The printed table will look fundamentally the same as Table 251 above.
RATING SCALES WITH THE SCALE REVERSED
If the scale needs to be reversed you can use the subtraction method or the SUBSCRIPT function method (see 6.2.1 Means on Rating Scales Using the Variable Definition for more information about these methods). If you use the subtraction method you can use scaler/vector arithmetic to significantly reduce the specification writing needed.
Scaler/vector arithmetic works in the following manner. A scaler is a single numeric value (number, data location, etc.). If you join a scaler to a vector with any of the arithmetic joiners (+,-,*,/), then that arithmetic operation will be performed on every category in the vector. This allows you to multiply a series of fields by the same number or to reverse the scale on a series of fields. The example below assumes the same things as TAB251 except that the 5 point scale needs to be reversed. The TITLE, STUB, and LOCAL_EDIT have been omitted since they are exactly the same as TAB251.
~DEFINE
TABLE_SET= {TAB253:
HEADER=: MEAN SUMMARY TABLE WITH THE SCALE REVERSED}
ROW=: $[MEAN] 5 – [6,…,10]
}
~EXECUTE
TABLE_SET= TAB253
Here are some alternate ways to write the row variable:
ROW253A: $[MEAN] (5 – [6]) WITH (5 – [7]) WITH (5 – [8]) WITH &
(5 – [9]) WITH (5 – [10])
ROW253B: $[MEAN] &
>REPEAT $A=06,…,10; STRIP=”WITH &”
SUBSCRIPT([$A^4//1]) WITH &
>ENDREPEAT
The printed table will look fundamentally the same as Table 251 above.
RATING SCALES WITH THE SCALE REVERSED AND DON’T KNOW CODED AS A NUMERIC
If the scale needs to be reversed and there is a numeric Don’t Know code you can either combine methods from above or use the SUBSCRIPT function. In all cases where you use the SUBSCRIPT function on a mean summary table you need to individually define each mean and join them using the WITH joiner. This example assumes the same things as TAB251, except that the Don’t Know was coded as a 5, and the 4 point scale needs to be reversed. The TITLE, STUB, and LOCAL_EDIT have been omitted since they are exactly the same as TAB251.
~DEFINE
TABLE_SET= {TAB254:
HEADER=: MEAN SUMMARY TABLE WITH THE SCALE REVERSED AND DON’T KNOW CODED AS NUMERIC}
ROW=: $[MEAN] 5 – [06,…,10*RANGES=1-4]
}
~EXECUTE
TABLE_SET=TAB254
Here are some alternate ways to write the row variable:
ROW254B: $[MEAN} &
>REPEAT $A=06,…,10; STRIP=“WITH &”
SUBSCRIPT([$A^4//1]) WITH &
>ENDREPEAT
The printed table will look fundamentally the same as table 251 above.
RATING SCALES WITH 10 CODED AS A ZERO (0)
If you want to do a mean summary table on a series of ratings where the scale goes from 1 to 10 and 10 was coded as a 0, then you will want to use the SUBSCRIPT function to recode them and use the WITH joiner to combine all the different ratings. The TITLE, STUB, and LOCAL_EDIT have been omitted since they are exactly the same as TAB251. The TITLE, STUB, and LOCAL_EDIT have been omitted since they are exactly the same as TAB251.
~DEFINE
TABLE_SET= {TAB255:
HEADER=: MEAN SUMMARY TABLE WITH 10 CODED AS 0}
ROW=: $[MEAN] &
>REPEAT $A=06,…,10; STRIP=”WITH &”
SUBSCRIPT([$A^1//0]) WITH &
>END_REPEAT
}
~EXECUTE
TABLE_SET= TAB255
The printed table will look fundamentally the same as Table 251 above.
RANGE VARIABLES
To produce a mean summary table on range type variables you need to define each mean using the SELECT_VALUE function and connect them using the WITH joiner (See 6.2.2 Means For Range Type Variables for an explanation of how to use the SELECT_VALUE function to define each mean). The values chosen for the SELECT_VALUE should be the midpoint of each range in the variable. The example below assumes the ranges are 1 to 10, 11 to 20, 21 to 30, and 31 to 50. The TITLE, STUB, and LOCAL_EDIT have been omitted since they are exactly the same as TAB251.
~DEFINE
TABLE_SET= {TAB256:
HEADER=: MEAN SUMMARY TABLES OR RANGE VARIABLES}
ROW=: $[MEAN] &
>REPEAT $A=06,…,10; STRIP=”WITH &”
SELECT_VALUE([$A^1//4],VALUES(5.5,15.5,25.5,40.5)) WITH &
>ENDREPEAT
}
~EXECUTE
TABLE_SET= TAB256
The printed table will look fundamentally the same as Table 251 above.
NUMERIC DATA WITH THE DON’T KNOW CODED AS A NON-NUMERIC
Producing a mean summary table on numeric data consists of combining the production of a mean from numeric data (See 6.2.3 Means For Numeric Data) with the methods that were discussed for mean summary tables on rating scales. A mean summary table for a mean where the Don’t Know was coded as non-numeric and all the numeric values were coded as their existing values requires no recoding and looks very similar to the mean summary on a rating scale without any recoding.
In the example below, assume there are 6 brands and each respondent is asked how many times they have used the brand in the past year. The answers for Brand A through F are stored in consecutive 2 column fields starting with Brand A in columns 11 and 12. The Don’t Know was coded as DK.
~DEFINE
TABLE_SET= {TAB257:
HEADER=: MEAN SUMMARY TABLE OF NUMERIC FIELDS WITH DON’T KNOW CODED AS NON-NUMERIC}
TITLE=: AVERAGE NUMBER OF TIMES USED PRODUCT IN LAST YEAR}
LOCAL_EDIT=: STUB_EXTRA=[STAT]}
STUB=:
BRAND A
BRAND B
BRAND C
BRAND D
BRAND E
BRAND F}
ROW=: $[MEAN] [11.2,…,21]
}
~EXECUTE
TABLE_SET= TAB257
Here is the table Mentor prints:
MEAN SUMMARY TABLE OF NUMERIC FIELDS WITH DON’T KNOW CODED AS NON-NUMERIC
TABLE 257
AVERAGE NUMBER OF TIMES USED PRODUCT IN LAST YEAR
GENDER
<———->
TOTAL MALE FEMALE
—– —- ——
BRAND A 46.08 46.80 45.18
BRAND B 56.81 62.08 50.39
BRAND C 53.56 56.36 49.97
BRAND D 54.97 53.90 56.24
BRAND E 53.90 54.81 52.83
BRAND F 59.38 60.78 57.68
NUMERIC DATA WITH THE DON’T KNOW CODED AS NUMERIC
If the Don’t Know response is coded as a numeric value such as 99, then you must exclude this from the calculation. The easiest way to do this is to use the *RANGES modifier. The example below assumes the same things as TAB257, except here the Don’t Know was coded as 99 for all the questions. The TITLE, STUB, and LOCAL_EDIT have been omitted since they are exactly the same as TAB257.
~DEFINE
TABLE_SET= {TAB258:
HEADER=: MEAN SUMMARY TABLE OF NUMERIC FIELDS WITH DON’T KNOW CODED AS NUMERIC}
ROW=: $[MEAN] [11.2,…,21*RANGES=0-98]
}
~EXECUTE
TABLE_SET= TAB258
The printed table will look fundamentally the same as Table 257 above.
NUMERIC DATA WITH A NUMERIC VALUE CODED AS A NON-NUMERIC CODE
When producing a mean summary table on a set of numeric questions where a non-numeric code like “XX”, was used to mean a numeric value like 100, you must use one of the methods discussed in 6.2.2 Means For Range Type Variables to recode the mean and then use the WITH joiner to combine them all. The TITLE, STUB, and LOCAL_EDIT have been omitted since they are exactly the same as TAB257.
~DEFINE
TABLE_SET= {TAB259:
HEADER=: MEAN SUMMARY TABLE OF NUMERIC FIELDS WITH “XX” CODED AS 100}
ROW=: $[MEAN] &
>REPEAT $A=11,13,…,21; STRIP=”WITH &”
([$A.2] OTHERWISE SELECT_VALUE([$A.2#”–“],VALUES(100))) WITH &
>ENDREPEAT
}
~EXECUTE
TABLE_SET= TAB259
Here is an alternate way to write the row variable:
ROW258A: $[MEAN] &
>REPEAT $A=11,13,…,21; STRIP=”WITH &”
SUM([$A.2],SELECT_VALUE([$A.2#”–“],VALUES(100))) WITH&
>ENDREPEAT
The printed table will look fundamentally the same as Table 257 above.
USING THE “BY” JOINER
A mean summary table using BY is one in which both the banner and the stub are normal demographic type variables, but the entire table is reporting the mean of some other variable. For instance, you are crossing AGE by GENDER, but the entire table is reporting the mean on the overall rating of a product. A given cell tells you the mean for the age group of that gender. To produce this table you need to define the mean for the rating as you normally would and then use the BY joiner to break that mean out by all the age categories. See the example below for the exact syntax.
In the example below, the age question is stored in columns 11 and 12, and the overall rating of product A is in column 6 (no recoding needed). If recoding was needed you would just use whatever recoding method was needed before applying the BY joiner.
~DEFINE
TABLE_SET= {TAB260:
HEAD=: USING BY TO CREATE MEAN SUMMARY TABLE}
TITLE=: AGE BY MEAN FOR OVERALL RATING OF PRODUCT A}
LOCAL_EDIT=: STUB_EXTRA=[STAT]}
STUB=:
UNDER 18
18 – 30
31 – 45
46 – 60
OVER 60
REFUSED
NO ANSWER}
ROW=: $[MEAN] [6] BY [11.2#0-17/18-30/31-45/46-60/61-99/”RF”/”–“]
}
~EXECUTE
TABLE_SET= TAB260
Here is the printed table.
USING BY TO CREATE MEAN SUMMARY TABLE
TABLE 260
AGE BY MEAN FOR OVERALL RATING OF PRODUCT A
GENDER
<———->
TOTAL MALE FEMALE
—– —- ——
UNDER 18 2.96 2.92 3.00
18 – 30 3.17 2.60 3.57
31 – 45 2.33 2.57 2.00
46 – 60 3.00 2.86 3.25
OVER 60 2.77 2.50 3.15
REFUSED 3.50 3.50 3.50
NO ANSWER 2.83 2.00 3.67
6.2.7 Means Scattered Throughout The Table
You may want to produce summary statistics (means) interspersed in the table with other frequency type numbers. In order to do this you must specify any time you want to switch from producing frequencies to means and vice-versa. As we have seen throughout this section, to switch from frequencies to means you need to use the keyword $[MEAN], but to switch from means to frequencies you will need to use the keyword $[ ]. The empty set of brackets tells the program that you want to go back to the default mode of producing whatever that category returns (usually a frequency, but sometimes a number). A good example of when you need to intersperse means and frequencies in a single table is a summary table where you are printing both the top box and the mean for each of several brands. In order to produce this table you will need to alternate between producing a frequency (top box) and a summary statistic (mean).
The example below is a combination of Table 102 and Table 251 in this chapter. The following set of commands define a standard front end for the next set of examples.
>PRINT_FILE MENIN
~INPUT MENIN
~SET AUTOMATIC_TABLES,DROP_LOCAL_EDIT, DROP_BASE,BEGIN_TABLE_NAME=T261
~DEFINE
STUB= STUBTOP1:
[SUPPRESS] TOTAL
[SUPPRESS] NO ANSWER }
TABLE_SET= {BAN2:
EDIT=: COLUMN_WIDTH=8,STUB_WIDTH=20,-COLUMN_TNA,STATISTICS_DEC IMALS=2 }
STUB_PREFACE= STUBTOP1
BANNER=:
| GENDER
| <———->
| TOTAL MALE FEMALE
| —– —- ——}
COLUMN=: TOTAL WITH [5^1/2]
}
~EXECUTE
TABLE_SET= BAN2
These commands are exclusive to this example.
~DEFINE
TABLE_SET= {TAB261:
HEADER=: EXAMPLE OF A TABLE WITH MEANS AND FREQUENCIES INTERMIXED}
TITLE=: TOPBOX AND MEAN SUMMARY TABLE}
TITLE_4=: BASE= RESPONDENTS WHO USED THE BRAND}
STUB=:
>REPEAT $A=A,B,C,D,E
[COMMENT,UNDERLINE] BRAND $A
[VERTICAL_PERCENT=*,SUPPRESS] PERCENT BASE FOR BRAND $A (DOESN’T PRINT)
[STUB_INDENT=2,-FREQUENCY] TOPBOX PERCENTAGE
[STUB_INDENT=2,STAT] MEAN
>END_REPEAT
}
ROW=: &
>REPEAT $A=07,…,11; STRIP=”$[] &”
[$A^1-4/4] $[MEAN] [$A] $[] &
>END_REPEAT
}
~EXECUTE
TABLE_SET= TAB261
An alternate way to write the row definition:
ROW261: [7^1-4/4] $[MEAN] [7] $[] &
[8^1-4/4] $[MEAN] [8] $[] &
[9^1-4/4] $[MEAN] [9] $[] &
[10^1-4/4] $[MEAN] [10] $[] &
[11^1-4/4] $[MEAN] [11]
Here is the printed table.
EXAMPLE OF A TABLE WITH MEANS AND FREQUENCIES INTERMIXED
TABLE 261
TOPBOX AND MEAN SUMMARY TABLE
BASE= RESPONDENTS WHO USED THE BRAND
GENDER
<———->
TOTAL MALE FEMALE
—– —- ——
BRAND A
——-
TOPBOX PERCENTAGE 28.4% 27.5% 29.3%
MEAN 2.70 2.70 2.71
BRAND B
——-
TOPBOX PERCENTAGE 30.8% 29.3% 32.4%
MEAN 2.62 2.68 2.54
BRAND C
——-
TOPBOX PERCENTAGE 26.3% 14.6% 38.5%
MEAN 2.66 2.49 2.85
BRAND D
——-
TOPBOX PERCENTAGE 22.2% 20.5% 24.3%
MEAN 2.35 2.30 2.41
BRAND E
——-
TOPBOX PERCENTAGE 15.8% 16.3% 15.2%
MEAN 2.29 2.28 2.30
MEAN/FREQUENCY SUMMARY TABLE
Another common type of mean summary table prints the number of respondents who make up each mean value, along with the mean value. This is sometimes referred to as a mean/frequency summary table. If you use the syntax in the next example the number generated in the frequency row will be the number of respondents who have a valid number in that data location. You can combine this with any of the recoding discussed above.
~DEFINE
TABLE_SET= TAB262:
HEADER=: EXAMPLE OF A SUMMARY TABLE OF MEANS AND EACH MEANS FREQUENCY }
TITLE=: MEAN OVERALL RATING FOR BRANDS}
LOCAL_EDIT=: -VERTICAL_PERCENT }
STUB=:
|BRAND A – FREQUENCY
[STAT,SKIP_LINES=0 ] | MEAN
|BRAND B – FREQUENCY
[STAT,SKIP_LINES=0 ] | MEAN
|BRAND C – FREQUENCY
[STAT,SKIP_LINES=0 ] | MEAN
|BRAND D – FREQUENCY
[STAT,SKIP_LINES=0 ] | MEAN
|BRAND E – FREQUENCY
[STAT,SKIP_LINES=0 ] | MEAN }
ROW=: $[FREQUENCY,MEAN] [07,…,11]
}
~EXECUTE
TABLE_SET= TAB262
Here is the printed table:
EXAMPLE OF A SUMMARY TABLE OF MEANS AND EACH MEANS FREQUENCY
TABLE 262
MEAN OVERALL RATING FOR BRANDS
GENDER
<———->
TOTAL MALE FEMALE
—– —- ——
BRAND A – FREQUENCY 81 40 41
MEAN 2.70 2.70 2.71
BRAND B – FREQUENCY 78 41 37
MEAN 2.62 2.68 2.54
BRAND C – FREQUENCY 80 41 39
MEAN 2.66 2.49 2.85
BRAND D – FREQUENCY 81 44 37
MEAN 2.35 2.30 2.41
BRAND E – FREQUENCY 76 43 33
MEAN 2.29 2.28 2.30
NOTE: To print the Mean before the frequency, reverse the order of the keywords MEAN and FREQUENCY inside the $[ ] and also flip the order of the stub definitions so that the text for the Mean row is first.
6.2.8 Summary Statistics with Arithmetic
You may sometimes need to perform some arithmetic operation on the mean before displaying it. This usually entails moving the decimal point to the right or to the left, or adding, subtracting, multiplying or dividing two or more fields. To move the decimal point to the left you can either divide the mean by the appropriate amount (10 or 100 or 1000) or you can use the *F modifier. When using a data location to define a numeric field you can put the modifier *F<number> right before the close bracket. The <number> says how many implied decimals to read that field with. In other words if 1028 is stored in columns 11 through 14 and you reference that field as [11.4*F2], the Mentor program will read that number as 10.28 (2 implied decimal places). To move the decimal place to the right you need to multiply the variable by the appropriate amount. If you wish to perform any arithmetic operation, just use the appropriate symbol (+,-,*,/,**).
In the following example frequencies, sums, means, and standard deviations are produced for a number of fields. First compare the statistics for the stubs [11.4] and [11.4*F2] noting that the frequency is the same, but the other numbers are all 100 times smaller. Second, compare the stubs [15.4] and 3*[15.4] again noting that the frequency is the same, but the other values are all 3 times higher. Then note that the stub [19.4] has only one respondent with a valid answer, so that the standard deviation is missing and the mean is even missing under the males column, because there are no males who have a valid answer.
NOTE: In the example below the mean of the stub [11.4] + [15.4] is equal to the mean of the stub [11.4 ] + the mean of the stub [15.4]. This will only occur when everyone who has a valid answer in one of those fields has a valid answer in both.
~DEFINE
TABLE_SET= {TAB263:
TITLE=:SUMMARY STATISTICS WITH ARITHMETIC }
LOCAL_EDIT=: -VERTICAL_PERCENT }
STUB=:
>REPEAT
$A=”[11.4]”,”[11.4*F2]”,”[15.4]”,”3*[15.4]”,”[19.4]”,”[11.4]+[15.4]”
[COMMENT,UNDERLINE] STATS FOR $A
[STUB_INDENT=2,SKIP_LINES=0] FREQUENCY
[STUB_INDENT=2,SKIP_LINES=0] SUM
[STAT,STUB_INDENT=2,SKIP_LINES=0] MEAN
[STAT,STUB_INDENT=2,SKIP_LINES=0] STD DEV
>END_REPEAT
}
ROW=: $[FREQUENCY,SUM,MEAN,STD] [11.4] WITH [11.4*F2] WITH [15.4] WITH &
(3*[15.4]) WITH [19.4] WITH ([11.4]+[15.4])
}
~EXECUTE
TABLE_SET= TAB263
Here is the printed table:
TABLE 263
SUMMARY STATISTICS WITH ARITHMETIC
GENDER
<————>
TOTAL MALE FEMALE
—– —- ——
STATS FOR [11.4]
—————-
FREQUENCY 110 65 45
SUM 307009 174141 132868
MEAN 2790.99 2679.09 2952.62
STD DEV 3210.92 3004.85 3516.00
STATS FOR [11.4*F2]
——————-
FREQUENCY 110 65 45
SUM 3070 1741 1329
MEAN 27.91 26.79 29.53
STD DEV 32.11 30.05 35.16
STATS FOR [15.4]
—————-
FREQUENCY 110 65 45
SUM 387726 263648 124078
MEAN 3524.78 4056.12 2757.29
STD DEV 3001.15 3236.99 2460.11
STATS FOR 3*[15.4]
——————
FREQUENCY 110 65 45
SUM 1163178 790944 372234
MEAN 10574.35 12168.37 8271.87
STD DEV 9003.44 9710.97 7380.33
STATS FOR [19.4]
—————-
FREQUENCY 1 – 1
SUM 945 – 945
MEAN 945.00 ? 945.00
STD DEV ? ? ?
STATS FOR
[11.4]+[15.4]
————-
FREQUENCY 110 65 45
SUM 694735 437789 256946
MEAN 6315.77 6735.22 5709.91
STD DEV 4384.49 4531.66 4137.25
6.3 WEIGHTED TABLES
A weighted table is one in which each case is multiplied by some appropriate factor so that it carries a higher or lower weight in the sample. This is usually done when you want the distribution in the tables across some demographic group to reflect the universal distribution rather than your sample distribution. A typical example would be as follows: You’re interviewing in some city and need to contact 110 respondents, but in this city females are much easier to contact than males, so you end up contacting 70 females and only 40 males. You know that in this city half the people are male and half are female, so for purposes of the tables you want to down weight the 70 females so it looks like there are only 55. To do this you would assign a weight of .786 to each (.786 X 70 = 55.02). You also want to up weight the 40 males to become 55. You would assign them a weight of 1.375 (40 X 1.375 = 55).
If the weighted total will be equal to the sample total, then you can calculate the weight by doing the following: take the universe for the group (expressed as a percentage or number) and divide it by the sample (percentage or number).
targeted # (or %)
Weight = —————–
actual # (or %)
In our example above, that would give us 55/40 (or 50/36 if using percentages) = 1.375 for the male group.
Sometimes when you weight responses you want to weight the numbers up so that they actually reflect the numbers in the universe. In the example above, suppose the city we were calling had 10,000 residents. Then you would want the 70 females in the sample to look like 5000 (10000 * 50%) and you would want the 40 males to look like 5000 also. The weights in this case would be 71.43 for females and 125 for males. In this case, use the same formula, but use numbers rather than percents. The target value is now 5000, not 55, for males, and our weight is 5000/40=125.
In order to do weighting, you need to either store this weight value somewhere in the data record or create an expression that holds the value. This is usually done with the SELECT function, although it can be done several other ways also. You can use the file GENWT.SPX located in the Mentor subdirectory to not only calculate the weights for you based on the target percentages, but also store them in each data record for future use.
You can either weight the entire table to affect every cell in the table, or you can weight either the column or row variable to apply different weights (including none) to different categories in the variable. Once you have determined the weights for the table and how the table is to be weighted you can use any of the table elements WEIGHT, COLUMN_WEIGHT, COLUMN_SHORT_WEIGHT, ROW_WEIGHT, or ROW_SHORT_WEIGHT to apply the weight to the table.
The WEIGHT element weights the entire table, while the others allow you to create cells with different weights. The WEIGHT element cannot be used in conjunction with any of the others, but you can do both column weighting and row weighting simultaneously.
NOTE:
- Weighted statistics are calculated properly, except for the EDIT options COLUMN_STD, COLUMN_SE, COLUMN_VARIANCE, ROW_STD, ROW_SE, and ROW_VARIANCE.
- When doing weighting, frequency counts may not add back to the total, and percentages may not add up to 100% due to rounding
- If data cases are not assigned a weight because either the weight field has a missing value of the SELECT function does not account for it, you will get the following warning message in the compile:
(WARN #8873) tables with MISSING table weights.
If you get this warning, it is likely there is an error in the weight definition and the data cases with a missing weight will be dropped from the table.
6.3.1 Weighting with Weight Value already Stored in the Data
It is quite simple to weight an entire table if the weight has already been stored in the data. All you need to do is use the table element WEIGHT= and equate it to the location where the weight is stored. If the weight is stored with implied decimals you will need to use the *F modifier. See 6.2.8 Summary Statistics with Arithmetic for more information on the *F modifier.
In the example below it is assumed that a weight value has previously been stored in columns 7-10 of the data file. The pertinent part of this example is the WEIGHT= line inside TABLE_SET TAB301. This weight value could have come from a generation done in a previous Mentor run. See 6.3.4 Storing the Weight in the Data to see how to store the weight in the data.
NOTE: The following set of commands define a standard front end for the all the examples in this section, except where noted.
>PRINT_FILE WGHT
~INPUT WGHT,ALLOW_UPDATE
~SET DROP_LOCAL_EDIT,DROP_BASE,BEGIN_TABLE_NAME=T301
~DEFINE
STUB= STUBTOP1:
WEIGHTED TOTAL
[SUPPRESS] WEIGHTED NO ANSWER }
TABLE_SET= {BAN1:
EDIT=: COLUMN_WIDTH=8,STUB_WIDTH=20,-COLUMN_TNA,STATISTICS_DECIMALS=2 }
STUB_PREFACE= STUBTOP1
BANNER=:
| GENDER
| <———->
| TOTAL MALE FEMALE
| —– —- —— }
COLUMN=: TOTAL WITH [5^1/2]
}
TABLE_SET= {TAB301:
TITLE=: OVERALL RATING OF PRODUCT A}
STUB=:
EXCELLENT (4)
GOOD (3)
FAIR (2)
POOR (1)
DON’T KNOW
[STAT] MEAN }
ROW=: [6^4//1/5] $[MEAN] [6*RANGES=1-4]
}
~EXECUTE
TABLE_SET= BAN1
TABLE_SET= ROW1
And here is our example:
~DEFINE
TABLE_SET= {TAB301:
HEADER=: WEIGHTED TABLE USING PREVIOUSLY STORED WEIGHT}
WEIGHT=: [7.4]
STORE_TABLES=*
}
~EXECUTE
TABLE_SET= TAB301
Here is the table that is printed.
WEIGHTED TABLE USING PREVIOUSLY STORED WEIGHT
TABLE 301
OVERALL RATING OF PRODUCT A
GENDER
<———->
TOTAL MALE FEMALE
—– —- ——
WEIGHTED TOTAL 100 55 45
100.0% 100.0% 100.0%
EXCELLENT (4) 17 6 11
17.0% 10.4% 25.2%
GOOD (3) 20 12 8
20.1% 22.0% 17.8%
FAIR (2) 31 20 11
31.3% 36.4% 25.0%
POOR (1) 14 9 5
13.8% 16.1% 10.9%
DON’T KNOW 18 8 9
17.8% 15.1% 21.1%
MEAN 2.49 2.31 2.73
Notice that ~SET AUTOMATIC_TABLES was not specified. This option looks for a ROW= to trigger the making of a table. In this example, we wanted to specify the ROW= separately, so we left off the AUTOMATIC_TABLES and put a STORE_TABLES=* in the TABLE_SET.
Also, notice in the above table that the DON’T KNOW row has a value of 18 for the Total column and values of 8 and 9 for the Male and Female columns. Since the data is weighted the actual values in those cells are 17.8, 8.4, and 9.4 respectively. Each of these values round up to produce numbers that do not seem to add up.
6.3.2 Weighting using the SELECT Function
If the weight has not been previously stored in the data you can directly create a variable that contains the weights by using the SELECT_VALUE function. In the following example the respondent’s city is stored in column 11 which is the variable being used to assign the weights. City 1 will have a weight factor of .86, City 2 a weight of 0.66, City 3 a weight of 1.39, and City 4 a weight of 1.67 (See 9.3.2 Functions, Number Returning Functions for detailed information on the SELECT_VALUE function). The only difference between this example and the previous one is the WEIGHT= table element.
~DEFINE
TABLE_SET= {TAB302:
HEADER=: WEIGHTED TABLE USING THE SELECT FUNCTION}
WEIGHT=: SELECT_VALUE([11^1//4],VALUES(.86,.66,1.39,1.67))
STORE_TABLES=*
}
The printed table will look fundamentally the same as Table 301 above.
6.3.3 Printing Both a Weighted and an Unweighted Total Row
You may want to print both a weighted and unweighted total row, so that you can easily tell both what the percentage base was for a particular column and the actual number of respondents in that column. The easiest way to do this is to use the SET option UNWEIGHTED_TOP in conjunction with a special STUB_PREFACE. The UNWEIGHTED_TOP option causes the program to create two additional summary rows at the top of the table, the unweighted total and the unweighted no answer.
The example below is the same as that for Table 301, except for the SET option and the different STUB_PREFACE. In the STUB_PREFACE notice that the vertical percentage has been turned off on the unweighted total because it makes no sense on this row.
~DEFINE
STUB= STUBTOP2:
[-VERTICAL_PERCENT] UNWEIGHTED TOTAL
[SUPPRESS] UNWEIGHTED NO ANSWER
WEIGHTED TOTAL
[SUPPRESS] WEIGHTED NO ANSWER }
TABLE_SET= {TAB303:
HEADER=: WEIGHTED TABLE WITH UNWEIGHTED TOTAL ROW }
SET UNWEIGHTED_TOP
STUB_PREFACE= STUBTOP2
STORE_TABLES=*
}
~EXECUTE
TABLE_SET= TAB303
Here is the table Mentor prints:
WEIGHTED TABLE WITH UNWEIGHTED TOTAL ROW
TABLE 303
OVERALL RATING OF PRODUCT A
GENDER
<———->
TOTAL MALE FEMALE
—– —- ——
UNWEIGHTED TOTAL 100 56 44
WEIGHTED TOTAL 100 55 45
100.0% 100.0% 100.0%
EXCELLENT (4) 17 6 11
17.0% 10.4% 25.2%
GOOD (3) 20 12 8
20.1% 22.0% 17.8%
FAIR (2) 31 20 11
31.3% 36.4% 25.0%
POOR (1) 14 9 5
13.8% 16.1% 10.9%
DON’T KNOW 18 8 9
17.8% 15.1% 21.1%
MEAN 2.49 2.31 2.73
If you wish to print the Unweighted Any Response row in addition to, or instead of the Unweighted Total row, you need to use the keyword PRINT_ROW=UAR in your STUB_PREFACE. If you only wanted to print the Unweighted and Weighted Any Response Rows, the STUB_PREFACE would look like the following:
STUB_PREFACE=:
[SUPPRESS] UNWEIGHTED TOTAL
[SUPPRESS] UNWEIGHTED NO ANSWER
[PRINT_ROW=UAR, -VERTICAL_PERCENT] UNWEIGHTED ANY RESPONSE
[SUPPRESS] WEIGHTED TOTAL
[SUPPRESS] WEIGHTED NO ANSWER
[PRINT_ROW=AR] WEIGHTED ANY RESPONSE }
6.3.4 Storing the Weight in the Data
If the weight variable is not already stored in the data, you may want to store it there for ease of future reference. Once the weights are stored in the data it is a very easy process to weight the tables. You may want to do this if you are going to be running multiple runs all with the same weighting scheme and you don’t want the program to have to recalculate the weights each time.
To store the weights in the data you will need to define and execute a procedure. Inside the procedure you will want to use the SELECT_VALUE function to store the various weights in the data. You can either explicitly put the decimal point in the data or you can store the weight as a whole number and then read the field as having implied decimals using the *F modifier. In the example below the MODIFY command is storing the values without a decimal point, while the PRINT_DATA command is storing it with a decimal point. See TAB304 for how to reference the data from the MODIFY command and see TAB305 for the data from the PRINT_DATA command. On an actual table run you would use one or the other.
~DEFINE
PROCEDURE= {GENWT1:
MODIFY [51.3] = SELECT_VALUE([11^1//4],VALUES(86,66,139,167))
PRINT_DATA [61.4] “\4.2F” SELECT_VALUE([11^1//4],VALUES(.86,.66,1.39,1.67))
}
~EXECUTE
PROCEDURE= GENWT1
The weight statement will look like this if you used the MODIFY command above:
~DEFINE
TABLE_SET= {TAB304:
HEADER=: WEIGHTED TABLE USING A WEIGHT STORED WITH AN IMPLIED DECIMAL POINT}
WEIGHT=: [51.3*F2]
STORE_TABLES=* }
~EXECUTE
TABLE_SET= TAB304
The printed table will look basically the same as Table 301 above.
The weight statement will look like this if you used the PRINT_DATA command above:
~DEFINE
TABLE_SET= {TAB305:
HEADER=: WEIGHTED TABLE USING A WEIGHT STORED WITH AN EXPLICIT DECIMAL POINT}
WEIGHT=: [61.4]
STORE_TABLES=* }
~EXECUTE
TABLE_SET= TAB305
The printed table will look fundamentally the same as Table 301 above.
6.3.5 Assigning Different Weights to Different Banner Points
Sometimes when producing weighted tables, you will need to apply different weighting factors to different banner points. For example, you might want some of the banner points weighted by the sex variable and others weighted by the city variable. An even more common occurrence is that you want some of the banner points weighted and others unweighted. An unweighted column is just a column where the weight for everyone in that column is one.
To produce a table with different weights across the banner, you will need to use the COLUMN_WEIGHT table element to set the weights. This element needs to have the same number of categories as the number of categories in your banner definition, if you do not, you will get an appropriate error message. Since most banners have a fair amount of banner points, it is usually a good idea to predefine any weight variables you need to use and then just reference them by name on the COLUMN_WEIGHT statement.
In the example below the banner is broken into two parts for each of the original banner points, one based on one weight variable, and the second on a different weight variable. Since the column variable has six categories in it, then the COLUMN_WEIGHT variable must also have six categories in it. Notice in Table 306 how the numbers and frequencies change between the two columns with the different weights.
NOTE: All system-generated columns will be unweighted when COLUMN_WEIGHT or COLUMN_SHORT_WEIGHT are specified.
~DEFINE
WGHT1: SELECT([11^1//4],VALUES(.86,.66,1.39,1.67))
WGHT2: SELECT([12^1//4],VALUES(.89,.59,1.92,1.47))
TABLE_SET= {BAN2:
EDIT=: COLUMN_WIDTH=8,STUB_WIDTH=20,-COLUMN_TNA,STATISTICS_DECIMALS=2 }
STUB_PREFACE= STUBTOP1
BANNER=:
| GENDER
| <===========================>
| WGHT1 WGHT2 WGHT1 WGHT2 WHGT1 WGHT2
| TOTAL TOTAL MALE MALE FEMALE FEMALE
| —– —— —– —– —— ——}
COLUMN_WEIGHT=: WGHT1 WITH WGHT2 WITH WGHT1 WITH WGHT2 WITH WGHT1 WITH WGHT2
COLUMN=: TOTAL WITH TOTAL WITH [5^1/1/2/2]
}
TABLE_SET= {TAB306:
HEADER=: WEIGHTED TABLE USING DIFFERENT WEIGHTS ON DIFFERENT BANNER POINTS}
STORE_TABLES=*
}
~EXECUTE
TABLE_SET= BAN2
TABLE_SET= TAB306
Here is the table that is printed.
WEIGHTED TABLE USING DIFFERENT WEIGHTS ON DIFFERENT BANNER POINTS
TABLE 306
OVERALL RATING OF PRODUCT A
GENDER
<===========================>
WGHT1 WGHT2 WGHT1 WGHT2 WHGT1 WGHT2
TOTAL TOTAL MALE MALE FEMALE FEMALE
—– —– —– —– —— ——
WEIGHTED TOTAL 100 100 55 58 45 41
100.0% 100.0% 100.0% 100.0% 100.0% 100.0%
EXCELLENT (4) 17 23 6 9 11 15
17.0% 23.5% 10.4% 14.7% 25.2% 35.9%
GOOD (3) 20 18 12 11 8 6
20.1% 17.9% 22.0% 19.5% 17.8% 15.7%
FAIR (2) 31 28 20 20 11 8
31.3% 28.4% 36.4% 34.3% 25.0% 20.3%
POOR (1) 14 14 9 9 5 4
13.8% 13.8% 16.1% 16.0% 10.9% 10.7%
DON’T KNOW 18 16 8 9 9 7
17.8% 16.3% 15.1% 15.5% 21.1% 17.5%
MEAN 2.49 2.61 2.31 2.39 2.73 2.93
6.3.6 Printing Both a Weighted and an Unweighted Total Column
If you need to print both a weighted and an unweighted total column you can easily do this by using the COLUMN_SHORT_WEIGHT table element. This option is very similar to the COLUMN_WEIGHT option in that it allows you to assign different weights to different columns in the banner, but it also allows you to define more categories in your column variable than in your weight variable and it just uses the last weight variable for all the additional categories. This means if your first banner point is an unweighted total followed by the rest of the banner points weighted, you can just define a two category weight variable. In order to create an unweighted category in a weight variable you need to assign it the keyword TOTAL. So in general, your weight variable will look like TOTAL WITH WEIGHTNAME. Notice that the column definition starts with TOTAL WITH TOTAL to create both the unweighted and weighted total columns.
~DEFINE
TABLE_SET= {BAN3:
EDIT=: COLUMN_WIDTH=8,STUB_WIDTH=20,-COLUMN_TNA, STATISTICS_DECIMALS=2 }
BANNER=:
| GENDER
| UNWGHT WGHT <———->
| TOTAL TOTAL MALE FEMALE
| —— —– —- ——}
COLUMN=: TOTAL WITH TOTAL WITH [5^1/2]
COLUMN_SHORT_WEIGHT=: TOTAL WITH WGHT1
}
TABLE_SET= {TAB307:
HEADER=: WEIGHTED TABLE WITH UNWEIGHTED AND WEIGHTED TOTAL COLUMNS}
STORE_TABLES=*
}
~EXECUTE
TABLE_SET= BAN3
TABLE_SET= TAB307
Here is the table that is printed.
WEIGHTED TABLE WITH UNWEIGHTED AND WEIGHTED TOTAL COLUMNS
TABLE 307
OVERALL RATING OF PRODUCT A
GENDER
UNWGHT WGHT <———->
TOTAL TOTAL MALE FEMALE
—— —– —- ——
WEIGHTED TOTAL 100 100 55 45
100.0% 100.0% 100.0% 100.0%
EXCELLENT (4) 21 17 6 11
21.0% 17.0% 10.4% 25.2%
GOOD (3) 19 20 12 8
19.0% 20.1% 22.0% 17.8%
FAIR (2) 30 31 20 11
30.0% 31.3% 36.4% 25.0%
POOR (1) 14 14 9 5
14.0% 13.8% 16.1% 10.9%
DON’T KNOW 16 18 8 9
16.0% 17.8% 15.1% 21.1%
MEAN 2.56 2.49 2.31 2.73
6.3.7 Assigning Different Weights To Different Rows
The process for weighting a table by the row variable is exactly the same as for weighting by the column variable, except you will want to use the table elements ROW_WEIGHT and ROW_SHORT_WEIGHT instead of COLUMN_WEIGHT and COLUMN_SHORT_WEIGHT. See “6.3.5 Assigning Different Weights to Different Banner Points” and 6.3.6 Printing Both a Weighted and an Unweighted Total Column for the process of weighting by the column variable.
If you want to create a row in the middle of the table that is unweighted, you can also do this by using the $[RAW_COUNT] keyword. This will cause all categories defined after it to be unweighted. In the following example, suppose you wanted to created both a weighted and unweighted Don’t Know row on a table. You would want to define the weighted Don’t Know row as you normally do and then define it again, after you have specified the $[RAW_COUNT] keyword.
~DEFINE
TABLE_SET= {TAB308:
TITLE=: OVERALL RATING OF PRODUCT A}
STUB=:
EXCELLENT (4)
GOOD (3)
FAIR (2)
POOR (1)
DON’T KNOW
[-VERTICAL_PERCENT] UNWEIGHTED DK
[STAT] MEAN }
ROW=: [6^4//1/5] $[RAW_COUNT] [6^5] $[MEAN] [6*RANGES=1-4]
}
TABLE_SET= {TAB308:
HEADER=: USING THE RAWCOUNT OPTION TO PRODUCE WEIGHTED AND UNWEIGHTED ROWS }
STORE_TABLES=* }
~EXECUTE
TABLE_SET= BAN1
TABLE_SET= TAB308
Here is the table that is printed.
USING THE RAWCOUNT OPTION TO PRODUCE WEIGHTED AND UNWEIGHTED ROWS
TABLE 308
OVERALL RATING OF PRODUCT A
GENDER
<———->
TOTAL MALE FEMALE
—– —- ——
WEIGHTED TOTAL 100 55 45
100.0% 100.0% 100.0%
EXCELLENT (4) 21 6 11
21.0% 10.4% 25.2%
GOOD (3) 19 12 8
19.0% 22.0% 17.8%
FAIR (2) 30 20 11
30.0% 36.4% 25.0%
POOR (1) 14 9 5
14.0% 16.1% 10.9%
DON’T KNOW 16 8 9
16.0% 15.1% 21.1%
UNWEIGHTED DK 16 8 8
MEAN 2.56 2.31 2.73
6.3.8 WEIGHTING USING MULTIPLE FACTORS
Sometimes, when tables are weighted, multiple factors are used. For instance, you might want to weight the table by both GENDER and AGE variables. There are three different approaches to accomplishing this.
The first and easiest approach is to just multiply the two weights together. The problem with doing this is that it does not take into account how the different weights will affect each other. The second approach is actually to assign weights to each cross section of the two variables. For instance, in the above example, you would have to assign a weight for Males under the age of 25, Females under the age of 25, and so on. If you do not know the cross-sectional universe percentages, you can estimate them by multiplying the two target percentages that make up the cross-section. The last approach is to use a procedure called “sample balancing.” You need to use this if you are weighting by enough factors such that a respondent does not exist in each and every cross-section.