Example of how to calculate and store weights for weighted data in tables
Attached is a spec GENERATEWEIGHTS.SPX that shows how to calculate the weights for various cases and then puts those weights into the data records.
~Comment This program can be used to calculate the weights for various cases and then puts those weights into the data records. Although this will work for a single weighting factor, it is designed to deal with multiple factors and uses a process often referred to as "sample balancing". This causes the final percentages for each factor to be unaffected by the weights for all the other factors. This is accomplished by re-coding the data by group within factor into a temporary data set and using that to calculates weights based on those factors. You must know the data definitions and target percentages for each category/group in each weighting factor before making the changes to this spec. A weighting factor is typically a demographic of some type like age, gender, or income. Each factor is made up of some number of categories/groups/break outs. For instance, gender would always be 2, Age is often 3 (young, middle aged, and old), but could be any value. The proc "recode" does the actual recoding, and must be adjusted for each study. This is where you specify the specific description of each group. Recoded data and initial weight factor is stored in the file generateweights.asc. The proc "check_recode" checks that no respondents are missing group assignments for any of the factors. This proc must also be modified for each study. The weights are written to a work file (generateweights.asc) and then merged back into the original datafile. NOTE: If you have DUPLICATE CASE IDS this will not work!!!! Although the spec can be modified to run off any unique identifier <= 30 characters wide in the data set. *** FOLLOWING DEFINES MUST ALL BE MODIFIED TO MATCH YOUR STUDY ***>Define @Input Generateweights.Dat ''Input Data file name>Define @InputOpt ASCII=110 ''Input File options including Select>Define @Output Generateweights ''Output Data file name>Define @OutputOpt ASCII ''Output file options >Define @IDLoc 1 ''Case ID location>Define @IdWidth 4 ''Width of ID variable>Define @Weightloc 51.10 ''Location to store weight in data file (must be 10 wide)>Define @WeightDec 5 ''Number of decimals to save in the weight>Define @Numfactors 3 ''Number of weighting factors>Define @MaxGroupsPerFactor 25 ''Maximum number of groups in a single factor (absolute max is 97)The following defines the actual data that makes up each weighting factor. You must have the same number of categories/groups in each expression below as you have in the factor percentages set up in the next set of defines below. In the example below the first factor has 2 categories/groups which are defined as code 1 and 2 in column 5. The 2nd factor has 4 categories/groups which are defined as 1 through 4 in column 6. The 3rd factor is slightly different in that its codes are NOT sequential. There are 5 categories/groups in the weighting factor which are defined as codes 1, 2, 3, 5, and 8 in column 7. You need to make sure you have set the proper number of factors. Comment out the ones you don't need if you have fewer factors, and uncomment and modify if you have more factors. >Define @Factor1 [5#1/2]>Define @Factor2 [6#1//5]>Define @Factor3 [7#1/2/3/5/8]''>Define @Factor4''>Define @Factor5''>Define @Factor6''>Define @Factor7''>Define @Factor8''>Define @Factor9 Set the desired weighted percents by factor. All values() must add to 100% or the run will be terminated below. The number of values in each factor below must EXACTLY match the number of categories/groups set in the corresponding define above.For instance, in the example below factor 1 has 2 categories/groupings. The first one has a desired percent of 60 and the second one is 40 percent. Factor 2 has 4 categories/groups with desired percents of 10.4, 45.3, 34.2, and 10.1. Factor 3 has 5 categories/groups with desired percents of 24.3, 25.9, 22.8, 18.5, and 8.5 respectively. >Define @Factor1Per 59.8,40.2>Define @Factor2Per 10.4,45.3,34.2,9.1,1.0>Define @Factor3Per 24.3,25.9,22.8,18.5,8.5>Define @Factor4Per 0 '' not used unless have 4 factors>Define @Factor5Per 0 '' not used unless have 5 factors>Define @Factor6Per 0 '' not used unless have 6 factors>Define @Factor7Per 0 '' not used unless have 7 factors>Define @Factor8Per 0 '' not used unless have 8 factors>Define @Factor9Per 0 '' not used unless have 9 factors Set desired weighted total or comment this out if you want to weight back to original total''>Define @Wt_Total 1000 Uncomment and modify this if using variable names from a.db file''>Usedb <dbfilename> Set this to 0 to balance to the nearest whole percent, set it to 1 to balance to the nearest 10th, and 2 to balance to the nearest 100th. Please be aware that balancing to the nearest 10th or 100th causes more iterations in the procedure to be needed, and increasing the probability that the program may not be able calculate appropriate values. If you set this to 2, you may need to increase the value of maxiter below.>Define @Decimal 2 Don't touch the next set of >If statements.>If @Decimal = 0>Define @CompValue 0.45>Endif>If @Decimal = 1>Define @CompValue 0.045>Endif>If @Decimal = 2>Define @CompValue 0.0045>Endif Maximum iterations trying to determine weight. Set this too high and program can get into a loop before you see the problem. Set it too low and it might stop before it is able to calculate the weights. The more decimal significance you are looking for in the above define, the larger you need this to be set to. If the procedure hasn't be able to calculate the weights by 50 iterations (and certainly by 100) it likely will never be able to find appropriate values.>Define @Maxiter 50 Comment out the following define if you want the program to continue if there are errors in the recoding. Only do this if you know what you are doing.>Define @StopIfError Comment out the following define if you do NOT want to run the test tables at the end to check the percents. See the file Generateweights.rpt for the test tables.>Define @RunTestTabs ''--------------------------------------------------------------''*** DON'T TOUCH ANYTHING BEYOND THIS LINE!!! ***''-------------------------------------------------------------- >AllowIndent>-FillDefinesInQuotes>PurgeSame>-PrintRep>EchoDefines>DbSizes= 0 0, 0 0, 0 0, 20000 0 Q, 0 0 ''To make localDB dup=quiet >Repeat $A=02,...,97; $B=04,...,99>if @MaxGroupsPerFactor = $A>Define @MaxGroups $B>Endif>EndRepeat~Define'' These are where it stores factor values (1-n)'' see Generateweights.asc>Repeat $A=1,...,9; $B=31,33,...,47F$A: [$B.2]>EndRepeat ID: [@IDLoc~.@IDwidth$] '' Location of respondent idNewID: [1.@IDwidth$] '' Location of ID in temporary fileWt: [61.10*zd@WeightDec] '' Temp location to hold respondent's weight Proc= {Recode: If FirstcaseChoosefile "Outpt"NextChoosefile "Orig"EndifBlank Outpt![1.80]Modify Outpt![f1] = Subscript(@Factor1)>IfDefine @Factor2Modify Outpt![f2] = Subscript(@Factor2)>Endif>IfDefine @Factor3Modify Outpt![f3] = Subscript(@Factor3)>Endif>IfDefine @Factor4Modify Outpt![f4] = Subscript(@Factor4)>Endif>IfDefine @Factor5Modify Outpt![f5] = Subscript(@Factor5)>Endif>IfDefine @Factor6Modify Outpt![f6] = Subscript(@Factor6)>Endif>IfDefine @Factor7Modify Outpt![f7] = Subscript(@Factor7)>Endif>IfDefine @Factor8Modify Outpt![f8] = Subscript(@Factor8)>Endif>IfDefine @Factor9Modify Outpt![f9] = Subscript(@Factor9)>EndifModify Outpt!Wt = 1 '' Everyone gets a weight of one to start.Print "s31gs" Id Outpt![31.40$]}Vf1: Values(@Factor1Per)Vf2: Values(@Factor2Per)Vf3: Values(@Factor3Per)Vf4: Values(@Factor4Per)Vf5: Values(@Factor5Per)Vf6: Values(@Factor6Per)Vf7: Values(@Factor7Per)Vf8: Values(@Factor8Per)Vf9: Values(@Factor9Per)Flag: [91#1]Count: [!93.8]'' A lot of trouble with weighting comes from dirty data, this checks for errors in the recoding/weighting scheme.Proc={Check_Recode:Modify Count += 1>Rep $A=1,...,@NumfactorsIf F$a > 0 ElseError "Factor $a is missing." F$AModify Flag = TrueEndif>EndRepIf Wt = 1 ElseError "Wt wasn't set to 1." WTModify Flag = TrueEndif>IfDefine @StopIfErrorIf LastCase and FlagHalt "Error in this Run!"Endif>Endif>IfDefine @Wt_Total>ElseIf LastCaseModify [1.80$] = ">Define @Wt_Total " Join [Count$]WriteEndif>Endif}~Input @Input Study=Orig NumBuf=2>IfDefine @InputOpt@InputOpt>Endif~Input $ NewBuf Length=80 Study=Outpt>Printfile $Generateweights.Asc Page_Width=80 -Formfeed~Exc Proc=Recode On Orig>Printfile '' Close Generateweights.asc so it can be used as input.~In~Input Generateweights.Asc ASCII=80:1.@IdWidth Total_Length=100>IfDefine @WT_Total>Else~Output Generateweights.Tmp ASCII length=40>Endif~Exc Proc= Check_Recode~Out ;>IfDefine @Wt_Total>Else&Generateweights.Tmp>Delete Generateweights.Tmp>Endif~CommentThis step reads the recoded data and collects information from it needed to do the sample balancing. Specifically it collects the number of factors and the number of groups within each of these factors.~Set Table_Drop_Mode=3~DefNumfactors[$R=Info 1 By T]Numgroups[$R=Info 1 By Na]Gf1[$R=Info 1 By 1]Gf2[$R=Info 1 By 2]Gf3[$R=Info 1 By 3]Gf4[$R=Info 1 By 4]Gf5[$R=Info 1 By 5]Gf6[$R=Info 1 By 6]Gf7[$R=Info 1 By 7]Gf8[$R=Info 1 By 8]Gf9[$R=Info 1 By 9]Proc={Get_Info:If first_case Then Create Info(=3, =@MaxGroups)=0 Endif'' Get highest group number in each factor>Repeat $F=1,...,9If F$F > Gf$F ThenModify Gf$F = F$FEndif>EndRepeatExeofPrint "NSample balancing information" &&"N============================">Repeat $F=1,...,9If Gf$F > 0 ThenModify Numfactors += 1Endif>EndRepeatPrint "Number of factors = s" Numfactors>Repeat $F=1,...,9If Numfactors >= $F ThenPrint "Number of groups in factor $F = s" Gf$FModify Numgroups += Gf$FEndif>EndRepeatPrint "Total number of groups = sn" Numgroups}FirstTime: [$R=Info T by T] '' Counts iterationsTmpwt: [$R=Weight Na by Na]Proc={Generateweights:If Firstcase ThenIf FirstTime = 1 ThenCreate Goals(=11, =@MaxGroups)=0>Repeat $G=1,...,9If $G <= NumFactors ThenModify [$$R=Goals $G ; 1 to Gf$G]=vf$GIf Not sum([$$R=Goals $G ; 1 to Gf$G]) = 100 ThenDo_Meta ">Define @Alldone"Error "Goals for factor $G not adding to 100%. Adds to s NN" &&"**********************************************************************************N" &&"Check your assigned percentages for each factor to make sure they add to 100%.N" &&"However, if you have an ERROR #5475 just above, that looks like this:N" &&"ERROR #5475) VEC with N items doesn't match table(1,M)N" &&"then it is very likely you are trying to assign a weight to a cateogry that hasN" &&"no data cases in it. This creates an impossible weighting situation. Run frequencyn" &&"counts on all the factors to check you have no categories that are empty.N" &&"**********************************************************************************" &sum([$$R=Goals $G ; 1 to Gf$G])Halt "NGoals for factor $G not adding to 100%N"EndifEndif>EndRepeatCreate Weight(=11, =@MaxGroups)=1Create Freq(=11, =@MaxGroups)=0Create Percent(=11, =@MaxGroups)=0Create HowClose(=11, =@MaxGroups)=0ElseModify [$R=Freq]=0Modify [$R=Percent]=0Modify [$R=HowClose]=0EndifEndif'' Calculate respondent's average weight.Modify tmpwt=0>Repeat $W=1,...,9If $W <= NumFactors ThenModify tmpwt += [$$R=Weight $W ; F$W]Endif>EndRepeatModify Wt = (wt * tmpwt) / NumFactors'' Accumulate Generateweights frequencysModify [$R=Freq T by T] += Wt>Repeat $F=1,...,9If $F <= NumFactors ThenModify [$$R=Freq $F ; F$F] += WtEndif>EndRepeatExeofModify [$R=Freq T by T]= @Wt_totalModify [$R=Percent All By All] = ([$R=Freq] * 100) / [$R=Freq T by T]>Repeat $Z=1,...,@NumFactorsModify [$$R=Percent T By All] += [$$R=Percent $Z by All]>EndRepeatModify [$R=HowClose] = [$R=Goals] - [$R=Percent]If Max([$R=HowClose 1 To Last by 1 To Last]) < @CompValue And &Min([$R=HowClose 1 To Last by 1 To Last]) > -@CompValue ThenPrint "Weighting is complete."Print "Balancing this sample took s iterations." FirstTimeDometa ">Define @Alldone"ElseModify [$R=Weight] = [$R=Goals] / [$R=Percent]'' say [$R=Freq] [$R=Percent] [$R=HowClose] [$R=Weight]EndifPrint_table [$R=Percent]If FirstTime = @maxiter ThenError "Maximum specified number of iterations reached!"Do_meta ">Define @HitMaxIterations"Do_meta ">Define @Alldone"EndifModify FirstTime += 1Store [$R=Goals]Store [$R=Weight]Store [$R=Freq]Store [$R=Percent]Store [$R=HowClose]}Tabset={Wttable:Edit={: -Vper -Coltna Colwid=6 FDec=@Decimal MinFreq=0.01 Extra_Stubs_Ok }banner={: