Languages & Encoding

this section explains how to program in multiple languages and how encoding is used.

Encodings and character sets

What is a character set? Let’s distinguish between the concept of character set and character encoding.

A character set is a set that is used for a particular purpose, e.g. the set of chars that support western european languages in computers or the set of Japanese characters learned at school. A coded character set is a set of characters for which there is a one to one correspondence between each character and a number; these numbers are called code points. Examples of coded character sets are Shift_JIS (Japanese); Big5 (Chinese for Taiwan Multi-byte set); ISO_8859-1:1987 (latin1); UTF-8; UTF-16. The last two encodings are Unicode ones. Unicode’s effort is to try and have a /universal/ character set, so that there is a single definition of all the characters used in computers. Unicode unifies and supersedes all the existing character sets.

Up until CfMC version 7.7, you could not mix and match languages (character sets) at will in a study; the rule of thumb would be that you would have a study with languages that have alphabets (so you’d set character_set to extended ascii in the study header), a study with multi-byte languages (multi-byte in the character_set) and a study with Japanese (shift_jis); English could go with any of the above sets, since the english characters are a subset of all the three sets above. From version 8.1 onwards, CfMC also uses the UTF-8 character_set and the practical result is that you can have any mix of languages you want in a single study – or, to be more precise, in a single qff.

You would expect to see a one to one correspondence between the position of the character in the character set and the computer representation of the character. For example, the ISO_8859-1:1987 coded character set always represents the letter A in the 65th position and the computer encoding is done using one byte of value 65. For Unicode (and also the non extended-ascii coded character sets), things are more complicated as there is not such a link. For example, the letter à has always the same code point (255) but is represented by two bytes; moreover Unicode uses three different types of encodings: UTF-8, UTF-16, UTF-32 and the same character can be represented differently in each encoding. CfMC only uses UTF-8 which uses two, three or four bytes for each character.

~prepare and the study header

Here we check the validity of the characters for you. For every character in a question text or response list item, Mentor checks whether a byte sequence is correct. The character sets recognized by Mentor are five: ascii, extended_ascii, multi-byte, shift_jis and utf-8. Extended_ascii is the default.

Encodings, browser side

If you are running a web job, it is not enough to choose the right study header options, as we have also to take care of the web layer properly. When a page is sent to the browser from the CfMC cgi program, the proper character encoding has to be specified. We inform the browser using the html meta charset declaration, for example:

<meta http-equiv="Content-type" content="text/html;charset=UTF-8" /> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 
<meta http-equiv="Content-Type" content="text/html; charset=Big5" />

The meta tag should be placed in the html head block of the page and usually is placed in either pagetop.tmpl or header.tmpl.

How to save from a Word document file

Translations in MS Word format must be saved to text format to be compiled by Mentor. The “save as” Word menu usually presents a list of possible encodings to choose from. In our example, we have a Chinese traditional MS Word document <save_from_MS/Traditional Chinese.doc>. When saving to text, we can choose from three traditional chinese encodings, for example Big5 <save_from_MS/Traditional Chinese Big5.txt> (this file will not display correctly in your browser, you will have to manually select the character encoding to Big5). I have then to choose font MingLiu and Big5 encoding to display appropriately the ideograms in the text editor I use. We can also choose UTF-8 <save_from_MS/Traditional Chinese UTF-8.txt> and usually there is no need to change font in the text editor. You could then use this text file as a qpx and build the syntax around it or do a copy and paste job from the saved text file to the qpx (being careful not to break any ideograms). Alternatively, you could choose to use external language files and use the saved text file as one of the external language files.

How to save from an Excel document file

Saving from a Japanese Excel document <save_from_MS/Japanese sample.xls> is conceptually the same process as saving from a Word doc, although slightly more convoluted (at least in my “old” version of Excel). Excel does not present a list of encodings for text files when you want to “save as”, so you choose Unicode text when saving, producing a Unicode tab delimited <save_from_MS/Japanese sample unicode text.txt> text file (no specification of actual encoding). If you open this file in Word you can “save as” choosing the encoding you need: either Shift_Jis <save_from_MS/Japanese sample shift_jis.txt> or UTF-8 <save_from_MS/Japanese sample UTF-8 text.txt>.

Apache and the AddDefaultCharset directive

Even if we explicitly specify the encoding in header.tmpl or pagetop.tmpl, the Apache Web server can still override its value. The file httpd.conf has an entry called AddDefaultCharset. Its values can be: On|Off|charset. This should override any charset specified in the body of the response via a META element, though the exact behavior is often dependent on the user’s client configuration. A setting of AddDefaultCharset Off disables this functionality. AddDefaultCharset On enables a default charset of iso-8859-1. In general, if we run multi-language studies, we want to have the AddDefaultCharset parameter set to off.

You can find more information on this here: <http://httpd.apache.org/docs/2.0/mod/core.html#adddefaultcharset>.

Survent syntax

I will be using two web studies to show a few features that are being discussed on this document.

Please note that in both the Japanese and Chinese versions of the studies the questionnaire and tmpl files text is not meant to be a translation from the English, I just copied some random text from the japanese and chinese google pages, just to prove the concepts. Apologies to the Japanese and Chinese speakers.

The webCati study is called wcex and can be found here </wcex/index.html>. The webSurvent study is called wsexam and can be found here: </wsexam/index.html>. I will also point to the relevant sections of the two qpx when appropriate. wcex’s qpx is here: <./wcex/wcex.qpx> and these are the language files:

english <./wcex/langen>
japanese <./wcex/langja>
chinese <./wcex/langzh>

wsexam’s qpx is here: <./wsexam/wsexam.qpx> and these are the language files:

english <./wsexam/langen>
japanese <./wsexam/langja>
chinese <./wsexam/langzh>

The study header language statement

An example of study header for a multi language study is:

[study,,language=(set=(en,fr) check_for_missing_languages)]

The syntax for the language header keyword is :

 LANGUAGE=(SET=(x=lang1,y=lang2,z=langn) CHARACTER_SET=name SPEAKING=langn DEFAULT_LANGUAGE=langx CHECK_FOR_MISSING_LANGUAGES)

Only “set=” is required. The legal values for set are the language codes specified in the msgfile. The languages being used can have 2 character codes or can have the 2 character codes converted to a 1 character code for ease of coding, for instance:

LANGUAGE=(SET=(e=en f=fr s=sp))

This says to use English, French, and Spanish, and allow coding of them as 1 character codes in the coding of the questionnaire (eg. \le, \lf,\ls).

The “SPEAKING=” parameter sets the starting language; if you don’t specify it, it defaults to the first language on the list.

The “DEFAULT_LANGUAGE” is the language used if you do not specify \L on some text; it defaults to the first language on the “SET=()” list. If you set default language to “” it will display unmarked text in all languages. If you set the default_language to ”!!” you get all text including the “\Lxx” part of the text (for debugging problems). “CHECK_FOR_MISSING_LANGUAGES” makes sure all languages are specified on each question that has any language specified. The default is not to check. “\L” accounts for all languages, unless there are other \Ls in the text, in which case there must be one for each language in the “set=”. For example:

[study,,language=(set=(en,fr) check_for_missing_languages)] 
{Q1: 
\L**This is fine. 
!disp} 
{Q2: 
\L**And so is this.\LenEnglish\LfrFrench. 
!disp}
{Q3: 
\L**But not this.\LenEnglish. 
!disp}

As said previously, the “CHARACTER_SET” parameter can take any of the ascii, extended_ascii, multi-byte, shift_jis and utf-8 values.

Note: there is currently a bug (mt3937) with setups where you keep all languages together for recode lists, where CHECK_FOR_MISSING_LANGUAGES errors on compile even if you have all languages specified; the current getaround is to have -CHECK_FOR_MISSING_LANGUAGES in the study header.

The \L language control specification

We have already seen that in order to control languages in the text part of a question we use \L and the language code. A language stays in effect until another \L code is recognized. We have two different ways to specify language in the code lists. You may use “\Lxx” in the response list item text to change to language xx. However, if you have many languages or many codes, or you wish to send the entire list to a translator, it is easier to keep all the codes together for each language.

First approach:

!FLD
1 \len English text A\lfr French text A\lsp Spanish text A
2 \len English text B\lfr French text B\lsp Spanish text B

Second approach:

!FLD
[\len]
1 English text A
2 English text B
[\lfr]
1 French text A
2 French text B
[\lsp]
1 Spanish text A
2 Spanish text B

Although the first way contains less text, it is more difficult to deal with when translating because you have to copy and paste each line to its proper position on the response list. Using the second way, you can send the response lists out to separate translators and just insert the response list for each language into the appropriate place on the list. We will see this approach is suited to use external language files.

Using external language files

If you use the external files approach you can go ahead and test your survey logic just using one language while the translators work on the questionnaire. You can add the languages to the survey simply recompiling the qpx and change one >define.

Let’s see how we can bring in external files into the qpx. In general, you will have as many files as languages. What we want to achieve is to include a file but only reference a section of it. You can think that there is a one to one correspondence between a section in the external file and a question text or code list.

The syntax to reference a region of an include file is:

& ('start/'end)

The ampersand must be the first character of a new line. Markers within the single parenthesis point to the beginning and end of a section of text in the include file. In the following example, the include file contains only section (‘q1b/’q1e):

&languageXX('q1b/'q1e) ''include file for multibyte

So if you have a !fld question:

{Q1: 
Question text
!FLD
1 Response list item 1
2 Response list item 2
3 Response list item 3
}

We can locate two different sections: the question text and the code list. We could then have an external “language” include file called “incen” that looks like this:

'q1s
Question text
'q1m
1 Response list item 1
2 Response list item 2
3 Response list item 3
'q1e

We can delimit the two sections using the q1s, q1m and q1e delimiters. If we were using just one language and this external file, the scripted question would look like:

{Q1:
&incen('q1s/'q1m)
!FLD
[\Len]
&incen('q1m/'q1e)
}

For example, we could have four languages and the external language files would be called incen,incfr, incge, incit:

{Q1: 
\LEn 
&incen('q1s/'q1m) 
\LFR 
&incfr('q1s/'q1m) 
\LGE 
&incge('q1s/'q1m)
\LIT 
&incit('q1s/'q1m)
\LCT
&incct('q1s/'q1m)
\LJA
&incja('q1s/'q1m) 
!fld 
[\Len] 
&incde(q1m/'q1e) 
[\LFR] 
&incfr(q1m/'q1e) 
[\LGE] 
&incge(q1m/'q1e) 
[\LIT] 
&incit(q1m/'q1e) 
[\LCT] 
&incct(q1m/'q1e) 
[\LJA] 
&incja(q1m/'q1e) 
}

At this point, the repetitive code can be placed inside a ”>repeat” structure; we can also use a define for the language codes:

>def @lang en,fr,ge,it,ct,ja
{Q1:
>repeat $a=@lang 
\L$a 
&inc$a('q1s/'q1m) 
>endrep 
!fld 
>repeat $a=@lang
[\L$a] 
&inc$a(q1m/'q1e) 
>endrep 
}

A note on displaying exception text on numeric questions

One way to display the “Don’t Know” and “Refused” text in the !num questions exception boxes is shown here: <./wsexam/wsexam.qpx#!num> in wsexam.qpx.

A note on escape sequences

Some multi-lingual web studies make extensive use of Numeric Character References (NCR) or “entities”. An example of a character and its corresponding NCR and entity is:

Character	description	        entity	Decimal NCR
ä	        small a, umlaut mark	&auml;	&#228;

Such escape sequences should be used only in exceptional circumstances, not as a matter of practice, as the file size increases and it is difficult to read and mantain extended ASCII or non ASCII text. In general, use of escape sequences as a matter of practice should be restricted to ”<”, ”>” and ”&”.

For example, this paragraph in Czech:

Jako efektivnější se nám jeví pořádání tzv. Road Show prostřednictvím našich autorizovaných dealerů v Čechách a na Moravě, které proběhnou v průběhu září a října.

expands to this if we use a NCR for every extended ascii character:

Jako efektivnĕjší se nám jeví pořádání tzv. Road Show prostřednictvím našich autorizovanǽch dealerů v Čechách a na Moravě, které proběhnou v průbůhu zá ří a října.

Managing error messages

In general, you can edit the existing CfMC error messages, either produced by Survent or a javascript function. Version 7.7 only supports error messages in the extended ascii, multi-byte or shift_jis encodings. Version 8.1 supports both 7.7’s encodings and utf-8. Version 8.2+ will only support utf-8 error messages.

Managing the msgfile

The msgfile is a central repository for the error messages. Any error message that is changed there will be shared by all the studies and processes in that environment. The msgfile is a binary file, so it cannot edited directly. We edit a text file called msgfile.raw instead and then we use the “makemsg” utility that reads msgfile.raw and builds msgfile. Version 7.7’s msgfile.raw contains lines in extended ascii, multi-byte and shift_jis encondings. Version 8.1’s msgfile.raw is shipped in the same format as 7.7’s but can be easily converted to utf-8 as we provide the makemsg.pl utility to do so. The utility is called makemsg.pl. If you call it thus:

  mkmsg.pl utf8

the program will read in the non utf-8 file msgfile.raw and output a utf-8 msgfile.raw and a utf-8 binary msgfile, which you can then copy to ${CFMC}control. The utf-8 msgfile.raw will have these lines:

  zh0010 zh multibyte chinese_simplified 
  de0010 de extended_ascii german
  dk0010 dk extended_ascii denmark
  fr0010 fr extended_ascii french
  it0010 it extended_ascii italian
  pt0010 pt extended_ascii portguese
  sp0010 sp extended_ascii spanish
  sv0010 sv extended_ascii swedish
  ct0010 ct multibyte chinese_traditional
  ja0010 ja shift_jis japanese
  ko0010 ko multibyte korean
  0010   -- ascii standard_english

automatically converted to:

  ct0010 ct utf8 chinese_traditional
  de0010 de utf8 german
  dk0010 dk utf8 denmark
  fr0010 fr utf8 french
  it0010 it utf8 italian
  ja0010 ja utf8 japanese
  ko0010 ko utf8 korean
  pt0010 pt utf8 portugese
  sp0010 sp utf8 spanish
  sv0010 sv utf8 swedish
  zh0010 zh utf8 chinese_simplified
  0010   -- ascii standard_english

If you want to add new language codes you will need to add them in the section above. In general, we advise to run all existing 7.7 jobs in a 7.7 or 8.1 environment with a non utf-8 msgfile. If you need to move a 7.7 job to a 8.1 utf-8 environment, then it is much easier to convert the qpx and all language files (if any) to utf-8 with the linux/unix utility /iconv/ <http://www.gnu.org/software/libiconv/documentation/libiconv/iconv.1.html> than to build a msgfile with utf-8 and non utf-8 error messages. An example of running iconv is:

  iconv -f iso_8859-1 -t utf-8 -o utf8en.raw  langen.raw

The above will convert the iso_8859-1 file langen.raw to the utf-8 file utf8en.raw.

If you wish to create an all encompassing msgfile.raw with all encodings, you can use your editor of choice, Mentor, perl or other programming language to build a msgfile.raw that contains messages in all encodings. You will find out, however, that you might run into a problem with the choice of the language codes if you want to stick to the “official” ones.

!error_msg vs rebuilding the msgfile

If you need to control Survent error messages on a study basis, then you can take advantage of the !error_msg compiler directive. In general the command allows you to change the text of an error message in the qpx as many times as you wish. The syntax is:

    {!error_msg #### error message text}

The command can be issued multiple times in the qpx. One thing to notice about the command is that once it is executed is stays in place even if you back up over it (this is by design since if an error message text has been changed due to a language change we do not want to revert to the original language when backing up). When we want to handle Survent error messages with multiple languages, the syntax is:

  {!error_msg 4377
  ct4377 答案個數太多(只接受 %d 個答案)
  de4377 zu viele Antworte (nur %d anerkannt)
  dk4377 For mange svar (kun %d er tilladt)
  fr4377 trop de réponses(%d autorisées)
  it4377 troppe risposte (consentita/e solo %d)
  ja4377 回答が多すぎます（ %d つのみ）
  ko4377 너무 많은 답(%d개의 답만 허용)
  pt4377 respostas demais (somente %d é permitido)
  sp4377 Demasiadas respuetas  (Solamente %d permitido)
  sv4377 för många svar (endast %d tillåts)
  zh4377 湘偶跺杅怮嗣(硐諉忳 %d 跺湘偶)
  4377 too many responses (Only %d allowed)  }

The wcex example has the command !error_msg here <./wcex/wcex.qpx#err_msg>. The structure of the error message must follow the structure of its counterpart in msgfile.raw including the %s %d, etc., which can go anywhere within the message as long as there are an equal number from the original message. For example, 3 % in the original must have the 3 % in the new message. The character limit for this compiler directive is 5000/2000 total in 8.1/7.7, inclusive of system variables such as %s and %d.

If you wish to add/change error messages on a shop-wide basis, then it is more efficient to rebuild the message file. You can either add them one by one to msgfile.raw or create a file with all the messages for a language and then merge it into msgfile.raw. If you choose the latter option, you can use the mknewmsg.spx <error_msg/mknewmsg.spx> spec which reads two CfMC “raw” msgfiles and merge the files such that the “new” file adds to and/or replaces lines in the “old” file. You just need to keep the messages in a separate file, like for example msgnew.raw <error_msg/msgnew.raw>.

Setting and changing the interviewing language

The interviewing language can be changed at any time during the course of the interview, but it is common practice not to give a webSurvent respondent the ability to do so, instead setting the language from the index page.

Setting the language from the index in webSurvent

Adding the “language” hidden input tag to the “cfmclogin” form of a web study gives the ability to set the language from the index page. This is the preferred method for webSurvent. The tag looks like:

  <input type="hidden" name="LANGUAGE" value="fr" />

If you are running a 7.7 study, you might need an extra hidden tag to point to the right qff. If, for example, you have a study with multi-byte and extended ascii languages, you cannot have the same qff handle all the languages. The idea here is to share all the files across the study except the qff ones. If you put the “qffname” hidden tag in the index page, you will be able to point to the correct qff. For example, you might have an index page that starts a session with language “fr” and points to the qff “doceu”:

  <input type="hidden" name="CFMC" value="/cfmc/test7.7/"/>
  <input type="hidden" name="CFMCCFG" value="/cfmc/test7.7/ipcfiles/"/>
  <input type="hidden" name="STUDY_DIR" value="/cfmc/test7.7/websurv/studies/danitest/doctor/"/>
  <input type="hidden" name="TMPLS_DIR" value="/cfmc/test7.7/websurv/tmpl/WS_default/"/>
  <input type="hidden" name="USE_PASSWDS" value="NO"/>
  <input type="hidden" name="STUDYCODE" value="doctor"/> 
  <input type="hidden" name="MAILTO" value="xyz@cfmc.com"/>
  <input type="hidden" name="LANGUAGE" value="fr"/>
  <input type="hidden" name="QFFNAME" value="doceu"/>

and then another index page that sets the language to “ct” and points to the multi-byte qff docmu:

 <input type="hidden" name="CFMC" value="/cfmc/test7.7/"/>
 <input type="hidden" name="CFMCCFG" value="/cfmc/test7.7/ipcfiles/"/>
 <input type="hidden" name="STUDY_DIR" value="/cfmc/test7.7/websurv/studies/danitest/doctor/"/>
 <input type="hidden" name="TMPLS_DIR" value="/cfmc/test7.7/websurv/tmpl/WS_default/"/>
 <input type="hidden" name="USE_PASSWDS" value="NO"/>
 <input type="hidden" name="STUDYCODE" value="doctor"/> 
 <input type="hidden" name="MAILTO" value="xyz@cfmc.com"/>
 <input type="hidden" name="LANGUAGE" value="ct"/>
 <input type="hidden" name="QFFNAME" value="docmu"/>

The language and qffname tags should also be placed in the “cfmclogin” form of suspend.tmpl, so that when you resume the language variable is passed back into the study. You can pass the value of the language to suspend.tmpl with a !html_define in the qpx, like in the wsexam study here <./wsexam/wsexam.qpx#htmldef_currlang> and suspend.tmpl here <./wsexam/suspend.tmpl> where I used @currentlang to set the language hidden input tag. The “cfmclogin” form on suspend.tmpl is in fact equivalent to the one on the index page.

If you use the 7.7 approach, the Survent command to put in the qpx to handle the qff name is the header option “qffname=”:

    [doctor,32000,qffname=doceu,language=(set=(en,fr) character_set=extendeded_ascii)] 
    [doctor,32000,qffname=docmu,language=(set=(ct,ko) character_set=multi-byte)]

You can send respondents direct links to the index pages or you can direct them to an entry point where they can choose the language they prefer. If you choose the latter, you can have a simple javascript applied to every anchor link that re-directs them to the chosen index page, carrying over the info you sent with the hotlink, populating the user name and password fields.

For example, if you are placing a password in the hotlink you send out with your emails:

  http://www.mysite.com/doctor/index.html?password=123

you can have an onclick javascript even handler on the anchor tags fire the “append” function and carry over the password to the webSurvent index.

The anchor tags and the javacript will be on the entry point file:

    <A HREF=http://www.mysite.com/doctor/indexen.html onClick="append(this)">English</A>
    <A HREF=http://www.mysite.com/doctore/indexfr.html onClick="append(this)">French</A>
    
    <script language="javascript">
      function append(thislink) {
        var loc = window.location.href;
        var locsp = loc.split("?");
        var passout = "?" + locsp[1];
        thislink.href = thislink + passout;
      }
    </script>

The wsexam study shows this approach here: Try this link </wsexam/index.html?password=testxyz> and then choose a language to conduct the interview in.

Changing the language in webCati during the interview

You might need to give an interviewer the ability to change the language mid-interview. In CATI this can be done issuing the “L=<language code>” command from the terminal at any question prompt. Once this is typed, the current question redisplays in the language specified. In webCati you can only change the language programmatically with !sys,L:

    {!sys,l,<lang code>}

If you want to give webCati interviewers the ability to change languages at any time during the interview, one way to do this is to set up the special block for this purpose and have a !sys,L there. The following is an example (also in wcex here <./wcex/wcex.qpx#special_block>).

  {!special}
  {!-AllowSuspend}
  {!-Allowbackup} ''only the "next" button is presented in  the special block
  {!-Allowspecial}
  {!-AllowTerminate}
  {setlang:
  \L**set language
  !fld
  1 \L**English
  2 \L**French}
  >rep $a=1,2;$b=en,fr
  {!if setlang($a)
  !sys,L,$b}
  >endrep
  {!endspecial}

This would be the “special block” button in btnsbtm.tmpl or buttons.tmpl:

<INPUT TYPE="submit" NAME="Special.x" VALUE=" SwitchLang ">

.tmpl files set-up

WebSurvent and webCATI facilities need to take into consideration proper language handling in the tmpl files. Tipically you will have to handle presentation in multiple languages only in webSurvent, as a webCati interviewer will only need to have the questions translated. In any case, what follows below may apply to both.

In 7.7, the software must have a set of tmpl files for each character set, so we have to point to the appropriate set of tmpl files from the index file. This <http://demo.cfmc.com/mulaex/> is a comprehensive training site specific to 7.7 setups.

For example, if you have a study with extended_ascii and multi-byte languages:

<input type="hidden" name="STUDY_DIR" value="/cfmc/test7.7/websurv/studies/doctor/extended_ascii/"/>

The instruction above will be present in the index pages of the extended_ascii languages, e.g. indexen.html, indexfr.html.

<input type="hidden" name="STUDY_DIR" value="/cfmc/test7.7/websurv/studies/doctor/multi-byte/"/>

The instruction above will be present in the indexct.html file.

In 8.1+ we do not need to split the .tmpl files in character sets, so the “STUDY_DIR” input tag can be:

<input type="hidden" name="STUDY_DIR" value="/cfmc/test7.7/websurv/studies/doctor/"/>

.tmpl files may contain the appropriate text for all the languages of your survey. An !html define can used to pass the .tmpl file info about which language should be displayed. We can get the current language the survey is currently being conducted with a !spc,7 command, like so:

{CurrentLang:
!spc,7,110,2}
{!html_define CurrentLang \:CurrentLang}

Then you can have switches in the .tmpl file that control which language to display on the page:

>if @CurrentLang = "en"
    display some English text here
>elseif @CurrentLang = "fr"
  display some French text here
>elseif @CurrentLang = "ct"
  display some Chinese tradtional text here
 >endif

For example, you might want to set the language of the buttons in the file buttons.tmpl. Click here <wsexam/buttons.tmpl> to see how the study wsexam controls the buttons’ language in buttons.tmpl and here <wsexam/suspend.tmpl> to see the text in suspend.tmpl

You will also need to specify the appropriate character encoding in the META tag in either header.tmpl or pagetop.tmpl. In version 8.1+ you just need to say:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

In version 7.7 we need to have the appropriate meta for every encoding we might use, for example:

                                                                      
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

for western european languages,

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2">

for (some) central european languages.

We can see that in 7.7 even running a study with just extended_ascii languages might require quite a few meta tags switches. For example, a study with western european languages like french, central european languages like Czech, eastern european languages like Estonian will require iso-8859-1, iso-8859-2 and iso-8859-4. Also in this case we can use !html_define’s to pass the language info to header.tmpl:

{currlang:
!spc,7,110,2} ''Currlang will hold a two-character digit that specifies the current language.

Then, you can use:

{!if currlang$<>"fr"
!goto arounden}
{!HTML_DEFINE METATAG <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">}
{arounden:
!goto}
{!if currlang$<>"cs"
!goto aroundcs}
{!HTML_DEFINE METATAG <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2">}
{aroundcs:
!goto}
{!if currlang$<>"et"
!goto aroundet}
{!HTML_DEFINE METATAG <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-4">}
{aroundet:
!goto}

You can find the full list of encodings here: <http://www.iana.org/assignments/character-sets>.

You can also find a full list of language codes here: <http://www.loc.gov/standards/iso639-2/php/code_list.php>.

Error messages in error.tmpl

Some webSurvent error messages are presented before Survent runs, i.e. before we enter the spec, for example a password that is too short. Try to click this </wsexam/indexja.html?password=a> link and submit the index page to get an error about the password being too short.

Therefore these error messages, presented in error.tmpl, cannot be controlled from the qpx with !error_msg and the webSurvent cgi program presents error messages coming directly from the msgfile. If you look at an error.tmpl file, you will see the string ”@error_text”, which is the error message itself. This message is presented in the appropriate language, if the translation is present in the msgfile, otherwise in the default language (english). In order to control the language you must have the “language” input tag in index.html, as seen before at point II.5. We know that in 8.1 any presented message can be encoded in utf-8, so if that is the case a single meta tag can suffice at the top of error.tmpl:

<meta http-equiv="Content-type" content="text/html;charset=UTF-8" />

The error message (@error_text) is presented in the correct language, but in order to have the rest of error.tmpl in the appropriate language, we can set-up conditional sections based on the value of the @language variable. For example:

    >if  @language="en" 
    <h2> An Error Has Occurred! </h2> <font color="#aa00ff"> @error_text </font> 
    <p>If this problem persists, please contact us by clicking on this email link: 
    <a href="mailto:@mailto?subject=Study:@studycode,User:@name,Password:@password, ErrorCode @error_code">@mailto</a> 
    <p>Please describe the problem in as much detail as possible. 
    >endif 
    
    
    >if @language="it"
    <h2> Si è prodotto un errore! </h2> <font color="#aa00ff"> @error_text </font>
    <p> Se il problema persiste, contattateci cliccando sul link email:
    <a href="mailto:@mailto?subject=Study:@studycode,User:@name,Password:@password, ErrorCode @error_code">@mailto</a> 
    <p> Per cortesia descrivete il problema nel maggiore dettaglio possibile. 
    >endif

Click here <./wsexam/error.tmpl> to see wsexam’s error.tmpl file.

javascript error messages and user_settings81.js

In order to display javascript error messages in the different languages of your survey, we must have a copy of user_settings81.js for every language of the survey and point to the one appropriate for the current language in use. We have to keep in mind that any character encoding that is not ASCII must be done in UTF-8 in order to display properly in javascript alerts, so even if you are running a 7.7 study and have, for example, a chinese questionnaire encoded in Big5, the copy of user_settings.js that handles the chinese has to be encoded in UTF-8.

Passing information about the current language in use from the qff to the tmpl files enables us to point to the correct user_settings81.js. We can set an !html_define in the qpx where to store the language and pass the info to the .tmpl file:

  
{CurrentLang:
!spc,7,110,2}
{!html_define CurrentLang \:CurrentLang}

Now that we have the language in the !html_define, we can point to the correct user_settings81.js in either pagetop.tmpl or header.tmpl:

>if CurrentLang = "en"
  <script src="/cfmcweb/js/user_settings_english.js" type="text/javascript"></script>
>elseif CurrentLang = "sp"
  <script src="/cfmcweb/js/user_settings_spanish.js" type="text/javascript"></script>
>elseif CurrentLang = "fr"
  <script src="/cfmcweb/js/user_settings_french.js" type="text/javascript"></script>
>endif

Click here <./wsexam/pagetop.tmpl> to see how wsexam implements the feature in pagetop.tmpl.

Building the fone file

Building a multi-lingual phone file is no different than building a non multi-lingual one, but it is useful to make some considerations.

First off, in versions 7.7 and 8.1 you have 4900 /columns/ available in the fone file text area. It is important to note here that there is not a one to one correspondence between a column and a character. This is true only if you have sample with extended ascii characters. In general, a column in the text area means a byte. If you run a job which has sample text in shift_jis, multi-byte or utf-8 with asian languages you will find that the sample fields use more space than what a visual inspection might suggest.

A step by step guide to building a multi-language sample file might be the following:

Save from Excel following the steps outlined earlier.
Once you have a text file from Excel, for example a tab-delimited <fone_file/provincialue.txt> one, you can run delimit.spx <fone_file/delimit.spx> on it.
This spx will ouput a map <fone_file/delimit.out> of the fields,
as well as a fixed width output <fone_file/provincial.fix> file. A quick visual inspection of the output file and the map tells us that the max 89 columns the map is referring to are bytes and not characters, since we do not have a Field_2 with 89 characters (or ideograms).
You can then run wsmover to create a raw sample file <fone_file/provincial.fbl> that Fonebuld can read, using the information from the map.

This </testsample/index.html> is a webSurvent study that uses the file provincial.fbl as sample file

Managing a multi-lingual job in webCati

Here we will try to give some hints and tips about study management. A webSurvent study is pretty straightforward to manage as typically a respondent is given a link to an index page that will set up the language of the interview for her. We have also seen that you can give respondents a link to an entry page where they can choose from a list of links.

Controlling the interviewing language

In webCati you can control the language when starting the interviewer or have interviewers choose their own interviewing language.

If you want to control the starting language, you can

Flag the employee.xxx <manage/employee.xxx> file in the user area (columns 46 to 240) with a language code, and use that info to set the language with !sys,L. The command !spc,5 reads from employee.xxx. For example you could have you language codes in columns 46 to 51 (three 2 characters language codes):

              {lang:
              !spc,5,46,6}
              {langx: [lang] hide
              !fld,,3
              >rep $a=fr,de,sp
              $a
              >endrep
              }
              >rep $a=fr,de,sp
              {!if langx($a)
              I speak $a
              !disp
              }
              >endrep

    The study wcex has the section here <./wcex/wcex.qpx#emp.xxx>.
  * Start an interviewer as special and then detect the special type in the qpx with the !spc,7 command. You can flag an interviewer as special in the employee.xxx file, in columns 32-40. (Of course, interviewers can also start themselves as special either from the index page or the login.php page in websuper, appending s=<special type> to their interviewer id). !spc,7 gathers various types of info from the system (we have seen that !spc,7,110,2 is the current interviewing language) and !spc,7,35,9 tells whether the interviewer was started as special, returning code 0 in the first column if not started as special or 1 in the first column, 2 in the second, 3 in the third, etc., if started as special. You might have this snippet of code in your qpx:

                  {spec:
                  !spc,7,35,9}              
                  {specx: [spec]hide
                  !fld,,9
                  >rep $a=0,...,9
                  $a
                  >endrep
                  }                   
                  >rep $a=0,...,9
                  {!if specx($a)
                  I am special type $a
                  !disp}
                  >endrep
                  
    Once you have this bit of info in hand, you can issue a !spc,L,<lang code> accordingly. Study wcex has this section here <./wcex/wcex.qpx#special_type>

If you allow interviewers to choose the interviewing language you can use the special block to hold the !sys,L commands as seen previously.

Languages as markets

When you can identify groups in advance in the sample file you can take advantage of the Survent market feature. A sample group can be anything that defines a subset in the sample, therefore if you know in advance the sample file records’ language you can think of a language as a market. In order to use markets, you need to flag the sample records in the fone file text area; the identifier can be a string up to 20 (8 in 7.7) characters long. Once you have a job set up with markets you can control sample using market weights and dedicated !phone commands. One useful command is the !phone,L, which is used to call a number in the market specified:

{Retcode: 
!PHONE,L,< label or location >}

It is important to note that Survent follows a very specific sequence of steps when looking for a number to release to the interviewer. For example, timed callbacks take precedence over a request for a number in a specific market. This in practice means that I might ask for a number in market “french” but might get a timed callback from the “german” market. If we want to reassign this “unwanted” timed number to the Holding Area (Bucket #9), the record will called as soon as an interviewer is available to call that market. You also need to set the following parameters this way:

RELEASE_HOLDING_AREA =  YES 
USE_HOLDING_AREA =  4

RELEASE_HOLDING_AREA set to yes allows for numbers to be available straight away. USE_HOLDING_AREA set to 4 means that the numbers in bucket #9 should ignore the minimum system callback time and be called immediately if released.

An example setup is the phonl <manage/phonl.qpx> spec, which is part of the Survent examples (that can be found in ${CFMC}survent). The study is also here </phonl/index.html>. This spec puts into the holding area timed calls that are coming from a market that was not requested.

You might want the phonl setup to mimick the timed callbacks behavior, where a timed callback is rescheduled for the following day at the same time (it is actually put in the “timed later” stack and subsequently integrated) if max callback age has been exceeded. In practice, you might want to put the number into holding area (bucket #9), but not call it if it comes out of the holding area too late. Instead, the number is rescheduled for the next day at the same time. This is the qpx <manage/buck9.qpx> that I used. All the necessary files to run the example are here <manage/buck9.zip> and here is the link to the running the webCati study buck9.

index.html>

Special interviewer type records

Another approach to managing a multi-language study is to use special interviewer type fone file records. If we flag fone file records in column 22.1 with a number from 1 to 9, those records will be accessed only by interviewers started as “special type” from 1 to 9 respectively. In the context of a multi-language job, we can identify each language with an interviewer special identifier. There are some considerations to be made when using this approach, but the rule of thumb is that the percentage of the number of fone file records of a special type should be equivalent to the percentage of interviewing time allocated to interviewers with that special type. You can refer to this document <manage/specintv.doc> for a full description of using special interviewer types.

Controlling the language in tables

If you want to control the language in Mentor tables, you have to use the

  >language speaking=<language code>

meta command at the top of your tables spec. For example, if we run SCAN on the data generated by this questionnaire <reporting/spec.qpx> and place

  >language speaking=fr

at the top of the SCAN spec <reporting/scan.spx>, we will get this output <reporting/spec.htm>

Similarly, the REFORMAT utility will output all languages unless you specify

>language speaking=<language code>

at the top of your Reformat spec.

For example, if we choose the SPSS fixed width output for the data generated by the same questionnaire <reporting/spec.qpx> as before and put

>languagespeaking=fr

at the top of the saved Reformat spec <reporting/refspss.spx>, we will get this rfl <reporting/spec.rfl> and sps <reporting/spec.sps> files.