Data Tools

The Data Tools Utility performs post processing functions such as copying, sorting and subsetting data files.  These are the COPYFILE functions.  (See Utilities, Copyfile for more information).

 

Data Tools Function Description
Fix TR File (Raw Copy) Attempts to recover corrupted data files
Change Data Case Length Increases or decreases the length of your data case file
Index Directory Utility Removes or changes the size of the index directory. Often used to find cases quicker for coding mode applications.
Set Starting Case ID and Reorder Reorders Caseid, sets Caseid to a variable or sets Caseid to a location
Subset Data File Subsets the file with according to base, randomized, every ‘Nth’ Case, first ‘N’ Cases
Sort Data File Sorts a Survox System file randomly or sequentially based on key sort variables or locations defined
Modify Text Area Shifts your text area as well as change the length of your data file.

Running Data Tools

When running Data Tools, you have the option to report using Survox Data files or ASCII data files. Once selected, Data Tools will find all Available files to report on and will display them on the screen. You have the option to report on a single file or multiple files. Select the file(s) you wish to report on and click the right arrow to copy the files to the Selected files box. To continue the Data Tools run, you MUST choose a .db file from the drop-down menu. Omitting this results in a popup javascript reminder message. You can then click the “Get Selected Files” button. This will bring you to the Data Tool main menu.

To run the Data Tools Utility, choose the menu item of your choice, select an Output File Type from the drop down menu and specify an output file name. The output file name will match your study code and will have an “_cp” appended to it. This can be changed. After specifying all parameters, click the “Run Copyfile” button to process.

 

data_tools_select_file

 

 

Data Tools Main Menu contains the Data File Information for the TR file along with the Select Tasks and Output formats and name.

data_tools

Converting Data to Other Formats

At the time each Data Tool function is performed, you also have the option to convert your data to another format if specified. You can choose the Output File Type from the drop down menu.

Output File Types include: 

Output File Types Description
SurvoxTR File A specially formatted Survox file that holds data. In a system file, one record is the length of an entire case, no matter how many columns of data are in the case. In addition, it has a special segment (called the case header) in each record that holds the internal case ID and flags for deleting, cleaning and updating the case. System files have an extension of TR.  This is the default output type.
ASCII Pronounced “askee”; acronym for American Standard Code for Information Interchange. A coding scheme that assigns numeric values to letters, numbers, punctuation marks and special characters. ASCII files are generally transportable across all computer systems. They are sometimes called text files, text-only files or flat files.
Ascii Card 80 columns of data. Each data case can be divided into 80 column records. By default, Mentor references a data field by its record number, column and width. Multiple records from one respondent is called a CASE.
Binary (1) The numbering system based on twos (just as the decimal system is based on tens). Numbers are represented using only the digits 0 and 1. (2) IBM360 column binary coding which is the format for Survox System files. Also see SWAPPED BINARY. (3) When data is stored as multiple punches in each column, also called multi-punched data.
Swapped Binary A type of file where the first and last bits are swapped. You can recognize this type of file when seeing sixes and sevens where you would expect to see zeros and ones (binary punches 01-9XY are swapped binary punches 6-9YX012354). If you transfer a standard IBM-360 column binary file from the HP3000 to the PC, you should treat it as a swapped binary file.

Saving Parameters for Future Runs

To save your specified parameters to a file, click the “Save Copyfile Spec” button at the bottom of the Data Tools Screen. The file will automatically be saved as studycode_copyf.spx and will be available to choose from in a drop down menu the next time you run the Data Tools utility.

Fix TR File (Raw Copy)

Raw Copy attempts to recover corrupted Survox data files on a case by case basis, discarding the cases that are not valid. Possible causes of file corruption include system crashes while the data file is open, bad sectors on hard disks or diskettes, errors in file transfers or files that have been altered.

To fix your data file, click on the Fix TR File option, choose the Output File Type from the drop down menu and specify an Output File Name. The default name of studycode_cp will automatically be used unless specified otherwise. To process, click the “Run Copyfile” button.

data_tools_fixtr

 

Change Data Case Length

Use this option to increase or decrease the length of your data case file. The minimum case length is 200 and the maximum case length is 640000. A Javascript popup error message will appear if you’ve defined lengths outside of those ranges.

To change the data case length, click on the Change Data Case Length option and specify the New Case Length, choose the Output File Type from the drop down menu and specify an Output File Name. The default name of studycode_cp will automatically be used unless specified otherwise. To process, click the “Run Copyfile” button.

data_tools_caselength

 

Index Directory Utility

The case id index directory is automatically created for each data file.  The index stores a list of the caseids and where they are in the file. Therefore when looking for a particular case, it can go directly to the case instead of reading through the whole file allowing for quicker accessing of information.

This is most often used by View mode in Survent, Survent coding applications when you want to find a record quickly or when using a cleaning utility to find cases quickly.

The default index created is 20,000 entries. The Index Directory Utility allows you to either remove an index directory completely or change the directory size altogether. Removing the index directory will decrease the file size significantly.

Remove Directory:

The only reason to remove the directory is to make the resultant file smaller if you don’t use any of the features above.

To remove the CaseID  index directory, click on the Remove Directory option, choose the Output File Type from the drop down menu and specify an Output File Name. The default name of studycode_cp will automatically be used unless specified otherwise. To process, click the “Run Copyfile” button. In the example below, we are removing the Index Directory by clicking Remove Directory.

Add Directory/Change Directory Size:

When changing the directory size, the minimum directory size is 1000 and the maximum is 500000. A Javascript popup error message will appear if you’ve defined lengths outside of those ranges.

To add a directory, click on the Add Directory/Change Directory Size option, choose the Output File Type from the drop down menu and specify an Output File Name. The default name of studycode_cp  will automatically be used unless specified otherwise. To process, click the “Run Copyfile” button.  If you expect more than 10,000 cases you should increase this value.

data_tools_dirsize

Set Starting Case ID and Reorder

After you’ve collected your data, you may have the need to reorder the CaseID numbering. This Data Tool option allows you to:

  • Reorder your case IDs with a new starting number
  • Set your Case ID to a variable
  • Set your Case ID to a location

Reorder CaseID:

A typical example of why you would want to reorder your CaseID is if you were doing a wave study. For the second wave, you may want to change your caseids to start with 2001 to signify the cases are from the second wave. When reordering your caseid, the minimum CaseID is 0 and the maximum is 32 digits in length.

To Reorder your CaseIDs, click on the Reorder CaseID option, specify your starting CaseID, choose the Output File Type from the drop down menu and specify an Output File Name. The default name of studycode_cp will automatically be used unless specified otherwise. To process, click the “Run Copyfile” button.

Set CaseID to a variable:

You can set the CaseID to match a variable in your questionnaire. The only requirement is the length of your CaseID must match the length of the variable you are referencing.

To set your CaseID to a variable, click on the Set CaseID to a variable option, specify the variable you want to use as your CaseID, choose the Output File Type from the drop down menu and specify an Output File Name. The default name of studycode_cp will automatically be used unless specified otherwise. To process, click the “Run Copyfile” button.

Set CaseID to a location:

You can also set the CaseID to match a location in your questionnaire. The only requirement is the length of your CaseID must match the length of the location you are referencing.

To set your CaseID to a location, click on the Set CaseID to a location option, specify the location you want to use as your CaseID, choose the Output File Type from the drop down menu and specify an Output File Name. The default name of studycode_cp will automatically be used unless specified otherwise. To process, click the “Run Copyfile” button.

data_tools_caseid

 

Subset Data File

Allows you to subset your data file with the following criteria:

  • Base Subset
  • Create Randomized Subset
  • Write every ‘Nth’ Case
  • Write the First ‘N’ Cases
  • Write an Exception File

Base Subset:

If using a Base to subset your data file, you need to define a select statement to tell the program which records to subset out of the data file.

To base your subset, click on the Base Subset option, define your base to use, choose the Output File Type from the drop down menu and specify an Output File Name. The default name of studycode_cp will automatically be used unless specified otherwise. To process, click the “Run Copyfile” button.

Create Randomized Subset:

When creating a randomized subset, you can obtain a random percent of records or a random number of records. If you request a random percent of records you will be prompted for the percentage of cases to get.  If you request a random number of records, you will be prompted for the exact number of cases to get.

To create a randomized subset, click on the Create Randomized Subset option, specify whether you want a random percent of records or a random number of records, choose the Output File Type from the drop down menu and specify an Output File Name. The default name of studycode_cp will automatically be used unless specified otherwise. To process, click the “Run Copyfile” button.

Write Every ‘Nth’ Case:

Writing out every ‘Nth’ Case is typically used when creating subsets of sample files and distributing them to other interviewing houses. For example, if you write out every 10th record of a sample file that has 100 records, you will wind up with 10 records in total (which will include records 10, 20, 30, 40, 50, 60, 70, 80, 90 and 100).

To write out every ‘Nth’ Case, click on the Every ‘Nth’ Case option, specify the number of records, choose the Output File Type from the drop down menu and specify an Output File Name. The default name of studycode_cp will automatically be used unless specified otherwise. To process, click the “Run Copyfile” button.

Write First ‘N’ Cases:

Writing out the First ‘N’ Cases subsets the data file by that many cases. For example, if you have a data file with 100 cases and you choose the first 18 cases, the program will subset out the first 18 cases to another file.

To write out the First ‘N” Cases, click on the First ‘N’ Cases option, specify the number of records, choose the Output File Type from the drop down menu and specify an Output File Name. The default name of studycode_cp will automatically be used unless specified otherwise. To process, click the “Run Copyfile” button.

Write Exception File:

When subsetting your data, you have the option to write out a second data file for the cases that don’t match your criteria. This is called an exception file.

To write out an exception file, choose your subset option as described above, click on the Write Exception File option, specify the name of your exception file, choose the Output File Type from the drop down menu and specify an Output File Name. The default name of studycode_cp will automatically be used unless specified otherwise. To process, click the “Run Copyfile” button.

This option can only be chosen along with another subset option. If you choose this option by itself, you will receive a Javascript pop-up message reminding you to select an additional subset option.

 

data_tools_subset

Sort Data File

This option sorts a Survox System file. When sorting your file, you have the ability to perform a:

  • Random Sort
  • Sort the file sequentially based on key sort variables or locations

Random Sort:

A random sort will randomly sort the file. This is useful for scrambling phone numbers in a sample file.

To randomly sort your file, choose the Random Sort option, choose the Output File Type from the drop down menu  and specify an Output File Name. The default name of studycode_cp will automatically be used unless specified otherwise. To process, click the “Run Copyfile”  button.

Select Sort Key(s):

You can specify up to three sort keys to sort your file. A sort key is a variable or data location that you provide. The program will then sort the file sequentially based on that data. For example, you may have a sample file that has market fields identified in it. Rather than doing a random sort, you may want to randomly sort files within a market. You can specify the market location as a sort key. When randomizing, it will keep those markets together and randomize the records within it. This is often used for replicate fields as well.

To specify sort keys, choose the Select Sort Keys option, specify each sort key variable or location as well as whether you want it sorted in ascending or descending order, choose the Output File Type from the drop down menu and specify an Output File Name. The default name of studycode_cp will automatically be used unless specified otherwise. To process, click the “Run Copyfile” button.

data_tools_sort

 

Modify Text Area

This option allows you to shift your text area as well as change the length of your data file. You must know the column location of where the text area starts in the file before you can shift it to a new location. The text start marks the beginning of the text area where compressed text is stored at the end of the data file. Each text question gets stored as a text pointer taking up 1 column of data in the text area (regardless of how many characters the question takes up). Often times, you may need to shift this text area to allow for more room to collect actual data. As a result from shifting the text area, you may also need to increase the total length of your data file. The Modify Text Area options allows you to perform both functions in one step. This is a safe method of shifting your text area so you don’t lose any text pointers for previously text data collected.

To shift your text area, choose the Modify Text Area option, specify your New Text Start. If you want to change your case length, specify a new case length, choose the Output File Type from the drop down menu and specify an Output File Name. The default name of studycode_cp will automatically be used unless specified otherwise. To process, click the “Run Copyfile” button.

data_tools_textarea

Downloading Files Created

  1. Available files for download will appear on the screen
  2. You can click the “All” or “Check All” box to select all the files, the “None” or “Uncheck All” box to deselect all the files, or you can select the specific file hyperlink that you want to download
  3. These files will be automatically compressed and saved as a zip file
  4. Click the “Download” button to download the file

data_tools_save

 

 Video on Data Tools