Using ~input to read delimited data

To use the ~input to read in delimited data output from spreadsheet programs such as Excel, you will need to use one or more of the following ~input options:

delimiterdelimit_mapdelimit_variable_name_prefix=delimit_name_firstsave_delimit_map= use_delimit_map=max_delimit_fields=delimit_drop_firstdelimit_quote_alpha

NOTE:Thesave_delimit_map= and use_delimit_map= options cannot be used together.

Delimited input files must be ascii with only one record per case. Only one delimited file at a time may be read, so the files= and join option may notbe used with delimited input.

Minimum requirements

When reading delimited input, you must always use the delimiter= option in order to indicate that the data is delimited, and in order to specify what delimiter is to be used. This is the only required option for reading delimited data.

The syntax of this option is:


NOTE: Blank may not be used as a delimiter.

Din1.spxcontains an example of the minimum required to read a delimited input file.

NOTE: Although the delimiter= option is the only option required to read a delimited input file, this option used alone will probably not be satisfactory. This is because the only way to tell how to reference the data is to have a look at it, to see which columns the fields ended up in. The delimit_map option provides a map of the data as seen by the control array.

The file din2.spx contains an example of a delimit_map.

The delimit_map shows the data location of each of the delimited fields, the field width, contents of the field (alpha vs. numeric), plus various stats for the numeric fields. This information may be used as a guide to accessing the data without creating any variables. Although the delimited map calls the fields Field_1, Field_2, etc., these names may not be used to make data references.

If you need to have Mentor create variables of the appropriate type having sequential names, you can use the delimit_variable_name_prefix= option.

For example,delimit_variable_name_prefix=var would cause the fields to be named var1, var2, etc., if there are less than 10 fields, or var01, var02, etc. if there are less than 100 fields.

These variables may be stored in an open DBfile, or used from the local DBfile if no permanent DBfile is being used. Using delimit_variable_name_prefix= is most appropriate when you only need to use several of the the data fields from the original delimited input.

For example, you need to merge fields 3, 9, and 15 into an existing data file. This option is also useful when you want to do simple operations such a ~freq on a number of fields.

The file din3.spx is an example of this.

When you want to give the fields meaningful variable names, you can use the delimit_name_first option. Using this option causes Mentor to view the first record of the delimited data file as a list of variable names for the fields. This record should be a list of legal Mentor variable names separated by the delimiter= character.

The file din4.spx is an example of using this option.

NOTE: The delimit_name_first option overrides the delimit_variable_name_prefix option.

The delimit_drop_first option may be used to skip the first record of a delimited data file in which the first record is a list of names. You can easily create variables with sequential names and variables with meaningful names from the same data file.

The file din5.spx is an example of doing this.

NOTE: The delimit_name_first option automatically calls the delimit_drop_first option.

The save_delimit_name= option allows you to store the delimited control array into a DBfile for later use. If you have a saved control array, you can use it to read the delimited data on subsequent occasions without Mentor needing and building a new control array. Remember, if the input file has been changed so that a field’s type (text vs. numeric) is no longer what it was when the control array was saved, or if a field’s maximum size has changed, the control array must be remade. If you want to write out a new data file from the delimited input, you do not need to use the control array. Variables created and stored in the DBfile as the delimited data was being read will be accurate for the output data file.

Din6a.spx and din6b.spx show how to reuse the control array.

NOTE: When using use_delimit_name=, ascii= is no longer necessary on the input statement.

Din6c.spx shows how to use a data file output from delimited input.

The default number of fields in the delimited data is 1000. If you need to read in a delimited data file that has more than 1000 fields, you should usethe max_delimit_fields= option to set this limit to an appropriate number.

Delimit_quote_alpha – The default for this option is true. When this option is set to true, any field enclosed in double quotes will be treated as an alpha (string) field, even if only numbers ever appear within the quotes. Some programs always output all of their fields in quotes.

In order to get Mentor to analyze the contents of quoted fields, you must set -delimit_quote_alpha. In the reformatted data, string variables are left justified within the new field, whereas numbers are right justified. A field of left justified numbers will not convert to a “good” number because it has trailing blanks. You can avoid this problem by using -delimit_quote_alpha.

The file din8.spx is an example of using this option.