Next
Topic: Advanced Load Options: Analyzing and Manipulating Data with NDL++ Expressions |
Setting Up the Data Load with the Load Specification Script
Advanced Load Options: Verifying
and Filtering Data with the PIC Clause
PIC clauses can be included in the record specification sections of the load specification script to specify verification and filtering operations to be applied to the contents of a field. The PIC clause is included immediately after the field-format declaration, and has the following basic syntax:
PIC "< formatting-string >"
The formatting string contains special characters (described below) that are
used to check for, delete, or add characters to the input field value. It
must be delimited by double quotation marks as illustrated above (single quotes
are used to denote a pattern within the formatting string). The formatting
string has a maximum length of 256 characters.
The PIC clause also offers options for truncating a field to the length of the formatting string, and for specification of an alternate formatting string. For complete information about the PIC clause, consult the SAND CDBMS Tools Reference Guide.
The Formatting String
The formatting string operates in terms of a character's position in a field, reading from left to right. That is, the first character in the formatting string is associated with the leftmost character in the field, and all subsequent characters follow going from left to right. Any spaces present within the formatting string are ignored.
Three types of symbols are used in the formatting string: control symbols, data-type symbols, and standard ASCII characters. Control symbols define the filtering operations to be performed, while data-type symbols and ASCII characters specify the input data to be affected by these operations. Note that these symbols are specific to the PIC clause and are not used in other components of load specification scripts.
Control Symbols
Control symbols specify actions that should be taken on a given field position, on the whole field, or on a script section. They can be included before any of the data-type symbols listed below, and can also be used with any ASCII character (which will then be interpreted as itself). If a pattern of characters enclosed in single-quotes follows the control symbol, ndlm will use that pattern literally in checking, deleting, or inserting characters; if a pattern of data-type symbols enclosed in single-quotes follows the control symbol, ndlm will use that pattern to check the data types in the field.
A control symbol affects the symbol or character that immediately follows it in the formatting string. Any number of control symbol/character pairs may appear consecutively in the formatting string; each pair is associated with one character position in the input field.
\x
(backslash)
Check for the presence of x anywhere in the field, and remove it. If x appears anywhere in the field, it is removed and the record is imported.Note:
So that it will not interfere with other specified operations, this symbol must appear first in the formatting string./x
(slash)
Check for the presence of x at the specified character position. If x is not found at the specified position in the field, the record is not imported.-x
(hyphen)
Check for the presence of x at the specified character position, and remove it. If x is found, it is removed and the record is imported.+x
(plus)
Insert the character x. x may not be single or double quotes. The character will not replace any of the input data, but will be inserted at the specified position, displacing data to the right. If the addition of a character to a field causes the field to exceed the database table column size, the character insertion operation will not be performed.Note:
When all the fields in the input data are declared as fixed length, and adding one or more characters with a PIC clause causes the new field to exceed the width defined in the record specification, this field will be truncated on the right. This will not occur, however, if at least one field is declared as variable length. If necessary, the last field in a record can be declared as a variable-length field that has the end-of-record character(s) (\n in UNIX; \r\n in Windows) as a trailing delimiter (since this character always appears at the end of a record).
Records that do not satisfy the check operation are not imported, but are saved to a file named input-file-name.pct (where input-file-name is the name of the source data file for the import).
Data-Type Symbols
Data-type symbols indicate the classes of ASCII data that can be manipulated using control symbols. Therefore, they must be preceded by a control symbol in a formatting string, unless they appear in a pattern statement enclosed by single quotes. If a pattern of data-type symbols appears between single quotes after a control character, ndlm will look for data conforming to this data-type pattern (for example, ' $#$#$# ' might be used to check fields containing Canadian postal codes).
The following data-type symbols can be used in the formatting string:
# numeric $ alphabetic % alphanumeric (containing both alphabetic and numeric characters) _ (underscore) spaces ? alphabetic and spaces together ^ other symbols
Next
Topic: Advanced Load Options: Analyzing and Manipulating Data with NDL++ Expressions |