Previous
Topic:
Data Loader Overview |
Next
Topic: SAND Compacted Table (SCT) Functionality |
The SAND CDBMS Data Loader utility can be set to operate in an "advanced" mode when loading data into a database from flat files. When operating in this mode, the utility is called the Parallel Loader.The Parallel Loader was designed to take advantage of certain hardware and data conditions, so that it performs data loads faster than the regular ndlm. There is no absolute guideline for when to use the Parallel Loader, but a good rule of thumb is to operate in this mode when one or more of the following conditions apply:
- Data considerations - If the columns tend to be very wide, are variable length, or there is not much repetitious data (that is, most values are unique), the Parallel Loader will perform more efficiently than the regular loader. Under these conditions, the regular loader often reaches the limits of real memory, and the subsequent high paging demands result in "memory thrashing", which stalls the load operation. The Parallel Loader is designed to reduce the real memory requirement; consequently, memory thrashing is rarely a factor in Parallel Loader operations.
- Multiprocessor system - The Parallel Loader is more efficient when the system under which it is running uses multiple processors. If the system has low processing power, it is recommended that the regular loader be used.
- Large flat file or database - If the flat file is greater than 10 GB in size, or the database is larger than 100 GB, the Parallel Loader will outperform the regular loader.
- Batch windows constraint - For end users who are operating the loader under a "batch windows" constraint, where only a limited amount of time is allocated for incremental loads, the Parallel Loader is preferable, as it can process more data in a short period than the regular loader.
When activated, the Parallel Loader prepares for the load by accumulating and sorting the data, assembling SAND CDBMS domains, and then merging the data. The actual data load is accomplished in three phases:
- The domain insert / column assignment phase: SAND CDBMS domains are populated with all unique data values in sorted order. A reference to each domain value is then created in the associated column locations where the value is used.
- The bulk insert phase: The column data are inserted into the database tables many rows at a time, and relations (tables) are validated with regard to all associated constraints.
- The indexing phase: SAND CDBMS creates an internal index for the loaded data.
The Parallel Loader command syntax is the same as that of the regular ndlm, with additional flags for setting the number of load processing threads (-k) and the number of reading threads (-j). To activate the Parallel Loader operating mode, simply include the -k flag in the ndlm invocation. In addition, include the -b flag with the command, as the Parallel Loader can operate only in batch mode.
There are additional configuration options for the Parallel Loader. Refer to "The NUCLEUS environment variable / nucleus.ini file" in the previous section for further information.
IMPORTANT!
Ensure that the amount of memory required by the Parallel Loader is not greater than the amount of real memory in the system where the load operation is taking place. The following condition must be true:((buffer-size/1024 * reading-threads * 3) + 550) <= real-memory
where:
- buffer-size is the buffer size (the -f flag value) in kilobytes
- reading-threads is the number of reading threads (the -j flag value)
- real-memory is the amount of real memory (in megabytes) in the system where the Parallel Loader is run.
As well, under Windows, the amount of memory required by the Parallel Loader operation must be less than 1.5 GB (1,536 MB):
((buffer-size/1024 * reading-threads * 3) + 550) <= 1536
Note: If the UNIX version of the Parallel Loader exits with a memory error, try increasing the maximum data segment size for the current execution environment using the ulimit command. Consult Minimum ulimit Settings (UNIX only) in the Troubleshooting section for further details.
Previous
Topic:
Data Loader Overview |
Next
Topic: SAND Compacted Table (SCT) Functionality |