SAND CDBMS Tools Reference Guide
Data Loader (ndlm)

 

Previous Topic:
Data Loader Overview
Chapter Index
Next Topic:
SAND Compacted Table (SCT) Functionality

 

The Parallel Loader


The SAND CDBMS Data Loader utility can be set to operate in an "advanced" mode when loading data into a database from flat files. When operating in this mode, the utility is called the Parallel Loader.

The Parallel Loader was designed to take advantage of certain hardware and data conditions, so that it performs data loads faster than the regular ndlm. There is no absolute guideline for when to use the Parallel Loader, but a good rule of thumb is to operate in this mode when one or more of the following conditions apply:

When activated, the Parallel Loader prepares for the load by accumulating and sorting the data, assembling SAND CDBMS domains, and then merging the data. The actual data load is accomplished in three phases:

  1. The domain insert / column assignment phase: SAND CDBMS domains are populated with all unique data values in sorted order. A reference to each domain value is then created in the associated column locations where the value is used.
  2. The bulk insert phase: The column data are inserted into the database tables many rows at a time, and relations (tables) are validated with regard to all associated constraints.
  3. The indexing phase: SAND CDBMS creates an internal index for the loaded data.

The Parallel Loader command syntax is the same as that of the regular ndlm, with additional flags for setting the number of load processing threads (-k) and the number of reading threads (-j). To activate the Parallel Loader operating mode, simply include the -k flag in the ndlm invocation. In addition, include the -b flag with the command, as the Parallel Loader can operate only in batch mode.

There are additional configuration options for the Parallel Loader. Refer to "The NUCLEUS environment variable / nucleus.ini file" in the previous section for further information.


IMPORTANT!

Ensure that the amount of memory required by the Parallel Loader is not greater than the amount of real memory in the system where the load operation is taking place. The following condition must be true:

((buffer-size/1024 * reading-threads * 3) + 550) <= real-memory

where:

  • buffer-size is the buffer size (the -f flag value) in kilobytes
  • reading-threads is the number of reading threads (the -j flag value)
  • real-memory is the amount of real memory (in megabytes) in the system where the Parallel Loader is run.

As well, under Windows, the amount of memory required by the Parallel Loader operation must be less than 1.5 GB (1,536 MB):

((buffer-size/1024 * reading-threads * 3) + 550) <= 1536


Note: If the UNIX version of the Parallel Loader exits with a memory error, try increasing the maximum data segment size for the current execution environment using the ulimit command. Consult “Minimum ulimit Settings (UNIX only)” in the Troubleshooting section for further details.

 

Previous Topic:
Data Loader Overview
Chapter Index
Next Topic:
SAND Compacted Table (SCT) Functionality