Octopus Administrator's Guide
Octopus and Partitioned Tables (MPP)

 

Previous Topic:
Octopus On-Demand Database (ODDB)
 
Next Topic:
Octopus Monitoring, Logging, and Statistics


Topics

 

Octopus and Partitioned Tables (MPP)

Octopus can be used with SAND partitioned tables as part of a Massively Parallel Processing (MPP) framework. A partitioned table is a special type of view that points to multiple partitions of data across different nodes in the network. A table is effectively split into smaller pieces on remote databases to take advantage of parallel processing.

For information about creating and using partitioned tables, and the related dimension tables and distributed domains, refer to the Massively Parallel Processing (MPP) chapter in the SAND CDBMS Administration Guide.

There are several different models that can be adopted for Octopus and data partitio ning. The two basic ones are the standard Octopus MPP model and the Octopus ODDB MPP model.

 

Standard Octopus MPP

In this setup, the head node and each partition node runs its own Octopus database instance (which might itself make use of multiple nodes). Whenever a user queries a partitioned table in the head database, the server transparently redirects the query to all of the associated partition nodes, where the queries are handled in parallel according to the respective Octopus configurations.
Since each head and partition node involves an independent Octopus, they must all be started and running at the same time when the MPP system is actively used. Enabling the AutoStartup option is one way to ensure that the Octopus instances start automatically with the server machines.

 

Octopus ODDB MPP

In the ODDB version of Octopus/MPP, a single Octopus manages the head database, and all of the partition databases are run as ODDB instances. Unlike the standard Octopus MPP setup, which is decentralized across nodes, most of the configuration settings in the Octopus ODDB MPP setup are consolidated in one nucleus.ini file on the head node.

One thing to note about ODDB with MPP is that Octopus acts as a client to itself when starting partition nodes as ODDB instances. When a user queries a partitioned table, the query is sent to each of the remote tables that comprise the partitioned table. If any of those partition nodes are not currently running, Octopus will start each one as an ODDB instance by calling itself with the database name as a parameter.

 

Examples

Configuration Example (Standard Octopus MPP)

In the following setup, there is one main (head) Octopus machine, and several remote machines where Octopus is running independently. The nucleus.ini file on the head node contains the following configuration fragment:

[OCTOPUS Octohead]
DatabaseName=mpp_head
Connection=mpp_head
Port=1234
DatabasePath=/store/db
...

[CONNECTION mpp_head]
Port=1234
Host=Alpha

[DATABASE mpp_head]
DatabasePath=/store/db

On the "Alpha" machine, Octopus runs an instance of database "mpp_head", which acts as the head node in the MPP system.

The Alpha machine nucleus.ini file also contains CONNECTION sections for each of the remote nodes that will be used for table partitioning:

# Remote Node 1
[CONNECTION mpp_oddb1]
Port=1235
Host=Beta

# Remote Node 2
[CONNECTION mpp_oddb2]
Port=1236
Host=Gamma

# Remote Node 3
[CONNECTION mpp_oddb3]
Port=1237
Host=Delta

On the "Beta" machine, the nucleus.ini file contains settings for its own Octopus instance:

[OCTOPUS Octo1]
DatabaseName=mpp_oddb1
Connection=mpp_oddb1
Port=1235
DatabasePath=/store/db
...

[CONNECTION mpp_oddb1]
Port=1235
Host=Beta

[DATABASE mpp_oddb1]
DatabasePath=/store/db

In this case, the "mpp_oddb1" database will have a table that acts as a partition for a partitioned table in the "mpp_head" database (head node).

Similarly, the "Gamma" and "Delta" machines both have their own nucleus.ini files, and will run Octopus database instances that contain table partitions.

Gamma machine (remote node #2) nucleus.ini settings:

[OCTOPUS Octo2]
DatabaseName=mpp_oddb2
Connection=mpp_oddb2
Port=1236
DatabasePath=/store/db
...

[CONNECTION mpp_oddb2]
Port=1236
Host=Gamma

[DATABASE mpp_oddb2]
DatabasePath=/store/db

Delta machine (remote node #3) nucleus.ini settings:

[OCTOPUS Octo3]
DatabaseName=mpp_oddb3
Connection=mpp_oddb3
Port=1237
DatabasePath=/store/db
...

[CONNECTION mpp_oddb3]
Port=1237
Host=Delta

[DATABASE mpp_oddb3]
DatabasePath=/store/db

In this standard Octopus MPP system, each node's Octopus instance must be started and running concurrently. When the MPP system is fully operational, linked tables for remote partitions will be created in the head node database, using the CONNECTION sections that point to the remote nodes. These linked tables can then be included in the definition of a partitioned table, effectively distributing the table's data among the remote Octopus nodes. A query executed against the partitioned table on the head node will be redirected to each of the remote Octopus instances for parallel processing, and the results will be collated and returned to the client.


Configuration Example (Octopus ODDB MPP)

In an alternative MPP setup, the main Octopus instance is configured to start the remote databases as ODDB instances. This way, in contrast to the standard Octopus MPP setup, the remote nodes do not have to be started independently of the head node. However, the head node nucleus.ini file becomes more complex, as most of the configuration details for the remote nodes are consolidated at the head.

The main OCTOPUS section might look similar to the following:

[OCTOPUS Octopus]
DatabaseName=octohead
Connection=octohead
Port=1233
DatabasePath=/store/db
ODDBDatabases=mpp_oddb1,mpp_oddb2,mpp_oddb3
Classes=Login,ODDB_MPP1,ODDB_MPP2,ODDB_MPP3
OctoRunMode=ODDB
ODDBKeepAlive=true
ODDBKeepAliveTimeOut=3600
ODDBRunMode=ReadOnly
LoginClass=true
Nodes=NODE1,NODE2,NODE3

...

Here, "octohead" is specified as the main (head) database, while the remote ODDB databases are listed as "mpp_oddb1", "mpp_oddb2", and "mpp_oddb3". Note that the ODDBRunMode parameter is set to ReadOnly, which means that the ODDB databases cannot be updated by users. If configured this way, all remote database tables must be loaded with the required partitioned data before ODDB is used. This also means that new dimension tables and distributed domains on the head, or changes to existing dimension tables, will not propagate automatically to the other nodes. Therefore, ReadOnly mode should be used only if the MPP system does not need to be changed further.

The main nucleus.ini file must include the CONNECTION and DATABASE sections for the head database:

[CONNECTION octohead]
Port=1233
Host=Alpha

[DATABASE octohead]
DatabasePath=/store/db

The nucleus.ini file must also include settings for each remote node/ODDB database:

# Node1 (mpp_oddb1)

[DATABASE mpp_oddb1]
DatabasePath=/store/db

[SUBCLASS NODE1 ODDB_MPP1]
ODDBSubClass=true
NumberOfEngines=0
DeltaPath=/store/db
ODDBKeepAlive=true

[NODE NODE1]
DatabasePath=/store/db
Host=Beta
Port=9997
StartOctoEngTimeOut=1


# Node2 (mpp_oddb2)

[DATABASE mpp_oddb2]
DatabasePath=/store/db

[SUBCLASS NODE2 ODDB_MPP2]
ODDBSubClass=true
NumberOfEngines=0
DeltaPath=/store/db
ODDBKeepAlive=true

[NODE NODE2]
DatabasePath=/store/db
Host=Gamma
Port=9998
StartOctoEngTimeOut=1


# Node3 (mpp_oddb3)

[DATABASE mpp_oddb3]
DatabasePath=/store/db

[SUBCLASS NODE3 ODDB_MPP3]
ODDBSubClass=true
NumberOfEngines=0
DeltaPath=/store/db
ODDBKeepAlive=true

[NODE NODE3]
DatabasePath=/store/db
Host=Delta
Port=9999
StartOctoEngTimeOut=1

Lastly, the nucleus.ini file must contain a CONNECTION section for each remote node, which is needed for the connection objects/linked tables that will be created for use with partitioned tables on the head:

[CONNECTION mpp_oddb1]
Port=1235
Host=Beta

[CONNECTION mpp_oddb2]
Port=1236
Host=Gamma

[CONNECTION mpp_oddb3]
Port=1237
Host=Delta

In addition to the head node nucleus.ini settings described above, the following OCTOPUS AGENT sections must be present in the respective remote nucleus.ini files:

#Host=Beta
[OCTOPUS AGENT]
Port=9997
LogFile=/store/log

#Host=Gamma
[OCTOPUS AGENT]
Port=9998
LogFile=/store/log

#Host=Delta
[OCTOPUS AGENT]
Port=9999
LogFile=/store/log

 

Previous Topic:
Octopus On-Demand Database (ODDB)
Chapter Index
Next Topic:
Octopus Monitoring, Logging, and Statistics