SAND CDBMS Administration Guide - Concurrency Control

SAND CDBMS Administration Guide
SAND CDBMS Concurrency Control

An Optimistic Approach to Concurrency

To maintain data integrity, SAND CDBMS employs a special non-locking mechanism for detecting and resolving concurrent database access conflicts. This method of concurrency control differs from other RDBMS products in its "optimistic" approach, which assumes that the possibility of two or more processes trying to update the same block of data is highly unlikely -- an assumption that is in fact valid for the high-read, low-update database environments that are typical of decision support systems.

This does not mean, however, that SAND CDBMS cannot handle concurrency conflicts when they arise. Instead of employing complex and expensive locking schemes to protect users from situations that very rarely occur, SAND CDBMS handles these conflicts by alerting the application to the situation (with an error message) and then allowing the user to decide how to proceed.

Generation-Based Concurrency Control

A database transaction is basically a set of SQL statements ending with a COMMIT [WORK] or ROLLBACK [WORK] command. SAND CDBMS tracks each transaction, recording information about which database structures have been accessed, as well as the nature (read or write) of the access. Changes made within a given transaction are actually stored in a workspace that is "private" to that transaction, called a generation log; one such log is kept in memory for each concurrent user (or thread) connected to the database — that is, for each database session. The pages that make up the generation log are swapped out to the system as memory requirements dictate.
SAND CDBMS' method of concurrency control is generation-based. When a COMMIT command is executed to complete a transaction, the private record (or "generation") of database accesses for that transaction is compared to the public generation maintained automatically by SAND CDBMS: this public generation represents a consistent version of the database, and includes any changes that have been successfully made to the database by other users since the transaction was initiated. If no conflict is detected, the changes contained in the private generation are merged with the public generation. If a transaction conflict is detected, the COMMIT command will fail, an error message will be issued, and the transaction will be automatically rolled back (that is, the transaction's private workspace will be removed from the system). The automatic ROLLBACK is executed for the user who receives the error; in other words, the transaction that caused the error is the one that gets rolled back.
If the COMMIT fails in this way, the entire transaction must be resubmitted. In order to do this, client SQL programs must expand their already existing logic to handle COMMIT failures: they will need to determine whether the transaction still applies to the data in the database (as would be the case if the failure was due to a VALIDATION error, see below). If this is the case, the transaction must re-executed and the COMMIT reissued.
Important Note!
For purposes of generation-based concurrency control, a SAND CDBMS transaction begins not at the previous transaction boundary (ROLLBACK, COMMIT, or connect), but rather when the first non-transaction-boundary SQL statement is issued. This is important to remember, as it determines which public generation will be the basis of a session's private generation.

ROLLBACK and COMMIT

It is important to note that for SAND CDBMS, the execution of a ROLLBACK involves significantly less physical work than a COMMIT. To optimize performance, therefore, it is best to conclude a transaction that does not modify the database (that is, SELECT-only transactions) with a ROLLBACK statement. Transactions that are intended to make changes to the database (for example, INSERT, UPDATE, DELETE, CREATE TABLE and so on) should of course be terminated with the COMMIT WORK command.
Ultimately, it is up to the user to decide whether "read-only" queries should be run against a "snapshot" of the database as it existed when the session was started, or should take into account any changes made by others during the session (as, for example, when the database is frequently updated via incremental loads). If users simply want to work with a snapshot, they can connect to the database and then proceed to issue read-only queries without worrying about concurrency issues. With SAND CDBMS' generation-based concurrency control, there is no need to be concerned about resource locking -- a common problem when working with other RDBMS systems, where certain "commit scopes" can cause table/row/column locking on SELECT statements.
If the user wants to be able to see the changes made by others (that is, those who have committed their work), a ROLLBACK [WORK] command should be issued between SELECT statements. The ROLLBACK command will refresh the user's private generation with the current public generation. The option of issuing a COMMIT command is available, but will be more costly; in most cases, a SAND CDBMS ROLLBACK command is very fast.

Related Errors

If a transaction fails due to a concurrency conflict, one of the following error codes is returned. This enables applications to detect conflict-related failures and retry transactions automatically:
0B001 Exclusive use not possible while other transactions are active
Returned when two or more users are connected to a database instance, and one of the users attempts to execute an ALTER TABLE...ADD CONSTRAINT command. For concurrency control reasons, SAND CDBMS does not allow constraints to be added to a table while other user sessions are active.
40001 Transaction error - a possible conflict forced a rollback
Returned when a user attempts to COMMIT database changes that conflict with changes committed by one or more other concurrent user connections to the same database instance. The entire transaction fails and must be re-executed.

Examples of Error Conditions

The following are examples of situations where a COMMIT would fail due to the errors listed above.

Unique Constraint Violation
1) USER1 and USER2 connect to the database instance.
2) USER1 issues the following SQL command to insert a new row into the employee table, which has a PRIMARY KEY constraint defined on column SSN:

INSERT INTO employee(ssn, f_name, l_name, city, dept) VALUES('222-56-4736', 'JANE', 'DOE', 'Chicago', 'D447');

COMMIT;

3) USER2 also attempts to insert the same row (for whatever reason). The INSERT statement will succeed, since it is USER2�s Generation Log that is being updated and not the actual database.
4) When USER2 issues the COMMIT command, however, an error occurs:
Error state 40001. Transaction error - a possible conflict forced a rollback
since the key field (SSN) is being duplicated.
5) A ROLLBACK is then automatically issued for USER2's session, which means USER2�s view of the database is refreshed and now includes the new row inserted by USER1.
6) It is now up to the logic of the application to determine the next step: retry the INSERT (in this case, the INSERT will fail with a SAND CDBMS Error 23500 Unique constraint violation - duplicate value), ignore the error and move on, or perform a different action based on the error.

Null Constraint Violation
A DBA user creates a state look-up table with two columns:
CREATE TABLE state_lookup (st_abbr CHAR(2), st_name VARCHAR(20));
The intention is to insert the appropriate information for all US states (NJ New Jersey, FL Florida, NY New York, and so on).
A business decision is subsequently made to add a REGION column; each state must be in a region. It is also decided to add US territories and possessions (for example, Puerto Rico, US Virgin Islands, and Guam) to the state_lookup table.

1) USER1 connects in order to start inserting the territories and possessions into the database.
2) Meanwhile, the DBA connects and alters the state_lookup table to add the REGION column:

ALTER TABLE state_lookup ADD COLUMN region VARCHAR(20) NOT NULL;

COMMIT WORK;

3) USER1 then issues the INSERT statements to add the territories and possessions:
INSERT INTO state_lookup(st_abbr, st_name) VALUES('PR', 'Puerto Rico');
INSERT INTO state_lookup(st_abbr, st_name) VALUES('VI', 'US Virgin Islands');
INSERT INTO state_lookup(st_abbr, st_name) VALUES('GU', 'Guam');
�
�
�
4) USER1 issues a COMMIT WORK;
5) Since NULL values will be inserted in the REGION column for the new rows, the COMMIT WORK will fail, and the 40001 transaction error message will be returned.

Column values changed by another user session
This error occurs when two applications attempt to update the same row at the same time.
Assume that in the employee table described above (for the UNIQUE constraint violation example), a record for "Jane Doe" already exists:

INSERT INTO employee(ssn f_name, l_name, city, dept) VALUES('222-56-4736', 'JANE', 'DOE', 'Chicago', 'D442');

COMMIT WORK;
1) USER1 and USER2 connect to the database instance.
2) USER1 issues this command:

UPDATE employee SET dept='D555';

COMMIT WORK;

3) USER2 then issues the following command:

UPDATE employee SET l_name='JONES';

COMMIT WORK;

4) After executing the COMMIT WORK, USER2 will receive the 40001 transaction error. This is because the database has changed as a result of USER1�s COMMIT WORK, but the change was not reflected in USER2�s generation log.
USER2�s COMMIT WORK causes their generation log to be merged with the database. SAND CDBMS determines that a change has been made to the row being merged, but also that USER2 is not aware of what that change is or how it may effect what they are trying to accomplish. Therefore, SAND CDBMS returns error 40001 to tell USER2 (or the application) that the data has changed and that USER2�s update will be rolled back.
5) At this point, a decision needs to be made: should USER2 reissue the update statement? Select the row to see what changed? Just move on and accept the error? Take some other action?

Attempting to add a constraint

This is a special case where a transaction error is returned even if no COMMIT is issued. This occurs if a user attempts to add a constraint to a table while there are other user sessions active.

Assume that the database instance contains the state_lookup table from the null constraint violation example above.

1) USER1 and USER2 connect to the database.

2) USER1 then issues the following command:

ALTER TABLE state_lookup ADD UNIQUE (st_abbr);

3) After executing the command, USER1 will receive the following error:

Error state 0B001. Exclusive use not possible while other transactions are active

This error is returned because USER2 is connected to the same database instance. Even if the state_lookup table is not being accessed in USER2's transaction, error 0B001 is returned in USER1's transaction simply because USER1 does not have exclusive use of the database instance. The only way for USER1 to add the constraint to the state_lookup table is to wait for USER2 to disconnect their session.

Further Example Scenarios

These scenarios involve two users, USER1 and USER2. In the diagrams, time proceeds from the top downwards; transaction boundaries are in boldface. Note that initial transactions are preceded by a ROLLBACK WORK command which ends the previous transaction, and refreshes the user's private version of the database with the current public generation (CPG).
Scenario 1:
connect: USER1
                                         connect: USER2
ROLLBACK WORK;
CREATE TABLE unique_tst(keycol INT,
nonkey INT, PRIMARY KEY (keycol));
COMMIT WORK;
                                         ROLLBACK WORK;
                                         INSERT INTO
                                         unique_tst
                                         (keycol) VALUES
                                         (123);
                                         COMMIT WORK;

SET TRANSACTION READ WRITE; SELECT * FROM unique_tst;
1 row selected
keycol   nonkey
------- -------
  123
INSERT INTO unique_tst (keycol) VALUES (123);
Error state 23500. Unique constraint violation - duplicate value
COMMIT WORK;

Explanation
1) USER1 and USER2 connect.
2) USER1 ends a transaction with ROLLBACK WORK, then creates a table called unique_tst, and commits the transaction successfully. It is now part of the current public generation CPG1.
3) Then USER2 ends a transaction with a ROLLBACK WORK, so the first subsequent statement (INSERT) starts a private generation based upon CPG1. The insert succeeds in putting a row into unique_tst with the value 123 in the column keycol and a null value in the nonkey column. The COMMIT WORK successfully ends transaction 2 for USER2 and the new record shows up in the new current public generation CPG2.
4) USER1 starts a new transaction with the SET TRANSACTION READ WRITE statement. USER1 then executes the first statement of their third transaction (a SELECT). Since this is the "beginning" of the transaction, the private generation is based upon CPG2. Therefore, USER1 sees the record inserted previously by USER2.
5) The next statement (INSERT) fails with a "unique constraint violation" error — the correct result, as USER1 sees what is in CPG2, a record with 123 in the keycol column.

Scenario 2 (a variation on scenario 1):
connect: USER1
                                      connect: USER2
ROLLBACK WORK;
CREATE TABLE unique_tst(keycol INT,
nonkey INT, PRIMARY KEY (keycol));
COMMIT WORK;
SELECT * FROM unique_tst;
0 rows selected
                                     ROLLBACK WORK;
                                    INSERT INTO unique_tst
                                    (keycol) VALUES (123);
                                     COMMIT WORK;
SELECT * FROM unique_tst;
0 rows selected
INSERT INTO unique_tst (keycol) VALUES (123);
COMMIT WORK;
Error state 40001. Transaction error - a possible conflict forced a rollback
INSERT INTO unique_tst (keycol) VALUES (123);
Error state 23500. Unique constraint violation - duplicate value

Explanation
1) USER1 and USER2 connect.
2) USER1 ends a transaction with ROLLBACK WORK, then creates a table called unique_tst and commits the transaction successfully. It is now part of the current public generation CPG1.
3) USER1 issues a select against unique_tst. The result is 0 rows selected. This also has the effect of basing the current transaction (the third transaction for USER1) on CPG1.
4) USER2 ends a transaction with a ROLLBACK WORK. USER2's first statement (INSERT), starts a private generation based upon CPG1. The insert succeeds in putting a row into unique_tst with the value 123 in the column keycol and a null value in the nonkey column. The COMMIT WORK successfully ends transaction 2 for USER2 and so the new record shows up in the new current public generation, CPG2.
5) USER1 executes the second statement of a transaction, another select returning 0 rows. (Recall that the private generation for USER is based upon CPG1 and not CPG2.)
6) The next statement (INSERT) succeeds, unlike in scenario 1. The reason for this is that the current transaction is based on CPG1, which shows no records in the unique_tst table. The COMMIT WORK, however, fails with the accurate transaction error message. (SAND CDBMS has to deal with the ambiguity of two reasons for transaction failure here: the failure of the prior SELECTs, and the failure of the INSERT. The read failure occurs first and therefore is reported.)
7) Due to the COMMIT WORK that was just executed, the subsequent transaction for USER1 works from CPG2. The retry of the INSERT fails this time, as it should, with the message "unique constraint violation."

Note that if the SELECT statements had been executed against some other table that had not changed, the transaction would have failed due to the merging of the INSERT with the previous INSERT. The message would have been "unique constraint violation..."

Scenario 3 (a variation on scenario 2):
connect: USER1
                                      connect: USER2
ROLLBACK WORK;
CREATE TABLE unique_tst(keycol INT,
nonkey INT, PRIMARY KEY (keycol));
COMMIT WORK;
INSERT INTO unique_tst(keycol,nonkey)
VALUES (456,456);
                                     ROLLBACK WORK;
                                     INSERT INTO unique_tst
                                     (keycol) VALUES(123);
                                     COMMIT WORK;
INSERT INTO unique_tst VALUES (123, 123);
COMMIT WORK;
Error state 40001. Transaction error - a possible conflict forced a rollback
INSERT INTO unique_tst VALUES (123, 123);
Error state 23500. Unique constraint violation - duplicate value
In this situation (unlike scenario 2), there are no SELECT statements in transaction 3 for USER1. In place of the first SELECT, however, there is an INSERT.
Both INSERTS for USER1 succeed within the context of the private generation, but the COMMIT WORK fails with a "transaction error " message. When USER1 commits this transaction, the two INSERTs must be merged together with the previous insert from USER2. The first INSERT would actually have succeeded; however, the second INSERT merged against the INSERT from USER2 causes the transaction to fail with the message "unique constraint violation".

SAND CDBMS Administration Guide SAND CDBMS Concurrency Control

SAND CDBMS Administration Guide
SAND CDBMS Concurrency Control