2020-12-16 07:12

CP363 : Transactions

Database Concurrency is defined by the number of users who may access the database simultaneously:

single-user: at most one user can access the database - mostly restricted to PC based databases
multi-user: more than one user can access the database - 'real' databases

Transactions

A database transaction is a logical unit of database processing. It consists of a series of database access operations: insertion, deletions, selection, updating.

A transaction can be part of a user program or an interactive session.

Transactions can be read-only (selection) or updates (insertion, deletion, update).

Transactions have a clearly defined beginning and end. Example: an ATM transaction. There is an explicit login followed by a series of other transactions (deposit, withdrawl, transfer, balance) followed by an implicit logout. Each transaction must be finished before moving onto the next one.

At the physics level transactions involve a series of reads and writes to and from memory and disk. (We will worry about the physical layer in only a few special cases).

ACID properties. Transactions are:

Atomic - all or nothing
Consistency preservation - database state is consistent at beginning and end of transaction
- database fulfills all constraints at both begining and end of transaction
Isolation - transaction acts as though it is alone, and unaffected by concurrent transactions
- solves problems of dirty reads, lost updates, etc.
Durability - results in permanent changes to database

Concurrency Problems

Lost Update

Occurs when transactions are interleaved. For account balance updates, assume that two transactions ( A and B ) are submitted simultaneously.

A	B
read( N ) N = N - i
	read( N ) N = N + k
write( N ) (this update is lost)
	write( N )

Dirty Read

A read is done on an updated item before the transaction that updated the item fails.

A	B
read( N ) N = N - i write( N )
	read( N )
failure rollback( N )

The N read by B is 'dirty data'

Incorrect Summary

When an aggregate function is applied to a number of records that are undergoing updating by another transaction

A	B
	sum = 0 sum = sum + N1 sum = sum + N2
read( N5 ) N5 = N5 - i write( N5 )	...
	sum = sum + N5 sum = sum + N6
read( N6 ) N6 = N6 - i write( N6 )

(assume reads of the appropriate values are done before each summation)

Unrepeatable Read

A value is changed between two reads, meaning that you do not get the same value twice

A	B
	read( N ) display N
read( N ) N = n - i write( N )
	read( N ) N = N + j write( N )

The update is done to the 'wrong' value of N.

Transaction States

Begin Transaction
Read
Write
End Transaction - reads and writes have ended, must choose one of next two
Commit
Rollback

System Log

Log records are kept of every transaction, including:
- Transaction ID
- write( ID, old value, new value )
- read( ID, value )
- commit( ID )
- rollback( ID )
- checkpoint (more later) - when DBMS buffers are written to disk
allows for UNDO or REDO of information
reads are not necessarily always recorded (useful for auditing)
logs are kept on disk, and in case of a system crash only transactions written to disk with commit records are considered committed. All others are rolled back.
transaction log entries may interleave, if appropriate

Types of Failures

System Failures

database server is halted abnormally
SQL commands are halted abnormally
connections to clients are broken
memory buffers are lost
database files are not damaged with these failures

Media Failures

database files are damaged
may cause system failure

Log may recover from:

system failure - determines active transactions when failure occurred, and undo uncommitted updates. Recreates lost committed updates.
media failure - redo committed transactions lost since most recent backup
aborted transaction - undo updates done by a rolled back transaction

Logs are scanned from tail to head to create a list of committed transactions, and then from head to tail to commit those transactions.

Checkpoints reduce the amount of log data that must be scanned after a failure. Tail to head scan goes only until last checkpoint (after system failure).

Checkpoint algorithm:

prevent new transactions from starting, and wait for active transactions to finish
copy memory buffers to disk
write a checkpoint record in the log
allow new transactions to start

Concurrency

For performance reasons, a database processing several transactions simultaneously. However, transactions appear to be processed serially.

Database Concurrency is defined by the number of users who may access the database simultaneously:

single-user: at most one user can access the database - mostly restricted to PC based databases
multi-user: more than one user can access the database - 'real' databases