Isolation

Why do we need concurrent execution?

  • Improved throughput and resource utilization: A transaction consists of many steps. Some involve I/O activity; others involve CPU activity. The CPU and the disks in a computer system can operate in parallel. Therefore, I/O activity can be done in parallel with processing at the CPU. The parallelism of the CPU and the I/O system can therefore be exploited to run multiple transactions in parallel. While a read or write on behalf of one transaction is in progress on one disk, another transaction can be running in the CPU, while another disk may be executing a read or write on behalf of a third transaction. All of this increases the throughput of the system—that is, the number of transactions executed in a given amount of time. Correspondingly, the processor and disk utilization also increase; in other words, the processor and disk spend less time idle, or not performing any useful work.

  • Reduced waiting time: There may be a mix of transactions running on a system, some short and some long. If transactions run serially, a short transaction may have to wait for a preceding long transaction to complete, which can lead to unpredictable delays in running a transaction. If the transactions are operating on different parts of the database, it is better to let them run concurrently, sharing the CPU cycles and disk accesses among them. Concurrent execution reduces the unpredictable delays in running transactions. Moreover, it also reduces the average response time: the average time for a transaction to be completed after it has been submitted.

Concurrency: CPU is switching between transactions while executing them. Higher degree of concurrency =>

  • Pros: more processor and disk utilization, more throughput, less average response time

  • Cons: can lead database to inconsistent state if concurrency control policies are not used

Concurrency Control

If every transaction has the property that it maintains database consistency if executed alone, then serializability ensures that concurrent executions maintain consistency.

The concurrency-control component of the database system ensure that any schedule that is executed will leave the database in a consistent state.

  • A transaction is composed of sequence of instructions.

  • If two transactions run concurrently (i.e. CPU is switching between transactions while executing them), their instructions will get interleaved.

  • The sequence of instructions written in chronological order in which they are executed in the system is called schedule.

  • A schedule will be called a serial schedule if it consists of a sequence of instructions from various transactions, where the instructions belonging to one single transaction appear together in that schedule.

  • If a schedule S, in some sense, is equivalent to some serial schedule Y, than we say that schedule S is a serializable schedule.

How to determine if a schedule is serializable

Equivalence can have multiple meanings. Here we will focus on conflict equivalence.

Consider two instructions I (from transaction T1) and J (from transaction T2)

  • If I and J are operating on different data, their order of execution does not matter.

  • If I and J are both operating on same data but both are read instruction, their order of execution doesn't matter.

  • If I and J are both operating on same data and at least one of them is a 'write' instruction, then their order does matter.

    We say that I and J conflict if they are

    i) operations by different transactions
    ii) on the same data item,
    iii) and at least one of these instructions is a write operation

    otherwise they are nonconflicting instructions (both read instruction or both operating on different data)

If a schedule S can be transformed into a schedule S′ by a series of swaps of nonconflicting instructions, we say that S and S′ are conflict equivalent

A schedule S is conflict serializable if it is conflict equivalent to a serial schedule.

We now present a simple and efficient method for determining conflict serializability of a schedule.

Consider a schedule S. We construct a directed graph G, called a precedence graph, using S. The set of vertices (N1, N2, etc) represents all the transactions (T1, T2, etc) participating in the schedule. The set of edges consists of all edges Ni → Nj for which one of three conditions holds:

  • Some instruction in Ti executes write(Q) before Some instruction in Tj executes read(Q).

  • Some instruction in Ti executes read(Q) before Some instruction in Tj executes write(Q).

  • Some instruction in Ti executes write(Q) before Some instruction in Tj executes write(Q).

    Note: It is not "just" before.

If an edge Ni → Nj exists in the precedence graph, then, in any serial schedule S′ equivalent to S, Ti must appear before Tj.

If the precedence graph for S has a cycle, then schedule S is not conflict serializable. If the graph contains no cycles, then the schedule S is conflict serializable.

A serializability order of the transactions can be obtained using topologically sorting of precedence graph.

Recoverable Schedules

Consider the partial schedule 9, in which T7 is a transaction that performs only one instruction: read(A). We call this a partial schedule because we have not included a commit or abort operation for T6. Notice that T7 commits immediately after executing the read(A) instruction. Thus, T7 commits while T6 is still in the active state. Now suppose that T6 fails before it commits. T7 has read the value of data item A written by T6. Therefore, we say that T7 is dependent on T6. Because of this, we must abort T7 to ensure atomicity. However, T7 has already committed and cannot be aborted. Thus, we have a situation where it is impossible to recover correctly from the failure of T6. Schedule 9 is an example of a nonrecoverable schedule. A recoverable schedule is one where, for each pair of transactions Ti and Tj such that Tj reads a data item previously written by Ti , the commit operation of Ti appears before the commit operation of Tj . For the example of schedule 9 to be recoverable, T7 would have to delay committing until after T6 commits.

Cascadeless Schedules

Even if a schedule is recoverable, to recover correctly from the failure of a transaction Ti , we may have to roll back several transactions. Such situations occur if transactions have read data written by Ti . As an illustration, consider the partial schedule 10. Transaction T8 writes a value of A that is read by transaction T9. Transaction T9 writes a value of A that is read by transaction T10. Suppose that, at this point, T8 fails. T8 must be rolled back. Since T9 is dependent on T8, T9 must be rolled back. Since T10 is dependent on T9, T10 must be rolled back. This phenomenon, in which a single transaction failure leads to a series of transaction rollbacks, is called cascading rollback.

Cascading rollback is undesirable, since it leads to the undoing of a significant amount of work. It is desirable to restrict the schedules to those where cascading rollbacks cannot occur. Such schedules are called cascadeless schedules. Formally, a cascadeless schedule is one where, for each pair of transactions Ti and Tj such that Tj reads a data item previously written by Ti , the commit operation of Ti appears before the read operation of Tj . It is easy to verify that every cascadeless schedule is also recoverable.

Last updated