2011-09-20

fast_start_parallel_rollback

DATABASE HANG DUE TO PARALLEL TRANSACTION RECOVERY [ID 464246.1]

Symptoms

Database is hanging. Undo tablespace is growing.
The hang encountered because of parallel transaction recovery as the systemstate dump shows the significant waits for "Wait for a undo record" and "Wait for stopper event to be increased".

Cause

The systemstate dump shows the following waitevents:
SO: 70000008c9de498, type: 4, owner: 70000008c626cc0, flag: INIT/-/-/0x00
(session) sid: 1099 trans: 0, creator: 70000008c626cc0, flag: (51) USR/- BSY/-/-/-/-/-
DID: 0001-0008-00000003, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS
waiting for 'wait for stopper event to be increased' blocking sess=0x0 seq=82 wait_time=0
seconds since wait started=156
=0, =0, =0
Dumping Session Wait History
for 'wait for stopper event to be increased' count=1 wait_time=97720 =0, =0, =0
for 'wait for stopper event to be increased' count=1 wait_time=97681 =0, =0, =0
for 'wait for stopper event to be increased' count=1 wait_time=97676 =0, =0, =0
for 'wait for stopper event to be increased' count=1 wait_time=97676 ...
SO: 70000008c9d6a58, type: 4, owner: 70000008c628460, flag: INIT/-/-/0x00
(session) sid: 1087 trans: 0, creator: 70000008c628460, flag: (41) USR/- BSY/-/-/-/-/-
DID: 0001-000E-00000004, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS
O/S info: user: oracle, term: UNKNOWN, ospid: 2998446, machine: ct-db3
program: oracle@ct-db3 (P000)
waiting for 'wait for a undo record' blocking sess=0x0 seq=69 wait_time=0 seconds since wait
started=243 =0, =0, =0
Dumping Session Wait History
for 'wait for a undo record' count=1 wait_time=97677 =0, =0, =0
for 'wait for a undo record' count=1 wait_time=97675 =0, =0, =0
for 'wait for a undo record' count=1 wait_time=97673 =0, =0, =0
for 'wait for a undo record' count=1 wait_time=97682 =0, =0, =0
for 'wait for a undo record' count=1 wait_time=97679 ...
The above 2 wait events are mainly meant for parallel transaction recovery.

Solution

Solution
======
To disable the parallel rollback by setting the following parameter
fast_start_parallel_rollback = false
Explanation
========
Sometimes Parallel Rollback of Large Transaction may become very slow. After killing a large running transaction (either by killing the shadow process or aborting the database) then database seems to hang, or SMON and parallel query servers taking all the available CPU.

In fast-start parallel rollback, the background process SMON acts as a coordinator and rolls back a set of transactions in parallel using multiple server processes.

Fast start parallel rollback is mainly useful when a system has transactions that run a long time before a commit, especially parallel Inserts, Updates, Deletes operations. When SMON discovers that the amount of recovery work is above a certain threshold, it automatically begins parallel rollback by dispersing the work among several parallel processes.

There are cases where parallel transaction recovery is not as fast as serial transaction recovery, because the PQ slaves are interfering with each other. It looks like the changes made by this transaction cannot be recovered in parallel without causing a performance problem. The parallel rollback slave processes are most likely contending for the same resource, which results in even worse rollback performance compared to a serial rollback.

Niciun comentariu:

Trimiteți un comentariu