Oracle DBA Hanganalyze ,system state dump,wait_chains

0
190
eod_performance_oem_snapshot
eod_performance_oem_snapshot

Many times Oracle database get hangs due to locks or latching issue and Nobody can login to the system. It becomes critical and  And Usually we do shutdown abort to immediately resolve the issue. But  it is good to take system state dump/hanganalyze  at 1 min interval to analyze the locking or latching  issue afterwards before doing shutdown abort so that we can find the fix for the problem and avoid that in future.Oracle Support also usually ask for these when the database hangs for some reason

Here are the steps to take system state dump

In RAC system this need to be executed on all the instance

Syntax for system state dump:

Logon to sqlplus as sysdba
SQL> oradebug setmypid
SQL> oradebug unlimit
SQL> oradebug dump systemstate 266
…….. Wait at least 1 min
SQL> oradebug dump systemstate 266
…….. Wait at lease 1 min
SQL> oradebug dump systemstate 266
SQL> oradebug tracefile_name

With releases 11g & RAC and above You should attach to the DIAG process and change its file size limit

The following commands will create the system state dump on each node in the diag trace file.No need to execute on all instances

sqlplus "/ as sysdba"
select SPID from v$process where program like '%DIAG%';
oradebug setospid <OS_PID> -- pid of the diag process
oradebug unlimit
oradebug dump systemstate 266
oradebug tracefile_name
exit

Similarly Hanganalyze is a good utility to analysis.

eod_performance_oem_snapshot
eod_performance_oem_snapshot

Here are the steps

Syntax for hanganalyze for single instance

Logon to sqlplus as sysdba
oradebug setmypid;
oradebug unlimit;
oradebug hanganalyze 3;
oradebug tracefile_name

Syntax for hanganalyze for RAC

Logon to sqlplus as sysdba
oradebug setmypid;
oradebug unlimit;
oradebug setinst all
oradebug -g def hanganalyze 3
oradebug tracefile_name

Sometimes it is not possible to even login with sqlplus “/ as sysdba ” at the time of hang, In that situation we can use below command to enter with 11g onwards

sqlplus -prelim / as sysdba

V$wait_chains

From 11gR2 onwards,oracle has provided a dynamic performance view called v$wait_chains. This also contain same information which we gather by running the hanganalyze command. So instead of using hanganalyze, you can use the below query to find wait chains

This is done using dia0 background processes starts collecting hanganalyze information and stores this in memory in the “hang analysis cache”. It does this every 3 seconds for local hanganalyze information and every 10 seconds for global (RAC) hanganalyze information

There is no gv$ equivalent as v$wait_chains would report on multiple instances in a multi-instance (RAC) environment

There is no gv$ equivalent as v$wait_chains would report on multiple instances in a multi-instance (RAC) environment

SQL> SELECT chain_id, num_waiters, in_wait_secs, osid, blocker_osid, substr(wait_event_text,1,30)
 FROM v$wait_chains; 
SELECT decode( a.blocker_sid , NULL , '<chain id#' ||a.chain_id||'>' ) chain_id,
RPAD( '+' , LEVEL , '-' ) ||a.sid sid,
RPAD( ' ' , LEVEL , ' ' ) ||a.wait_event_text wait_event
FROM V$WAIT_CHAINS a
CONNECT BY PRIOR a.sid=a.blocker_sid
AND PRIOR a.sess_serial#=a.blocker_sess_serial#
AND PRIOR a.instance = a.blocker_instance START WITH a.blocker_is_valid='FALSE'
ORDER BY a.chain_id ,
LEVEL
/

Query for Top 100 wait chain processes

set pages 1000
set lines 120
set heading off
column w_proc format a50 tru
column instance format a20 tru
column inst format a28 tru
column wait_event format a50 tru
column p1 format a16 tru
column p2 format a16 tru
column p3 format a15 tru
column Seconds format a50 tru
column sincelw format a50 tru
column blocker_proc format a50 tru
column fblocker_proc format a50 tru
column waiters format a50 tru
column chain_signature format a100 wra
column blocker_chain format a100 wra

SELECT * 
FROM (SELECT 'Current Process: '||osid W_PROC, 'SID '||i.instance_name INSTANCE, 
 'INST #: '||instance INST,'Blocking Process: '||decode(blocker_osid,null,'',blocker_osid)|| 
 ' from Instance '||blocker_instance BLOCKER_PROC,
 'Number of waiters: '||num_waiters waiters,
 'Final Blocking Process: '||decode(p.spid,null,'',
 p.spid)||' from Instance '||s.final_blocking_instance FBLOCKER_PROC, 
 'Program: '||p.program image,
 'Wait Event: ' ||wait_event_text wait_event, 'P1: '||wc.p1 p1, 'P2: '||wc.p2 p2, 'P3: '||wc.p3 p3,
 'Seconds in Wait: '||in_wait_secs Seconds, 'Seconds Since Last Wait: '||time_since_last_wait_secs sincelw,
 'Wait Chain: '||chain_id ||': '||chain_signature chain_signature,'Blocking Wait Chain: '||decode(blocker_chain_id,null,
 '',blocker_chain_id) blocker_chain
FROM v$wait_chains wc,
 gv$session s,
 gv$session bs,
 gv$instance i,
 gv$process p
WHERE wc.instance = i.instance_number (+)
 AND (wc.instance = s.inst_id (+) and wc.sid = s.sid (+)
 and wc.sess_serial# = s.serial# (+))
 AND (s.final_blocking_instance = bs.inst_id (+) and s.final_blocking_session = bs.sid (+))
 AND (bs.inst_id = p.inst_id (+) and bs.paddr = p.addr (+))
 AND ( num_waiters > 0
 OR ( blocker_osid IS NOT NULL
 AND in_wait_secs > 10 ) )
ORDER BY chain_id,
 num_waiters DESC)
WHERE ROWNUM < 101;