Oracle 10g RAC Cluster interconnects

0 comments

Oracle 10g RAC Cluster interconnects

Posted by Arul Ramachandran on January 3, 2008

Yet another post on … say, you’ve just joined a new client, you are in a new RAC environment and you are trying to figure out the different components of the cluster.

This time it is on the Cluster private interconnects.

The private interconnects used in a RAC are typically configured during the Clusterware install and would be registered in OCR. However this default config could thereafter be overrided using cluster_interconnects initialization parameter (not recommended though ) for various reasons.

In order to troubleshoot interconnect related issues, a starting point would be to find out what interconnects were originally configured and/or what interconnects are presently in use by the cluster. Here are a few ways to do this.

1. Using the dynamic view gv$cluster_interconnects:

select * from gv$cluster_interconnects ;

INST_ID NAME IP_ADDRESS IS_ SOURCE
---------- --------------- ---------------- --- -------------------------------
1 eth0 192.168.10.1 NO Oracle Cluster Repository
1 eth1 192.168.11.1 NO Oracle Cluster Repository
2 eth0 192.168.10.2 NO Oracle Cluster Repository
2 eth1 192.168.11.2 NO Oracle Cluster Repository

In the above output, the column SOURCE indicates where the private interconnect info was derived from; this column could be one of OCR, OS dependent software or cluster_interconnects parameter.

2. Using the clusterware command oifcfg:

$oifcfg getif
eth2 10.104.95.0 global public
eth0 192.168.10.0 global cluster_interconnect
eth1 192.168.11.0 global cluster_interconnect

3. Using oradebug ipc:

sqlplus “/ as sysdba”

SQL>oradebug setmypid
Statement processed.
SQL>oradebug ipc
Information written to trace file.

The above command would dump a trace to user_dump_dest. The last few lines of the trace would indicate the IP of the cluster interconnect. Below is a sample output of those lines.

From the trace file on node1:

SSKGXPT 0x5edf558 flags SSKGXPT_READPENDING socket no 9 IP 192.168.11.1 UDP 18852

From the trace file on node2:

SSKGXPT 0x5edf558 flags SSKGXPT_READPENDING socket no 9 IP 192.168.10.2 UDP 38967

Change or switch undo tablespace in Oracle database

0 comments

Change or switch undo tablespace in Oracle database


http://www.oracleflash.com/32/Change-or-switch-undo-tablespace-in-Oracle-database.html

Posted by Zahid Qureshi on July 8, 2010.

This is a mostly asked question on different oracle forums; How do I change the UNDO tablespace for my database and drop the old one? and the answer often is; Its very simple daa. But I don't think its that simple. The complexity comes in if there were pending transactions in your old undo tablespace and you have already switched to the new one and when you try to drop the old one it says "Tablespace is in use".

How to switch the database to a new UNDO tablespace and drop the old one

$ sqlplus / as sysdba

SQL> show parameter undo

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
undo_management string AUTO
undo_retention integer 900
undo_tablespace string UNDOTBS1
SQL>

The current undo tablespace as suggested by the initialization parameter undo_tablespace is UNDOTBS1. Leave this sysdba as is, open another console, log in as user SCOTT and initiate a transaction.

$ sqlplus scott/tiger

update emp set sal = sal + 1000 where empno=7839;

1 row updated.

With an update on emp table we have initiated a transaction. The undo data is written to a segment in the UNDOTBS1 tablespace. Now leave this SCOTT's session intact and go back to the sysdba console without issuing any COMMIT or ROLLBACK.

CREATE UNDO TABLESPACE undotbs2
DATAFILE '/d01/apps/oradata/oraxpo/undotbs201.dbf'
SIZE 50M AUTOEXTEND ON NEXT 5M;

Tablespace created.

-- We created a new UNDO tablespace named UNDOTBS2

ALTER SYSTEM SET UNDO_TABLESPACE=UNDOTBS2 SCOPE=BOTH;



System altered.

-- Switch the database to the new UNDO tablespace.

SQL> DROP TABLESPACE undotbs1 INCLUDING CONTENTS AND DATAFILES;
DROP TABLESPACE undotbs1 INCLUDING CONTENTS AND DATAFILES
*
ERROR at line 1:
ORA-30013: undo tablespace 'UNDOTBS1' is currently in use

-- Try to drop the tablespace but failed.

SQL>

With the alter system set undo_tablespace=UNDOTBS2, the database UNDO tablespace is changed and any new transaction's undo data will go to the new tablespace i.e. UNDOTBS2. But the undo data for already pending transaction (e.g. the one initiated by SCOTT before the database UNDO tablespace switch) is still in the old tablespace with a status of PENDING OFFLINE. As far as it is there you cannot drop the old tablespace.

set lines 10000
column name format a10

SELECT a.name,b.status
FROM v$rollname a,v$rollstat b
WHERE a.usn = b.usn
AND a.name IN (
SELECT segment_name
FROM dba_segments
WHERE tablespace_name = 'UNDOTBS1'
);

NAME STATUS
---------- ---------------
_SYSSMU8$ PENDING OFFLINE

The above query shows the name of the UNDO segment in the UNDOTBS1 tablespace and its status. Now lets see which users/sessions are running this pending transaction.

column username format a6

SELECT a.name,b.status , d.username , d.sid , d.serial#
FROM v$rollname a,v$rollstat b, v$transaction c , v$session d
WHERE a.usn = b.usn
AND a.usn = c.xidusn
AND c.ses_addr = d.saddr
AND a.name IN (
SELECT segment_name
FROM dba_segments
WHERE tablespace_name = 'UNDOTBS1'
);

NAME STATUS USERNA SID SERIAL#
---------- --------------- ------ ---------- ----------
_SYSSMU8$ PENDING OFFLINE SCOTT 147 4

So this is SCOTT with SID=147 and SERIAL#=4. Since we know now the user, we can go to him/her and request to end the transaction gracefully i.e. issue a ROLLBACK or COMMIT. However, if this is not possible (say the user initiated the transaction and left for annual leave :) and trust me this happens) you may go ahead and kill the session to release the undo segments in the UNDOTBS1 tablespace.

SQL> alter system kill session '147,4';

System altered.

SELECT a.name,b.status , d.username , d.sid , d.serial#
FROM v$rollname a,v$rollstat b, v$transaction c , v$session d
WHERE a.usn = b.usn
AND a.usn = c.xidusn
AND c.ses_addr = d.saddr
AND a.name IN (
SELECT segment_name
FROM dba_segments
WHERE tablespace_name = 'UNDOTBS1'
);

no rows selected

As we can see once the session is kills we don't see anymore segments occupied in the UNDOTBS1 tablespace. Lets drop UNDOTBS1.

SQL> DROP TABLESPACE undotbs1 INCLUDING CONTENTS AND DATAFILES;
DROP TABLESPACE undotbs1 INCLUDING CONTENTS AND DATAFILES
*
ERROR at line 1:
ORA-30013: undo tablespace 'UNDOTBS1' is currently in use

If you are retaining undo data then you still won't be able to drop the tablespace because it is still in use by undo_retention. Let the UNDO_RETENTION time pass and then try to drop the tablespace. In my case it is 900 seconds i.e. 15 minutes.

-- After 15 minutes.
SQL> DROP TABLESPACE undotbs1 INCLUDING CONTENTS AND DATAFILES;

Tablespace dropped.

from webpage:
http://www.oracleflash.com/32/Change-or-switch-undo-tablespace-in-Oracle-database.html

RAC IMPORTANT NOTES

0 comments

What are Oracle Clusterware processes for 10g on Unix and Linux

Cluster Synchronization Services (ocssd) Manages cluster node membership and runs as the oracle user; failure of this process results in cluster restart.

Cluster Ready Services (crsd) — The crs process manages cluster resources (which could be a database, an instance, a service, a Listener, a virtual IP (VIP) address, an application process, and so on) based on the resource's configuration information that is stored in the OCR. This includes start, stop, monitor and failover operations. This process runs as the root user

Event manager daemon (evmd) —A background process that publishes events that crs creates.

Process Monitor Daemon (OPROCD) —This process monitor the cluster and provide I/O fencing. OPROCD performs its check, stops running, and if the wake up is beyond the expected time, then OPROCD resets the processor and reboots the node. An OPROCD failure results in Oracle Clusterware restarting the node. OPROCD uses the hangcheck timer on Linux platforms.

RACG (racgmain, racgimon) —Extends clusterware to support Oracle-specific requirements and complex resources. Runs server callout scripts when FAN events occur.

What are Oracle database background processes specific to RAC

•LMS—Global Cache Service Process

•LMD—Global Enqueue Service Daemon

•LMON—Global Enqueue Service Monitor

•LCK0—Instance Enqueue Process

To ensure that each Oracle RAC database instance obtains the block that it needs to satisfy a query or transaction, Oracle RAC instances use two processes, the Global Cache Service (GCS) and the Global Enqueue Service (GES). The GCS and GES maintain records of the statuses of each data file and each cached block using a Global Resource Directory (GRD). The GRD contents are distributed across all of the active instances.

What are Oracle Clusterware Components

Voting Disk — Oracle RAC uses the voting disk to manage cluster membership by way of a health check and arbitrates cluster ownership among the instances in case of network failures. The voting disk must reside on shared disk.

Oracle Cluster Registry (OCR) — Maintains cluster configuration information as well as configuration information about any cluster database within the cluster. The OCR must reside on shared disk that is accessible by all of the nodes in your cluster

How do you troubleshoot node reboot

Please check metalink ...

Note 265769.1 Troubleshooting CRS Reboots
Note.559365.1 Using Diagwait as a diagnostic to get more information for diagnosing Oracle Clusterware Node evictions.

How do you backup the OCR

There is an automatic backup mechanism for OCR. The default location is : $ORA_CRS_HOME\cdata\"clustername"\

To display backups :
#ocrconfig -showbackup
To restore a backup :
#ocrconfig -restore

With Oracle RAC 10g Release 2 or later, you can also use the export command:
#ocrconfig -export -s online, and use -import option to restore the contents back.
With Oracle RAC 11g Release 1, you can do a manaual backup of the OCR with the command:
# ocrconfig -manualbackup

How do you backup voting disk

#dd if=voting_disk_name of=backup_file_name

How do I identify the voting disk location

#crsctl query css votedisk

How do I identify the OCR file location

check /var/opt/oracle/ocr.loc or /etc/ocr.loc ( depends upon platform)
or
#ocrcheck

Is ssh required for normal Oracle RAC operation ?

"ssh" are not required for normal Oracle RAC operation. However "ssh" should be enabled for Oracle RAC and patchset installation.

What is SCAN?

Single Client Access Name (SCAN) is s a new Oracle Real Application Clusters (RAC) 11g Release 2 feature that provides a single name for clients to access an Oracle Database running in a cluster. The benefit is clients using SCAN do not need to change if you add or remove nodes in the cluster.

Click here for more details from Oracle

What is the purpose of Private Interconnect ?

Clusterware uses the private interconnect for cluster synchronization (network heartbeat) and daemon communication between the the clustered nodes. This communication is based on the TCP protocol.
RAC uses the interconnect for cache fusion (UDP) and inter-process communication (TCP). Cache Fusion is the remote memory mapping of Oracle buffers, shared between the caches of participating nodes in the cluster.

Why do we have a Virtual IP (VIP) in Oracle RAC?

Without using VIPs or FAN, clients connected to a node that died will often wait for a TCP timeout period (which can be up to 10 min) before getting an error. As a result, you don't really have a good HA solution without using VIPs.
When a node fails, the VIP associated with it is automatically failed over to some other node and new node re-arps the world indicating a new MAC address for the IP. Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients. This results in the clients getting errors immediately.

What do you do if you see GC CR BLOCK LOST in top 5 Timed Events in AWR Report?

This is most likely due to a fault in interconnect network.
Check netstat -s
if you see "fragments dropped" or "packet reassemblies failed" , Work with your system administrator find the fault with network.

How many nodes are supported in a RAC Database?

10g Release 2, support 100 nodes in a cluster using Oracle Clusterware, and 100 instances in a RAC database.

Srvctl cannot start instance, I get the following error PRKP-1001 CRS-0215, however sqlplus can start it on both nodes? How do you identify the problem?

Set the environmental variable SRVM_TRACE to true.. And start the instance with srvctl. Now you will get detailed error stack.

what is the purpose of the ONS daemon?

The Oracle Notification Service (ONS) daemon is an daemon started by the CRS clusterware as part of the nodeapps. There is one ons daemon started per clustered node.
The Oracle Notification Service daemon receive a subset of published clusterware events via the local evmd and racgimon clusterware daemons and forward those events to application subscribers and to the local listeners.

This in order to facilitate:

a. the FAN or Fast Application Notification feature or allowing applications to respond to database state changes.
b. the 10gR2 Load Balancing Advisory, the feature that permit load balancing accross different rac nodes dependent of the load on the different nodes. The rdbms MMON is creating an advisory for distribution of work every 30seconds and forward it via racgimon and ONS to listeners and applications.

from blog:
http://dbaregistry.blogspot.com/2010/02/oracle-rac-interview-questions.html