在安装11.2.0.2 RAC的时候,第一步安装Grid,在第二个节点上运行root.sh的时候,报错如下:
Start of resource "ora.ctssd" failed
CRS-2672: Attempting to start 'ora.ctssd' on 'xsh-server2'
CRS-2674: Start of 'ora.ctssd' on 'xsh-server2' failed
CRS-4000: Command Start failed, or completed with errors.
Cluster Time Synchronisation Service start in exclusive mode failed at /u01/app/11.2.0/grid/crs/install/crsconfig_lib.pm line 6455.
/u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/rootcrs.pl execution failed
从报错信息上看是ctssd进程启动失败(在这之前会显示cssd进程启动成功,这与MOS上的其它一些第二节点运行root.sh失败的情形是不一样的,那些场景在cssd进程启动的时候就失败了),查看ctssd进程的启动log(位于$GRID_HOME/log/ctssd目录下),发现如下错误信息。
2010-11-12 18:55:46.132: [ GIPC][2424495392] gipcCheckInitialization: possible incompatible non-threaded init from [prom.c : 687], original from [clsss.c : 5325]
[ default][2424495392]Failure 4 in trying to open SV key SYSTEM.version.localhost
[ default][2424495392]procr_open_key error 4 errorbuf : PROCL-4: The local registry key to be operated on does not exist.
2010-11-12 18:55:46.135: [ CTSS][2424495392]clsctss_r_av2: Error [3] retrieving Active Version from OLR. Returns [19].
2010-11-12 18:55:46.138: [ CTSS][2424495392](:ctss_init16:): Error [19] retrieving active version. Returns [19].
2010-11-12 18:55:46.138: [ CTSS][2424495392]ctss_main: CTSS init failed [19]
2010-11-12 18:55:46.138: [ CTSS][2424495392]ctss_main: CTSS daemon aborting [19].
2010-11-12 18:55:46.138: [ CTSS][2424495392]CTSS daemon aborting
从crsctl命令中也可以看出ora.cssd启动成功,但是ora.ctssd是OFFLINE状态。
$ crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 OFFLINE OFFLINE
ora.cluster_interconnect.haip
1 OFFLINE OFFLINE
ora.crf
1 OFFLINE OFFLINE
ora.crsd
1 OFFLINE OFFLINE
ora.cssd
1 ONLINE ONLINE xsh-server2
ora.cssdmonitor
1 ONLINE ONLINE xsh-server2
ora.ctssd
1 ONLINE OFFLINE
ora.diskmon
1 ONLINE ONLINE xsh-server2
ora.drivers.acfs
1 OFFLINE OFFLINE
ora.evmd
1 OFFLINE OFFLINE
ora.gipcd
1 ONLINE ONLINE xsh-server2
ora.gpnpd
1 ONLINE ONLINE xsh-server2
ora.mdnsd
1 ONLINE ONLINE xsh-server2
此时如果用此命令查看第一个节点的状况会发现所有资源都是正常ONLINE的。继续检查cssd.log(位于$GRID_HOME/log/cssd目录中),显示在发现ASM磁盘的时候报错。
2010-11-12 13:44:30.505: [ SKGFD][1087203648]UFS discovery with :ORCL:VOL*:
2010-11-12 13:44:30.505: [ SKGFD][1087203648]OSS discovery with :ORCL:VOL*:
2010-11-12 13:44:30.505: [ SKGFD][1087203648]Discovery with asmlib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: str :ORCL:VOL*:
2010-11-12 13:44:30.505: [ SKGFD][1087203648]Fetching asmlib disk :ORCL:VOL1:
2010-11-12 13:44:30.505: [ SKGFD][1087203648]Fetching asmlib disk :ORCL:VOL2:
2010-11-12 13:44:30.505: [ SKGFD][1087203648]Fetching asmlib disk :ORCL:VOL3:
2010-11-12 13:44:30.505: [ SKGFD][1087203648]ERROR: -15(asmlib ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so op asm_open error Operation not permitted
)
2010-11-12 13:44:30.505: [ SKGFD][1087203648]ERROR: -15(asmlib ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so op asm_open error Operation not permitted
)
2010-11-12 13:44:30.505: [ SKGFD][1087203648]ERROR: -15(asmlib ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so op asm_open error Operation not permitted
值得注意的是,这样的报错在第一个节点上也同样存在,但是第一个节点上所有的资源包括ASM磁盘组却都是正常运行的。
对于以上cssd.log中的错误,按照MOS Note [1050164.1]处理,修改/etc/sysconfig/oracleasm-_dev_oracleasm文件,指定ASMLib在发现磁盘的时候需要忽略的盘和需要检查的盘。在我们的环境中是使用了Multipath来对多块磁盘做多路径处理,因此需要包括dm开头的磁盘,而忽略sd开头的磁盘。这样的问题也应该只会发生在使用了Multipath的磁盘上。
# ORACLEASM_SCANORDER: Matching patterns to order disk scanning
ORACLEASM_SCANORDER="dm"
# ORACLEASM_SCANEXCLUDE: Matching patterns to exclude disks from scan
ORACLEASM_SCANEXCLUDE="sd"
可以通过以下方法来确认是否遭遇了此问题。
# ls -l /dev/oracleasm/disks
brw-rw---- 1 oracle dba 3, 65 May 14 12:08 CRSVOL
# cat /proc/partitions
3 65 4974448 sda
253 1 4974448 dm-1
在上面可以看到CRSVOL这个用oracleasm创建的ASM磁盘的major和minor号分别是3,65,而这正是/dev/sda的号,并不是/dev/dm-1的号,所以表示在创建ASM磁盘组的时候并没有使用到Multipath设备。通常情况下,在节点1上是正确的,而在节点2上不正确的,因此出现了问题。
在处理完以上问题以后,必须要对grid环境做deconfig再reconfig,而不能只是在失败节点上重新运行root.sh(我在这里耗费了大量时间),重新配置grid的步骤可以参考MOS Note [942166.1] – How to Proceed from Failed 11gR2 Grid Infrastructure (CRS) Installation。之后root.sh顺利在第二节点上运行成功。
在错误解决以后,回顾之前的安装信息,可以发现虽然第一个节点显示所有资源都正常,但是和正常的root.sh运行信息相比则缺少了几行显示。
正常的信息如下:
# $GRID_HOME/root.sh
Running Oracle 11g root script...
The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/app/11.2.0/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
LOCAL ADD MODE
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
OLR initialization - successful
Adding daemon to inittab
ACFS-9200: Supported
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
CRS-2672: Attempting to start 'ora.mdnsd' on 'xsh-server1'
CRS-2676: Start of 'ora.mdnsd' on 'xsh-server1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'xsh-server1'
CRS-2676: Start of 'ora.gpnpd' on 'xsh-server1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'xsh-server1'
CRS-2672: Attempting to start 'ora.gipcd' on 'xsh-server1'
CRS-2676: Start of 'ora.cssdmonitor' on 'xsh-server1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'xsh-server1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'xsh-server1'
CRS-2672: Attempting to start 'ora.diskmon' on 'xsh-server1'
CRS-2676: Start of 'ora.diskmon' on 'xsh-server1' succeeded
CRS-2676: Start of 'ora.cssd' on 'xsh-server1' succeeded
ASM created and started successfully.
Disk Group CRSDG created successfully.
clscfg: -install mode specified
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Successful addition of voting disk 67463e71af084f76bf98b3ee55081e40.
Successfully replaced voting disk group with +CRSDG.
CRS-4266: Voting file(s) successfully replaced
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 67463e71af084f76bf98b3ee55081e40 (ORCL:VOL1) [CRSDG]
Located 1 voting disk(s).
CRS-2672: Attempting to start 'ora.asm' on 'xsh-server1'
CRS-2676: Start of 'ora.asm' on 'xsh-server1' succeeded
CRS-2672: Attempting to start 'ora.CRSDG.dg' on 'xsh-server1'
CRS-2676: Start of 'ora.CRSDG.dg' on 'xsh-server1' succeeded
ACFS-9200: Supported
ACFS-9200: Supported
CRS-2672: Attempting to start 'ora.registry.acfs' on 'xsh-server1'
CRS-2676: Start of 'ora.registry.acfs' on 'xsh-server1' succeeded
Preparing packages for installation...
cvuqdisk-1.0.9-1
Configure Oracle Grid Infrastructure for a Cluster ... succeeded
而之前的信息则缺少了以下4行。
LOCAL ADD MODE
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
OLR initialization - successful
Oracle显然不会承认这是bug,好吧,解决问题就好。
2 Comments Add yours