Ocfs2 timeout
From CoolSolutionsWiki
-my other wikis
Contents |
[edit]
Error
Jan 16 18:24:59 hql4 kernel: (3980,0):o2net_connect_expired:1570 ERROR: no connection established with node 0 after 10.0 seconds, giving up and returning errors.
[edit]
Symptom
SLES 10 SP1 Server would boot, but not automount the ocfs2 partition. After the boot, one could simply type mount -a, which would auto mount everything in the fstab
[edit]
Facts
- /etc/fstab
- /dev/emcpowerb1 /vservers ocfs2 _netdev 0 0
[edit]
Troubleshooting
- dmesg | less then once in less type / and search for o2cb
[edit]
Fix
- Modify the gedit /etc/sysconfig/o2cb
- We put in the following parameters. Your system may require diff. parameters
- O2CB_HEARTBEAT_THRESHOLD=61
- O2CB_IDLE_TIMEOUT_MS=30000
- On each and every node, do the following - or simply reboot all ocfs2 nodes after you modify the o2cb file on EVERY server
- umount each and every ocfs2 partition on each and every node.
- rco2cb stop
- rcocfs2 stop
- rco2cb start
- rcocfs2 start
[edit]
Tried
- Tried modifying the boot parameter to only allow one processor - so slow things down.
- Tried disabling apparmor and firewall
- Considered removing NIC bonding (NIC teaming)
- Tried without XEN kernel boot
