Ocfs2 timeout

From CoolSolutionsWiki

-my other wikis

Contents

Error

Jan 16 18:24:59 hql4 kernel: (3980,0):o2net_connect_expired:1570 ERROR: no connection established with node 0 after 10.0 seconds, giving up and returning errors.

Symptom

SLES 10 SP1 Server would boot, but not automount the ocfs2 partition. After the boot, one could simply type mount -a, which would auto mount everything in the fstab


Facts

  1. /etc/fstab
    1. /dev/emcpowerb1 /vservers ocfs2 _netdev 0 0

Troubleshooting

  1. dmesg | less then once in less type / and search for o2cb

Fix

  1. Modify the gedit /etc/sysconfig/o2cb
  2. We put in the following parameters. Your system may require diff. parameters
    1. O2CB_HEARTBEAT_THRESHOLD=61
    2. O2CB_IDLE_TIMEOUT_MS=30000
  3. On each and every node, do the following - or simply reboot all ocfs2 nodes after you modify the o2cb file on EVERY server
    1. umount each and every ocfs2 partition on each and every node.
    2. rco2cb stop
    3. rcocfs2 stop
    4. rco2cb start
    5. rcocfs2 start


Tried

  1. Tried modifying the boot parameter to only allow one processor - so slow things down.
  2. Tried disabling apparmor and firewall
  3. Considered removing NIC bonding (NIC teaming)
  4. Tried without XEN kernel boot