Jun  6 09:48:01 ehyperion354 Lustre: DEBUG MARKER: mds has failed over 6 times, and counting...
Jun  6 09:48:14 ehyperion354 LustreError: 10231:0:(ldlm_request.c:1039:ldlm_cli_cancel_req()) Got rc -11 from cancel RPC: canceling anyway
Jun  6 09:48:14 ehyperion354 Lustre: lustre-MDT0000-mdc-ffff81021762a000: Connection restored to service lustre-MDT0000 using nid 192.168.120.126@o2ib.
Jun  6 09:48:14 ehyperion354 LustreError: 10231:0:(ldlm_request.c:1597:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -11
Jun  6 09:50:44 ehyperion354 LustreError: 11-0: an error occurred while communicating with 192.168.120.126@o2ib. The ldlm_enqueue operation failed with -107
Jun  6 09:50:44 ehyperion354 Lustre: lustre-MDT0000-mdc-ffff81021762a000: Connection to service lustre-MDT0000 via nid 192.168.120.126@o2ib was lost; in progress operations using this service will wait for recovery to complete.
Jun  6 09:50:44 ehyperion354 LustreError: 167-0: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
Jun  6 09:50:44 ehyperion354 Lustre: Server lustre-MDT0000_UUID version (2.1.1.0) is much newer than client version (1.8.8)
Jun  6 09:50:44 ehyperion354 LustreError: 25302:0:(namei.c:256:ll_mdc_blocking_ast()) ### data mismatch with ino 144115305952584894/0 (ffff8101b4b8e9a0) ns: lustre-MDT0000-mdc-ffff81021762a000 lock: ffff810160e1f400/0x793342fb2430adf9 lrc: 3/0,0 mode: PR/PR res: 8589941618/9406 bits 0x1 rrc: 2 type: IBT flags: 0x2002c90 remote: 0xe77c1b87ad8301fe expref: -99 pid: 25116 timeout: 0
Jun  6 09:50:44 ehyperion354 LustreError: 25116:0:(mdc_locks.c:653:mdc_enqueue()) ldlm_cli_enqueue error: -4
Jun  6 09:50:44 ehyperion354 LustreError: 25116:0:(file.c:3331:ll_inode_revalidate_fini()) failure -4 inode 180355073
Jun  6 09:50:44 ehyperion354 LustreError: 25303:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID  req@ffff810162637400 x1403992855059565/t0 o101->lustre-MDT0000_UUID@192.168.120.126@o2ib:12/10 lens 544/1232 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0
Jun  6 09:50:44 ehyperion354 LustreError: 25303:0:(mdc_locks.c:653:mdc_enqueue()) ldlm_cli_enqueue error: -108
Jun  6 09:50:44 ehyperion354 LustreError: 25303:0:(file.c:3331:ll_inode_revalidate_fini()) failure -108 inode 144115305952584894
Jun  6 09:50:44 ehyperion354 LustreError: 25304:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID  req@ffff81022a8d6800 x1403992855059566/t0 o101->lustre-MDT0000_UUID@192.168.120.126@o2ib:12/10 lens 544/1232 e 0 to 1 dl 0 ref 1 fl Rpc:/0/0 rc 0/0
Jun  6 09:50:44 ehyperion354 LustreError: 25302:0:(namei.c:256:ll_mdc_blocking_ast()) Skipped 767 previous similar messages
Jun  6 09:50:44 ehyperion354 Lustre: lustre-MDT0000-mdc-ffff81021762a000: Connection restored to service lustre-MDT0000 using nid 192.168.120.126@o2ib.
Jun  6 09:51:14 ehyperion354 LustreError: 17936:0:(o2iblnd_cb.c:2534:kiblnd_rejected()) 192.168.117.3@o2ib rejected: o2iblnd fatal error
Jun  6 09:51:14 ehyperion354 LustreError: 17936:0:(o2iblnd_cb.c:2534:kiblnd_rejected()) Skipped 39 previous similar messages
Jun  6 09:57:22 ehyperion354 mrshd[25309]: root@hyperion318.llnl.gov as root: cmd='PATH=/admin/scripts:/admin/bin:/bin:/usr/bin:/sbin:/usr/sbin;(PATH=$PATH:/usr/lib64/lustre/utils:/usr/lib64/lustre/tests:/sbin:/usr/sbin; cd /usr/lib64/lustre/tests; sh -c "/usr/sbin/lctl mark Duration:               86400 Server failover period: 600 seconds Exited after:           3041 seconds Number of failovers before exit: mds: 6 times ost1: 0 times ost2: 0 times ost3: 0 times ost4: 0 times ost5: 0 times ost6: 0 times ost7: 0 times ost8: 0 times Status: FAIL: rc=1");echo XXRETCODE:$?'
Jun  6 09:57:22 ehyperion354 Lustre: DEBUG MARKER: Duration: 86400
Jun  6 09:57:23 ehyperion354 mrshd[25328]: root@hyperion318.llnl.gov as root: cmd='PATH=/admin/scripts:/admin/bin:/bin:/usr/bin:/sbin:/usr/sbin;(PATH=$PATH:/usr/lib64/lustre/utils:/usr/lib64/lustre/tests:/sbin:/usr/sbin; cd /usr/lib64/lustre/tests; sh -c "test -f /tmp/client-load.pid &&         { kill -s TERM \$(cat /tmp/client-load.pid); rm -f /tmp/client-load.pid; }");echo XXRETCODE:$?'
Jun  6 09:57:23 ehyperion354 mrshd[25337]: root@hyperion318.llnl.gov as root: cmd='PATH=/admin/scripts:/admin/bin:/bin:/usr/bin:/sbin:/usr/sbin;(PATH=$PATH:/usr/lib64/lustre/utils:/usr/lib64/lustre/tests:/sbin:/usr/sbin; cd /usr/lib64/lustre/tests; sh -c "/usr/sbin/lctl dk > /home/white215/test_logs/2012-06-06/085655/recovery-mds-scale.test_failover_mds.debug_log.\$(hostname -s).1339001843.log;          dmesg > /home/white215/test_logs/2012-06-06/085655/recovery-mds-scale.test_failover_mds.dmesg.\$(hostname -s).1339001843.log");echo XXRETCODE:$?'
Jun  6 10:00:17 ehyperion354 mrshd[25352]: root@ehyperion0 as root: cmd='rdistd -S'
Jun  6 10:01:21 ehyperion354 LustreError: 17936:0:(o2iblnd_cb.c:2534:kiblnd_rejected()) 192.168.117.3@o2ib rejected: o2iblnd fatal error
Jun  6 10:01:21 ehyperion354 LustreError: 17936:0:(o2iblnd_cb.c:2534:kiblnd_rejected()) Skipped 39 previous similar messages
