log for lu-7372 MDT 1 {noformat} Lustre: DEBUG MARKER: /usr/sbin/lctl mark == replay-dual test 26: dbench and tar with mds failover == 07:07:09 \(1453792029\) LustreError: 166-1: MGC10.1.4.32@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail Lustre: DEBUG MARKER: == replay-dual test 26: dbench and tar with mds failover == 07:07:09 (1453792029) Lustre: DEBUG MARKER: sync; sync; sync Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 readonly Turning device dm-0 (0xfd00000) read-only Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000 Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000 Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds1 1 times Lustre: DEBUG MARKER: test_26 fail mds1 1 times Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts Lustre: DEBUG MARKER: umount -d /mnt/mds1 [root@shadow-4vm4 ~]# [root@shadow-4vm4 ~]# LustreError: 29132:0:(client.c:1130:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff880052360cc0 x1524408556505644/t0(0) o101->lustre-MDT0000-lwp-MDT0000@0@lo:23/10 lens 456/496 e 0 to 0 dl 0 ref 2 fl Rpc:/0/ffffffff rc 0/-1 LustreError: 29132:0:(client.c:1130:ptlrpc_import_delay_req()) Skipped 2 previous similar messages LustreError: 29133:0:(qsd_reint.c:56:qsd_reint_completion()) lustre-MDT0000: failed to enqueue global quota lock, glb fid:[0x200000006:0x1010000:0x0], rc:-5 LustreError: 29133:0:(qsd_reint.c:56:qsd_reint_completion()) Skipped 1 previous similar message [root@shadow-4vm4 ~]# INFO: task umount:29131 blocked for more than 120 seconds. Not tainted 2.6.32-573.12.1.el6_lustre.gd050cac.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. umount D 0000000000000001 0 29131 29130 0x00000080 ffff88005f857b48 0000000000000082 0000000000000000 00000000000d2657 000071cb00000000 000000ae00000000 000002bd0aebb5ab ffff88005f857b98 ffff88005f857b58 0000000100295daa ffff880037c505f8 ffff88005f857fd8 Call Trace: [] __mutex_lock_slowpath+0x96/0x210 [] mutex_lock+0x2b/0x50 [] mgc_process_config+0x1dd/0x1210 [mgc] [] ? libcfs_debug_msg+0x41/0x50 [libcfs] [] obd_process_config.clone.0+0x8d/0x2e0 [obdclass] [] ? libcfs_debug_msg+0x41/0x50 [libcfs] [] lustre_end_log+0x262/0x6a0 [obdclass] [] server_put_super+0x911/0xed0 [obdclass] [] ? invalidate_inodes+0xf6/0x190 [] generic_shutdown_super+0x5b/0xe0 [] kill_anon_super+0x16/0x60 [] lustre_kill_super+0x36/0x60 [obdclass] [] deactivate_super+0x57/0x80 [] mntput_no_expire+0xbf/0x110 [] sys_umount+0x7b/0x3a0 [] system_call_fastpath+0x16/0x1b LustreError: 29140:0:(client.c:1130:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff88007ce6cc80 x1524408556529256/t0(0) o101->lustre-MDT0000-lwp-MDT0000@0@lo:23/10 lens 456/496 e 0 to 0 dl 0 ref 2 fl Rpc:/0/ffffffff rc 0/-1 LustreError: 29140:0:(client.c:1130:ptlrpc_import_delay_req()) Skipped 1 previous similar message LustreError: 29140:0:(qsd_reint.c:56:qsd_reint_completion()) lustre-MDT0000: failed to enqueue global quota lock, glb fid:[0x200000006:0x10000:0x0], rc:-5 LustreError: 29140:0:(qsd_reint.c:56:qsd_reint_completion()) Skipped 1 previous similar message LustreError: 27268:0:(ldlm_request.c:106:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1453792022, 300s ago); not entering recovery in server code, just going back to sleep ns: MGS lock: ffff88007860a8c0/0x3d46acaafc52bb2c lrc: 3/0,1 mode: --/EX res: [0x65727473756c:0x2:0x0].0x0 rrc: 14 type: PLN flags: 0x40210400000020 nid: local remote: 0x0 expref: -99 pid: 27268 timeout: 0 lvb_type: 0 LustreError: dumping log to /tmp/lustre-log.1453792322.27268 Lustre: MGS: Client 087857b6-d8da-f267-04ff-41d4f56e21ca (at 10.1.4.29@tcp) reconnecting Lustre: Skipped 3 previous similar messages Lustre: MGS: Connection restored to 46797eba-1270-d602-fc49-2de2a2c82c60 (at 10.1.4.29@tcp) Lustre: Skipped 154 previous similar messages LustreError: 166-1: MGC10.1.4.32@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail LustreError: 27252:0:(ldlm_request.c:125:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1453792030, 300s ago), entering recovery for MGS@10.1.4.32@tcp ns: MGC10.1.4.32@tcp lock: ffff88004a23f880/0x3d46acaafc52bb95 lrc: 4/1,0 mode: --/CR res: [0x65727473756c:0x2:0x0].0x0 rrc: 1 type: PLN flags: 0x1000000000000 nid: local remote: 0x3d46acaafc52bb9c expref: -99 pid: 27252 timeout: 0 lvb_type: 0 LustreError: 29145:0:(ldlm_resource.c:887:ldlm_resource_complain()) MGC10.1.4.32@tcp: namespace resource [0x65727473756c:0x2:0x0].0x0 (ffff88005223a480) refcount nonzero (1) after lock cleanup; forcing cleanup. LustreError: 29145:0:(ldlm_resource.c:887:ldlm_resource_complain()) Skipped 1 previous similar message LustreError: 29145:0:(ldlm_resource.c:1502:ldlm_resource_dump()) --- Resource: [0x65727473756c:0x2:0x0].0x0 (ffff88005223a480) refcount = 2 LustreError: 29145:0:(ldlm_resource.c:1523:ldlm_resource_dump()) Waiting locks: LustreError: 29145:0:(ldlm_resource.c:1525:ldlm_resource_dump()) ### ### ns: MGC10.1.4.32@tcp lock: ffff88004a23f880/0x3d46acaafc52bb95 lrc: 4/1,0 mode: --/CR res: [0x65727473756c:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1106400000000 nid: local remote: 0x3d46acaafc52bb9c expref: -99 pid: 27252 timeout: 0 lvb_type: 0 Lustre: Failing over lustre-MDT0000 Lustre: Skipped 7 previous similar messages Lustre: lustre-MDT0000: Not available for connect from 10.1.4.29@tcp (stopping) Lustre: Skipped 2 previous similar messages LustreError: 29131:0:(client.c:1130:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8800585b6c80 x1524408556537952/t0(0) o1000->lustre-MDT0001-osp-MDT0000@10.1.4.36@tcp:24/4 lens 304/4320 e 0 to 0 dl 0 ref 2 fl Rpc:/0/ffffffff rc 0/-1 LustreError: 29131:0:(client.c:1130:ptlrpc_import_delay_req()) Skipped 1 previous similar message LustreError: 29131:0:(osp_object.c:588:osp_attr_get()) lustre-MDT0001-osp-MDT0000:osp_attr_get update error [0x240000402:0x1:0x0]: rc = -5 LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 10.1.4.29@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 111 previous similar messages Lustre: 29131:0:(client.c:2058:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1453792330/real 1453792330] req@ffff8800585b6c80 x1524408556537968/t0(0) o251->MGC10.1.4.32@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1453792336 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: 29131:0:(client.c:2058:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Removing read-only on unknown block (0xfd00000) Lustre: server umount lustre-MDT0000 complete Lustre: Skipped 7 previous similar messages Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' Lustre: DEBUG MARKER: hostname Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P1 Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre /dev/lvm-Role_MDS/P1 /mnt/mds1 LDISKFS-fs (dm-0): recovery complete LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts: Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1 Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P1 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P1 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P1 2>/dev/null Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 7 clients reconnect Lustre: Skipped 7 previous similar messages LustreError: 29442:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0000: waking for gap in transno, VBR is OFF (skip: 158913790214, ql: 5, comp: 2, conn: 7, next: 158913790220, next_update 158913790286 last_committed: 158913790205) LustreError: 29442:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0000: waking for gap in transno, VBR is OFF (skip: 158913790224, ql: 5, comp: 2, conn: 7, next: 158913790226, next_update 158913790286 last_committed: 158913790205) LustreError: 29442:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0000: waking for gap in transno, VBR is OFF (skip: 158913790230, ql: 5, comp: 2, conn: 7, next: 158913790232, next_update 158913790286 last_committed: 158913790205) LustreError: 29442:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0000: waking for gap in transno, VBR is OFF (skip: 158913790233, ql: 5, comp: 2, conn: 7, next: 158913790235, next_update 158913790286 last_committed: 158913790205) LustreError: 29442:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0000: waking for gap in transno, VBR is OFF (skip: 158913790238, ql: 5, comp: 2, conn: 7, next: 158913790242, next_update 158913790286 last_committed: 158913790205) LustreError: 29442:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0000: waking for gap in transno, VBR is OFF (skip: 158913790244, ql: 5, comp: 2, conn: 7, next: 158913790245, next_update 158913790286 last_committed: 158913790205) LustreError: 29442:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0000: waking for gap in transno, VBR is OFF (skip: 158913790248, ql: 5, comp: 2, conn: 7, next: 158913790249, next_update 158913790286 last_committed: 158913790205) LustreError: 29442:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0000: waking for gap in transno, VBR is OFF (skip: 158913790252, ql: 5, comp: 2, conn: 7, next: 158913790253, next_update 158913790286 last_committed: 158913790205) LustreError: 29442:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0000: waking for gap in transno, VBR is OFF (skip: 158913790254, ql: 5, comp: 2, conn: 7, next: 158913790259, next_update 158913790286 last_committed: 158913790205) LustreError: 29442:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0000: waking for gap in transno, VBR is OFF (skip: 158913790288, ql: 5, comp: 2, conn: 7, next: 158913790289, next_update 158913790307 last_committed: 158913790205) LustreError: 29442:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0000: waking for gap in transno, VBR is OFF (skip: 158913790290, ql: 5, comp: 2, conn: 7, next: 158913790295, next_update 158913790307 last_committed: 158913790205) LustreError: 29442:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0000: waking for gap in transno, VBR is OFF (skip: 158913790300, ql: 5, comp: 2, conn: 7, next: 158913790302, next_update 158913790307 last_committed: 158913790205) LustreError: 29442:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0000: waking for gap in transno, VBR is OFF (skip: 158913790308, ql: 5, comp: 2, conn: 7, next: 158913790313, next_update 158913790346 last_committed: 158913790205) .... Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 18 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 18 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 18 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 18 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds2 6 times Lustre: DEBUG MARKER: test_26 fail mds2 6 times LustreError: 11-0: lustre-MDT0001-osp-MDT0000: operation out_update to node 10.1.4.36@tcp failed: rc = -107 Lustre: lustre-MDT0001-osp-MDT0000: Connection to lustre-MDT0001 (at 10.1.4.36@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 4 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 4 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 4 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 4 sec Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 fail_val=0 2>/dev/null || true Lustre: DEBUG MARKER: rc=0; val=$(/usr/sbin/lctl get_param -n catastrophe 2>&1); if [[ $? -eq 0 && $val -ne 0 ]]; then echo $(hostname -s): $val; rc=$val; fi; exit $rc Lustre: DEBUG MARKER: /usr/sbin/lctl mark == replay-dual test complete, duration 3281 sec == 07:17:36 \(1453792656\) Lustre: DEBUG MARKER: == replay-dual test complete, duration 3281 sec == 07:17:36 (1453792656) Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.mdt=ug Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.ost=ug Lustre: DEBUG MARKER: /usr/sbin/lctl mark -----============= acceptance-small: replay-dual ============----- Tue Jan 26 07:18:20 UTC 2016 Lustre: DEBUG MARKER: -----============= acceptance-small: replay-dual ============----- Tue Jan 26 07:18:20 UTC 2016 Lustre: DEBUG MARKER: /usr/sbin/lctl mark == replay-dual test complete, duration -o sec == 07:18:21 \(1453792701\) Lustre: DEBUG MARKER: == replay-dual test complete, duration -o sec == 07:18:21 (1453792701) {noformat} MDT 2/3/4 {noformat} Lustre: DEBUG MARKER: /usr/sbin/lctl mark == replay-dual test 26: dbench and tar with mds failover == 07:07:09 \(1453792029\) Lustre: DEBUG MARKER: == replay-dual test 26: dbench and tar with mds failover == 07:07:09 (1453792029) Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds1 1 times Lustre: DEBUG MARKER: test_26 fail mds1 1 times [root@shadow-4vm8 ~]# [root@shadow-4vm8 ~]# mount /dev/vda1 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) 10.1.0.1:/export/scratch on /scratch type nfs (rw,addr=10.1.0.1) 10.1.0.1:/home on /home type nfs (rw,addr=10.1.0.1) nfsd on /proc/fs/nfsd type nfsd (rw) /dev/mapper/lvm--Role_MDS-P3 on /mnt/mds3 type lustre (rw) /dev/mapper/lvm--Role_MDS-P4 on /mnt/mds4 type lustre (rw) /dev/mapper/lvm--Role_MDS-P2 on /mnt/mds2 type lustre (rw) [root@shadow-4vm8 ~]# [root@shadow-4vm8 ~]# LustreError: 166-1: MGC10.1.4.32@tcp: Connection to MGS (at 10.1.4.32@tcp) was lost; in progress operations using this service will fail LustreError: Skipped 8 previous similar messages LustreError: 11225:0:(ldlm_request.c:125:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1453792030, 300s ago), entering recovery for MGS@10.1.4.32@tcp ns: MGC10.1.4.32@tcp lock: ffff88006246b700/0x3cf7993a67ce73cf lrc: 4/1,0 mode: --/CR res: [0x65727473756c:0x2:0x0].0x0 rrc: 1 type: PLN flags: 0x1000000000000 nid: local remote: 0x3d46acaafc52bb72 expref: -99 pid: 11225 timeout: 0 lvb_type: 0 Lustre: MGC10.1.4.32@tcp: Connection restored to 10.1.4.32@tcp (at 10.1.4.32@tcp) Lustre: Skipped 91 previous similar messages Lustre: 11205:0:(client.c:2058:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1453792335/real 1453792335] req@ffff8800674f6380 x1524408581847232/t0(0) o400->MGC10.1.4.32@tcp@10.1.4.32@tcp:26/25 lens 224/224 e 0 to 1 dl 1453792342 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Lustre: 11205:0:(client.c:2058:ptlrpc_expire_one_request()) Skipped 56 previous similar messages Lustre: Evicted from MGS (at 10.1.4.32@tcp) after server handle changed from 0x3d46acaafc52b513 to 0x3d46acaafc5a42d3 Lustre: Skipped 8 previous similar messages LustreError: 167-0: lustre-MDT0000-lwp-MDT0002: This client was evicted by lustre-MDT0000; in progress operations using this service will fail. LustreError: Skipped 21 previous similar messages Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 47 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 47 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 123 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 123 sec Lustre: DEBUG MARKER: sync; sync; sync Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0001 notransno Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0001 readonly LustreError: 30491:0:(osd_handler.c:1737:osd_ro()) *** setting lustre-MDT0001 read-only *** Turning device dm-1 (0xfd00001) read-only Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds2 REPLAY BARRIER on lustre-MDT0001 Lustre: DEBUG MARKER: mds2 REPLAY BARRIER on lustre-MDT0001 Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds2 2 times Lustre: DEBUG MARKER: test_26 fail mds2 2 times Lustre: DEBUG MARKER: grep -c /mnt/mds2' ' /proc/mounts Lustre: DEBUG MARKER: umount -d /mnt/mds2 Lustre: Failing over lustre-MDT0001 Lustre: lustre-MDT0001: Not available for connect from 10.1.4.29@tcp (stopping) Lustre: Skipped 10 previous similar messages LustreError: 30711:0:(client.c:1130:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff88004bbcbc80 x1524408581928272/t0(0) o1000->lustre-MDT0000-osp-MDT0001@10.1.4.32@tcp:24/4 lens 304/4320 e 0 to 0 dl 0 ref 2 fl Rpc:/0/ffffffff rc 0/-1 LustreError: 30711:0:(client.c:1130:ptlrpc_import_delay_req()) Skipped 2 previous similar messages LustreError: 30711:0:(osp_object.c:588:osp_attr_get()) lustre-MDT0000-osp-MDT0001:osp_attr_get update error [0x200000401:0x1:0x0]: rc = -5 LustreError: 30711:0:(osp_object.c:588:osp_attr_get()) Skipped 2 previous similar messages Removing read-only on unknown block (0xfd00001) Lustre: server umount lustre-MDT0001 complete LustreError: 137-5: lustre-MDT0001_UUID: not available for connect from 10.1.4.29@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. LustreError: Skipped 48 previous similar messages Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' Lustre: DEBUG MARKER: hostname Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P2 Lustre: DEBUG MARKER: mkdir -p /mnt/mds2; mount -t lustre /dev/lvm-Role_MDS/P2 /mnt/mds2 LDISKFS-fs (dm-1): recovery complete LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. quota=on. Opts: Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: lustre-MDT0001: Will be in recovery for at least 1:00, or until 7 clients reconnect Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1 Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P2 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P2 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P2 2>/dev/null LustreError: 30973:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 30064788661, ql: 5, comp: 2, conn: 7, next: 30064788662, next_update 30064788668 last_committed: 30064788628) LustreError: 30973:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 30064788665, ql: 5, comp: 2, conn: 7, next: 30064788667, next_update 30064788668 last_committed: 30064788628) LustreError: 30973:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 30064788673, ql: 5, comp: 2, conn: 7, next: 30064788674, next_update 30064788675 last_committed: 30064788628) LustreError: 30973:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 30064788692, ql: 5, comp: 2, conn: 7, next: 30064788693, next_update 30064788707 last_committed: 30064788628) LustreError: 30973:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 30064788697, ql: 5, comp: 2, conn: 7, next: 30064788698, next_update 30064788707 last_committed: 30064788628) LustreError: 30973:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 30064788699, ql: 5, comp: 2, conn: 7, next: 30064788700, next_update 30064788707 last_committed: 30064788628) LustreError: 30973:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 30064788710, ql: 5, comp: 2, conn: 7, next: 30064788711, next_update 30064788719 last_committed: 30064788628) LustreError: 30973:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 30064788713, ql: 5, comp: 2, conn: 7, next: 30064788714, next_update 30064788719 last_committed: 30064788628) LustreError: 30973:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 30064788721, ql: 5, comp: 2, conn: 7, next: 30064788724, next_update 30064788724 last_committed: 30064788628) LustreError: 30973:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 30064788727, ql: 5, comp: 2, conn: 7, next: 30064788728, next_update 30064788741 last_committed: 30064788628) LustreError: 30973:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 30064788738, ql: 5, comp: 2, conn: 7, next: 30064788740, next_update 30064788741 last_committed: 30064788628) LustreError: 30973:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 30064788750, ql: 5, comp: 2, conn: 7, next: 30064788751, next_update 30064788761 last_committed: 30064788628) LustreError: 30973:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 30064788754, ql: 5, comp: 2, conn: 7, next: 30064788755, next_update 30064788761 last_committed: 30064788628) LustreError: 30973:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 30064788759, ql: 5, comp: 2, conn: 7, next: 30064788760, next_update 30064788761 last_committed: 30064788628) LustreError: 30973:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 30064788766, ql: 5, comp: 2, conn: 7, next: 30064788769, next_update 30064788770 last_committed: 30064788628) LustreError: 30973:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 30064788771, ql: 5, comp: 2, conn: 7, next: 30064788774, next_update 0 last_committed: 30064788628) LustreError: 30973:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 30064788776, ql: 4, comp: 3, conn: 7, next: 30064788777, next_update 0 last_committed: 30064788628) LustreError: 30973:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 30064788785, ql: 2, comp: 5, conn: 7, next: 30064788788, next_update 0 last_committed: 30064788628) LustreError: 30973:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 30064788798, ql: 2, comp: 5, conn: 7, next: 30064788801, next_update 0 last_committed: 30064788628) LustreError: 30973:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 30064788808, ql: 2, comp: 5, conn: 7, next: 30064788809, next_update 0 last_committed: 30064788628) LustreError: 30973:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 30064788810, ql: 2, comp: 5, conn: 7, next: 30064788811, next_update 0 last_committed: 30064788628) LustreError: 30973:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 30064788812, ql: 2, comp: 5, conn: 7, next: 30064788813, next_update 0 last_committed: 30064788628) LustreError: 30973:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 30064788814, ql: 2, comp: 5, conn: 7, next: 30064788815, next_update 0 last_committed: 30064788628) LustreError: 30973:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 30064788820, ql: 2, comp: 5, conn: 7, next: 30064788823, next_update 0 last_committed: 30064788628) LustreError: 30973:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 30064788824, ql: 2, comp: 5, conn: 7, next: 30064788825, next_update 0 last_committed: 30064788628) LustreError: 30973:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 30064788826, ql: 2, comp: 5, conn: 7, next: 30064788827, next_update 0 last_committed: 30064788628) Lustre: lustre-MDT0001: Recovery over after 0:07, of 7 clients 7 recovered and 0 were evicted. Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 5 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 5 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 5 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 5 sec Lustre: DEBUG MARKER: sync; sync; sync Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0002 notransno Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0002 readonly LustreError: 31634:0:(osd_handler.c:1737:osd_ro()) *** setting lustre-MDT0002 read-only *** Turning device dm-2 (0xfd00002) read-only Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds3 REPLAY BARRIER on lustre-MDT0002 Lustre: DEBUG MARKER: mds3 REPLAY BARRIER on lustre-MDT0002 Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds3 3 times Lustre: DEBUG MARKER: test_26 fail mds3 3 times Lustre: DEBUG MARKER: grep -c /mnt/mds3' ' /proc/mounts Lustre: DEBUG MARKER: umount -d /mnt/mds3 Lustre: Failing over lustre-MDT0002 Removing read-only on unknown block (0xfd00002) Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' Lustre: DEBUG MARKER: hostname Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P3 Lustre: DEBUG MARKER: mkdir -p /mnt/mds3; mount -t lustre /dev/lvm-Role_MDS/P3 /mnt/mds3 LDISKFS-fs (dm-2): recovery complete LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. quota=on. Opts: Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1 Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P3 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P3 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P3 2>/dev/null Lustre: lustre-MDT0002: Will be in recovery for at least 1:00, or until 7 clients reconnect LustreError: 32116:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0002: waking for gap in transno, VBR is OFF (skip: 4294987329, ql: 5, comp: 2, conn: 7, next: 4294987331, next_update 4294987413 last_committed: 4294987317) LustreError: 32116:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0002: waking for gap in transno, VBR is OFF (skip: 4294987332, ql: 5, comp: 2, conn: 7, next: 4294987334, next_update 4294987413 last_committed: 4294987317) LustreError: 32116:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0002: waking for gap in transno, VBR is OFF (skip: 4294987344, ql: 5, comp: 2, conn: 7, next: 4294987345, next_update 4294987413 last_committed: 4294987317) LustreError: 32116:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0002: waking for gap in transno, VBR is OFF (skip: 4294987346, ql: 5, comp: 2, conn: 7, next: 4294987347, next_update 4294987413 last_committed: 4294987317) LustreError: 32116:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0002: waking for gap in transno, VBR is OFF (skip: 4294987352, ql: 5, comp: 2, conn: 7, next: 4294987353, next_update 4294987413 last_committed: 4294987317) LustreError: 32116:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0002: waking for gap in transno, VBR is OFF (skip: 4294987357, ql: 5, comp: 2, conn: 7, next: 4294987359, next_update 4294987413 last_committed: 4294987317) LustreError: 32116:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0002: waking for gap in transno, VBR is OFF (skip: 4294987360, ql: 5, comp: 2, conn: 7, next: 4294987365, next_update 4294987413 last_committed: 4294987317) LustreError: 32116:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0002: waking for gap in transno, VBR is OFF (skip: 4294987380, ql: 5, comp: 2, conn: 7, next: 4294987382, next_update 4294987413 last_committed: 4294987317) LustreError: 32116:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0002: waking for gap in transno, VBR is OFF (skip: 4294987427, ql: 5, comp: 2, conn: 7, next: 4294987430, next_update 4294987430 last_committed: 4294987317) LustreError: 32116:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0002: waking for gap in transno, VBR is OFF (skip: 4294987461, ql: 5, comp: 2, conn: 7, next: 4294987462, next_update 4294987469 last_committed: 4294987459) LustreError: 32116:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0002: waking for gap in transno, VBR is OFF (skip: 4294987502, ql: 5, comp: 2, conn: 7, next: 4294987518, next_update 4294987583 last_committed: 4294987459) LustreError: 32116:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0002: waking for gap in transno, VBR is OFF (skip: 4294987519, ql: 5, comp: 2, conn: 7, next: 4294987521, next_update 4294987583 last_committed: 4294987459) LustreError: 32116:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0002: waking for gap in transno, VBR is OFF (skip: 4294987524, ql: 5, comp: 2, conn: 7, next: 4294987527, next_update 4294987583 last_committed: 4294987459) LustreError: 32116:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0002: waking for gap in transno, VBR is OFF (skip: 4294987529, ql: 5, comp: 2, conn: 7, next: 4294987531, next_update 4294987583 last_committed: 4294987459) LustreError: 32116:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0002: waking for gap in transno, VBR is OFF (skip: 4294987533, ql: 5, comp: 2, conn: 7, next: 4294987538, next_update 4294987583 last_committed: 4294987459) LustreError: 32116:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0002: waking for gap in transno, VBR is OFF (skip: 4294987588, ql: 4, comp: 3, conn: 7, next: 4294987590, next_update 0 last_committed: 4294987459) LustreError: 32116:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0002: waking for gap in transno, VBR is OFF (skip: 4294987591, ql: 4, comp: 3, conn: 7, next: 4294987593, next_update 0 last_committed: 4294987459) LustreError: 32116:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0002: waking for gap in transno, VBR is OFF (skip: 4294987596, ql: 4, comp: 3, conn: 7, next: 4294987598, next_update 0 last_committed: 4294987459) LustreError: 32116:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0002: waking for gap in transno, VBR is OFF (skip: 4294987599, ql: 4, comp: 3, conn: 7, next: 4294987603, next_update 0 last_committed: 4294987459) LustreError: 32116:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0002: waking for gap in transno, VBR is OFF (skip: 4294987606, ql: 4, comp: 3, conn: 7, next: 4294987610, next_update 0 last_committed: 4294987459) LustreError: 32116:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0002: waking for gap in transno, VBR is OFF (skip: 4294987617, ql: 4, comp: 3, conn: 7, next: 4294987619, next_update 0 last_committed: 4294987459) LustreError: 32116:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0002: waking for gap in transno, VBR is OFF (skip: 4294987624, ql: 3, comp: 4, conn: 7, next: 4294987626, next_update 0 last_committed: 4294987459) Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 3 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 3 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 3 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 3 sec Lustre: DEBUG MARKER: sync; sync; sync Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0003 notransno Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0003 readonly LustreError: 309:0:(osd_handler.c:1737:osd_ro()) *** setting lustre-MDT0003 read-only *** Turning device dm-3 (0xfd00003) read-only Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds4 REPLAY BARRIER on lustre-MDT0003 Lustre: DEBUG MARKER: mds4 REPLAY BARRIER on lustre-MDT0003 Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds4 4 times Lustre: DEBUG MARKER: test_26 fail mds4 4 times Lustre: DEBUG MARKER: grep -c /mnt/mds4' ' /proc/mounts Lustre: DEBUG MARKER: umount -d /mnt/mds4 Lustre: Failing over lustre-MDT0003 Removing read-only on unknown block (0xfd00003) Lustre: server umount lustre-MDT0003 complete Lustre: Skipped 1 previous similar message Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' Lustre: DEBUG MARKER: hostname Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P4 Lustre: DEBUG MARKER: mkdir -p /mnt/mds4; mount -t lustre /dev/lvm-Role_MDS/P4 /mnt/mds4 LDISKFS-fs (dm-3): recovery complete LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. quota=on. Opts: Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1 Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P4 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P4 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P4 2>/dev/null LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987652, ql: 5, comp: 2, conn: 7, next: 4294987654, next_update 4294987660 last_committed: 4294987645) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987655, ql: 5, comp: 2, conn: 7, next: 4294987657, next_update 4294987660 last_committed: 4294987645) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987658, ql: 5, comp: 2, conn: 7, next: 4294987660, next_update 4294987660 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987662, ql: 5, comp: 2, conn: 7, next: 4294987666, next_update 4294987693 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987668, ql: 5, comp: 2, conn: 7, next: 4294987672, next_update 4294987693 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987673, ql: 5, comp: 2, conn: 7, next: 4294987675, next_update 4294987693 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987680, ql: 5, comp: 2, conn: 7, next: 4294987683, next_update 4294987693 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987698, ql: 5, comp: 2, conn: 7, next: 4294987699, next_update 4294987719 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987700, ql: 5, comp: 2, conn: 7, next: 4294987701, next_update 4294987719 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987721, ql: 5, comp: 2, conn: 7, next: 4294987722, next_update 4294987773 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987733, ql: 5, comp: 2, conn: 7, next: 4294987734, next_update 4294987773 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987739, ql: 5, comp: 2, conn: 7, next: 4294987740, next_update 4294987773 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987741, ql: 5, comp: 2, conn: 7, next: 4294987742, next_update 4294987773 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987743, ql: 5, comp: 2, conn: 7, next: 4294987744, next_update 4294987773 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987745, ql: 5, comp: 2, conn: 7, next: 4294987750, next_update 4294987773 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987751, ql: 5, comp: 2, conn: 7, next: 4294987754, next_update 4294987773 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987768, ql: 5, comp: 2, conn: 7, next: 4294987769, next_update 4294987773 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987771, ql: 5, comp: 2, conn: 7, next: 4294987773, next_update 4294987773 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987778, ql: 5, comp: 2, conn: 7, next: 4294987780, next_update 4294987803 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987781, ql: 5, comp: 2, conn: 7, next: 4294987785, next_update 4294987803 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987797, ql: 5, comp: 2, conn: 7, next: 4294987801, next_update 4294987803 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987804, ql: 5, comp: 2, conn: 7, next: 4294987808, next_update 4294987808 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987822, ql: 5, comp: 2, conn: 7, next: 4294987828, next_update 4294987835 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987829, ql: 5, comp: 2, conn: 7, next: 4294987830, next_update 4294987835 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987834, ql: 5, comp: 2, conn: 7, next: 4294987835, next_update 4294987835 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987840, ql: 5, comp: 2, conn: 7, next: 4294987843, next_update 0 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987845, ql: 5, comp: 2, conn: 7, next: 4294987849, next_update 0 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987857, ql: 5, comp: 2, conn: 7, next: 4294987865, next_update 0 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987866, ql: 5, comp: 2, conn: 7, next: 4294987870, next_update 0 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987871, ql: 5, comp: 2, conn: 7, next: 4294987873, next_update 0 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987874, ql: 4, comp: 3, conn: 7, next: 4294987875, next_update 0 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987877, ql: 3, comp: 4, conn: 7, next: 4294987878, next_update 0 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987881, ql: 3, comp: 4, conn: 7, next: 4294987882, next_update 0 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987886, ql: 3, comp: 4, conn: 7, next: 4294987887, next_update 0 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987892, ql: 2, comp: 5, conn: 7, next: 4294987894, next_update 0 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987901, ql: 2, comp: 5, conn: 7, next: 4294987904, next_update 0 last_committed: 4294987646) LustreError: 795:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0003: waking for gap in transno, VBR is OFF (skip: 4294987917, ql: 1, comp: 6, conn: 7, next: 4294987919, next_update 0 last_committed: 4294987646) Lustre: lustre-MDT0003: Recovery over after 0:03, of 7 clients 7 recovered and 0 were evicted. Lustre: Skipped 1 previous similar message Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 3 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 3 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds1 5 times Lustre: DEBUG MARKER: test_26 fail mds1 5 times LustreError: 11204:0:(client.c:2874:ptlrpc_replay_interpret()) @@@ request replay timed out. req@ffff88007b1e4cc0 x1524408581947664/t163208765020(163208765020) o1000->lustre-MDT0000-osp-MDT0002@10.1.4.32@tcp:24/4 lens 328/4288 e 1 to 1 dl 1453792605 ref 2 fl Interpret:EX/4/ffffffff rc -110/-1 Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 18 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 18 sec Lustre: DEBUG MARKER: sync; sync; sync Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0001 notransno Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0001 readonly LustreError: 1581:0:(osd_handler.c:1737:osd_ro()) *** setting lustre-MDT0001 read-only *** Turning device dm-1 (0xfd00001) read-only Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds2 REPLAY BARRIER on lustre-MDT0001 Lustre: DEBUG MARKER: mds2 REPLAY BARRIER on lustre-MDT0001 Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds2 6 times Lustre: DEBUG MARKER: test_26 fail mds2 6 times Lustre: DEBUG MARKER: grep -c /mnt/mds2' ' /proc/mounts Lustre: DEBUG MARKER: umount -d /mnt/mds2 Lustre: Failing over lustre-MDT0001 LustreError: 1806:0:(client.c:1130:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff880050dd8680 x1524408581957548/t0(0) o1000->lustre-MDT0000-osp-MDT0001@10.1.4.32@tcp:24/4 lens 304/4320 e 0 to 0 dl 0 ref 2 fl Rpc:/0/ffffffff rc 0/-1 LustreError: 1806:0:(client.c:1130:ptlrpc_import_delay_req()) Skipped 3 previous similar messages LustreError: 1806:0:(osp_object.c:588:osp_attr_get()) lustre-MDT0000-osp-MDT0001:osp_attr_get update error [0x200000401:0x1:0x0]: rc = -5 LustreError: 1806:0:(osp_object.c:588:osp_attr_get()) Skipped 3 previous similar messages Lustre: lustre-MDT0001: Not available for connect from 10.1.4.29@tcp (stopping) Lustre: Skipped 2 previous similar messages Removing read-only on unknown block (0xfd00001) Lustre: server umount lustre-MDT0001 complete Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' LustreError: 11-0: lustre-MDT0001-osp-MDT0002: operation out_update to node 0@lo failed: rc = -107 LustreError: Skipped 14 previous similar messages Lustre: lustre-MDT0001-osp-MDT0002: Connection to lustre-MDT0001 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 18 previous similar messages Lustre: DEBUG MARKER: hostname Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P2 Lustre: DEBUG MARKER: mkdir -p /mnt/mds2; mount -t lustre /dev/lvm-Role_MDS/P2 /mnt/mds2 LDISKFS-fs (dm-1): recovery complete LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. quota=on. Opts: Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1 Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P2 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P2 2>/dev/null | grep -E ':[a-zA-Z]{3}[0-9]{4}' Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P2 2>/dev/null Lustre: lustre-MDT0001: Will be in recovery for at least 1:00, or until 7 clients reconnect Lustre: Skipped 1 previous similar message LustreError: 2068:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 34359740457, ql: 5, comp: 2, conn: 7, next: 34359740461, next_update 34359740580 last_committed: 34359740421) LustreError: 2068:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 34359740484, ql: 5, comp: 2, conn: 7, next: 34359740489, next_update 34359740580 last_committed: 34359740421) LustreError: 2068:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 34359740490, ql: 5, comp: 2, conn: 7, next: 34359740492, next_update 34359740580 last_committed: 34359740421) LustreError: 2068:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 34359740493, ql: 5, comp: 2, conn: 7, next: 34359740494, next_update 34359740580 last_committed: 34359740421) LustreError: 2068:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 34359740495, ql: 5, comp: 2, conn: 7, next: 34359740496, next_update 34359740580 last_committed: 34359740421) LustreError: 2068:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 34359740521, ql: 5, comp: 2, conn: 7, next: 34359740522, next_update 34359740580 last_committed: 34359740421) LustreError: 2068:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 34359740654, ql: 3, comp: 4, conn: 7, next: 34359740657, next_update 34359740666 last_committed: 34359740421) LustreError: 2068:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 34359740662, ql: 3, comp: 4, conn: 7, next: 34359740663, next_update 34359740666 last_committed: 34359740421) LustreError: 2068:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 34359740692, ql: 3, comp: 4, conn: 7, next: 34359740693, next_update 0 last_committed: 34359740421) LustreError: 2068:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 34359740696, ql: 3, comp: 4, conn: 7, next: 34359740697, next_update 0 last_committed: 34359740421) LustreError: 2068:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 34359740705, ql: 3, comp: 4, conn: 7, next: 34359740706, next_update 0 last_committed: 34359740421) LustreError: 2068:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 34359740733, ql: 3, comp: 4, conn: 7, next: 34359740734, next_update 0 last_committed: 34359740421) LustreError: 2068:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 34359740738, ql: 3, comp: 4, conn: 7, next: 34359740739, next_update 0 last_committed: 34359740421) LustreError: 2068:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 34359740740, ql: 3, comp: 4, conn: 7, next: 34359740741, next_update 0 last_committed: 34359740421) LustreError: 2068:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 34359740743, ql: 3, comp: 4, conn: 7, next: 34359740744, next_update 0 last_committed: 34359740421) LustreError: 2068:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 34359740745, ql: 3, comp: 4, conn: 7, next: 34359740746, next_update 0 last_committed: 34359740421) LustreError: 2068:0:(ldlm_lib.c:1885:check_for_next_transno()) lustre-MDT0001: waking for gap in transno, VBR is OFF (skip: 34359740779, ql: 2, comp: 5, conn: 7, next: 34359740781, next_update 0 last_committed: 34359740421) Lustre: lustre-MDT0001: Recovery over after 0:03, of 7 clients 7 recovered and 0 were evicted. Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 4 sec Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 4 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 4 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 4 sec Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 fail_val=0 2>/dev/null || true Lustre: DEBUG MARKER: rc=0; val=$(/usr/sbin/lctl get_param -n catastrophe 2>&1); if [[ $? -eq 0 && $val -ne 0 ]]; then echo $(hostname -s): $val; rc=$val; fi; exit $rc Lustre: DEBUG MARKER: /usr/sbin/lctl mark == replay-dual test complete, duration 3281 sec == 07:17:36 \(1453792656\) Lustre: DEBUG MARKER: == replay-dual test complete, duration 3281 sec == 07:17:36 (1453792656) Lustre: DEBUG MARKER: /usr/sbin/lctl mark -----============= acceptance-small: replay-dual ============----- Tue Jan 26 07:18:20 UTC 2016 Lustre: DEBUG MARKER: -----============= acceptance-small: replay-dual ============----- Tue Jan 26 07:18:20 UTC 2016 Lustre: DEBUG MARKER: /usr/sbin/lctl mark == replay-dual test complete, duration -o sec == 07:18:21 \(1453792701\) {noformat} client {noformat} Lustre: DEBUG MARKER: /usr/sbin/lctl mark == replay-dual test 26: dbench and tar with mds failover == 07:07:09 \(1453792029\) Lustre: DEBUG MARKER: == replay-dual test 26: dbench and tar with mds failover == 07:07:09 (1453792029) Lustre: DEBUG MARKER: running=$(mount | grep -c /mnt/lustre' '); rc=0; if [ $running -eq 0 ] ; then mkdir -p /mnt/lustre; mount -t lustre -o user_xattr,flock shadow-4vm4@tcp:/lustre /mnt/lustre; rc=$?; fi; exit $rc Lustre: DEBUG MARKER: mount | grep /mnt/lustre' ' Lustre: DEBUG MARKER: mcreate /mnt/lustre/fsa-$(hostname); rm /mnt/lustre/fsa-$(hostname) Lustre: DEBUG MARKER: if [ -d /mnt/lustre2 ]; then mcreate /mnt/lustre2/fsa-$(hostname); rm /mnt/lustre2/fsa-$(hostname); fi Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000 Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds1 1 times Lustre: DEBUG MARKER: test_26 fail mds1 1 times [root@shadow-4vm1 ~]# [root@shadow-4vm1 ~]# [root@shadow-4vm1 ~]# LustreError: 166-1: MGC10.1.4.32@tcp: Connection to MGS (at 10.1.4.32@tcp) was lost; in progress operations using this service will fail LustreError: Skipped 8 previous similar messages LustreError: 24388:0:(ldlm_request.c:125:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1453792029, 300s ago), entering recovery for MGS@10.1.4.32@tcp ns: MGC10.1.4.32@tcp lock: ffff880079512bc0/0x7cad107cf28ec18f lrc: 4/1,0 mode: --/CR res: [0x65727473756c:0x2:0x0].0x0 rrc: 1 type: PLN flags: 0x1000000000000 nid: local remote: 0x3d46acaafc52bb48 expref: -99 pid: 24388 timeout: 0 lvb_type: 0 LustreError: 11-0: lustre-MDT0000-mdc-ffff8800795a7c00: operation ldlm_enqueue to node 10.1.4.32@tcp failed: rc = -107 LustreError: Skipped 35 previous similar messages Lustre: lustre-MDT0000-mdc-ffff8800795a7c00: Connection to lustre-MDT0000 (at 10.1.4.32@tcp) was lost; in progress operations using this service will wait for recovery to complete Lustre: Skipped 38 previous similar messages Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: lctl get_param -n at_max LustreError: 18567:0:(ldlm_resource.c:887:ldlm_resource_complain()) MGC10.1.4.32@tcp: namespace resource [0x65727473756c:0x2:0x0].0x0 (ffff88005eb7b180) refcount nonzero (1) after lock cleanup; forcing cleanup. LustreError: 18567:0:(ldlm_resource.c:1502:ldlm_resource_dump()) --- Resource: [0x65727473756c:0x2:0x0].0x0 (ffff88005eb7b180) refcount = 2 LustreError: 18567:0:(ldlm_resource.c:1523:ldlm_resource_dump()) Waiting locks: LustreError: 18567:0:(ldlm_resource.c:1525:ldlm_resource_dump()) ### ### ns: MGC10.1.4.32@tcp lock: ffff880079512bc0/0x7cad107cf2964a0f lrc: 4/1,0 mode: --/CR res: [0x65727473756c:0x2:0x0].0x0 rrc: 2 type: PLN flags: 0x1106400000000 nid: local remote: 0x3d46acaafc5a3d2a expref: -99 pid: 24388 timeout: 0 lvb_type: 0 LustreError: 15242:0:(client.c:2929:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff880037725980 x1524408468416392/t158913790227(158913790227) o101->lustre-MDT0000-mdc-ffff88007c636400@10.1.4.32@tcp:12/10 lens 808/544 e 0 to 0 dl 1453792433 ref 2 fl Interpret:R/4/0 rc 301/301 Lustre: 15242:0:(client.c:2886:ptlrpc_replay_interpret()) @@@ Version mismatch during replay req@ffff880077c113c0 x1524408468442476/t158913790890(158913790890) o36->lustre-MDT0000-mdc-ffff88007c636400@10.1.4.32@tcp:12/10 lens 624/424 e 0 to 0 dl 1453792436 ref 2 fl Interpret:R/4/0 rc -75/-75 Lustre: 15242:0:(client.c:2886:ptlrpc_replay_interpret()) @@@ Version mismatch during replay req@ffff8800783a2380 x1524408468465252/t158913791694(158913791694) o36->lustre-MDT0000-mdc-ffff88007c636400@10.1.4.32@tcp:12/10 lens 616/424 e 0 to 0 dl 1453792438 ref 2 fl Interpret:R/4/0 rc -75/-75 Lustre: 15242:0:(client.c:2886:ptlrpc_replay_interpret()) @@@ Version mismatch during replay req@ffff880077c8e080 x1524408468488596/t158913792149(158913792149) o101->lustre-MDT0000-mdc-ffff88007c636400@10.1.4.32@tcp:12/10 lens 808/544 e 0 to 0 dl 1453792439 ref 2 fl Interpret:R/4/0 rc -75/-75 Lustre: 15242:0:(client.c:2886:ptlrpc_replay_interpret()) Skipped 2 previous similar messages Lustre: 15242:0:(client.c:2886:ptlrpc_replay_interpret()) @@@ Version mismatch during replay req@ffff8800782f36c0 x1524408468514668/t158913792980(158913792980) o36->lustre-MDT0000-mdc-ffff88007c636400@10.1.4.32@tcp:12/10 lens 616/424 e 0 to 0 dl 1453792441 ref 2 fl Interpret:R/4/0 rc -75/-75 Lustre: 15242:0:(client.c:2886:ptlrpc_replay_interpret()) Skipped 2 previous similar messages Lustre: 15242:0:(client.c:2886:ptlrpc_replay_interpret()) @@@ Version mismatch during replay req@ffff8800607de980 x1524408468584216/t158913794774(158913794774) o101->lustre-MDT0000-mdc-ffff88007c636400@10.1.4.32@tcp:12/10 lens 808/544 e 0 to 0 dl 1453792445 ref 2 fl Interpret:R/4/0 rc -75/-75 Lustre: 15242:0:(client.c:2886:ptlrpc_replay_interpret()) Skipped 6 previous similar messages Lustre: 15242:0:(client.c:2886:ptlrpc_replay_interpret()) @@@ Version mismatch during replay req@ffff880063d6a680 x1524408468692876/t158913797470(158913797470) o36->lustre-MDT0000-mdc-ffff88007c636400@10.1.4.32@tcp:12/10 lens 624/424 e 0 to 0 dl 1453792453 ref 2 fl Interpret:R/4/0 rc -75/-75 Lustre: 15242:0:(client.c:2886:ptlrpc_replay_interpret()) Skipped 11 previous similar messages LustreError: 15242:0:(client.c:2929:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff88005fe120c0 x1524408468870324/t158913802184(158913802184) o101->lustre-MDT0000-mdc-ffff88007c636400@10.1.4.32@tcp:12/10 lens 808/544 e 0 to 0 dl 1453792465 ref 2 fl Interpret:R/4/0 rc 301/301 LustreError: 15242:0:(client.c:2929:ptlrpc_replay_interpret()) Skipped 1056 previous similar messages Lustre: 15242:0:(client.c:2058:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1453792398/real 1453792398] req@ffff88005f7d7080 x1524408469007016/t0(0) o400->lustre-MDT0000-mdc-ffff88007c636400@10.1.4.32@tcp:12/10 lens 224/224 e 0 to 1 dl 1453792474 ref 1 fl Rpc:X/c0/ffffffff rc 0/-1 Lustre: 15242:0:(client.c:2058:ptlrpc_expire_one_request()) Skipped 29 previous similar messages Lustre: 15242:0:(import.c:1339:completed_replay_interpret()) lustre-MDT0000-mdc-ffff88007c636400: version recovery fails, reconnecting LustreError: 167-0: lustre-MDT0000-mdc-ffff88007c636400: This client was evicted by lustre-MDT0000; in progress operations using this service will fail. LustreError: 18054:0:(mdc_request.c:1285:mdc_read_page()) lustre-MDT0000-mdc-ffff88007c636400: [0x200000402:0x8e:0x0] lock enqueue fails: rc = -5 LustreError: 16878:0:(vvp_io.c:1519:vvp_io_init()) lustre: refresh file layout [0x20000b7b3:0x165:0x0] error -5. LustreError: 19487:0:(file.c:180:ll_close_inode_openhandle()) lustre-clilmv-ffff88007c636400: inode [0x20000b7b3:0x154:0x0] mdc close failed: rc = -108 LustreError: 18054:0:(llite_nfs.c:321:ll_dir_get_parent_fid()) lustre: failure inode [0x2c0000407:0x12d:0x0] get parent: rc = -108 LustreError: 18054:0:(file.c:3202:ll_inode_revalidate_fini()) lustre: revalidate FID [0x200000007:0x1:0x0] error: rc = -108 Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 123 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 123 sec Lustre: DEBUG MARKER: mcreate /mnt/lustre/fsa-$(hostname); rm /mnt/lustre/fsa-$(hostname) Lustre: DEBUG MARKER: if [ -d /mnt/lustre2 ]; then mcreate /mnt/lustre2/fsa-$(hostname); rm /mnt/lustre2/fsa-$(hostname); fi Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0001 Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds2 2 times Lustre: DEBUG MARKER: test_26 fail mds2 2 times Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: lctl get_param -n at_max LustreError: 15242:0:(client.c:2929:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff88005f5b79c0 x1524408469098768/t30064788630(30064788630) o101->lustre-MDT0001-mdc-ffff8800795a7c00@10.1.4.36@tcp:12/10 lens 816/544 e 0 to 0 dl 1453792516 ref 2 fl Interpret:R/4/0 rc 301/301 LustreError: 15242:0:(client.c:2929:ptlrpc_replay_interpret()) Skipped 373 previous similar messages Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 5 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 5 sec Lustre: DEBUG MARKER: mcreate /mnt/lustre/fsa-$(hostname); rm /mnt/lustre/fsa-$(hostname) Lustre: DEBUG MARKER: if [ -d /mnt/lustre2 ]; then mcreate /mnt/lustre2/fsa-$(hostname); rm /mnt/lustre2/fsa-$(hostname); fi Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0002 Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds3 3 times Lustre: DEBUG MARKER: test_26 fail mds3 3 times Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: lctl get_param -n at_max Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 3 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0002-mdc-*.mds_server_uuid in FULL state after 3 sec Lustre: DEBUG MARKER: mcreate /mnt/lustre/fsa-$(hostname); rm /mnt/lustre/fsa-$(hostname) Lustre: DEBUG MARKER: if [ -d /mnt/lustre2 ]; then mcreate /mnt/lustre2/fsa-$(hostname); rm /mnt/lustre2/fsa-$(hostname); fi Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0003 Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds4 4 times Lustre: DEBUG MARKER: test_26 fail mds4 4 times Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: lctl get_param -n at_max Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 3 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0003-mdc-*.mds_server_uuid in FULL state after 3 sec Lustre: DEBUG MARKER: mcreate /mnt/lustre/fsa-$(hostname); rm /mnt/lustre/fsa-$(hostname) Lustre: DEBUG MARKER: if [ -d /mnt/lustre2 ]; then mcreate /mnt/lustre2/fsa-$(hostname); rm /mnt/lustre2/fsa-$(hostname); fi Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0000 Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds1 5 times Lustre: DEBUG MARKER: test_26 fail mds1 5 times Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: lctl get_param -n at_max Lustre: Evicted from MGS (at 10.1.4.32@tcp) after server handle changed from 0x3d46acaafc5a4320 to 0x3d46acaafc5ddd67 Lustre: Skipped 5 previous similar messages Lustre: MGC10.1.4.32@tcp: Connection restored to 10.1.4.32@tcp (at 10.1.4.32@tcp) Lustre: Skipped 13 previous similar messages Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 18 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 18 sec Lustre: DEBUG MARKER: mcreate /mnt/lustre/fsa-$(hostname); rm /mnt/lustre/fsa-$(hostname) Lustre: DEBUG MARKER: if [ -d /mnt/lustre2 ]; then mcreate /mnt/lustre2/fsa-$(hostname); rm /mnt/lustre2/fsa-$(hostname); fi Lustre: DEBUG MARKER: local REPLAY BARRIER on lustre-MDT0001 Lustre: DEBUG MARKER: /usr/sbin/lctl mark test_26 fail mds2 6 times Lustre: DEBUG MARKER: test_26 fail mds2 6 times Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u Lustre: DEBUG MARKER: lctl get_param -n at_max LustreError: 15242:0:(client.c:2929:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff88006124f9c0 x1524408469200348/t34359740416(34359740416) o101->lustre-MDT0001-mdc-ffff8800795a7c00@10.1.4.36@tcp:12/10 lens 800/544 e 0 to 0 dl 1453792695 ref 2 fl Interpret:R/4/0 rc 301/301 LustreError: 15242:0:(client.c:2929:ptlrpc_replay_interpret()) Skipped 163 previous similar messages Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 4 sec Lustre: DEBUG MARKER: mdc.lustre-MDT0001-mdc-*.mds_server_uuid in FULL state after 4 sec Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 fail_val=0 2>/dev/null || true Lustre: DEBUG MARKER: rc=0; val=$(/usr/sbin/lctl get_param -n catastrophe 2>&1); if [[ $? -eq 0 && $val -ne 0 ]]; then echo $(hostname -s): $val; rc=$val; fi; exit $rc Lustre: DEBUG MARKER: /usr/sbin/lctl mark == replay-dual test complete, duration 3281 sec == 07:17:36 \(1453792656\) Lustre: DEBUG MARKER: == replay-dual test complete, duration 3281 sec == 07:17:36 (1453792656) {noformat}