Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.8.0
-
None
-
3
-
9223372036854775807
Description
I saw a few bulk timeout error with patch
http://review.whamcloud.com/#/c/13786/
11:44:13:Lustre: DEBUG MARKER: == replay-single test 110f: DNE: create striped dir, fail MDT1/MDT2 == 11:32:16 (1435663936) 11:44:13:Lustre: DEBUG MARKER: sync; sync; sync 11:44:13:Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 notransno 11:44:13:Lustre: DEBUG MARKER: /usr/sbin/lctl --device lustre-MDT0000 readonly 11:44:13:Turning device dm-0 (0xfd00000) read-only 11:44:13:Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 REPLAY BARRIER on lustre-MDT0000 11:44:13:Lustre: DEBUG MARKER: mds1 REPLAY BARRIER on lustre-MDT0000 11:44:13:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts 11:44:13:Lustre: DEBUG MARKER: umount -d /mnt/mds1 11:44:13:Removing read-only on unknown block (0xfd00000) 11:44:13:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' 11:44:13:Lustre: DEBUG MARKER: hostname 11:44:13:Lustre: DEBUG MARKER: test -b /dev/lvm-Role_MDS/P1 11:44:13:Lustre: DEBUG MARKER: mkdir -p /mnt/mds1; mount -t lustre /dev/lvm-Role_MDS/P1 /mnt/mds1 11:44:13:LDISKFS-fs (dm-0): recovery complete 11:44:13:LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. quota=on. Opts: 11:44:13:Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u 11:44:13:Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1 11:44:13:Lustre: DEBUG MARKER: e2label /dev/lvm-Role_MDS/P1 2>/dev/null 11:44:13:Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 4 sec 11:44:13:Lustre: DEBUG MARKER: /usr/sbin/lctl mark mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 4 sec 11:44:13:Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 4 sec 11:44:13:Lustre: DEBUG MARKER: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 4 sec 11:44:13:Lustre: 2930:0:(client.c:2018:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1435664041/real 1435664041] req@ffff88006043d9c0 x1505397080987136/t0(0) o400->lustre-MDT0001-osp-MDT0000@10.1.4.127@tcp:24/4 lens 224/224 e 1 to 1 dl 1435664046 ref 1 fl Rpc:X/c0/ffffffff rc 0/-1 11:44:13:Lustre: 2930:0:(client.c:2018:ptlrpc_expire_one_request()) Skipped 35 previous similar messages 11:44:13:LustreError: 4290:0:(ldlm_lib.c:3030:target_bulk_io()) @@@ timeout on bulk WRITE after 100+0s req@ffff88006a3c3050 x1505397094695176/t0(0) o1000->lustre-MDT0001-mdtlov_UUID@10.1.4.127@tcp:638/0 lens 248/16608 e 4 to 0 dl 1435664093 ref 1 fl Interpret:/0/0 rc 0/0 11:44:13:LNet: Service thread pid 4290 completed after 100.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). 11:44:13:Lustre: 2930:0:(client.c:2018:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1435664645/real 1435664645] req@ffff88006043d9c0 x1505397080993520/t0(0) o400->lustre-MDT0001-osp-MDT0000@10.1.4.127@tcp:24/4 lens 224/224 e 1 to 1 dl 1435664647 ref 1 fl Rpc:X/c0/ffffffff rc 0/-1 12:24:19:Lustre: 2930:0:(client.c:2018:ptlrpc_expire_one_request()) Skipped 73 previous similar messages 12:24:19:Lustre: 2930:0:(client.c:2018:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1435665245/real 1435665245] req@ffff88006043d9c0 x1505397081000080/t0(0) o400->lustre-MDT0001-osp-MDT0000@10.1.4.127@tcp:24/4 lens 224/224 e 1 to 1 dl 1435665247 ref 1 fl Rpc:X/c0/ffffffff rc 0/-1 12:24:19:Lustre: 2930:0:(client.c:2018:ptlrpc_expire_one_request()) Skipped 99 previous similar messages 12:24:19:Lustre: DEBUG MARKER: /usr/sbin/lctl mark rpc : @@@@@@ FAIL: can\'t put import for mdc.lustre-MDT0001-mdc-*.mds_server_uuid into FULL state after 1475 sec, have REPLAY 12:24:19:Lustre: DEBUG MARKER: /usr/sbin/lctl mark rpc : @@@@@@ FAIL: can\'t put import for mdc.lustre-MDT0001-mdc-*.mds_server_uuid into FULL state after 1475 sec, have REPLAY_WAIT 12:24:19:Lustre: DEBUG MARKER: rpc : @@@@@@ FAIL: can't put import for mdc.lustre-MDT0001-mdc-*.mds_server_uuid into FULL state after 1475 sec, have REPLAY_WAIT 12:24:19:Lustre: DEBUG MARKER: rpc : @@@@@@ FAIL: can't put import for mdc.lustre-MDT0001-mdc-*.mds_server_uuid into FULL state after 1475 sec, have REPLAY
Landed for 2.8