Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 2.9.0
-
None
-
Server/client : RHEL 7, master branch, build# 39282
-
3
-
9223372036854775807
Description
This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>
This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/ee2b9da2-25b6-11e6-a3be-5254006e85c2.
The sub-test test_0a failed with the following error:
test failed to respond and timed out
Client dmesg:
[ 4291.067685] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: == replay-single test 0a: empty replay =============================================================== 08:54:47 (1464512087) [ 4291.460662] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: mcreate /mnt/lustre/fsa-$(hostname); rm /mnt/lustre/fsa-$(hostname) [ 4291.727658] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: if [ -d /mnt/lustre2 ]; then mcreate /mnt/lustre2/fsa-$(hostname); rm /mnt/lustre2/fsa-$(hostname); fi [ 4292.566872] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: local REPLAY BARRIER on lustre-MDT0000 [ 4298.096880] LustreError: 11-0: lustre-MDT0000-mdc-ffff8800558b5800: operation obd_ping to node 10.9.4.84@tcp failed: rc = -107 [ 4298.104735] Lustre: lustre-MDT0000-mdc-ffff8800558b5800: Connection to lustre-MDT0000 (at 10.9.4.84@tcp) was lost; in progress operations using this service will wait for recovery to complete [ 4300.110135] Lustre: 22656:0:(client.c:2067:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1464512089/real 1464512089] req@ffff88007912e100 x1535651801466544/t0(0) o400->MGC10.9.4.84@tcp@10.9.4.84@tcp:26/25 lens 224/224 e 0 to 1 dl 1464512096 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [ 4300.125416] LustreError: 166-1: MGC10.9.4.84@tcp: Connection to MGS (at 10.9.4.84@tcp) was lost; in progress operations using this service will fail [ 4306.131160] Lustre: 22655:0:(client.c:2067:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1464512096/real 1464512096] req@ffff88007912f000 x1535651801466784/t0(0) o250->MGC10.9.4.84@tcp@10.9.4.84@tcp:26/25 lens 520/544 e 0 to 1 dl 1464512102 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [ 4309.041466] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib [ 4309.311583] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: lctl get_param -n at_max [ 4310.133413] Lustre: Evicted from MGS (at 10.9.4.84@tcp) after server handle changed from 0x3cefc44a406cb1f5 to 0x3cefc44a406cb671 [ 4310.144216] Lustre: MGC10.9.4.84@tcp: Connection restored to 10.9.4.84@tcp (at 10.9.4.84@tcp) [ 4312.245146] Lustre: lustre-MDT0000-mdc-ffff8800558b5800: Connection restored to 10.9.4.84@tcp (at 10.9.4.84@tcp) [ 4312.528473] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: /usr/sbin/lctl mark trevis-8vm1.trevis.hpdd.intel.com: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 3 sec [ 4312.670970] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 3 sec [ 7896.917424] SysRq : Show State
This issue started from 05/11/2016 and since then it is occurring around every alternative day, i.e. once in 2 days.