[LU-8222] replay-single test_0a: test failed to respond and timed out Created: 31/May/16 Updated: 13/Oct/21 Resolved: 13/Oct/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.9.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Server/client : RHEL 7, master branch, build# 39282 |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/ee2b9da2-25b6-11e6-a3be-5254006e85c2. The sub-test test_0a failed with the following error: test failed to respond and timed out Client dmesg: [ 4291.067685] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: == replay-single test 0a: empty replay =============================================================== 08:54:47 (1464512087) [ 4291.460662] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: mcreate /mnt/lustre/fsa-$(hostname); rm /mnt/lustre/fsa-$(hostname) [ 4291.727658] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: if [ -d /mnt/lustre2 ]; then mcreate /mnt/lustre2/fsa-$(hostname); rm /mnt/lustre2/fsa-$(hostname); fi [ 4292.566872] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: local REPLAY BARRIER on lustre-MDT0000 [ 4298.096880] LustreError: 11-0: lustre-MDT0000-mdc-ffff8800558b5800: operation obd_ping to node 10.9.4.84@tcp failed: rc = -107 [ 4298.104735] Lustre: lustre-MDT0000-mdc-ffff8800558b5800: Connection to lustre-MDT0000 (at 10.9.4.84@tcp) was lost; in progress operations using this service will wait for recovery to complete [ 4300.110135] Lustre: 22656:0:(client.c:2067:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1464512089/real 1464512089] req@ffff88007912e100 x1535651801466544/t0(0) o400->MGC10.9.4.84@tcp@10.9.4.84@tcp:26/25 lens 224/224 e 0 to 1 dl 1464512096 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [ 4300.125416] LustreError: 166-1: MGC10.9.4.84@tcp: Connection to MGS (at 10.9.4.84@tcp) was lost; in progress operations using this service will fail [ 4306.131160] Lustre: 22655:0:(client.c:2067:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1464512096/real 1464512096] req@ffff88007912f000 x1535651801466784/t0(0) o250->MGC10.9.4.84@tcp@10.9.4.84@tcp:26/25 lens 520/544 e 0 to 1 dl 1464512102 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [ 4309.041466] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib [ 4309.311583] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: lctl get_param -n at_max [ 4310.133413] Lustre: Evicted from MGS (at 10.9.4.84@tcp) after server handle changed from 0x3cefc44a406cb1f5 to 0x3cefc44a406cb671 [ 4310.144216] Lustre: MGC10.9.4.84@tcp: Connection restored to 10.9.4.84@tcp (at 10.9.4.84@tcp) [ 4312.245146] Lustre: lustre-MDT0000-mdc-ffff8800558b5800: Connection restored to 10.9.4.84@tcp (at 10.9.4.84@tcp) [ 4312.528473] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: /usr/sbin/lctl mark trevis-8vm1.trevis.hpdd.intel.com: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 3 sec [ 4312.670970] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 3 sec [ 7896.917424] SysRq : Show State This issue started from 05/11/2016 and since then it is occurring around every alternative day, i.e. once in 2 days. |
| Comments |
| Comment by Bob Glossman (Inactive) [ 03/Jun/16 ] |
|
another on master: |
| Comment by nasf (Inactive) [ 06/Jun/16 ] |
|
Another failure instance on master: |
| Comment by Jian Yu [ 07/Jun/16 ] |
|
More failure on master: |