Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Cannot Reproduce
Priority: Minor
Fix Version/s: None
Affects Version/s: Lustre 2.9.0
Labels:
None
Environment:
Server/client : RHEL 7, master branch, build# 39282

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/ee2b9da2-25b6-11e6-a3be-5254006e85c2.

The sub-test test_0a failed with the following error:

test failed to respond and timed out

Client dmesg:

[ 4291.067685] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: == replay-single test 0a: empty replay =============================================================== 08:54:47 (1464512087)
[ 4291.460662] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: mcreate /mnt/lustre/fsa-$(hostname); rm /mnt/lustre/fsa-$(hostname)
[ 4291.727658] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: if [ -d /mnt/lustre2 ]; then mcreate /mnt/lustre2/fsa-$(hostname); rm /mnt/lustre2/fsa-$(hostname); fi
[ 4292.566872] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: local REPLAY BARRIER on lustre-MDT0000
[ 4298.096880] LustreError: 11-0: lustre-MDT0000-mdc-ffff8800558b5800: operation obd_ping to node 10.9.4.84@tcp failed: rc = -107
[ 4298.104735] Lustre: lustre-MDT0000-mdc-ffff8800558b5800: Connection to lustre-MDT0000 (at 10.9.4.84@tcp) was lost; in progress operations using this service will wait for recovery to complete
[ 4300.110135] Lustre: 22656:0:(client.c:2067:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1464512089/real 1464512089]  req@ffff88007912e100 x1535651801466544/t0(0) o400->MGC10.9.4.84@tcp@10.9.4.84@tcp:26/25 lens 224/224 e 0 to 1 dl 1464512096 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[ 4300.125416] LustreError: 166-1: MGC10.9.4.84@tcp: Connection to MGS (at 10.9.4.84@tcp) was lost; in progress operations using this service will fail
[ 4306.131160] Lustre: 22655:0:(client.c:2067:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1464512096/real 1464512096]  req@ffff88007912f000 x1535651801466784/t0(0) o250->MGC10.9.4.84@tcp@10.9.4.84@tcp:26/25 lens 520/544 e 0 to 1 dl 1464512102 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[ 4309.041466] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib
[ 4309.311583] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: lctl get_param -n at_max
[ 4310.133413] Lustre: Evicted from MGS (at 10.9.4.84@tcp) after server handle changed from 0x3cefc44a406cb1f5 to 0x3cefc44a406cb671
[ 4310.144216] Lustre: MGC10.9.4.84@tcp: Connection restored to 10.9.4.84@tcp (at 10.9.4.84@tcp)
[ 4312.245146] Lustre: lustre-MDT0000-mdc-ffff8800558b5800: Connection restored to 10.9.4.84@tcp (at 10.9.4.84@tcp)
[ 4312.528473] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: /usr/sbin/lctl mark trevis-8vm1.trevis.hpdd.intel.com: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 3 sec
[ 4312.670970] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 3 sec
[ 7896.917424] SysRq : Show State

This issue started from 05/11/2016 and since then it is occurring around every alternative day, i.e. once in 2 days.

Attachments

Activity

People

Assignee:: WC Triage

Reporter:: Maloo

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 31/May/16 7:03 PM

Updated:: 13/Oct/21 2:58 AM

Resolved:: 13/Oct/21 2:58 AM