Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8222

replay-single test_0a: test failed to respond and timed out

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.9.0
    • None
    • Server/client : RHEL 7, master branch, build# 39282
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/ee2b9da2-25b6-11e6-a3be-5254006e85c2.

      The sub-test test_0a failed with the following error:

      test failed to respond and timed out
      

      Client dmesg:

      [ 4291.067685] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: == replay-single test 0a: empty replay =============================================================== 08:54:47 (1464512087)
      [ 4291.460662] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: mcreate /mnt/lustre/fsa-$(hostname); rm /mnt/lustre/fsa-$(hostname)
      [ 4291.727658] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: if [ -d /mnt/lustre2 ]; then mcreate /mnt/lustre2/fsa-$(hostname); rm /mnt/lustre2/fsa-$(hostname); fi
      [ 4292.566872] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: local REPLAY BARRIER on lustre-MDT0000
      [ 4298.096880] LustreError: 11-0: lustre-MDT0000-mdc-ffff8800558b5800: operation obd_ping to node 10.9.4.84@tcp failed: rc = -107
      [ 4298.104735] Lustre: lustre-MDT0000-mdc-ffff8800558b5800: Connection to lustre-MDT0000 (at 10.9.4.84@tcp) was lost; in progress operations using this service will wait for recovery to complete
      [ 4300.110135] Lustre: 22656:0:(client.c:2067:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1464512089/real 1464512089]  req@ffff88007912e100 x1535651801466544/t0(0) o400->MGC10.9.4.84@tcp@10.9.4.84@tcp:26/25 lens 224/224 e 0 to 1 dl 1464512096 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      [ 4300.125416] LustreError: 166-1: MGC10.9.4.84@tcp: Connection to MGS (at 10.9.4.84@tcp) was lost; in progress operations using this service will fail
      [ 4306.131160] Lustre: 22655:0:(client.c:2067:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1464512096/real 1464512096]  req@ffff88007912f000 x1535651801466784/t0(0) o250->MGC10.9.4.84@tcp@10.9.4.84@tcp:26/25 lens 520/544 e 0 to 1 dl 1464512102 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      [ 4309.041466] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib
      [ 4309.311583] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: lctl get_param -n at_max
      [ 4310.133413] Lustre: Evicted from MGS (at 10.9.4.84@tcp) after server handle changed from 0x3cefc44a406cb1f5 to 0x3cefc44a406cb671
      [ 4310.144216] Lustre: MGC10.9.4.84@tcp: Connection restored to 10.9.4.84@tcp (at 10.9.4.84@tcp)
      [ 4312.245146] Lustre: lustre-MDT0000-mdc-ffff8800558b5800: Connection restored to 10.9.4.84@tcp (at 10.9.4.84@tcp)
      [ 4312.528473] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: /usr/sbin/lctl mark trevis-8vm1.trevis.hpdd.intel.com: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 3 sec
      [ 4312.670970] Lustre: DEBUG MARKER: trevis-8vm1.trevis.hpdd.intel.com: mdc.lustre-MDT0000-mdc-*.mds_server_uuid in FULL state after 3 sec
      [ 7896.917424] SysRq : Show State
      

      This issue started from 05/11/2016 and since then it is occurring around every alternative day, i.e. once in 2 days.

      Attachments

        Activity

          People

            wc-triage WC Triage
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: