Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13202

recovery-small test 65 fails with 'test_65 failed with 1'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.13.0, Lustre 2.12.3, Lustre 2.12.4, Lustre 2.16.0
    • 3
    • 9223372036854775807

    Description

      recovery-small test_65 fails with 'test_65 failed with 1'. Looking at the client test_log, we see

      == recovery-small test 65: lock enqueue for destroyed export ========================================= 02:06:47 (1580522807)
      Starting client: trevis-47vm5.trevis.whamcloud.com:  -o user_xattr,flock trevis-47vm11:trevis-47vm12:/lustre /mnt/lustre2
      CMD: trevis-47vm5.trevis.whamcloud.com mkdir -p /mnt/lustre2
      CMD: trevis-47vm5.trevis.whamcloud.com mount -t lustre -o user_xattr,flock trevis-47vm11:trevis-47vm12:/lustre /mnt/lustre2
      mount.lustre: mount trevis-47vm11:trevis-47vm12:/lustre at /mnt/lustre2 failed: Input/output error
      Is the MGS running?
      lfs setstripe: setstripe error for '/mnt/lustre2/f65.recovery-small': Inappropriate ioctl for device
       recovery-small test_65: @@@@@@ FAIL: test_65 failed with 1 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:5900:error()
        = /usr/lib64/lustre/tests/test-framework.sh:6202:run_one()
      

      Looking at a recent failure at https://testing.whamcloud.com/test_sets/e14f1d26-44d3-11ea-bffa-52540065bddc, we see some communication errors. Looking at dmesg for client1 (vm5), we see

      [263001.363134] Lustre: DEBUG MARKER: == recovery-small test 65: lock enqueue for destroyed export ========================================= 02:06:47 (1580522807)
      [263001.413695] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre2
      [263001.422615] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock trevis-47vm11:trevis-47vm12:/lustre /mnt/lustre2
      [263004.000599] LustreError: 15c-8: MGC10.9.3.131@tcp: The configuration from log 'lustre-client' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      [263004.005019] Lustre: Unmounted lustre-client
      [263004.006116] LustreError: 28814:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount  (-5)
      [263004.233327] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  recovery-small test_65: @@@@@@ FAIL: test_65 failed with 1 
      

      On the OSS (vm10) dmesg we see

      [91380.311234] Lustre: DEBUG MARKER: == recovery-small test 65: lock enqueue for destroyed export ========================================= 02:06:47 (1580522807)
      [91383.075128] Lustre: 8612:0:(client.c:2133:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1580522766/real 1580522766]  req@ffff8bd1d0144000 x1657293567846016/t0(0) o400->MGC10.9.3.131@tcp@10.9.3.131@tcp:26/25 lens 224/224 e 0 to 1 dl 1580522810 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      [91383.081667] Lustre: 8612:0:(client.c:2133:ptlrpc_expire_one_request()) Skipped 8 previous similar messages
      [91383.083309] LustreError: 166-1: MGC10.9.3.131@tcp: Connection to MGS (at 10.9.3.131@tcp) was lost; in progress operations using this service will fail
      [91383.085504] LustreError: Skipped 1 previous similar message
      [91383.186218] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  recovery-small test_65: @@@@@@ FAIL: test_65 failed with 1 
      

      We’ve seen this error since at least 16 JULY 2019. Here are links to a few failures
      https://testing.whamcloud.com/test_sets/8f74e7f4-1f28-11ea-b1e8-52540065bddc
      https://testing.whamcloud.com/test_sets/41964346-dcd1-11e9-b62b-52540065bddc
      https://testing.whamcloud.com/test_sets/350e54fa-acf8-11e9-8fc1-52540065bddc

      We see recovery-small test 65 fail for PPC client testing, but we don’t see the LustreError above; https://testing.whamcloud.com/test_sets/6599a846-ac16-11e9-861b-52540065bddc. So, I’m not sure if the PPC failures are the same as documented here.

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: