Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12251

sanity-hsm tests fail with ‘Device or resource busy’

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.13.0, Lustre 2.12.1, Lustre 2.12.3, Lustre 2.14.0, Lustre 2.12.4, Lustre 2.12.5
    • PPC clients
    • 3
    • 9223372036854775807

    Description

      Many sanity-hsm tests fail with a variety of failure messages about release file failures, but all have the common error message ‘Device or resource busy’. We see this for PPC client testing only.

      sanity-hsm test_1a, 1b, 1d, 12q, 21, 22, 23 and 58 all fail with error messages similar to

      Cannot send HSM request (use of /mnt/lustre/d1a.sanity-hsm/f1a.sanity-hsm): Device or resource busy
       sanity-hsm test_1a: @@@@@@ FAIL: could not release file 
      

      sanity-hsm test_12c, 12f, 12g, 12h, 12m, and 12o all fail with error messages similar to

      Cannot send HSM request (use of /mnt/lustre/d12f.sanity-hsm/f12f.sanity-hsm): Device or resource busy
       sanity-hsm test_12f: @@@@@@ FAIL: release of /mnt/lustre/d12f.sanity-hsm/f12f.sanity-hsm failed 

      sanity-hsm test_12p, 24a, 24e, 24f, and 37 all fail with error messages similar to

      Cannot send HSM request (use of /mnt/lustre/d12p.sanity-hsm/f12p.sanity-hsm): Device or resource busy
       sanity-hsm test_12p: @@@@@@ FAIL: cannot release /mnt/lustre/d12p.sanity-hsm/f12p.sanity-hsm 
      

      sanity-hsm test_24b, 30c, and 228 all fail with error messages similar to

      Cannot send HSM request (use of /mnt/lustre/d24b.sanity-hsm/f24b.sanity-hsm): Device or resource busy
       sanity-hsm test_24b: @@@@@@ FAIL: hsm flags on /mnt/lustre/d24b.sanity-hsm/f24b.sanity-hsm are 0x00000009 != 0x0000000d 
      

      sanity-hsm test_25b fails with

      Cannot send HSM request (use of /mnt/lustre/d25b.sanity-hsm/f25b.sanity-hsm): Device or resource busy
      0006f7423be4c48158f1a88ef512b3fe  /mnt/lustre/d25b.sanity-hsm/f25b.sanity-hsm
       sanity-hsm test_25b: @@@@@@ FAIL: lost file access should failed (returns 0) 
      

      sanity-hsm test_57 fails with

      trevis-77vm2: Cannot send HSM request (use of /mnt/lustre/d57.sanity-hsm/test_archive_remote): Device or resource busy
       sanity-hsm test_57: @@@@@@ FAIL: hsm_release failed 
      

      sanity-hsm test_90 fails with

      Cannot send HSM request (use of /mnt/lustre/d90.sanity-hsm/f90.sanity-hsm.1): Device or resource busy
       sanity-hsm test_90: @@@@@@ FAIL: cannot release a file list 
      

      Looking at logs for a recent failure, https://testing.whamcloud.com/test_sets/82a14510-668f-11e9-8bb1-52540065bddc, there is nothing obviously wrong or error messages in the node console logs. The only thing that seems out of place is that, on the client 1 (vm1) console log, we see OSC reconnect messages for several of these failures

      [  342.773914] Lustre: DEBUG MARKER: == sanity-hsm test 1a: mmap
      [  342.854968] Lustre: lustre-OST0003-osc-c00000007461d000: reconnect after 25s idle
      [  344.521132] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity-hsm test_1a: @@@@@@ FAIL: could not release file 
      

      Logs for other failures are at
      https://testing.whamcloud.com/test_sets/0352f31e-6322-11e9-8bb1-52540065bddc
      https://testing.whamcloud.com/test_sets/00dc12be-65bb-11e9-a6f9-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: