Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11788

sanity test 104a fails with ‘lfs df failed’

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.12.0
    • 3
    • 9223372036854775807

    Description

      sanity test_104a fails with ‘lfs df failed’ for ARM clients. We’ve only seen this once, https://testing.whamcloud.com/test_sets/ec527b2c-fdef-11e8-b837-52540065bddc , in the past four months. After test 104a fails, a series of other tests fail 107, 118k, 118i, 119c, 119d, 120a, 123a, 124a, 124b, 129, 130a/b/d/e, 131a/d/e, 133a/b/c/d, and test 133g hangs.

      It’s clear from the suite_log that there is something wrong with some of the MDTs/MDSs at the beginning of the test

      == sanity test 104a: lfs df [-ih] [path] test ======================================================== 19:18:31 (1544469511)
      UUID                   1K-blocks        Used   Available Use% Mounted on
      lustre-MDT0000_UUID      1165900       22572     1040132   2% /mnt/lustre[MDT:0]
      lustre-MDT0001_UUID : Input/output error
      lustre-MDT0002_UUID : Input/output error
      lustre-MDT0003_UUID : Input/output error
      lustre-OST0000_UUID      1933276       34688     1777348   2% /mnt/lustre[OST:0]
      lustre-OST0001_UUID      1933276       45700     1766336   3% /mnt/lustre[OST:1]
      lustre-OST0002_UUID      1933276       37288     1774748   2% /mnt/lustre[OST:2]
      lustre-OST0003_UUID      1933276       31024     1781012   2% /mnt/lustre[OST:3]
      lustre-OST0004_UUID      1933276       30168     1781868   2% /mnt/lustre[OST:4]
      lustre-OST0005_UUID      1933276       40068     1771968   2% /mnt/lustre[OST:5]
      lustre-OST0006_UUID      1933276       41116     1770920   2% /mnt/lustre[OST:6]
      lustre-OST0007_UUID      1933276       32700     1779336   2% /mnt/lustre[OST:7]
      
      filesystem_summary:     15466208      292752    14203536   2% /mnt/lustre
      
       sanity test_104a: @@@@@@ FAIL: lfs df failed 
      

      So, it’s no surprise that ‘lfs df’ failed.

      sanity test 104a does deactivate an OST and should expect to see that the OST is in a’FULL” state, but, in this test session log, we see a connection restored message from MDTs. From MDS2, 4 (vm5), we see

      [ 8408.485447] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity test 104a: lfs df [-ih] [path] test ======================================================== 19:18:31 \(1544469511\)
      [ 8409.091430] Lustre: DEBUG MARKER: == sanity test 104a: lfs df [-ih] [path] test ======================================================== 19:18:31 (1544469511)
      [ 8409.525890] Lustre: lustre-MDT0001: Connection restored to 310dfc65-ad7d-537f-6815-c0fd7f0fb43b (at 10.9.8.38@tcp)
      [ 8409.940081] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity test_104a: @@@@@@ FAIL: lfs df failed 
      

      with a similar message from MDS1, 3 (vm4)

      [ 8408.748317] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity test 104a: lfs df [-ih] [path] test ======================================================== 19:18:31 \(1544469511\)
      [ 8409.370355] Lustre: DEBUG MARKER: == sanity test 104a: lfs df [-ih] [path] test ======================================================== 19:18:31 (1544469511)
      [ 8409.831727] Lustre: lustre-MDT0002: Connection restored to 56904d5f-959b-023e-bc98-099190cbfba6 (at 10.9.8.38@tcp)
      [ 8409.832746] Lustre: Skipped 2 previous similar messages
      [ 8410.186046] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity test_104a: @@@@@@ FAIL: lfs df failed 
      

      Attachments

        Issue Links

          Activity

            People

              xinliang Xinliang Liu
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: