Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11725

replay-single test 41 fails with 'dd on client failed'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.10.6
    • None
    • failover test group configuration
    • 3
    • 9223372036854775807

    Description

      replay-single test_41 fails for failover test sessions with error message 'dd on client failed' .

      Looking at the client test_log for https://testing.whamcloud.com/test_sets/57dc5c3c-ee96-11e8-86c0-52540065bddc , we see

      == replay-single test 41: read from a valid osc while other oscs are invalid == 17:33:32 (1542908012)
      error on ioctl 0x4008669a for '/mnt/lustre/f41.replay-single' (3): No space left on device
      error: setstripe: create striped file '/mnt/lustre/f41.replay-single' failed: No space left on device
      CMD: trevis-34vm1.trevis.whamcloud.com dd if=/dev/zero of=/mnt/lustre/f41.replay-single bs=4k count=1
      dd: opening `/mnt/lustre/f41.replay-single': No space left on device
       replay-single test_41: @@@@@@ FAIL: dd on client failed 
      

      Most likely, there is not an OST that is full because, looking at the suite_log, we see that there is space on every OST in test 39

      == replay-single test 39: test recovery from unlink llog (test llog_gen_rec) == 17:32:13 (1542907933)
      total: 800 open/close in 1.19 seconds: 674.75 ops/second
      CMD: trevis-34vm8 sync; sync; sync
      UUID                   1K-blocks        Used   Available Use% Mounted on
      lustre-MDT0000_UUID      5825660       47228     5255600   1% /mnt/lustre[MDT:0]
      lustre-OST0000_UUID      1933276       25792     1786244   1% /mnt/lustre[OST:0]
      lustre-OST0001_UUID      1933276       25792     1786244   1% /mnt/lustre[OST:1]
      lustre-OST0002_UUID      1933276       25784     1786028   1% /mnt/lustre[OST:2]
      lustre-OST0003_UUID      1933276       25784     1786028   1% /mnt/lustre[OST:3]
      lustre-OST0004_UUID      1933276       25784     1786028   1% /mnt/lustre[OST:4]
      lustre-OST0005_UUID      1933276       25784     1786028   1% /mnt/lustre[OST:5]
      lustre-OST0006_UUID      1933276       25832     1786204   1% /mnt/lustre[OST:6]
      
      filesystem_summary:     13532932      180552    12502804   1% /mnt/lustre
      

      test 40 is skipped and both tests 41 and 42 fail with errors indicating that an OST is full. Then test 43 again shows us that there is not a full OST

      == replay-single test 43: mds osc import failure during recovery; don't LBUG == 17:33:37 (1542908017)
      CMD: trevis-34vm7 sync; sync; sync
      UUID                   1K-blocks        Used   Available Use% Mounted on
      lustre-MDT0000_UUID      5825660       47292     5255536   1% /mnt/lustre[MDT:0]
      lustre-OST0000_UUID      1933276       25792     1786244   1% /mnt/lustre[OST:0]
      lustre-OST0001_UUID      1933276       25792     1786244   1% /mnt/lustre[OST:1]
      lustre-OST0002_UUID      1933276       25784     1786028   1% /mnt/lustre[OST:2]
      lustre-OST0003_UUID      1933276       25784     1786028   1% /mnt/lustre[OST:3]
      lustre-OST0004_UUID      1933276       25784     1786028   1% /mnt/lustre[OST:4]
      lustre-OST0005_UUID      1933276       25784     1786028   1% /mnt/lustre[OST:5]
      lustre-OST0006_UUID      1933276       25832     1786204   1% /mnt/lustre[OST:6]
      
      filesystem_summary:     13532932      180552    12502804   1% /mnt/lustre
      

      Looking at the MDS (vm7) console log, we see the following for both test 41 and test 42

      [  128.628474] Lustre: DEBUG MARKER: == replay-single test 41: read from a valid osc while other oscs are invalid == 17:33:32 (1542908012)
      [  128.639586] LustreError: 2271:0:(lod_qos.c:1354:lod_alloc_specific()) can't lstripe objid [0x200050929:0x643:0x0]: have 0 want 1
      [  128.874693] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  replay-single test_41: @@@@@@ FAIL: dd on client failed 
      

      There are several JIRA tickets that have similar failures and same message in the MDS console log. For example, this looks like LU-10613, but we are seeing an error on ioctl for this failure which is not seen in LU-10613.

      We have seen this failure and error messages in other test sessions
      https://testing.whamcloud.com/test_sets/0ea8f4f0-d350-11e8-b589-52540065bddc
      https://testing.whamcloud.com/test_sets/5892b56a-ba69-11e8-9df3-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: