Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7105

sanityn test_28 fails with 'error() without useful message, please fix'

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.8.0
    • autotest
    • 3
    • 9,977
    • 9223372036854775807

    Description

      sanityn test 28 was recently removed from the ALWAYS_EXCEPT list by accident and is still failing. There is no real error message, but the output from the test on failure is

      'error() without useful message, please fix' 
      

      Recently, there are many examples of this test failing and, thus, many logs of the failures. Here are just a couple:
      https://testing.hpdd.intel.com/test_sets/d0ec87b2-530f-11e5-8228-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/7a462a70-5301-11e5-b798-5254006e85c2

      From the test log output, it’s clear that this test needs to be updated; newdev was removed as an option to lctl many years ago:

      == sanityn test 28: read/write/truncate file with lost stripes == 08:31:03 (1441355463)
      2+0 records in
      2+0 records out
      2097152 bytes (2.1 MB) copied, 0.0383377 s, 54.7 MB/s
      No such command, type help
      error: setup: Operation already in progress
      error: destroy: invalid objid '12745:0'
      destroy OST object <objid> [num [verbose]]
      usage: destroy <num> objects, starting at objid <objid>
      run <command> after connecting to device <devno>
      --device <devno> <command [args ...]>
      

      Until we fix the obvious issues, we don’t really know if the original bug/reason for ALWAYS_EXCEPT test 28 is still valid.

      In sanityn, the reason for putting this test on the ALWAYS_EXCEPT list is due to bz=9977.

      Attachments

        Issue Links

          Activity

            [LU-7105] sanityn test_28 fails with 'error() without useful message, please fix'

            The remnants of test_28 should be removed from lustre/tests/sanityn.sh:

            • remove test_28() itself
            • remove "ALWAYS_EXCEPT+= 28" and "LU-7105" on the previous line
            adilger Andreas Dilger added a comment - The remnants of test_28 should be removed from lustre/tests/sanityn.sh : remove test_28() itself remove " ALWAYS_EXCEPT+= 28 " and " LU-7105 " on the previous line
            fdilger Fred Dilger added a comment -

            test has not been supported since 06/05/2022: https://review.whamcloud.com/c/fs/lustre-release/+/47239

            fdilger Fred Dilger added a comment - test has not been supported since 06/05/2022: https://review.whamcloud.com/c/fs/lustre-release/+/47239
            tappro Mikhail Pershin added a comment - - edited

            While working on unrelated test fixes I was trying to reanimate test_28 by deleting OST object with debugfs but test is still failing. So in general the idea of test is that missing stripe should return error while reading from it but can be recreated by writing to it. It also says something about truncate in test name but there is no truncate in test actually. By using debugfs I remove stripe #2 of file and then get the following:

            # read from stripe #1, successful
            1048576 bytes (1,0 MB) copied, 0,00574064 s, 183 MB/s
            # read from stripe #2 failed as expected
            dd: cannot fstat '/mnt/lustre2/f28.sanityn': No such file or directory
            # write to both stripes again fails also with ENOENT
            dd: failed to open '/mnt/lustre/f28.sanityn': No such file or directory
             sanityn test_28: @@@@@@ FAIL: re-creating write failed 
            
            

            I am not sure how it should really work actually. Is that really error that write is failed or maybe it shouldn't work and test is just obsolete

            patch for test is attached to the ticket

            tappro Mikhail Pershin added a comment - - edited While working on unrelated test fixes I was trying to reanimate test_28 by deleting OST object with debugfs but test is still failing. So in general the idea of test is that missing stripe should return error while reading from it but can be recreated by writing to it. It also says something about truncate in test name but there is no truncate in test actually. By using debugfs I remove stripe #2 of file and then get the following: # read from stripe #1, successful 1048576 bytes (1,0 MB) copied, 0,00574064 s, 183 MB/s # read from stripe #2 failed as expected dd: cannot fstat '/mnt/lustre2/f28.sanityn' : No such file or directory # write to both stripes again fails also with ENOENT dd: failed to open '/mnt/lustre/f28.sanityn' : No such file or directory  sanityn test_28: @@@@@@ FAIL: re-creating write failed  I am not sure how it should really work actually. Is that really error that write is failed or maybe it shouldn't work and test is just obsolete patch for test is attached to the ticket

            It is possible to use fail_loc added for LFSCK to create files that are missing stripes. That would be a lot less heavyweight than configuring the echo_client to delete one object.

            adilger Andreas Dilger added a comment - It is possible to use fail_loc added for LFSCK to create files that are missing stripes. That would be a lot less heavyweight than configuring the echo_client to delete one object.

            People

              fdilger Fred Dilger
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: