Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5373

Failure on test suite sanity test_33b: FAIL: test_33b failed with 2

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.7.0
    • Lustre 2.6.0
    • None
    • server: lustre-b2_6-rc2 RHEL6 ldiskfs
      client: SLES11 SP3
    • 3
    • 14978

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/cd5613b2-0dd2-11e4-972c-5254006e85c2.

      The sub-test test_33b failed with the following error:

      test_33b failed with 2

      == sanity test 33b: test open file with malformed flags (No panic) =================================== 18:24:17 (1405473857)
      running as uid/gid/euid/egid 500/500/500/500, groups:
       [openfile] [-f] [1286739555] [/mnt/lustre/d33/f33]
      Error in opening file "/mnt/lustre/d33/f33"(flags=1286739555) 2: No such file or directory
       sanity test_33b: @@@@@@ FAIL: test_33b failed with 2 
      

      Attachments

        Activity

          [LU-5373] Failure on test suite sanity test_33b: FAIL: test_33b failed with 2

          Patch landed to Master.

          jlevi Jodi Levi (Inactive) added a comment - Patch landed to Master.

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12992/
          Subject: LU-5373 test: ignore command return value in sanity test_33b
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 9b1569f56a1504e89e29c769900fedcbaad4abe7

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12992/ Subject: LU-5373 test: ignore command return value in sanity test_33b Project: fs/lustre-release Branch: master Current Patch Set: Commit: 9b1569f56a1504e89e29c769900fedcbaad4abe7

          Bob, I'd prefer to keep the test around. The whole point of a test that always passes is that you know when it fails in the future.

          adilger Andreas Dilger added a comment - Bob, I'd prefer to keep the test around. The whole point of a test that always passes is that you know when it fails in the future.

          following Andreas' suggestion I've pushed a simple fix that just ignores the command return value. If anybody objects to this solution please add a review comment.

          Still think it might be better to just delete the test since it doesn't seem to be testing anything useful.

          bogl Bob Glossman (Inactive) added a comment - following Andreas' suggestion I've pushed a simple fix that just ignores the command return value. If anybody objects to this solution please add a review comment. Still think it might be better to just delete the test since it doesn't seem to be testing anything useful.

          Bob Glossman (bob.glossman@intel.com) uploaded a new patch: http://review.whamcloud.com/12992
          Subject: LU-5373 test: ignore command return value in sanity test_33b
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 8eb4a5d72b3bfbe4febc921391da0fbb329832e8

          gerrit Gerrit Updater added a comment - Bob Glossman (bob.glossman@intel.com) uploaded a new patch: http://review.whamcloud.com/12992 Subject: LU-5373 test: ignore command return value in sanity test_33b Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 8eb4a5d72b3bfbe4febc921391da0fbb329832e8

          Andreas, yes you are correct. I think it may be as simple as teaching the test to ignore the return value of $OPENFILE command. As far as I can tell this test is just trying to see a panic doesn't happen, it doesn't really care of the command succeeds or not. Or at least it shouldn't care.

          I'm just wondering if it's even worth it to keep this test around. I've never seen it actually cause the panic it's checking against. Suspect it went in a long time ago to look for a problem that was fixed a long time ago.

          bogl Bob Glossman (Inactive) added a comment - Andreas, yes you are correct. I think it may be as simple as teaching the test to ignore the return value of $OPENFILE command. As far as I can tell this test is just trying to see a panic doesn't happen, it doesn't really care of the command succeeds or not. Or at least it shouldn't care. I'm just wondering if it's even worth it to keep this test around. I've never seen it actually cause the panic it's checking against. Suspect it went in a long time ago to look for a problem that was fixed a long time ago.

          Bob, is this just a matter of fixing the test to ignore the return code of this test? The test comment is "no panic", which we would detect via "/proc/sys/lnet/catastrophe", so it might be enough to add "|| true" at the end.

          adilger Andreas Dilger added a comment - Bob, is this just a matter of fixing the test to ignore the return code of this test? The test comment is "no panic", which we would detect via "/proc/sys/lnet/catastrophe", so it might be enough to add "|| true" at the end.
          bogl Bob Glossman (Inactive) added a comment - - edited

          I'm starting to think this test is invalid in kernels newer than 2.6. I see consistent, repeatable failures in any manual invocation of openfile with any illegal combination of open flags. The set of illegal flags in test 33b is just one example. I see similar errors with opens not on lustre, in /tmp or other directories. It always fails and returns ENOENT. This is seen in sles11sp3 (3.0 kernels), sles12 (3.12 kernels), and el7 (3.10 kernels).

          Strongly suspect generic open code is more picky about open flags in newer kernels and is returning an error before ever reaching lustre.

          bogl Bob Glossman (Inactive) added a comment - - edited I'm starting to think this test is invalid in kernels newer than 2.6. I see consistent, repeatable failures in any manual invocation of openfile with any illegal combination of open flags. The set of illegal flags in test 33b is just one example. I see similar errors with opens not on lustre, in /tmp or other directories. It always fails and returns ENOENT. This is seen in sles11sp3 (3.0 kernels), sles12 (3.12 kernels), and el7 (3.10 kernels). Strongly suspect generic open code is more picky about open flags in newer kernels and is returning an error before ever reaching lustre.
          bogl Bob Glossman (Inactive) added a comment - - edited seen again in master, el7 client: https://testing.hpdd.intel.com/test_sets/31a088ae-6c81-11e4-8bd3-5254006e85c2
          mdiep Minh Diep added a comment - it'd seen here with el7 client https://testing.hpdd.intel.com/test_sets/06458494-2fc7-11e4-957a-5254006e85c2

          People

            bogl Bob Glossman (Inactive)
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: