Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5375

Failure on test suite sanity test_151 test_156: roc_hit is not safe to use

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.6.0, Lustre 2.8.0
    • None
    • client and server: lustre-b2_6-rc2 ldiskfs
      client is SLES11 SP3
    • 3
    • 14980

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/97672104-0dca-11e4-b3f5-5254006e85c2.

      The sub-test test_151 failed with the following error:

      roc_hit is not safe to use

      == sanity test 151: test cache on oss and controls ================================= 19:31:03 (1405477863)
      CMD: onyx-40vm8 /usr/sbin/lctl get_param -n obdfilter.lustre-OST*.read_cache_enable 		osd-*.lustre-OST*.read_cache_enable 2>&1
      CMD: onyx-40vm8 /usr/sbin/lctl get_param -n obdfilter.lustre-OST*.read_cache_enable 		osd-*.lustre-OST*.read_cache_enable 2>&1
      CMD: onyx-40vm8 /usr/sbin/lctl set_param -n obdfilter.lustre-OST*.writethrough_cache_enable=1 		osd-*.lustre-OST*.writethrough_cache_enable=1 2>&1
      CMD: onyx-40vm8 /usr/sbin/lctl get_param -n obdfilter.lustre-OST*.writethrough_cache_enable 		osd-*.lustre-OST*.writethrough_cache_enable 2>&1
      4+0 records in
      4+0 records out
      16384 bytes (16 kB) copied, 0.00947514 s, 1.7 MB/s
      CMD: onyx-40vm8 /usr/sbin/lctl get_param -n obdfilter.*OST*0000.stats 		osd-*.*OST*0000.stats 2>&1
      CMD: onyx-40vm8 /usr/sbin/lctl get_param -n obdfilter.*OST*0000.stats 		osd-*.*OST*0000.stats 2>&1
      BEFORE:11 AFTER:12
       sanity test_151: @@@@@@ FAIL: roc_hit is not safe to use 
      

      Attachments

        Issue Links

          Activity

            [LU-5375] Failure on test suite sanity test_151 test_156: roc_hit is not safe to use

            Recent failures were triggered by LU-11607 patch landing, but it turns out the problem was in the original LU-2261 patch. See LU-11889 for details.

            adilger Andreas Dilger added a comment - Recent failures were triggered by LU-11607 patch landing, but it turns out the problem was in the original LU-2261 patch. See LU-11889 for details.

            I'm seeing this bug again  in 2.12.50 testing.

            simmonsja James A Simmons added a comment - I'm seeing this bug again  in 2.12.50 testing.

            Recent failures reporting this ticket were caused by LU-11347 patch, this hasn't been hit in a long time.

            adilger Andreas Dilger added a comment - Recent failures reporting this ticket were caused by LU-11347 patch, this hasn't been hit in a long time.

            Can we close this?

            simmonsja James A Simmons added a comment - Can we close this?
            standan Saurabh Tandan (Inactive) added a comment - - edited

            Another instance found for interop - EL6.7 Server/2.5.5 Client, tag 2.7.90.
            https://testing.hpdd.intel.com/test_sessions/f99a2d60-d567-11e5-bc47-5254006e85c2
            Another instance found for interop - EL7 Server/2.5.5 Client, tag 2.7.90.
            https://testing.hpdd.intel.com/test_sessions/93baffee-d2ae-11e5-8697-5254006e85c2

            standan Saurabh Tandan (Inactive) added a comment - - edited Another instance found for interop - EL6.7 Server/2.5.5 Client, tag 2.7.90. https://testing.hpdd.intel.com/test_sessions/f99a2d60-d567-11e5-bc47-5254006e85c2 Another instance found for interop - EL7 Server/2.5.5 Client, tag 2.7.90. https://testing.hpdd.intel.com/test_sessions/93baffee-d2ae-11e5-8697-5254006e85c2

            Another instance found for interop tag 2.7.66 - EL6.7 Server/2.5.5 Client, build# 3316
            https://testing.hpdd.intel.com/test_sets/9ed7c1d8-cc9f-11e5-963e-5254006e85c2

            Another instance found for interop tag 2.7.66 - EL7 Server/2.5.5 Client, build# 3316
            https://testing.hpdd.intel.com/test_sets/5ea975e2-cc46-11e5-901d-5254006e85c2

            standan Saurabh Tandan (Inactive) added a comment - - edited Another instance found for interop tag 2.7.66 - EL6.7 Server/2.5.5 Client, build# 3316 https://testing.hpdd.intel.com/test_sets/9ed7c1d8-cc9f-11e5-963e-5254006e85c2 Another instance found for interop tag 2.7.66 - EL7 Server/2.5.5 Client, build# 3316 https://testing.hpdd.intel.com/test_sets/5ea975e2-cc46-11e5-901d-5254006e85c2

            Another instance found for interop : EL6.7 Server/2.5.5 Client
            Server: master, build# 3303, RHEL 6.7
            Client: 2.5.5, b2_5_fe/62
            https://testing.hpdd.intel.com/test_sets/24b4b54e-bad6-11e5-9137-5254006e85c2

            standan Saurabh Tandan (Inactive) added a comment - - edited Another instance found for interop : EL6.7 Server/2.5.5 Client Server: master, build# 3303, RHEL 6.7 Client: 2.5.5, b2_5_fe/62 https://testing.hpdd.intel.com/test_sets/24b4b54e-bad6-11e5-9137-5254006e85c2

            globstrerr only handled 3 error cases. The move to cfs_get_paths() expanded the possible errors. I have a working solution now. Just pushed the patch.

            simmonsja James A Simmons added a comment - globstrerr only handled 3 error cases. The move to cfs_get_paths() expanded the possible errors. I have a working solution now. Just pushed the patch.

            No, because bash always returns positive error numbers and not negative ones. You could check [ $? -ne 0 ] but that might as well just be return $?, which is also the default behaviour when returning from a function - to return the exit code from the last function. The other question is whether "lctl set_param" actually returns an error code on errors, or just prints a message?

            In this case, you might be better off using | egrep -v 'Found no match|no such file or directory' or similar, to ensure it works for both old and new lctl, since this will also run in interop mode with servers that do not have your patches. Is there a reason you got rid of globerrstr() and went to strerror()?

            adilger Andreas Dilger added a comment - No, because bash always returns positive error numbers and not negative ones. You could check [ $? -ne 0 ] but that might as well just be return $? , which is also the default behaviour when returning from a function - to return the exit code from the last function. The other question is whether "lctl set_param" actually returns an error code on errors, or just prints a message? In this case, you might be better off using | egrep -v 'Found no match|no such file or directory' or similar, to ensure it works for both old and new lctl, since this will also run in interop mode with servers that do not have your patches. Is there a reason you got rid of globerrstr() and went to strerror()?
            simmonsja James A Simmons added a comment - - edited

            I'm thinking the "grep -v 'Found no match'" test might not always work. I'm exploring testing the return value "$?" of the command. I like to test to see if "$?" is less than zero. Would something like this work?

            do_nodes $nodes "$LCTL set_param -n obdfilter.$device.$name=$value \
            -              osd-*.$device.$name=$value 2>&1" | grep -v 'Found no match'
            +              osd-*.$device.$name=$value 2>&1" || return [ $? -lt 0 ]
            

            Sorry not the greatest bash scripter.

            simmonsja James A Simmons added a comment - - edited I'm thinking the "grep -v 'Found no match'" test might not always work. I'm exploring testing the return value "$?" of the command. I like to test to see if "$?" is less than zero. Would something like this work? do_nodes $nodes "$LCTL set_param -n obdfilter.$device.$name=$value \ - osd-*.$device.$name=$value 2>&1" | grep -v 'Found no match' + osd-*.$device.$name=$value 2>&1" || return [ $? -lt 0 ] Sorry not the greatest bash scripter.

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: