Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4421

Failure on test suite sanity test_120e: 1 blocking RPC occured

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • None
    • 3
    • 12139

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/8cbadcc4-706c-11e3-a3b4-52540035b04c.

      The sub-test test_120e failed with the following error:

      1 blocking RPC occured.

      test log shows:

      == sanity test 120e: Early Lock Cancel: unlink test == 10:25:53 (1388255153)
      ldlm.namespaces.lustre-MDT0000-mdc-ffff8800697aa400.lru_size=200
      ldlm.namespaces.lustre-MDT0001-mdc-ffff8800697aa400.lru_size=200
      ldlm.namespaces.lustre-MDT0002-mdc-ffff8800697aa400.lru_size=200
      ldlm.namespaces.lustre-MDT0003-mdc-ffff8800697aa400.lru_size=200
      ldlm.namespaces.lustre-OST0000-osc-ffff8800697aa400.lru_size=200
      ldlm.namespaces.lustre-OST0001-osc-ffff8800697aa400.lru_size=200
      ldlm.namespaces.lustre-OST0002-osc-ffff8800697aa400.lru_size=200
      ldlm.namespaces.lustre-OST0003-osc-ffff8800697aa400.lru_size=200
      ldlm.namespaces.lustre-OST0004-osc-ffff8800697aa400.lru_size=200
      ldlm.namespaces.lustre-OST0005-osc-ffff8800697aa400.lru_size=200
      ldlm.namespaces.lustre-OST0006-osc-ffff8800697aa400.lru_size=200
      ldlm.namespaces.lustre-OST0007-osc-ffff8800697aa400.lru_size=200
      1+0 records in
      1+0 records out
      512 bytes (512 B) copied, 0.00208266 s, 246 kB/s
      1+0 records in
      1+0 records out
      512 bytes (512 B) copied, 0.00540223 s, 94.8 kB/s
      error: get_param: /proc/{fs,sys}/{lnet,lustre}/ldlm/services/ldlm_canceld/stats: Found no match
      error: get_param: /proc/{fs,sys}/{lnet,lustre}/ldlm/services/ldlm_canceld/stats: Found no match
       sanity test_120e: @@@@@@ FAIL: 1 blocking RPC occured. 
      

      Hit this issue in patch http://review.whamcloud.com/#/c/7087/, not sure if this is master problem or just the patch

      Attachments

        Issue Links

          Activity

            [LU-4421] Failure on test suite sanity test_120e: 1 blocking RPC occured

            Duplicate of LU-4206.

            adilger Andreas Dilger added a comment - Duplicate of LU-4206 .

            I've hit the same error at https://maloo.whamcloud.com/test_sets/805a8136-c27f-11e3-a886-52540035b04c in review-dne-part-1. The error message is the same, but, as you can see below, I don't get the get_param error

            ...
            ldlm.namespaces.lustre-OST0005-osc-ffff88007e7b4c00.lru_size=200
            ldlm.namespaces.lustre-OST0006-osc-ffff88007e7b4c00.lru_size=200
            ldlm.namespaces.lustre-OST0007-osc-ffff88007e7b4c00.lru_size=200
            1+0 records in
            1+0 records out
            512 bytes (512 B) copied, 0.00499895 s, 102 kB/s
            1+0 records in
            1+0 records out
            512 bytes (512 B) copied, 0.00506382 s, 101 kB/s
            CMD: client-32vm3 /usr/sbin/lctl get_param -n ldlm.services.ldlm_canceld.stats
            CMD: client-32vm3 /usr/sbin/lctl get_param -n ldlm.services.ldlm_canceld.stats
             sanity test_120e: @@@@@@ FAIL: 1 blocking RPC occured. 
            

            Looking at the two cases above https://maloo.whamcloud.com/test_sets/a9708b10-b799-11e3-97ab-52540035b04c and https://maloo.whamcloud.com/test_sets/42cc4338-b92c-11e3-a578-52540035b04c, there is no get_param error in those logs. So, I think the get_param error was corrected in LU-4747, but there still an issue with the blocking RPC.

            Should we close this ticket and open a new one with the slightly modified client test log?

            jamesanunez James Nunez (Inactive) added a comment - I've hit the same error at https://maloo.whamcloud.com/test_sets/805a8136-c27f-11e3-a886-52540035b04c in review-dne-part-1. The error message is the same, but, as you can see below, I don't get the get_param error ... ldlm.namespaces.lustre-OST0005-osc-ffff88007e7b4c00.lru_size=200 ldlm.namespaces.lustre-OST0006-osc-ffff88007e7b4c00.lru_size=200 ldlm.namespaces.lustre-OST0007-osc-ffff88007e7b4c00.lru_size=200 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.00499895 s, 102 kB/s 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.00506382 s, 101 kB/s CMD: client-32vm3 /usr/sbin/lctl get_param -n ldlm.services.ldlm_canceld.stats CMD: client-32vm3 /usr/sbin/lctl get_param -n ldlm.services.ldlm_canceld.stats sanity test_120e: @@@@@@ FAIL: 1 blocking RPC occured. Looking at the two cases above https://maloo.whamcloud.com/test_sets/a9708b10-b799-11e3-97ab-52540035b04c and https://maloo.whamcloud.com/test_sets/42cc4338-b92c-11e3-a578-52540035b04c , there is no get_param error in those logs. So, I think the get_param error was corrected in LU-4747 , but there still an issue with the blocking RPC. Should we close this ticket and open a new one with the slightly modified client test log?
            bogl Bob Glossman (Inactive) added a comment - another: https://maloo.whamcloud.com/test_sets/42cc4338-b92c-11e3-a578-52540035b04c
            spimpale Swapnil Pimpale (Inactive) added a comment - Another instance where this bug was hit: https://maloo.whamcloud.com/test_sets/a9708b10-b799-11e3-97ab-52540035b04c

            Fan Yong,
            Can you look into this one and comment?

            jlevi Jodi Levi (Inactive) added a comment - Fan Yong, Can you look into this one and comment?

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: