Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14203

sanityn test 40a fails with 'parallel operation is blocked'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.12.6
    • None
    • 3
    • 9223372036854775807

    Description

      sanityn test_40a fails with 'parallel operation is blocked'.

      We’ve seen this failure three times :
      18 AUG 2020 DNE/ZFS 2.12.5.20 for ticket/patch LU-13471/39576 - https://testing.whamcloud.com/test_sets/15350d90-9f69-4900-b036-9fc3f73140e5
      07 NOV 2020 DNE/ZFS 2.12.5.83 for branch testing - https://testing.whamcloud.com/test_sets/bbd2f426-8f1b-4e0d-a27f-951e75976ab6
      07 DEC 2020 DNE/ldiskfs 2.12.6 RC2 branch testing - https://testing.whamcloud.com/test_sets/bc0cbb1c-6e57-40df-9eef-47d8b7b8962d

      Looking at the failure for RC2 testing, the suite_log shows the output for the test

      == sanityn test 40a: pdirops: create vs others ======================================================= 17:18:20 (1607361500)
      CMD: trevis-52vm5,trevis-52vm6 /usr/sbin/lctl set_param -n ldlm.namespaces.*mdt*.lru_size=clear
      CMD: trevis-52vm5,trevis-52vm6 /usr/sbin/lctl get_param ldlm.namespaces.*mdt*.lock_unused_count ldlm.namespaces.*mdt*.lock_count
      ldlm.namespaces.mdt-lustre-MDT0000_UUID.lock_count=57
      ldlm.namespaces.mdt-lustre-MDT0001_UUID.lock_count=1
      CMD: trevis-52vm5 lctl set_param fail_loc=0x80000145
      fail_loc=0x80000145
      No conflict
      No conflict
      No conflict
      No conflict
      No conflict
      No conflict
      Conflict
       sanityn test_40a: @@@@@@ FAIL: parallel operation is blocked 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:5907:error()
        = /usr/lib64/lustre/tests/sanityn.sh:1517:test_40a()
      

      Although we see these ‘errors’ in the MDS console when 40a passes, this is the only interesting messages in the console logs. Looking at the console log for MDS1/3 (vm5), we see

      [31916.082787] Lustre: DEBUG MARKER: == sanityn test 40a: pdirops: create vs others ======================================================= 17:18:20 (1607361500)
      [31916.531769] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param -n ldlm.namespaces.*mdt*.lru_size=clear
      [31917.009765] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param ldlm.namespaces.*mdt*.lock_unused_count ldlm.namespaces.*mdt*.lock_count
      [31917.449680] Lustre: DEBUG MARKER: lctl set_param fail_loc=0x80000145
      [31917.647130] LustreError: 8427:0:(fail.c:129:__cfs_fail_timeout_set()) cfs_fail_timeout id 145 sleeping for 15000ms
      [31917.648962] LustreError: 8427:0:(fail.c:129:__cfs_fail_timeout_set()) Skipped 4 previous similar messages
      [31932.650062] LustreError: 8427:0:(fail.c:133:__cfs_fail_timeout_set()) cfs_fail_timeout id 145 awake
      [31932.651716] LustreError: 8427:0:(fail.c:133:__cfs_fail_timeout_set()) Skipped 4 previous similar messages
      [31933.827926] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanityn test_40a: @@@@@@ FAIL: parallel operation is blocked 
      [31934.090878] Lustre: DEBUG MARKER: sanityn test_40a: @@@@@@ FAIL: parallel operation is blocked
      

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: