Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15893

replay-dual test_30: ldlm_cli_cancel on converting lock

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for John Hammond <jhammond@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/95e574e7-8b99-41dc-9287-a2ac4e2d5965

      test_30 failed with the following error:

      onyx-44vm3 crashed during replay-dual test_30
      

      This is a failure of LASSERT(!ldlm_is_converting(lock)) in ldlm_cli_cleanup().

      [ 4806.105733] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == replay-dual test 30: layout lock replay is not blocked on IO ========================================================== 18:12:34 \(1653502354\)
      [ 4806.511990] Lustre: DEBUG MARKER: == replay-dual test 30: layout lock replay is not blocked on IO ========================================================== 18:12:34 (1653502354)
      [ 4806.576387] LustreError: 37471:0:(fail.c:138:__cfs_fail_timeout_set()) cfs_fail_timeout id 32e sleeping for 4000ms
      [ 4810.637737] LustreError: 37471:0:(fail.c:149:__cfs_fail_timeout_set()) cfs_fail_timeout id 32e awake
      [ 4810.640700] LustreError: 11-0: lustre-MDT0000-mdc-ffff98bd03d49800: operation ost_write to node 10.240.23.231@tcp failed: rc = -107
      [ 4826.632462] LustreError: 166-1: MGC10.240.23.231@tcp: Connection to MGS (at 10.240.23.231@tcp) was lost; in progress operations using this service will fail
      [ 4826.635829] LustreError: Skipped 2 previous similar messages
      [ 4826.638669] Lustre: Evicted from MGS (at 10.240.23.231@tcp) after server handle changed from 0xbde5217ab8283ded to 0xbde5217ab829a8b2
      [ 4826.640966] Lustre: Skipped 2 previous similar messages
      [ 4826.660227] LustreError: 130333:0:(fail.c:138:__cfs_fail_timeout_set()) cfs_fail_timeout id 32e sleeping for 4000ms
      [ 4826.662942] LustreError: 130333:0:(fail.c:138:__cfs_fail_timeout_set()) Skipped 1 previous similar message
      [ 4830.716829] LustreError: 130334:0:(fail.c:149:__cfs_fail_timeout_set()) cfs_fail_timeout id 32e awake
      [ 4830.737968] LustreError: 8013:0:(import.c:701:ptlrpc_connect_import_locked()) already connecting
      [ 4830.740450] LustreError: 167-0: lustre-MDT0000-mdc-ffff98bd03d49800: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
      [ 4830.743247] LustreError: Skipped 1 previous similar message
      [ 4830.744635] Lustre: 8017:0:(llite_lib.c:3512:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.240.23.231@tcp:/lustre/fid: [0x200012511:0x162:0x0]// may get corrupted (rc -5)
      [ 4830.748634] LustreError: 130392:0:(ldlm_request.c:1546:ldlm_cli_cancel()) ASSERTION( !((((lock))->l_flags & (1ULL << 25)) != 0) ) failed: 
      [ 4830.750999] LustreError: 130392:0:(ldlm_request.c:1546:ldlm_cli_cancel()) LBUG
      [ 4830.752406] Pid: 130392, comm: ll_imp_inval 4.18.0-348.2.1.el8_5.x86_64 #1 SMP Tue Nov 16 14:42:35 UTC 2021
      [ 4830.754255] Call Trace TBD:
      [ 4830.755020] [<0>] libcfs_call_trace+0x6f/0x90 [libcfs]
      [ 4830.756060] [<0>] lbug_with_loc+0x43/0x80 [libcfs]
      [ 4830.757298] [<0>] ldlm_cli_cancel+0x245/0x510 [ptlrpc]
      [ 4830.758367] [<0>] cleanup_resource+0x132/0x310 [ptlrpc]
      [ 4830.759447] [<0>] ldlm_resource_clean+0x30/0x50 [ptlrpc]
      [ 4830.760514] [<0>] cfs_hash_for_each_relax+0x253/0x450 [libcfs]
      [ 4830.761669] [<0>] cfs_hash_for_each_nolock+0x11b/0x1f0 [libcfs]
      [ 4830.762848] [<0>] ldlm_namespace_cleanup+0x2b/0xb0 [ptlrpc]
      [ 4830.763999] [<0>] mdc_import_event+0x32d/0xcf0 [mdc]
      [ 4830.765009] [<0>] ptlrpc_invalidate_import+0x28d/0x9f0 [ptlrpc]
      [ 4830.766218] [<0>] ptlrpc_invalidate_import_thread+0x6d/0x260 [ptlrpc]
      [ 4830.767522] [<0>] kthread+0x116/0x130
      [ 4830.768304] [<0>] ret_from_fork+0x35/0x40
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      replay-dual test_30 - onyx-44vm3 crashed during replay-dual test_30

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: