Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4844

ost_prolong_lock_one()) ASSERTION( lock->l_export == opd->opd_exp )

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • None
    • None
    • 3
    • 13347

    Description

      While testing Lustre 2.4.0-28chaos (see github.com/chaos/lustre) on an ldiskfs filesystem, we hit the following assertion on an OSS:

      2014-03-30 10:14:22 Lustre: lc2-OST000b: deleting orphan objects from 0x0:70377775 to 0x0:70380721
      2014-03-30 10:14:22 Lustre: lc2-OST000b: Recovery over after 1:45, of 143 clients 143 recovered and 0 were evicted.
      2014-03-30 10:20:28 LNetError: 2672:0:(o2iblnd_cb.c:2635:kiblnd_rejected()) 10.1.1.161@o2ib9 rejected: o2iblnd fatal error
      2014-03-30 10:20:28 LNetError: 2672:0:(o2iblnd_cb.c:2635:kiblnd_rejected()) Skipped 19 previous similar messages
      2014-03-30 10:21:03 LustreError: 0:0:(ldlm_lockd.c:403:waiting_locks_callback()) ### lock callback timer expired after 150s: evicting client at 192.168.121.90@o2i
      2014-03-30 10:21:06 LustreError: 18813:0:(client.c:1049:ptlrpc_import_delay_req()) @@@ IMP_CLOSED   req@ffff88018e191400 x1463232807590408/t0(0) o104->lc2-OST0017
      2014-03-30 10:21:06 LustreError: 18813:0:(client.c:1049:ptlrpc_import_delay_req()) Skipped 8 previous similar messages
      2014-03-30 10:21:06 LustreError: 18813:0:(ldlm_lockd.c:736:ldlm_handle_ast_error()) ### client (nid 192.168.121.90@o2ib2) returned 0 from blocking AST ns: filter-
      2014-03-30 10:21:06 LustreError: 18813:0:(ldlm_lockd.c:736:ldlm_handle_ast_error()) Skipped 1 previous similar message
      2014-03-30 10:21:07 LustreError: 18937:0:(ldlm_lib.c:2734:target_bulk_io()) @@@ bulk GET failed: rc -107  req@ffff8801097bbc00 x1463677594325660/t0(0) o4->e5fffb3
      2014-03-30 10:21:07 Lustre: lc2-OST0017: Bulk IO write error with e5fffb36-4dc9-0a2e-f74b-66de9283e46f (at 192.168.121.90@o2ib2), client will retry: rc -107
      2014-03-30 10:21:07 Lustre: Skipped 1 previous similar message
      2014-03-30 10:21:07 LustreError: 18937:0:(ldlm_lib.c:2734:target_bulk_io()) Skipped 1 previous similar message
      2014-03-30 10:21:09 LustreError: 5121:0:(ldlm_lib.c:2734:target_bulk_io()) @@@ bulk GET failed: rc -107  req@ffff880053a75c00 x1463677594328176/t0(0) o4->e5fffb36
      2014-03-30 10:21:09 Lustre: lc2-OST0017: Bulk IO write error with e5fffb36-4dc9-0a2e-f74b-66de9283e46f (at 192.168.121.90@o2ib2), client will retry: rc -107
      2014-03-30 10:21:09 Lustre: Skipped 1 previous similar message
      2014-03-30 10:21:13 LustreError: 18877:0:(ldlm_lib.c:2734:target_bulk_io()) @@@ bulk GET failed: rc -107  req@ffff8800341cdc00 x1463677594328180/t0(0) o4->e5fffb3
      2014-03-30 10:21:13 Lustre: lc2-OST0017: Bulk IO write error with e5fffb36-4dc9-0a2e-f74b-66de9283e46f (at 192.168.121.90@o2ib2), client will retry: rc -107
      2014-03-30 10:21:13 LustreError: 18877:0:(ldlm_lib.c:2734:target_bulk_io()) Skipped 3 previous similar messages
      2014-03-30 10:21:17 LustreError: 18964:0:(ldlm_lib.c:2734:target_bulk_io()) @@@ bulk GET failed: rc -107  req@ffff88006e5a7400 x1463677594328172/t0(0) o4->e5fffb3
      2014-03-30 10:21:17 Lustre: lc2-OST0017: Bulk IO write error with e5fffb36-4dc9-0a2e-f74b-66de9283e46f (at 192.168.121.90@o2ib2), client will retry: rc -107
      2014-03-30 10:21:17 Lustre: Skipped 3 previous similar messages
      2014-03-30 10:21:17 LustreError: 18829:0:(ost_handler.c:1909:ost_prolong_lock_one()) ASSERTION( lock->l_export == opd->opd_exp ) failed: 
      2014-03-30 10:21:17 LustreError: 18834:0:(ost_handler.c:1909:ost_prolong_lock_one()) ASSERTION( lock->l_export == opd->opd_exp ) failed: 
      2014-03-30 10:21:17 LustreError: 18834:0:(ost_handler.c:1909:ost_prolong_lock_one()) LBUG
      2014-03-30 10:21:17 Pid: 18834, comm: ll_ost_io00_019
      2014-03-30 10:21:17 
      2014-03-30 10:21:17 Call Trace:
      2014-03-30 10:21:17  [<ffffffffa032d8f5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
      2014-03-30 10:21:17  [<ffffffffa032def7>] lbug_with_loc+0x47/0xb0 [libcfs]
      2014-03-30 10:21:17  [<ffffffffa0f473b7>] ost_prolong_lock_one+0xe7/0x170 [ost]
      2014-03-30 10:21:17  [<ffffffffa07bf579>] ? __ldlm_handle2lock+0x39/0x320 [ptlrpc]
      2014-03-30 10:21:17  [<ffffffffa0f474dc>] ost_prolong_locks+0x9c/0x340 [ost]
      2014-03-30 10:21:17  [<ffffffffa0f4caab>] ost_rw_hpreq_check+0x25b/0x500 [ost]
      2014-03-30 10:21:17  [<ffffffffa080d620>] ? lustre_swab_niobuf_remote+0x0/0x30 [ptlrpc]
      2014-03-30 10:21:17  [<ffffffffa081cb53>] ptlrpc_main+0x1113/0x1700 [ptlrpc]
      2014-03-30 10:21:17  [<ffffffffa081ba40>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      2014-03-30 10:21:17  [<ffffffff8100c10a>] child_rip+0xa/0x20
      2014-03-30 10:21:17  [<ffffffffa081ba40>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      2014-03-30 10:21:17  [<ffffffffa081ba40>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
      2014-03-30 10:21:17  [<ffffffff8100c100>] ? child_rip+0x0/0x20
      

      It looks like this has been seen in the past by multiple people in LU-2232, but that ticket was closed without a resolution.

      Attachments

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              morrone Christopher Morrone (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: