Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1592

ASSERTION(cfs_atomic_read(&imp->imp_refcount) == 0) failed: value: -1

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.3.0, Lustre 2.1.2
    • None
    • 2.1.2 servers and clients.
    • 2
    • 4468

    Description

      We had three mds/mgs crashed within an hour.
      The service110 crashed in class_import_destroy() while service150 and service170 crashed in class_import_put(). The class_import_destroy() failed on an ASSERT because the refcount was -1. The other two cases look like ORI-710.

      Unfortunately, we were not able to get a vmcore. Kdump crashed.

      Service110:
      Lustre: nbp4-MDT0000: Export ffff8806560aa400 already connecting from 10.151.5.8@o2ib^M
      Lustre: nbp4-MDT0000: denying duplicate export for 81811d25-ee59-5ea0-fbaf-31ee49f5aeb7, -114^M
      Lustre: Skipped 1 previous similar message^M
      LustreError: 1439:0:(o2iblnd_cb.c:2615:kiblnd_rejected()) 10.151.46.203@o2ib rejected: o2iblnd fatal error^M
      LustreError: 1439:0:(o2iblnd_cb.c:2615:kiblnd_rejected()) Skipped 7 previous similar messages^M
      LustreError: 4621:0:(mgs_handler.c:782:mgs_handle()) MGS handle cmd=250 rc=-114^M
      Lustre: nbp4-MDT0000: denying duplicate export for 000566b2-2e24-9e9d-b38c-24016bb34ecd, -114^M
      Lustre: nbp3-MDT0000: Export ffff880756260c00 already connecting from 10.151.4.136@o2ib^M
      Lustre: Skipped 3 previous similar messages^M
      LustreError: 1439:0:(o2iblnd_cb.c:2615:kiblnd_rejected()) 10.151.17.141@o2ib rejected: o2iblnd fatal error^M
      LustreError: 1439:0:(o2iblnd_cb.c:2615:kiblnd_rejected()) Skipped 6 previous similar messages^M
      LustreError: 4650:0:(mgs_handler.c:782:mgs_handle()) MGS handle cmd=250 rc=-114^M
      LustreError: 3541:0:(genops.c:930:class_import_destroy()) ASSERTION(cfs_atomic_read(&imp->imp_refcount) == 0) failed: value: -1^M
      Lustre: nbp4-MDT0000: denying duplicate export for b7f2dcde-c1be-3701-2f14-fec0e7c1b513, -114^M
      LustreError: 3541:0:(genops.c:930:class_import_destroy()) LBUG^M
      Pid: 3541, comm: obd_zombid^M

      We were not able to product a vmcore. Kdump crashed.

      Service150 (and also service170):
      (The crash on both MDS looks like ORI-710)
      Lustre: 3437:0:(client.c:1778:ptlrpc_expire_one_request()) @@@ Request x1404442415660130 sent from nbp5-OST002c-osc-MDT0000 to NID 10.151.25.241@o2ib has timed out for sent delay: [sent 1341262190] [real_sent 0] [current 1341262295] [deadline 105s] [delay 0s] req@ffff880549250800 x1404442415660130/t0(0) o-1->nbp5-OST002c_UUID@10.151.25.241@o2ib:28/4 lens 368/512 e 0 to 1 dl 1341262295 ref 2 fl Rpc:XN/ffffffff/ffffffff rc 0/-1^M
      Lustre: 3437:0:(client.c:1778:ptlrpc_expire_one_request()) Skipped 599 previous similar messages^M
      Lustre: nbp5-MDT0000: haven't heard from client b04675d7-083f-bc2b-0fa5-2863afc271db (at 10.151.41.133@o2ib) in 279 seconds. I think it's dead, and I am evicting it. exp ffff880bfaff3800, cur 1341262407 expire 1341262257 last 1341262128^M
      Lustre: Skipped 43832 previous similar messages^M
      LustreError: 1812:0:(o2iblnd_cb.c:2613:kiblnd_rejected()) 10.151.13.180@o2ib rejected: o2iblnd fatal error^M
      LustreError: 1812:0:(o2iblnd_cb.c:2613:kiblnd_rejected()) Skipped 3013 previous similar messages^M
      LustreError: 3694:0:(genops.c:934:class_import_put()) ASSERTION(__v > 0 && __v < ((int)0x5a5a5a5a5a5a5a5a)) failed: value: 0^M
      LustreError: 12396:0:(ldlm_lib.c:965:target_handle_connect()) ee0eaddd-4f30-488b-720c-5ffbddbd6ae9: 10.151.25.237@o2ib already connected at higher conn_cnt: 8 > 6^M
      LustreError: 12389:0:(ldlm_lib.c:965:target_handle_connect()) ee0eaddd-4f30-488b-720c-5ffbddbd6ae9: 10.151.25.237@o2ib already connected at higher conn_cnt: 8 > 7^M
      LustreError: 12396:0:(mgs_handler.c:783:mgs_handle()) MGS handle cmd=250 rc=-114^M
      LustreError: 12396:0:(mgs_handler.c:783:mgs_handle()) Skipped 1 previous similar message^M
      LustreError: 3694:0:(genops.c:934:class_import_put()) LBUG^M
      Pid: 3694, comm: ll_mgs_01^M

      Attachments

        Activity

          People

            bobijam Zhenyu Xu
            jaylan Jay Lan (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: