Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7975

"(lod_object.c:700:lod_ah_init()) ASSERTION( lc->ldo_stripenr == 0 )" LBUG/Assert on MDS

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.9.0
    • None
    • 3
    • 9223372036854775807

    Description

      A site has encountered multiple crashes with same signature/stack+msgs following :

      LustreError: 89879:0:(osp_precreate.c:1222:osp_object_truncate()) can't punch object: -11
      Lustre: composit-OST0009-osc-MDT0000: Connection to composit-OST0009 (at 10.0.14.31@o2ib) was lost; in progress operations using this service will wait for recovery to complete
      LustreError: 89879:0:(lod_object.c:700:lod_ah_init()) ASSERTION( lc->ldo_stripenr == 0 ) failed: 
      LustreError: 89879:0:(lod_object.c:700:lod_ah_init()) LBUG
      Pid: 89879, comm: mdt01_006
      
      Call Trace:
       [<ffffffffa057e895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
       [<ffffffffa057ee97>] lbug_with_loc+0x47/0xb0 [libcfs]
       [<ffffffffa266c0af>] lod_ah_init+0x58f/0x5d0 [lod]
       [<ffffffffa26c7ad3>] mdd_object_make_hint+0x83/0xa0 [mdd]
       [<ffffffffa26d4502>] mdd_create_data+0x332/0x7d0 [mdd]
       [<ffffffffa25a93f0>] mdt_finish_open+0x1350/0x19a0 [mdt]
       [<ffffffffa257e5f4>] ? mdt_object_lock+0x14/0x20 [mdt]
       [<ffffffffa25a9fbd>] mdt_open_by_fid_lock+0x57d/0x910 [mdt]
       [<ffffffffa25aabac>] mdt_reint_open+0x56c/0x21a0 [mdt]
       [<ffffffffa059b14c>] ? upcall_cache_get_entry+0x29c/0x890 [libcfs]
       [<ffffffffa0983930>] ? lu_ucred+0x20/0x30 [obdclass]
       [<ffffffffa2572945>] ? mdt_ucred+0x15/0x20 [mdt]
       [<ffffffffa258f8ec>] ? mdt_root_squash+0x2c/0x410 [mdt]
       [<ffffffffa123bad6>] ? __req_capsule_get+0x166/0x710 [ptlrpc]
       [<ffffffffa2593ab1>] mdt_reint_rec+0x41/0xe0 [mdt]
       [<ffffffffa2578f83>] mdt_reint_internal+0x4c3/0x780 [mdt]
       [<ffffffffa257950e>] mdt_intent_reint+0x1ee/0x520 [mdt]
       [<ffffffffa2576cee>] mdt_intent_policy+0x3ae/0x770 [mdt]
       [<ffffffffa11ca2f5>] ldlm_lock_enqueue+0x135/0x980 [ptlrpc]
       [<ffffffffa11f43fb>] ldlm_handle_enqueue0+0x51b/0x10c0 [ptlrpc]
       [<ffffffffa25771b6>] mdt_enqueue+0x46/0xe0 [mdt]
       [<ffffffffa257c84a>] mdt_handle_common+0x52a/0x1470 [mdt]
       [<ffffffffa25b98f5>] mds_regular_handle+0x15/0x20 [mdt]
       [<ffffffffa12238d5>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc]
       [<ffffffffa05904fa>] ? lc_watchdog_touch+0x7a/0x190 [libcfs]
       [<ffffffffa121c289>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc]
       [<ffffffff81057849>] ? __wake_up_common+0x59/0x90
       [<ffffffffa122605d>] ptlrpc_main+0xaed/0x1780 [ptlrpc]
       [<ffffffffa1225570>] ? ptlrpc_main+0x0/0x1780 [ptlrpc]
       [<ffffffff8109e78e>] kthread+0x9e/0xc0
       [<ffffffff8100c28a>] child_rip+0xa/0x20
       [<ffffffff8109e6f0>] ? kthread+0x0/0xc0
       [<ffffffff8100c280>] ? child_rip+0x0/0x20
      

      According to existing tickets, I have found that this kind of problem has already (partially?) been addressed in LU-4260, LU-4791 and LU-5346 tickets.
      And since both fixes for LU-4260 and LU-4791 are already integrated, this means that we encounter a new situation/problem during OST objects pre-creation, likely to be caused by some specific file meta-data pattern (I have identified as "deferred layout" feature usage with open(, ...|O_LOV_DELAY_CREATE|...,) along with a non-0 truncate() to trigger objects preallocation), leading to trigger a similar case than described in LU-5346 upon error return path that is still not fixed.

      BTW, I have also determined that these MDT assert always occurs just following an OSS crash, hence the -EAGAIN/EWOULDBLOCK error in "(osp_precreate.c:1222:osp_object_truncate()) can't punch object: -11" msg just preceding the assert !

      Attachments

        Issue Links

          Activity

            People

              bfaccini Bruno Faccini (Inactive)
              bfaccini Bruno Faccini (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: