Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10949

LBUG lu_object.h:688:lu_object_get()) ASSERTION( atomic_read(&o->lo_header->loh_ref) > 0 ) failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.13.0, Lustre 2.12.6
    • Lustre 2.12.0, Lustre 2.10.4
    • Soak cluster - lustre-master-ib build 81
    • 3
    • 9223372036854775807

    Description

      This also occurred under 2.10.4, but I missed it due to some syslog issues.
      System failed over soak-8 (MDS) to soak-9, upon failback, soak-9 crashed.
      vmcore available.

      [67387.761484] LustreError: 167-0: soaked-MDT0000-osp-MDT0001: This client was evicted by soaked-MDT0000; in progress operations using this service will fail.
      [67387.810434] LustreError: 8279:0:(mdt_reint.c:2181:mdt_reint_rename_or_migrate()) soaked-MDT0001: can't lock FS for rename: rc  = -5
      [67387.810699] LNet: Service thread pid 8242 completed after 336.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      [67387.810700] LNet: Service thread pid 8615 completed after 335.77s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      [67387.811812] LustreError: 8615:0:(lu_object.h:688:lu_object_get()) ASSERTION( atomic_read(&o->lo_header->loh_ref) > 0 ) failed:
      [67387.811814] LustreError: 8615:0:(lu_object.h:688:lu_object_get()) LBUG
      [67387.811815] Pid: 8615, comm: mdt01_014
      [67387.811816]
      Call Trace:
      [67387.811831]  [<ffffffffc0a567ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
      [67387.811837]  [<ffffffffc0a5683c>] lbug_with_loc+0x4c/0xb0 [libcfs]
      [67387.811847] LustreError: 8692:0:(lu_object.h:688:lu_object_get()) ASSERTION( atomic_read(&o->lo_header->loh_ref) > 0 ) failed:
      [67387.811850] LustreError: 8692:0:(lu_object.h:688:lu_object_get()) LBUG
      [67387.811851] Pid: 8692, comm: mdt01_017
      [67387.811851]
      Call Trace:
      [67387.811860]  [<ffffffffc12527c9>] err_serious.part.38+0x0/0x36 [mdt]
      [67387.811861]  [<ffffffffc0a567ae>] libcfs_call_trace+0x4e/0x60 [libcfs]
      [67387.811867]  [<ffffffffc0a5683c>] lbug_with_loc+0x4c/0xb0 [libcfs]
      [67387.811873]  [<ffffffffc1204146>] mdt_remote_object_lock_try+0x736/0x770 [mdt]
      [67387.811884]  [<ffffffffc12527c9>] err_serious.part.38+0x0/0x36 [mdt]
      [67387.811895]  [<ffffffffc1204146>] mdt_remote_object_lock_try+0x736/0x770 [mdt]
      [67387.811901]  [<ffffffffc0b74be4>] ? lprocfs_stats_lock+0x24/0xd0 [obdclass]
      [67387.811912]  [<ffffffffc12041aa>] mdt_remote_object_lock+0x2a/0x30 [mdt]
      [67387.811918]  [<ffffffffc0b74be4>] ? lprocfs_stats_lock+0x24/0xd0 [obdclass]
      [67387.811927]  [<ffffffffc122ccf9>] mdt_reint_open+0x2769/0x3270 [mdt]
      [67387.811928]  [<ffffffffc12041aa>] mdt_remote_object_lock+0x2a/0x30 [mdt]
      [67387.811940]  [<ffffffffc122ccf9>] mdt_reint_open+0x2769/0x3270 [mdt]
      [67387.811949]  [<ffffffffc0baae51>] ? upcall_cache_get_entry+0x211/0x8e0 [obdclass]
      [67387.811965]  [<ffffffffc0baae51>] ? upcall_cache_get_entry+0x211/0x8e0 [obdclass]
      [67387.811968]  [<ffffffffc0bb0b2e>] ? lu_ucred+0x1e/0x30 [obdclass]
      [67387.811980]  [<ffffffffc120fd35>] ? mdt_ucred+0x15/0x20 [mdt]
      [67387.811984]  [<ffffffffc0bb0b2e>] ? lu_ucred+0x1e/0x30 [obdclass]
      [67387.811990]  [<ffffffffc12105c1>] ? mdt_root_squash+0x21/0x430 [mdt]
      [67387.811995]  [<ffffffffc120fd35>] ? mdt_ucred+0x15/0x20 [mdt]
      [67387.812002]  [<ffffffffc12208e3>] mdt_reint_rec+0x83/0x210 [mdt]  
      [67387.812006]  [<ffffffffc12105c1>] ? mdt_root_squash+0x21/0x430 [mdt]
      [67387.812012]  [<ffffffffc120020b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
      [67387.812017]  [<ffffffffc12208e3>] mdt_reint_rec+0x83/0x210 [mdt]
      [67387.812022]  [<ffffffffc120c787>] mdt_intent_reint+0x157/0x420 [mdt]
      [67387.812027]  [<ffffffffc120020b>] mdt_reint_internal+0x5fb/0x9c0 [mdt]
      [67387.812032]  [<ffffffffc1203365>] mdt_intent_opc+0x455/0xae0 [mdt]
      [67387.812036]  [<ffffffffc120c787>] mdt_intent_reint+0x157/0x420 [mdt]
      [67387.812046]  [<ffffffffc1203365>] mdt_intent_opc+0x455/0xae0 [mdt]
      [67387.812090]  [<ffffffffc0dd3920>] ? lustre_swab_ldlm_intent+0x0/0x20 [ptlrpc]
      [67387.812099]  [<ffffffffc0dd3920>] ? lustre_swab_ldlm_intent+0x0/0x20 [ptlrpc]
      [67387.812102]  [<ffffffffc120afb3>] mdt_intent_policy+0x1a3/0x360 [mdt]
      [67387.812111]  [<ffffffffc120afb3>] mdt_intent_policy+0x1a3/0x360 [mdt]
      [67387.812127]  [<ffffffffc0d83f1e>] ldlm_lock_enqueue+0x34e/0xa50 [ptlrpc]
      [67387.812136]  [<ffffffffc0a6a63e>] ? cfs_hash_add+0xbe/0x1a0 [libcfs]
      [67387.812137]  [<ffffffffc0d83f1e>] ldlm_lock_enqueue+0x34e/0xa50 [ptlrpc]
      [67387.812145]  [<ffffffffc0a6a63e>] ? cfs_hash_add+0xbe/0x1a0 [libcfs]
      [67387.812164]  [<ffffffffc0dabb13>] ldlm_handle_enqueue0+0x8f3/0x1410 [ptlrpc]
      [67387.812173]  [<ffffffffc0dabb13>] ldlm_handle_enqueue0+0x8f3/0x1410 [ptlrpc]
      [67387.812198]  [<ffffffffc0dd39a0>] ? lustre_swab_ldlm_request+0x0/0x30 [ptlrpc]
      [67387.812204]  [<ffffffffc0dd39a0>] ? lustre_swab_ldlm_request+0x0/0x30 [ptlrpc]
      [67387.812238]  [<ffffffffc0e33392>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [67387.812238]  [<ffffffffc0e33392>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [67387.812245]  [<ffffffffc0e33392>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [67387.812288]  [<ffffffffc0e3994a>] tgt_request_handle+0x92a/0x13b0 [ptlrpc]
      [67387.812294]  [<ffffffffc0e3994a>] tgt_request_handle+0x92a/0x13b0 [ptlrpc]
      [67387.812322]  [<ffffffffc0ddda53>] ptlrpc_server_handle_request+0x253/0xab0 [ptlrpc]
      [67387.812330]  [<ffffffffc0ddda53>] ptlrpc_server_handle_request+0x253/0xab0 [ptlrpc]
      [67387.812355]  [<ffffffffc0ddb5d8>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
      [67387.812361]  [<ffffffff810c7c82>] ? default_wake_function+0x12/0x20
      [67387.812364]  [<ffffffff810bdc4b>] ? __wake_up_common+0x5b/0x90
      [67387.812366]  [<ffffffffc0ddb5d8>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
      [67387.812370]  [<ffffffff810c7c82>] ? default_wake_function+0x12/0x20
      [67387.812372]  [<ffffffff810bdc4b>] ? __wake_up_common+0x5b/0x90
      [67387.812397]  [<ffffffffc0de1202>] ptlrpc_main+0xa92/0x1f40 [ptlrpc]
      [67387.812405]  [<ffffffffc0de1202>] ptlrpc_main+0xa92/0x1f40 [ptlrpc]
      [67387.812429]  [<ffffffffc0de0770>] ? ptlrpc_main+0x0/0x1f40 [ptlrpc]
      [67387.812433]  [<ffffffff810b4031>] kthread+0xd1/0xe0
      [67387.812435]  [<ffffffff810b3f60>] ? kthread+0x0/0xe0
      [67387.812439]  [<ffffffff816c0577>] ret_from_fork+0x77/0xb0
      [67387.812440]  [<ffffffffc0de0770>] ? ptlrpc_main+0x0/0x1f40 [ptlrpc]
      [67387.812441]  [<ffffffff810b3f60>] ? kthread+0x0/0xe0
      [67387.812442]
      [67387.812443]  [<ffffffff810b4031>] kthread+0xd1/0xe0
      [67387.812443] Kernel panic - not syncing: LBUG
      [67387.812446] CPU: 25 PID: 8615 Comm: mdt01_014 Tainted: P        W  OE  ------------   3.10.0-693.21.1.el7_lustre.x86_64 #1
      [67387.812447]  [<ffffffff810b3f60>] ? kthread+0x0/0xe0
      [67387.812448] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
      [67387.812449] Call Trace:
      [67387.812450]  [<ffffffff816c0577>] ret_from_fork+0x77/0xb0
      [67387.812468]  [<ffffffff816ae7c8>] dump_stack+0x19/0x1b
      [67387.812469]  [<ffffffff810b3f60>] ? kthread+0x0/0xe0
      [67387.812471]  [<ffffffff816a8634>] panic+0xe8/0x21f
      [67387.812472]
      [67387.812478]  [<ffffffffc0a56854>] lbug_with_loc+0x64/0xb0 [libcfs]
      [67387.812492]  [<ffffffffc12527c9>] lu_object_get.isra.33.part.34+0x36/0x36 [mdt]
      [67387.812502]  [<ffffffffc1204146>] mdt_remote_object_lock_try+0x736/0x770 [mdt]
      [67387.812521]  [<ffffffffc0b74be4>] ? lprocfs_stats_lock+0x24/0xd0 [obdclass]
      [67387.812531]  [<ffffffffc12041aa>] mdt_remote_object_lock+0x2a/0x30 [mdt]
      [67387.812543]  [<ffffffffc122ccf9>] mdt_reint_open+0x2769/0x3270 [mdt]
      [67387.812563]  [<ffffffffc0baae51>] ? upcall_cache_get_entry+0x211/0x8e0 [obdclass]
      [67387.812582]  [<ffffffffc0bb0b2e>] ? lu_ucred+0x1e/0x30 [obdclass]
      [67387.812592]  [<ffffffffc120fd35>] ? mdt_ucred+0x15/0x20 [mdt]
      .....
      

      Full vmcore at /scratch/dumps/soak-9.spirit.hpdd.intel.com/10.10.1.109-2018-04-24-18:20:36

      Attachments

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: