Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11294

MDS kernel panic, LNetError: 4307:0:(lib-msg.c:594:lnet_incr_hstats()) LBUG

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.12.0
    • lustre-master-ib build #120 lustre version: 2.11.54_54_ga7debf8
    • 3
    • 9223372036854775807

    Description

      MDS2 hit LBUG after recovery completed

      soak-10 console

      [  485.435828] Lustre: soaked-MDT0002: Recovery over after 0:44, of 28 clients 28 recovered and 0 were evicted.
      Aug 29 03:10:01 soak-10 TIME: Time stamp for console
      [ 3453.242721] LNetError: 4307:0:(o2iblnd_cb.c:3321:kiblnd_check_txs_locked()) Timed out tx: active_txs, 0 seconds
      [ 3453.255641] LNetError: 4307:0:(o2iblnd_cb.c:3396:kiblnd_check_conns()) Timed out RDMA with 192.168.1.109@o2ib (28): c: 6, oc: 0, rc: 8
      [ 3453.272156] Lustre: 4335:0:(client.c:2126:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1535514904/real 1535514909]  re
      q@ffff9bf8f3dfe000 x1610100967659856/t0(0) o400->soaked-MDT0001-osp-MDT0002@192.168.1.109@o2ib:24/4 lens 224/224 e 0 to 1 dl 1535514911 ref 1 fl Rpc:eXN/0/ffffff
      ff rc 0/-1
      [ 3453.309301] Lustre: soaked-MDT0001-osp-MDT0002: Connection to soaked-MDT0001 (at 192.168.1.109@o2ib) was lost; in progress operations using this service will 
      wait for recovery to complete
      [ 3454.242963] LNet: 4307:0:(o2iblnd_cb.c:3367:kiblnd_check_conns()) Timed out tx for 192.168.1.109@o2ib: 1 seconds
      [ 3651.408451] Lustre: soaked-MDT0002: haven't heard from client soaked-MDT0001-mdtlov_UUID (at 192.168.1.109@o2ib) in 227 seconds. I think it's dead, and I am e
      victing it. exp ffff9bf8fbbc7400, cur 1535515108 expire 1535514958 last 1535514881
      Aug 29 04:01:01 soak-10 TIME: Time stamp for console
      [ 7026.178219] LNet: Service thread pid 6456 was inactive for 200.53s. The thread might be hung, or it might only be slow and will resume later. Dumping the stac
      k trace for debugging purposes:
      [ 7026.200999] Pid: 6456, comm: mdt01_016 3.10.0-862.9.1.el7_lustre.x86_64 #1 SMP Fri Aug 17 20:37:05 UTC 2018
      [ 7026.213853] Call Trace:
      [ 7026.218428]  [<ffffffffc1904ab8>] osp_precreate_reserve+0x2d8/0x800 [osp]
      [ 7026.227906]  [<ffffffffc18f9bb9>] osp_declare_create+0x199/0x5b0 [osp]
      [ 7026.236955]  [<ffffffffc1844e8f>] lod_sub_declare_create+0xdf/0x210 [lod]
      [ 7026.246232]  [<ffffffffc183e62e>] lod_qos_declare_object_on+0xbe/0x3a0 [lod]
      [ 7026.255759]  [<ffffffffc184b40f>] lod_alloc_rr.constprop.18+0xc16/0xf63 [lod]
      [ 7026.265270]  [<ffffffffc1844326>] lod_qos_prep_create+0x1416/0x17a0 [lod]
      [ 7026.274312]  [<ffffffffc18448c5>] lod_prepare_create+0x215/0x2e0 [lod]
      [ 7026.283060]  [<ffffffffc18367ee>] lod_declare_striped_create+0x1ee/0x980 [lod]
      [ 7026.292518]  [<ffffffffc183aec4>] lod_declare_create+0x204/0x590 [lod]
      [ 7026.301124]  [<ffffffffc18ad892>] mdd_declare_create_object_internal+0xe2/0x2f0 [mdd]
      [ 7026.311217]  [<ffffffffc189d1c8>] mdd_declare_create+0x48/0xc10 [mdd]
      [ 7026.319652]  [<ffffffffc18a15e9>] mdd_create+0x929/0x13f0 [mdd]
      [ 7026.327460]  [<ffffffffc174ce37>] mdt_reint_open+0x2117/0x3160 [mdt]
      [ 7026.335764]  [<ffffffffc1740ce3>] mdt_reint_rec+0x83/0x210 [mdt]
      [ 7026.343611]  [<ffffffffc17201d2>] mdt_reint_internal+0x6b2/0xa80 [mdt]
      [ 7026.352048]  [<ffffffffc172c6c2>] mdt_intent_open+0x82/0x350 [mdt]
      [ 7026.360063]  [<ffffffffc172a768>] mdt_intent_policy+0x2f8/0xd10 [mdt]
      [ 7026.368365]  [<ffffffffc12c8e9e>] ldlm_lock_enqueue+0x34e/0xa50 [ptlrpc]
      [ 7026.376957]  [<ffffffffc12f1483>] ldlm_handle_enqueue0+0x903/0x1520 [ptlrpc]
      [ 7026.385918]  [<ffffffffc1377932>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [ 7026.393851]  [<ffffffffc138027a>] tgt_request_handle+0xaea/0x1580 [ptlrpc]
      [ 7026.402592]  [<ffffffffc132340b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      [ 7026.412194]  [<ffffffffc1326c44>] ptlrpc_main+0xb14/0x1fb0 [ptlrpc]
      [ 7026.420225]  [<ffffffffb98bb621>] kthread+0xd1/0xe0
      [ 7026.426693]  [<ffffffffb9f205f7>] ret_from_fork_nospec_end+0x0/0x39
      [ 7026.434746]  [<ffffffffffffffff>] 0xffffffffffffffff
      [ 7026.441364] LustreError: dumping log to /tmp/lustre-log.1535518483.6456
      [ 7081.097862] INFO: task mdt00_008:4632 blocked for more than 120 seconds.
      [ 7081.107689] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7081.118799] mdt00_008       D ffff9bfd2cb7eeb0     0  4632      2 0x00000080
      [ 7081.128951] Call Trace:
      [ 7081.133752]  [<ffffffffb98c8204>] ? __wake_up+0x44/0x50
      [ 7081.141630]  [<ffffffffb9f14029>] schedule+0x29/0x70
      [ 7081.149050]  [<ffffffffb9f15915>] rwsem_down_write_failed+0x225/0x3a0
      [ 7081.158119]  [<ffffffffb9b691bd>] ? list_del+0xd/0x30
      [ 7081.165473]  [<ffffffffb9b5b587>] call_rwsem_down_write_failed+0x17/0x30
      [ 7081.174637]  [<ffffffffb9f1319d>] down_write+0x2d/0x3d
      [ 7081.181924]  [<ffffffffc1840f77>] lod_qos_statfs_update+0x97/0x2b0 [lod]
      [ 7081.190933]  [<ffffffffc184307a>] lod_qos_prep_create+0x16a/0x17a0 [lod]
      [ 7081.199838]  [<ffffffffc1503d70>] ? qsd_op_begin+0xb0/0x4d0 [lquota]
      [ 7081.208310]  [<ffffffffc161fa98>] ? osd_declare_qid+0x1d8/0x480 [osd_ldiskfs]
      [ 7081.217533]  [<ffffffffc18448c5>] lod_prepare_create+0x215/0x2e0 [lod]
      [ 7081.226064]  [<ffffffffc18367ee>] lod_declare_striped_create+0x1ee/0x980 [lod]
      [ 7081.235279]  [<ffffffffc1844e8f>] ? lod_sub_declare_create+0xdf/0x210 [lod]
      [ 7081.244170]  [<ffffffffc183aec4>] lod_declare_create+0x204/0x590 [lod]
      [ 7081.252490]  [<ffffffffc182a013>] ? lod_striping_from_default+0x93/0x5b0 [lod]
      [ 7081.261638]  [<ffffffffc10dd969>] ? lu_context_refill+0x19/0x50 [obdclass]
      [ 7081.270296]  [<ffffffffc18ad892>] mdd_declare_create_object_internal+0xe2/0x2f0 [mdd]
      [ 7081.279987]  [<ffffffffc189d1c8>] mdd_declare_create+0x48/0xc10 [mdd]
      [ 7081.288085]  [<ffffffffc18a15e9>] mdd_create+0x929/0x13f0 [mdd]
      [ 7081.295636]  [<ffffffffc174ce37>] mdt_reint_open+0x2117/0x3160 [mdt]
      [ 7081.303618]  [<ffffffffc10f2278>] ? upcall_cache_get_entry+0x218/0x8b0 [obdclass]
      [ 7081.312849]  [<ffffffffc1740ce3>] mdt_reint_rec+0x83/0x210 [mdt]
      [ 7081.320409]  [<ffffffffc17201d2>] mdt_reint_internal+0x6b2/0xa80 [mdt]
      [ 7081.328539]  [<ffffffffc172c6c2>] mdt_intent_open+0x82/0x350 [mdt]
      [ 7081.336260]  [<ffffffffc10bc6f9>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
      [ 7081.345041]  [<ffffffffc172a768>] mdt_intent_policy+0x2f8/0xd10 [mdt]
      [ 7081.353013]  [<ffffffffc172c640>] ? mdt_intent_fixup_resent+0x220/0x220 [mdt]
      [ 7081.361816]  [<ffffffffc12c8e9e>] ldlm_lock_enqueue+0x34e/0xa50 [ptlrpc]
      [ 7081.370088]  [<ffffffffc0fa26ee>] ? cfs_hash_add+0xbe/0x1a0 [libcfs]
      [ 7081.377989]  [<ffffffffc12f1483>] ldlm_handle_enqueue0+0x903/0x1520 [ptlrpc]
      [ 7081.386676]  [<ffffffffc13192d0>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc]
      [ 7081.395867]  [<ffffffffc1377932>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [ 7081.403500]  [<ffffffffc138027a>] tgt_request_handle+0xaea/0x1580 [ptlrpc]
      [ 7081.412018]  [<ffffffffc1358291>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc]
      [ 7081.421288]  [<ffffffffc0f93bde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs]
      [ 7081.430002]  [<ffffffffc132340b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
      [ 7081.439396]  [<ffffffffc1320295>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc]
      [ 7081.447788]  [<ffffffffb98cf682>] ? default_wake_function+0x12/0x20
      [ 7081.455596]  [<ffffffffb98c52ab>] ? __wake_up_common+0x5b/0x90
      [ 7081.462945]  [<ffffffffc1326c44>] ptlrpc_main+0xb14/0x1fb0 [ptlrpc]
      [ 7081.470780]  [<ffffffffc1326130>] ? ptlrpc_register_service+0xe90/0xe90 [ptlrpc]
      [ 7081.479853]  [<ffffffffb98bb621>] kthread+0xd1/0xe0
      [ 7081.486103]  [<ffffffffb98bb550>] ? insert_kthread_work+0x40/0x40
      [ 7081.493716]  [<ffffffffb9f205f7>] ret_from_fork_nospec_begin+0x21/0x21
      [ 7081.501800]  [<ffffffffb98bb550>] ? insert_kthread_work+0x40/0x40
      [ 7091.714953] Pid: 4632, comm: mdt00_008 3.10.0-862.9.1.el7_lustre.x86_64 #1 SMP Fri Aug 17 20:37:05 UTC 2018
      ...
      [ 7225.649157] LNet: Service thread pid 6456 completed after 400.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      [ 7225.649191] LNet: Service thread pid 4632 completed after 334.69s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      [ 7247.049160] LustreError: 11-0: soaked-OST0002-osc-MDT0002: operation ost_create to node 192.168.1.107@o2ib failed: rc = -19
      [ 7247.063727] LustreError: 4525:0:(osp_precreate.c:647:osp_precreate_send()) soaked-OST0002-osc-MDT0002: can't precreate: rc = -19
      [ 7247.080342] LustreError: 4525:0:(osp_precreate.c:1294:osp_precreate_thread()) soaked-OST0002-osc-MDT0002: cannot precreate objects: rc = -19
      Aug 29 05:01:01 soak-10 TIME: Time stamp for console
      [ 8225.327563] LNetError: 4307:0:(lib-msg.c:594:lnet_incr_hstats()) LBUG
      [ 8225.336875] Pid: 4307, comm: kiblnd_connd 3.10.0-862.9.1.el7_lustre.x86_64 #1 SMP Fri Aug 17 20:37:05 UTC 2018
      [ 8225.349882] Call Trace:
      [ 8225.354452]  [<ffffffffc0f937cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [ 8225.363505]  [<ffffffffc0f9387c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [ 8225.372109]  [<ffffffffc100d61a>] lnet_finalize+0xb7a/0xe00 [lnet]
      [ 8225.380637]  [<ffffffffc0bd15cd>] kiblnd_tx_done+0x10d/0x3e0 [ko2iblnd]
      [ 8225.389573]  [<ffffffffc0bd18e6>] kiblnd_txlist_done+0x46/0x60 [ko2iblnd]
      [ 8225.398487]  [<ffffffffc0bd2360>] kiblnd_abort_txs+0xf0/0x250 [ko2iblnd]
      [ 8225.407115]  [<ffffffffc0bd6825>] kiblnd_finalise_conn+0x75/0x120 [ko2iblnd]
      [ 8225.416096]  [<ffffffffc0bdb591>] kiblnd_connd+0x251/0xa00 [ko2iblnd]
      [ 8225.424360]  [<ffffffffb98bb621>] kthread+0xd1/0xe0
      [ 8225.430835]  [<ffffffffb9f205f7>] ret_from_fork_nospec_end+0x0/0x39
      [ 8225.438857]  [<ffffffffffffffff>] 0xffffffffffffffff
      [ 8225.445439] Kernel panic - not syncing: LBUG
      [ 8225.451189] CPU: 21 PID: 4307 Comm: kiblnd_connd Tainted: P           OE  ------------   3.10.0-862.9.1.el7_lustre.x86_64 #1
      [ 8225.464746] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
      [ 8225.478323] Call Trace:
      [ 8225.482095]  [<ffffffffb9f0e84e>] dump_stack+0x19/0x1b
      [ 8225.488889]  [<ffffffffb9f08b50>] panic+0xe8/0x21f
      [ 8225.495277]  [<ffffffffc0f938cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
      [ 8225.503226]  [<ffffffffc100d61a>] lnet_finalize+0xb7a/0xe00 [lnet]
      [ 8225.511156]  [<ffffffffc0bd15cd>] kiblnd_tx_done+0x10d/0x3e0 [ko2iblnd]
      [ 8225.519586]  [<ffffffffc0bd18e6>] kiblnd_txlist_done+0x46/0x60 [ko2iblnd]
      [ 8225.528203]  [<ffffffffc0bd2360>] kiblnd_abort_txs+0xf0/0x250 [ko2iblnd]
      [ 8225.536723]  [<ffffffffc0bd6825>] kiblnd_finalise_conn+0x75/0x120 [ko2iblnd]
      [ 8225.545635]  [<ffffffffc0bdb591>] kiblnd_connd+0x251/0xa00 [ko2iblnd]
      [ 8225.553845]  [<ffffffffb98cf670>] ? wake_up_state+0x20/0x20
      [ 8225.561085]  [<ffffffffc0bdb340>] ? kiblnd_check_conns+0x870/0x870 [ko2iblnd]
      [ 8225.570069]  [<ffffffffb98bb621>] kthread+0xd1/0xe0
      [ 8225.576523]  [<ffffffffb98bb550>] ? insert_kthread_work+0x40/0x40
      [ 8225.584338]  [<ffffffffb9f205f7>] ret_from_fork_nospec_begin+0x21/0x21
      [ 8225.592644]  [<ffffffffb98bb550>] ? insert_kthread_work+0x40/0x40
      [ 8225.600518] Kernel Offset: 0x38800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      

      Attachments

        Issue Links

          Activity

            People

              ashehata Amir Shehata (Inactive)
              sarah Sarah Liu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: