Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
Lustre 2.13.0
-
b2.13-ib #2
-
3
-
9223372036854775807
Description
2 of 4 MDSs hit this issue after restart SOAK. SOAK was restarted due to LU-12990, after restart, without cleaning the update_log and did mds failover, it hit the problem right away. The restart process are:
1. stop soak
2. umount everything
3. reboot
4. mount
5. restart soak
soak-9 console
[ 1731.742403] Lustre: soaked-MDT0002-osp-MDT0001: Connection restored to 192.168.1.110@o2ib (at 192.168.1.110@o2ib) [ 1731.753872] Lustre: Skipped 2 previous similar messages [ 1736.627026] Lustre: soaked-MDT0001: Recovery over after 5:49, of 12 clients 3 recovered and 9 were evicted. [23485.235434] LustreError: 5279:0:(osd_handler.c:2165:osd_object_release()) ASSERTION( !(o->oo_destroyed == 0 && o->oo_inode && o->oo_inode ->i_nlink == 0) ) failed: [23485.251769] LustreError: 5279:0:(osd_handler.c:2165:osd_object_release()) LBUG [23485.259857] Pid: 5279, comm: mdt01_001 3.10.0-1062.1.1.el7_lustre.x86_64 #1 SMP Fri Nov 8 18:37:40 UTC 2019 [23485.270743] Call Trace: [23485.273503] [<ffffffffc0dab8ac>] libcfs_call_trace+0x8c/0xc0 [libcfs] [23485.280832] [<ffffffffc0dab95c>] lbug_with_loc+0x4c/0xa0 [libcfs] [23485.287762] [<ffffffffc169466c>] osd_object_release+0x7c/0x80 [osd_ldiskfs] [23485.295670] [<ffffffffc0eef0a8>] lu_object_put+0x198/0x3e0 [obdclass] [23485.303009] [<ffffffffc0eb3a8a>] llog_osd_regular_fid_add_name_entry+0x27a/0x500 [obdclass] [23485.312472] [<ffffffffc0eb4a5f>] llog_osd_declare_create+0x3af/0x710 [obdclass] [23485.320769] [<ffffffffc0ea07f5>] llog_declare_create+0x75/0x1f0 [obdclass] [23485.328579] [<ffffffffc0ea6fbd>] llog_cat_prep_log+0x11d/0x360 [obdclass] [23485.336290] [<ffffffffc0ea7260>] llog_cat_declare_add_rec+0x60/0x260 [obdclass] [23485.344586] [<ffffffffc0e9e178>] llog_declare_add+0x78/0x1a0 [obdclass] [23485.352100] [<ffffffffc124c52e>] top_trans_start+0x17e/0x940 [ptlrpc] [23485.359490] [<ffffffffc18ce2b4>] lod_trans_start+0x34/0x40 [lod] [23485.366340] [<ffffffffc1989f9a>] mdd_trans_start+0x1a/0x20 [mdd] [23485.373197] [<ffffffffc196e2f2>] mdd_create+0xbe2/0x1630 [mdd] [23485.379838] [<ffffffffc17f8e84>] mdt_create+0xb54/0x10e0 [mdt] [23485.386519] [<ffffffffc17f957b>] mdt_reint_create+0x16b/0x360 [mdt] [23485.393643] [<ffffffffc17feab3>] mdt_reint_rec+0x83/0x210 [mdt] [23485.400378] [<ffffffffc17d89e0>] mdt_reint_internal+0x7b0/0xba0 [mdt] [23485.407703] [<ffffffffc17e46d7>] mdt_reint+0x67/0x140 [mdt] [23485.414052] [<ffffffffc123b83a>] tgt_request_handle+0x98a/0x1630 [ptlrpc] [23485.421781] [<ffffffffc11dda96>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc] [23485.430384] [<ffffffffc11e15cc>] ptlrpc_main+0xbac/0x1540 [ptlrpc] [23485.437434] [<ffffffff848c50d1>] kthread+0xd1/0xe0 [23485.442910] [<ffffffff84f8cd37>] ret_from_fork_nospec_end+0x0/0x39 [23485.449927] [<ffffffffffffffff>] 0xffffffffffffffff [23485.455515] Kernel panic - not syncing: LBUG [23485.460280] CPU: 12 PID: 5279 Comm: mdt01_001 Kdump: loaded Tainted: G OE ------------ 3.10.0-1062.1.1.el7_lustre.x86_64 #1 [23485.473964] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 [23485.486499] Call Trace: [23485.489230] [<ffffffff84f792c2>] dump_stack+0x19/0x1b [23485.494965] [<ffffffff84f72941>] panic+0xe8/0x21f [23485.500316] [<ffffffffc0dab9ab>] lbug_with_loc+0x9b/0xa0 [libcfs] [23485.507217] [<ffffffffc169466c>] osd_object_release+0x7c/0x80 [osd_ldiskfs] [23485.515102] [<ffffffffc0eef0a8>] lu_object_put+0x198/0x3e0 [obdclass] [23485.522401] [<ffffffffc0eb3a8a>] llog_osd_regular_fid_add_name_entry+0x27a/0x500 [obdclass] [23485.531833] [<ffffffffc0eb4a5f>] llog_osd_declare_create+0x3af/0x710 [obdclass] [23485.540099] [<ffffffffc0ea07f5>] llog_declare_create+0x75/0x1f0 [obdclass] [23485.547881] [<ffffffffc0ea6fbd>] llog_cat_prep_log+0x11d/0x360 [obdclass] [23485.555566] [<ffffffffc0ea7260>] llog_cat_declare_add_rec+0x60/0x260 [obdclass] [23485.563832] [<ffffffffc0e9e178>] llog_declare_add+0x78/0x1a0 [obdclass] [23485.571336] [<ffffffffc124c52e>] top_trans_start+0x17e/0x940 [ptlrpc] [23485.578627] [<ffffffffc18ce2b4>] lod_trans_start+0x34/0x40 [lod] [23485.585432] [<ffffffffc1989f9a>] mdd_trans_start+0x1a/0x20 [mdd] [23485.592239] [<ffffffffc196e2f2>] mdd_create+0xbe2/0x1630 [mdd] [23485.598859] [<ffffffffc17f8e84>] mdt_create+0xb54/0x10e0 [mdt] [23485.605489] [<ffffffffc0ecc3c4>] ? lprocfs_stats_lock+0x24/0xd0 [obdclass] [23485.613270] [<ffffffffc17f957b>] mdt_reint_create+0x16b/0x360 [mdt] [23485.620370] [<ffffffffc17feab3>] mdt_reint_rec+0x83/0x210 [mdt] [23485.627080] [<ffffffffc17d89e0>] mdt_reint_internal+0x7b0/0xba0 [mdt] [23485.634378] [<ffffffffc17e1f67>] ? mdt_thread_info_init+0xa7/0x1e0 [mdt] [23485.641963] [<ffffffffc17e46d7>] mdt_reint+0x67/0x140 [mdt] [23485.648312] [<ffffffffc123b83a>] tgt_request_handle+0x98a/0x1630 [ptlrpc] [23485.656022] [<ffffffffc1212c41>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [23485.664474] [<ffffffffc0dabcbe>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [23485.672372] [<ffffffffc11dda96>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc] [23485.680951] [<ffffffffc11da4e1>] ? ptlrpc_wait_event+0xd1/0x3a0 [ptlrpc] [23485.688535] [<ffffffff848d2643>] ? __wake_up+0x13/0x20 [23485.694392] [<ffffffffc11e15cc>] ptlrpc_main+0xbac/0x1540 [ptlrpc] [23485.701412] [<ffffffffc11e0a20>] ? ptlrpc_register_service+0xf90/0xf90 [ptlrpc] [23485.709666] [<ffffffff848c50d1>] kthread+0xd1/0xe0 [23485.715115] [<ffffffff848c5000>] ? insert_kthread_work+0x40/0x40 [23485.721914] [<ffffffff84f8cd37>] ret_from_fork_nospec_begin+0x21/0x21 [23485.729199] [<ffffffff848c5000>] ? insert_kthread_work+0x40/0x40 [ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Initializing cgroup subsys cpu [ 0.000000] Initializing cgroup subsys cpuacct [ 0.000000] Linux version 3.10.0-1062.1.1.el7_lustre.x86_64 (jenkins@trevis-306-el7-x8664-2.trevis.whamcloud.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Fri Nov 8 18:37:40 UTC 2019 [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-1062.1.1.el7_lustre.x86_64 ro console=ttyS0,115200 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never nokaslr novmcoredd disable_cpu_apicid=0 elfcorehdr=869816K [ 0.000000] e820: BIOS-provided physical RAM map:
soak-10 console
[ 2564.640197] Lustre: soaked-MDT0002: disconnecting 9 stale clients [ 2564.727697] Lustre: soaked-MDT0002: Recovery over after 5:00, of 12 clients 3 recovered and 9 were evicted. [ 2567.239742] Lustre: soaked-MDT0000-osp-MDT0002: Connection restored to 192.168.1.108@o2ib (at 192.168.1.108@o2ib) [ 2567.251222] Lustre: Skipped 8 previous similar messages [24318.991635] LustreError: 5459:0:(osd_handler.c:2165:osd_object_release()) ASSERTION( !(o->oo_destroyed == 0 && o->oo_inode && o->oo_inode ->i_nlink == 0) ) failed: [24319.007997] LustreError: 5459:0:(osd_handler.c:2165:osd_object_release()) LBUG [24319.016083] Pid: 5459, comm: mdt_out01_002 3.10.0-1062.1.1.el7_lustre.x86_64 #1 SMP Fri Nov 8 18:37:40 UTC 2019 [24319.027375] Call Trace: [24319.030151] [<ffffffffc0c7d8ac>] libcfs_call_trace+0x8c/0xc0 [libcfs] [24319.037494] [<ffffffffc0c7d95c>] lbug_with_loc+0x4c/0xa0 [libcfs] [24319.044452] [<ffffffffc158166c>] osd_object_release+0x7c/0x80 [osd_ldiskfs] [24319.052389] [<ffffffffc0dc10a8>] lu_object_put+0x198/0x3e0 [obdclass] [24319.059807] [<ffffffffc1110b9c>] out_tx_end+0x1ec/0x5c0 [ptlrpc] [24319.066758] [<ffffffffc1114d52>] out_handle+0x1442/0x1bb0 [ptlrpc] [24319.073859] [<ffffffffc110d83a>] tgt_request_handle+0x98a/0x1630 [ptlrpc] [24319.081638] [<ffffffffc10afa96>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc] [24319.090301] [<ffffffffc10b35cc>] ptlrpc_main+0xbac/0x1540 [ptlrpc] [24319.097391] [<ffffffff8bcc50d1>] kthread+0xd1/0xe0 [24319.102906] [<ffffffff8c38cd37>] ret_from_fork_nospec_end+0x0/0x39 [24319.109926] [<ffffffffffffffff>] 0xffffffffffffffff [24319.115541] Kernel panic - not syncing: LBUG [24319.120317] CPU: 12 PID: 5459 Comm: mdt_out01_002 Kdump: loaded Tainted: G OE ------------ 3.10.0-1062.1.1.el7_lustre.x86_64 #1 [24319.134391] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 [24319.146920] Call Trace: [24319.149656] [<ffffffff8c3792c2>] dump_stack+0x19/0x1b [24319.155410] [<ffffffff8c372941>] panic+0xe8/0x21f [24319.160762] [<ffffffffc0c7d9ab>] lbug_with_loc+0x9b/0xa0 [libcfs] [24319.167668] [<ffffffffc158166c>] osd_object_release+0x7c/0x80 [osd_ldiskfs] [24319.175562] [<ffffffffc0dc10a8>] lu_object_put+0x198/0x3e0 [obdclass] [24319.182897] [<ffffffffc1110b9c>] out_tx_end+0x1ec/0x5c0 [ptlrpc] [24319.189744] [<ffffffffc1114d52>] out_handle+0x1442/0x1bb0 [ptlrpc] [24319.196784] [<ffffffffc110d83a>] tgt_request_handle+0x98a/0x1630 [ptlrpc] [24319.204509] [<ffffffffc10e4c41>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [24319.212965] [<ffffffffc0c7dcbe>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [24319.220877] [<ffffffffc10afa96>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc] [24319.229471] [<ffffffffc10ac4e1>] ? ptlrpc_wait_event+0xd1/0x3a0 [ptlrpc] [24319.237076] [<ffffffff8bcd2643>] ? __wake_up+0x13/0x20 [24319.242946] [<ffffffffc10b35cc>] ptlrpc_main+0xbac/0x1540 [ptlrpc] [24319.249988] [<ffffffffc10b2a20>] ? ptlrpc_register_service+0xf90/0xf90 [ptlrpc] [24319.258245] [<ffffffff8bcc50d1>] kthread+0xd1/0xe0 [24319.263695] [<ffffffff8bcc5000>] ? insert_kthread_work+0x40/0x40 [24319.270500] [<ffffffff8c38cd37>] ret_from_fork_nospec_begin+0x21/0x21 [24319.277791] [<ffffffff8bcc5000>] ? insert_kthread_work+0x40/0x40 [ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Initializing cgroup subsys cpu [ 0.000000] Initializing cgroup subsys cpuacct