[LU-13000] MDS hit (osd_handler.c:2165:osd_object_release()) ASSERTION( !(o->oo_destroyed == 0 && o->oo_inode && o->oo_inode ->i_nlink == 0) ) failed Created: 22/Nov/19  Updated: 04/Dec/20  Resolved: 04/Dec/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Sarah Liu Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: soak
Environment:

b2.13-ib #2


Issue Links:
Related
is related to LU-13980 Kernel panic on OST after removing fi... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

2 of 4 MDSs hit this issue after restart SOAK. SOAK was restarted due to LU-12990, after restart, without cleaning the update_log and did mds failover, it hit the problem right away. The restart process are:
1. stop soak
2. umount everything
3. reboot
4. mount
5. restart soak

soak-9 console

[ 1731.742403] Lustre: soaked-MDT0002-osp-MDT0001: Connection restored to 192.168.1.110@o2ib (at 192.168.1.110@o2ib)
[ 1731.753872] Lustre: Skipped 2 previous similar messages
[ 1736.627026] Lustre: soaked-MDT0001: Recovery over after 5:49, of 12 clients 3 recovered and 9 were evicted.
[23485.235434] LustreError: 5279:0:(osd_handler.c:2165:osd_object_release()) ASSERTION( !(o->oo_destroyed == 0 && o->oo_inode && o->oo_inode
->i_nlink == 0) ) failed:
[23485.251769] LustreError: 5279:0:(osd_handler.c:2165:osd_object_release()) LBUG
[23485.259857] Pid: 5279, comm: mdt01_001 3.10.0-1062.1.1.el7_lustre.x86_64 #1 SMP Fri Nov 8 18:37:40 UTC 2019
[23485.270743] Call Trace:
[23485.273503]  [<ffffffffc0dab8ac>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[23485.280832]  [<ffffffffc0dab95c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[23485.287762]  [<ffffffffc169466c>] osd_object_release+0x7c/0x80 [osd_ldiskfs]
[23485.295670]  [<ffffffffc0eef0a8>] lu_object_put+0x198/0x3e0 [obdclass]
[23485.303009]  [<ffffffffc0eb3a8a>] llog_osd_regular_fid_add_name_entry+0x27a/0x500 [obdclass]
[23485.312472]  [<ffffffffc0eb4a5f>] llog_osd_declare_create+0x3af/0x710 [obdclass]
[23485.320769]  [<ffffffffc0ea07f5>] llog_declare_create+0x75/0x1f0 [obdclass]
[23485.328579]  [<ffffffffc0ea6fbd>] llog_cat_prep_log+0x11d/0x360 [obdclass]
[23485.336290]  [<ffffffffc0ea7260>] llog_cat_declare_add_rec+0x60/0x260 [obdclass]
[23485.344586]  [<ffffffffc0e9e178>] llog_declare_add+0x78/0x1a0 [obdclass]
[23485.352100]  [<ffffffffc124c52e>] top_trans_start+0x17e/0x940 [ptlrpc]
[23485.359490]  [<ffffffffc18ce2b4>] lod_trans_start+0x34/0x40 [lod]
[23485.366340]  [<ffffffffc1989f9a>] mdd_trans_start+0x1a/0x20 [mdd]
[23485.373197]  [<ffffffffc196e2f2>] mdd_create+0xbe2/0x1630 [mdd]
[23485.379838]  [<ffffffffc17f8e84>] mdt_create+0xb54/0x10e0 [mdt]
[23485.386519]  [<ffffffffc17f957b>] mdt_reint_create+0x16b/0x360 [mdt]
[23485.393643]  [<ffffffffc17feab3>] mdt_reint_rec+0x83/0x210 [mdt]
[23485.400378]  [<ffffffffc17d89e0>] mdt_reint_internal+0x7b0/0xba0 [mdt]
[23485.407703]  [<ffffffffc17e46d7>] mdt_reint+0x67/0x140 [mdt]
[23485.414052]  [<ffffffffc123b83a>] tgt_request_handle+0x98a/0x1630 [ptlrpc]
[23485.421781]  [<ffffffffc11dda96>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
[23485.430384]  [<ffffffffc11e15cc>] ptlrpc_main+0xbac/0x1540 [ptlrpc]
[23485.437434]  [<ffffffff848c50d1>] kthread+0xd1/0xe0
[23485.442910]  [<ffffffff84f8cd37>] ret_from_fork_nospec_end+0x0/0x39
[23485.449927]  [<ffffffffffffffff>] 0xffffffffffffffff
[23485.455515] Kernel panic - not syncing: LBUG
[23485.460280] CPU: 12 PID: 5279 Comm: mdt01_001 Kdump: loaded Tainted: G           OE  ------------   3.10.0-1062.1.1.el7_lustre.x86_64 #1
[23485.473964] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
[23485.486499] Call Trace:
[23485.489230]  [<ffffffff84f792c2>] dump_stack+0x19/0x1b
[23485.494965]  [<ffffffff84f72941>] panic+0xe8/0x21f
[23485.500316]  [<ffffffffc0dab9ab>] lbug_with_loc+0x9b/0xa0 [libcfs]
[23485.507217]  [<ffffffffc169466c>] osd_object_release+0x7c/0x80 [osd_ldiskfs]
[23485.515102]  [<ffffffffc0eef0a8>] lu_object_put+0x198/0x3e0 [obdclass]
[23485.522401]  [<ffffffffc0eb3a8a>] llog_osd_regular_fid_add_name_entry+0x27a/0x500 [obdclass]
[23485.531833]  [<ffffffffc0eb4a5f>] llog_osd_declare_create+0x3af/0x710 [obdclass]
[23485.540099]  [<ffffffffc0ea07f5>] llog_declare_create+0x75/0x1f0 [obdclass]
[23485.547881]  [<ffffffffc0ea6fbd>] llog_cat_prep_log+0x11d/0x360 [obdclass]
[23485.555566]  [<ffffffffc0ea7260>] llog_cat_declare_add_rec+0x60/0x260 [obdclass]
[23485.563832]  [<ffffffffc0e9e178>] llog_declare_add+0x78/0x1a0 [obdclass]
[23485.571336]  [<ffffffffc124c52e>] top_trans_start+0x17e/0x940 [ptlrpc]
[23485.578627]  [<ffffffffc18ce2b4>] lod_trans_start+0x34/0x40 [lod]
[23485.585432]  [<ffffffffc1989f9a>] mdd_trans_start+0x1a/0x20 [mdd]
[23485.592239]  [<ffffffffc196e2f2>] mdd_create+0xbe2/0x1630 [mdd]
[23485.598859]  [<ffffffffc17f8e84>] mdt_create+0xb54/0x10e0 [mdt]
[23485.605489]  [<ffffffffc0ecc3c4>] ? lprocfs_stats_lock+0x24/0xd0 [obdclass]
[23485.613270]  [<ffffffffc17f957b>] mdt_reint_create+0x16b/0x360 [mdt]
[23485.620370]  [<ffffffffc17feab3>] mdt_reint_rec+0x83/0x210 [mdt]
[23485.627080]  [<ffffffffc17d89e0>] mdt_reint_internal+0x7b0/0xba0 [mdt]
[23485.634378]  [<ffffffffc17e1f67>] ? mdt_thread_info_init+0xa7/0x1e0 [mdt]
[23485.641963]  [<ffffffffc17e46d7>] mdt_reint+0x67/0x140 [mdt]
[23485.648312]  [<ffffffffc123b83a>] tgt_request_handle+0x98a/0x1630 [ptlrpc]
[23485.656022]  [<ffffffffc1212c41>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc]
[23485.664474]  [<ffffffffc0dabcbe>] ? ktime_get_real_seconds+0xe/0x10 [libcfs]
[23485.672372]  [<ffffffffc11dda96>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
[23485.680951]  [<ffffffffc11da4e1>] ? ptlrpc_wait_event+0xd1/0x3a0 [ptlrpc]
[23485.688535]  [<ffffffff848d2643>] ? __wake_up+0x13/0x20
[23485.694392]  [<ffffffffc11e15cc>] ptlrpc_main+0xbac/0x1540 [ptlrpc]
[23485.701412]  [<ffffffffc11e0a20>] ? ptlrpc_register_service+0xf90/0xf90 [ptlrpc]
[23485.709666]  [<ffffffff848c50d1>] kthread+0xd1/0xe0
[23485.715115]  [<ffffffff848c5000>] ? insert_kthread_work+0x40/0x40
[23485.721914]  [<ffffffff84f8cd37>] ret_from_fork_nospec_begin+0x21/0x21
[23485.729199]  [<ffffffff848c5000>] ? insert_kthread_work+0x40/0x40
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 3.10.0-1062.1.1.el7_lustre.x86_64 (jenkins@trevis-306-el7-x8664-2.trevis.whamcloud.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Fri Nov 8 18:37:40 UTC 2019
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-3.10.0-1062.1.1.el7_lustre.x86_64 ro console=ttyS0,115200 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never nokaslr novmcoredd disable_cpu_apicid=0 elfcorehdr=869816K
[    0.000000] e820: BIOS-provided physical RAM map:

soak-10 console

[ 2564.640197] Lustre: soaked-MDT0002: disconnecting 9 stale clients
[ 2564.727697] Lustre: soaked-MDT0002: Recovery over after 5:00, of 12 clients 3 recovered and 9 were evicted.
[ 2567.239742] Lustre: soaked-MDT0000-osp-MDT0002: Connection restored to 192.168.1.108@o2ib (at 192.168.1.108@o2ib)
[ 2567.251222] Lustre: Skipped 8 previous similar messages
[24318.991635] LustreError: 5459:0:(osd_handler.c:2165:osd_object_release()) ASSERTION( !(o->oo_destroyed == 0 && o->oo_inode && o->oo_inode
->i_nlink == 0) ) failed: 
[24319.007997] LustreError: 5459:0:(osd_handler.c:2165:osd_object_release()) LBUG
[24319.016083] Pid: 5459, comm: mdt_out01_002 3.10.0-1062.1.1.el7_lustre.x86_64 #1 SMP Fri Nov 8 18:37:40 UTC 2019
[24319.027375] Call Trace:
[24319.030151]  [<ffffffffc0c7d8ac>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[24319.037494]  [<ffffffffc0c7d95c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[24319.044452]  [<ffffffffc158166c>] osd_object_release+0x7c/0x80 [osd_ldiskfs]
[24319.052389]  [<ffffffffc0dc10a8>] lu_object_put+0x198/0x3e0 [obdclass]
[24319.059807]  [<ffffffffc1110b9c>] out_tx_end+0x1ec/0x5c0 [ptlrpc]
[24319.066758]  [<ffffffffc1114d52>] out_handle+0x1442/0x1bb0 [ptlrpc]
[24319.073859]  [<ffffffffc110d83a>] tgt_request_handle+0x98a/0x1630 [ptlrpc]
[24319.081638]  [<ffffffffc10afa96>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
[24319.090301]  [<ffffffffc10b35cc>] ptlrpc_main+0xbac/0x1540 [ptlrpc]
[24319.097391]  [<ffffffff8bcc50d1>] kthread+0xd1/0xe0
[24319.102906]  [<ffffffff8c38cd37>] ret_from_fork_nospec_end+0x0/0x39
[24319.109926]  [<ffffffffffffffff>] 0xffffffffffffffff
[24319.115541] Kernel panic - not syncing: LBUG
[24319.120317] CPU: 12 PID: 5459 Comm: mdt_out01_002 Kdump: loaded Tainted: G           OE  ------------   3.10.0-1062.1.1.el7_lustre.x86_64 #1
[24319.134391] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
[24319.146920] Call Trace:
[24319.149656]  [<ffffffff8c3792c2>] dump_stack+0x19/0x1b
[24319.155410]  [<ffffffff8c372941>] panic+0xe8/0x21f
[24319.160762]  [<ffffffffc0c7d9ab>] lbug_with_loc+0x9b/0xa0 [libcfs]
[24319.167668]  [<ffffffffc158166c>] osd_object_release+0x7c/0x80 [osd_ldiskfs]
[24319.175562]  [<ffffffffc0dc10a8>] lu_object_put+0x198/0x3e0 [obdclass]
[24319.182897]  [<ffffffffc1110b9c>] out_tx_end+0x1ec/0x5c0 [ptlrpc]
[24319.189744]  [<ffffffffc1114d52>] out_handle+0x1442/0x1bb0 [ptlrpc]
[24319.196784]  [<ffffffffc110d83a>] tgt_request_handle+0x98a/0x1630 [ptlrpc]
[24319.204509]  [<ffffffffc10e4c41>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc]
[24319.212965]  [<ffffffffc0c7dcbe>] ? ktime_get_real_seconds+0xe/0x10 [libcfs]
[24319.220877]  [<ffffffffc10afa96>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
[24319.229471]  [<ffffffffc10ac4e1>] ? ptlrpc_wait_event+0xd1/0x3a0 [ptlrpc]
[24319.237076]  [<ffffffff8bcd2643>] ? __wake_up+0x13/0x20
[24319.242946]  [<ffffffffc10b35cc>] ptlrpc_main+0xbac/0x1540 [ptlrpc]
[24319.249988]  [<ffffffffc10b2a20>] ? ptlrpc_register_service+0xf90/0xf90 [ptlrpc]
[24319.258245]  [<ffffffff8bcc50d1>] kthread+0xd1/0xe0
[24319.263695]  [<ffffffff8bcc5000>] ? insert_kthread_work+0x40/0x40
[24319.270500]  [<ffffffff8c38cd37>] ret_from_fork_nospec_begin+0x21/0x21
[24319.277791]  [<ffffffff8bcc5000>] ? insert_kthread_work+0x40/0x40
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct


 Comments   
Comment by Andreas Dilger [ 04/Dec/20 ]

This assertion is removed with the patch https://review.whamcloud.com/40058 "LU-13980 osd: remove osd_object_release LASSERT".

Generated at Sat Feb 10 02:57:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.