[LU-14110] Race during several client mount instances (--> rmmod lustre hang) Created: 03/Nov/20 Updated: 26/Apr/22 Resolved: 22/Mar/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.13.0, Lustre 2.14.0, Lustre 2.12.5 |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Etienne Aujames | Assignee: | Etienne Aujames |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | obdclass | ||
| Environment: |
VMs with Lustre 2.12.5/master on ldiskfs |
||
| Issue Links: |
|
||||||||
| Epic/Theme: | client | ||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
I create this ticket to follow the issue seen by @apercher (cf. Here are the commands/configs to reproduce the issue: fstab: <serv1@ib1>:<serv2@ib1>:/fs1 /mnt/fs1 lustre defaults,_netdev,noauto,x-systemd.requires=lnet.service,flock,user_xattr,nosuid 0 0 <serv1@ib1>:<serv2@ib1>:/fs1/home /mnt/home lustre defaults,_netdev,noauto,x-systemd.requires=lnet.service,flock,user_xattr,nosuid 0 0 commands: while true; do mount /mnt/home & mount /mnt/fs1 umount /mnt/home umount /mnt/fs1 lustre_rmmod done After some iterations "rmmod lustre" will hang in "lu_context_key_degister" dmesg (master branch): [ 1560.484463] INFO: task rmmod:6430 blocked for more than 120 seconds. [ 1560.484480] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1560.484496] rmmod D ffff9ddbdfd9acc0 0 6430 6396 0x00000080 [ 1560.484499] Call Trace: [ 1560.484504] [<ffffffff8b0266d2>] ? kmem_cache_free+0x1e2/0x200 [ 1560.484508] [<ffffffff8b585da9>] schedule+0x29/0x70 [ 1560.484531] [<ffffffffc0a0284d>] lu_context_key_degister+0xcd/0x150 [obdclass] [ 1560.484534] [<ffffffff8aec7880>] ? wake_bit_function_rh+0x40/0x40 [ 1560.484548] [<ffffffffc0a02a72>] lu_context_key_degister_many+0x72/0xb0 [obdclass] [ 1560.484550] [<ffffffff8b0266d2>] ? kmem_cache_free+0x1e2/0x200 [ 1560.484564] [<ffffffffc0d67347>] vvp_type_fini+0x27/0x30 [lustre] [ 1560.484577] [<ffffffffc09fc01b>] lu_device_type_fini+0x1b/0x20 [obdclass] [ 1560.484586] [<ffffffffc0d68d75>] vvp_global_fini+0x15/0x30 [lustre] [ 1560.484596] [<ffffffffc0d7beb4>] lustre_exit+0x31/0x17d [lustre] [ 1560.484599] [<ffffffff8af1c46e>] SyS_delete_module+0x19e/0x310 [ 1560.484601] [<ffffffff8b592e09>] ? system_call_after_swapgs+0x96/0x13a [ 1560.484603] [<ffffffff8b592e15>] ? system_call_after_swapgs+0xa2/0x13a [ 1560.484604] [<ffffffff8b592e09>] ? system_call_after_swapgs+0x96/0x13a [ 1560.484606] [<ffffffff8b592e15>] ? system_call_after_swapgs+0xa2/0x13a [ 1560.484607] [<ffffffff8b592e09>] ? system_call_after_swapgs+0x96/0x13a [ 1560.484609] [<ffffffff8b592ed2>] system_call_fastpath+0x25/0x2a [ 1560.484611] [<ffffffff8b592e15>] ? system_call_after_swapgs+0xa2/0x13a crash backtrace (master branch):
crash> bt -F 6430
PID: 6430 TASK: ffff9ddbd5c0c1c0 CPU: 3 COMMAND: "rmmod"
#0 [ffff9ddbd5d2bd18] __schedule at ffffffff8b5858fa
ffff9ddbd5d2bd20: 0000000000000082 ffff9ddbd5d2bfd8
ffff9ddbd5d2bd30: ffff9ddbd5d2bfd8 ffff9ddbd5d2bfd8
ffff9ddbd5d2bd40: 000000000001acc0 [task_struct]
ffff9ddbd5d2bd50: kmem_cache_free+482 [dm_rq_target_io]
ffff9ddbd5d2bd60: 0000000000000000 00000000a8325962
ffff9ddbd5d2bd70: 0000000000000246 ll_thread_key
ffff9ddbd5d2bd80: bit_wait_table+2664 ffff9ddbd5d2bdd8
ffff9ddbd5d2bd90: 0000000000000000 0000000000000000
ffff9ddbd5d2bda0: ffff9ddbd5d2bdb0 schedule+41
#1 [ffff9ddbd5d2bda8] schedule at ffffffff8b585da9
ffff9ddbd5d2bdb0: ffff9ddbd5d2be20 lu_context_key_degister+205
#2 [ffff9ddbd5d2bdb8] lu_context_key_degister at ffffffffc0a0284d [obdclass]
ffff9ddbd5d2bdc0: ll_thread_key+36 00000000ffffffff
ffff9ddbd5d2bdd0: 0000000000000000 0000000000000000
ffff9ddbd5d2bde0: [task_struct] var_wake_function
ffff9ddbd5d2bdf0: bit_wait_table+2672 bit_wait_table+2672
ffff9ddbd5d2be00: 00000000a8325962 fffffffffffffff5
ffff9ddbd5d2be10: __this_module 0000000000000800
ffff9ddbd5d2be20: ffff9ddbd5d2be80 lu_context_key_degister_many+114
#3 [ffff9ddbd5d2be28] lu_context_key_degister_many at ffffffffc0a02a72 [obdclass]
ffff9ddbd5d2be30: ffff9ddb00000008 ffff9ddbd5d2be90
ffff9ddbd5d2be40: ffff9ddbd5d2be50 00000000a8325962
ffff9ddbd5d2be50: kmem_cache_free+482 vvp_session_key
ffff9ddbd5d2be60: vvp_thread_key 0000000000000000
crash> sym ll_thread_key
ffffffffc0da4a00 (D) ll_thread_key [lustre]
crash> struct lu_context_key ll_thread_key
struct lu_context_key {
lct_tags = 1073741832,
lct_init = 0xffffffffc0d67d20 <ll_thread_key_init>,
lct_fini = 0xffffffffc0d67e30 <ll_thread_key_fini>,
lct_exit = 0x0,
lct_index = 14,
lct_used = {
counter = 1
},
lct_owner = 0xffffffffc0da8b80 <__this_module>,
lct_reference = {<No data fields>}
}
The issue seems to be more recurrent on b2_12 branch.
|
| Comments |
| Comment by Gerrit Updater [ 06/Nov/20 ] |
|
Etienne AUJAMES (eaujames@ddn.com) uploaded a new patch: https://review.whamcloud.com/40561 |
| Comment by Gerrit Updater [ 06/Nov/20 ] |
|
Etienne AUJAMES (eaujames@ddn.com) uploaded a new patch: https://review.whamcloud.com/40565 |
| Comment by Gerrit Updater [ 22/Mar/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40565/ |
| Comment by Peter Jones [ 22/Mar/21 ] |
|
Landed for 2.15 |