[LU-15880] ASSERTION( lqe->u.se.lse_pending_write == 0 ) Created: 22/May/22 Updated: 30/Nov/23 Resolved: 31/Jan/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Hongchao Zhang | Assignee: | Hongchao Zhang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
[2893639.878072] LustreError: 11121:0:(osp_object.c:594:osp_attr_get()) crex-MDT0001-osp-MDT0000:osp_attr_get update error [0x20000000a:0x1 :0x0]: rc = -5 [2893639.895830] LustreError: 11121:0:(osp_object.c:594:osp_attr_get()) Skipped 172 previous similar messages [2893639.921219] LustreError: 11121:0:(llog_cat.c:458:llog_cat_close()) crex-MDT0001-osp-MDT0000: failure destroying log during cleanup: rc = -5 [2893639.921220] LustreError: 11121:0:(llog_cat.c:458:llog_cat_close()) Skipped 1 previous similar message [2893643.015127] LustreError: 11121:0:(lquota_entry.c:132:lqe_iter_cb()) ASSERTION( lqe->u.se.lse_pending_write == 0 ) failed: [2893643.029044] LustreError: 11121:0:(lquota_entry.c:132:lqe_iter_cb()) LBUG [2893643.038158] Pid: 11121, comm: umount 3.10.0-1160.49.1.el7_lustre.ddn16.x86_64 #1 SMP Mon Dec 20 11:42:01 PST 2021 [2893643.038159] Call Trace: [2893643.038177] [<0>] libcfs_call_trace+0x90/0xf0 [libcfs] [2893643.038182] [<0>] lbug_with_loc+0x4c/0xa0 [libcfs] [2893643.038190] [<0>] lqe_iter_cb+0x147/0x150 [lquota] [2893643.038196] [<0>] cfs_hash_for_each_tight+0x11e/0x320 [libcfs] [2893643.038201] [<0>] cfs_hash_for_each_safe+0x13/0x20 [libcfs] [2893643.038205] [<0>] lquota_site_free+0x11c/0x310 [lquota] [2893643.038209] [<0>] qsd_qtype_fini+0x94/0x530 [lquota] [2893643.038213] [<0>] qsd_fini+0xd7/0x500 [lquota] [2893643.038221] [<0>] osd_shutdown+0x66/0x2e0 [osd_ldiskfs] [2893643.038229] [<0>] osd_process_config+0x277/0x360 [osd_ldiskfs] [2893643.038238] [<0>] lod_process_config+0x24d/0x1540 [lod] [2893643.038247] [<0>] mdd_process_config+0x146/0x5f0 [mdd] [2893643.038260] [<0>] mdt_stack_fini+0x2c2/0xca0 [mdt] [2893643.038266] [<0>] mdt_device_fini+0x34b/0x930 [mdt] [2893643.038294] [<0>] class_cleanup+0x9b8/0xc50 [obdclass] [2893643.038306] [<0>] class_process_config+0x65c/0x2830 [obdclass] [2893643.038319] [<0>] class_manual_cleanup+0x1c6/0x710 [obdclass] [2893643.038335] [<0>] server_put_super+0xa35/0x1150 [obdclass] [2893643.038338] [<0>] generic_shutdown_super+0x6d/0x100 [2893643.038340] [<0>] kill_anon_super+0x12/0x20 [2893643.038352] [<0>] lustre_kill_super+0x32/0x50 [obdclass] [2893643.038353] [<0>] deactivate_locked_super+0x4e/0x70 [2893643.038355] [<0>] deactivate_super+0x46/0x60 [2893643.038356] [<0>] cleanup_mnt+0x3f/0x80 [2893643.038358] [<0>] __cleanup_mnt+0x12/0x20 [2893643.038360] [<0>] task_work_run+0xbb/0xe0 [2893643.038363] [<0>] do_notify_resume+0xa5/0xc0 [2893643.038365] [<0>] int_signal+0x12/0x17 [2893643.038379] [<0>] 0xfffffffffffffffe the following errors are also related to this issue LustreError: 25405:0:(qmt_handler.c:516:qmt_dqacq0()) $$$ Release too much! uuid |
| Comments |
| Comment by Gerrit Updater [ 22/May/22 ] |
|
"Hongchao Zhang <hongchao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47425 |
| Comment by Gerrit Updater [ 18/Jul/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47425/ |
| Comment by Peter Jones [ 20/Jul/22 ] |
|
Landed for 2.16 |
| Comment by Gerrit Updater [ 13/Oct/22 ] |
|
"Hongchao Zhang <hongchao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48856 |
| Comment by Gerrit Updater [ 29/Oct/22 ] |
|
"Hongchao Zhang <hongchao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48981 |
| Comment by Peter Jones [ 26/Nov/22 ] |
|
Reopening to track new patches |
| Comment by Sergey Cheremencev [ 30/Dec/22 ] |
|
I was thinking how to fix problems causing messsage "Release too much! ..." automatically. There is already a mechanism to release a space at QSD - if usage+qunit is < granted. The problem is that QMT can't release more than stored in lqe_granted. It is not clear what to do in a such case to get correct lqe_granted. Even if the target sends it's actual usage(which is always correct as we got it from the lower level), it is not known about this target contribution in a total granted. To get 100% correct total granted we should recalculate total granted from the beginning. There is already a qmt_pool_recalc that does recalculate for all ids from the pool. We probably can change this to recalculate only certain UID. Now qsd release doesn't fill qb_usage - this also should be changed. I.e. the algorythm at QMT(qmt_dqacq0): |
| Comment by Gerrit Updater [ 31/Jan/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48981/ |
| Comment by Peter Jones [ 31/Jan/23 ] |
|
Landed for 2.16 |
| Comment by Stephane Thiell [ 06/Jul/23 ] |
|
Just hit this assertion with 2.15.3 during MDT unmount: Jul 05 19:37:27 fir-md1-s2 kernel: LustreError: 25778:0:(lquota_entry.c:135:lqe_iter_cb()) ASSERTION( lqe->u.se.lse_pending_write == 0 ) failed: Jul 05 19:37:27 fir-md1-s2 kernel: LustreError: 25778:0:(lquota_entry.c:135:lqe_iter_cb()) LBUG Jul 05 19:37:27 fir-md1-s2 kernel: Pid: 25778, comm: umount 3.10.0-1160.90.1.el7_lustre.pl1.x86_64 #1 SMP Tue Jun 20 15:47:49 PDT 2023 Message from syslogd@fir-md1-s2 at Jul 5 19:37:27 ... kernel:LustreError: 25778:0:(lquota_entry.c:135:lqe_iter_cb()) ASSERTION( lqe->u.se.lse_pending_write == 0 ) failed: Message from syslogd@fir-md1-s2 at Jul 5 19:37:27 ... kernel:LustreError: 25778:0:(lquota_entry.c:135:lqe_iter_cb()) LBUG Jul 05 19:37:27 fir-md1-s2 kernel: Call Trace: Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] libcfs_call_trace+0x90/0xf0 [libcfs] Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] lbug_with_loc+0x4c/0xa0 [libcfs] Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] lqe_iter_cb+0x153/0x160 [lquota] Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] cfs_hash_for_each_tight+0x11e/0x340 [libcfs] Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] cfs_hash_for_each_safe+0x13/0x20 [libcfs] Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] lquota_site_free+0x104/0x300 [lquota] Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] qsd_qtype_fini+0x8c/0x510 [lquota] Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] qsd_fini+0x130/0x500 [lquota] Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] osd_shutdown+0x66/0x300 [osd_ldiskfs] Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] osd_process_config+0x277/0x380 [osd_ldiskfs] Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] lod_process_config+0x270/0x1340 [lod] Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] mdd_process_config+0x146/0x640 [mdd] Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] mdt_stack_fini+0x2c2/0xd10 [mdt] Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] mdt_device_fini+0x34b/0x9a0 [mdt] Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] class_cleanup+0xa61/0xd20 [obdclass] Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] class_process_config+0x537/0x2670 [obdclass] Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] class_manual_cleanup+0x1c6/0x760 [obdclass] Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] server_put_super+0xa25/0xf80 [obdclass] Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] generic_shutdown_super+0x6d/0x100 Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] kill_anon_super+0x12/0x20 Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] lustre_kill_super+0x2b/0x40 [lustre] Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] deactivate_locked_super+0x56/0x70 Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] deactivate_super+0x4a/0x70 Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] cleanup_mnt+0x3f/0x90 Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] __cleanup_mnt+0x12/0x20 Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] task_work_run+0xbb/0xe0 Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] do_notify_resume+0xad/0xd0 Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] int_signal+0x12/0x17 Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] 0xfffffffffffffffe Jul 05 19:37:27 fir-md1-s2 kernel: Kernel panic - not syncing: LBUG Message from syslogd@fir-md1-s2 at Jul 5 19:37:27 ... kernel:Kernel panic - not syncing: LBUG |
| Comment by Gerrit Updater [ 06/Jul/23 ] |
|
"Stephane Thiell <sthiell@stanford.edu>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51588 |
| Comment by Gerrit Updater [ 30/Nov/23 ] |
|
"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53290 |