[LU-15880] ASSERTION( lqe->u.se.lse_pending_write == 0 ) Created: 22/May/22  Updated: 30/Nov/23  Resolved: 31/Jan/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Minor
Reporter: Hongchao Zhang Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-17125 jbd2 commit deadlock Closed
Related
is related to LU-15927 LustreError: 25405:0:(qmt_handler.c:5... Resolved
is related to LU-16529 sanity-quota test_84: pool grant is n... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
[2893639.878072] LustreError: 11121:0:(osp_object.c:594:osp_attr_get()) crex-MDT0001-osp-MDT0000:osp_attr_get update error [0x20000000a:0x1
:0x0]: rc = -5
[2893639.895830] LustreError: 11121:0:(osp_object.c:594:osp_attr_get()) Skipped 172 previous similar messages
[2893639.921219] LustreError: 11121:0:(llog_cat.c:458:llog_cat_close()) crex-MDT0001-osp-MDT0000: failure destroying log during cleanup: rc
 = -5
[2893639.921220] LustreError: 11121:0:(llog_cat.c:458:llog_cat_close()) Skipped 1 previous similar message
[2893643.015127] LustreError: 11121:0:(lquota_entry.c:132:lqe_iter_cb()) ASSERTION( lqe->u.se.lse_pending_write == 0 ) failed: 
[2893643.029044] LustreError: 11121:0:(lquota_entry.c:132:lqe_iter_cb()) LBUG
[2893643.038158] Pid: 11121, comm: umount 3.10.0-1160.49.1.el7_lustre.ddn16.x86_64 #1 SMP Mon Dec 20 11:42:01 PST 2021
[2893643.038159] Call Trace:
[2893643.038177] [<0>] libcfs_call_trace+0x90/0xf0 [libcfs]
[2893643.038182] [<0>] lbug_with_loc+0x4c/0xa0 [libcfs]
[2893643.038190] [<0>] lqe_iter_cb+0x147/0x150 [lquota]
[2893643.038196] [<0>] cfs_hash_for_each_tight+0x11e/0x320 [libcfs]
[2893643.038201] [<0>] cfs_hash_for_each_safe+0x13/0x20 [libcfs]
[2893643.038205] [<0>] lquota_site_free+0x11c/0x310 [lquota]
[2893643.038209] [<0>] qsd_qtype_fini+0x94/0x530 [lquota]
[2893643.038213] [<0>] qsd_fini+0xd7/0x500 [lquota]
[2893643.038221] [<0>] osd_shutdown+0x66/0x2e0 [osd_ldiskfs]
[2893643.038229] [<0>] osd_process_config+0x277/0x360 [osd_ldiskfs]
[2893643.038238] [<0>] lod_process_config+0x24d/0x1540 [lod]
[2893643.038247] [<0>] mdd_process_config+0x146/0x5f0 [mdd]
[2893643.038260] [<0>] mdt_stack_fini+0x2c2/0xca0 [mdt]
[2893643.038266] [<0>] mdt_device_fini+0x34b/0x930 [mdt]
[2893643.038294] [<0>] class_cleanup+0x9b8/0xc50 [obdclass]
[2893643.038306] [<0>] class_process_config+0x65c/0x2830 [obdclass]
[2893643.038319] [<0>] class_manual_cleanup+0x1c6/0x710 [obdclass]
[2893643.038335] [<0>] server_put_super+0xa35/0x1150 [obdclass]
[2893643.038338] [<0>] generic_shutdown_super+0x6d/0x100
[2893643.038340] [<0>] kill_anon_super+0x12/0x20
[2893643.038352] [<0>] lustre_kill_super+0x32/0x50 [obdclass]
[2893643.038353] [<0>] deactivate_locked_super+0x4e/0x70
[2893643.038355] [<0>] deactivate_super+0x46/0x60
[2893643.038356] [<0>] cleanup_mnt+0x3f/0x80
[2893643.038358] [<0>] __cleanup_mnt+0x12/0x20
[2893643.038360] [<0>] task_work_run+0xbb/0xe0
[2893643.038363] [<0>] do_notify_resume+0xa5/0xc0
[2893643.038365] [<0>] int_signal+0x12/0x17
[2893643.038379] [<0>] 0xfffffffffffffffe

the following errors are also related to this issue

LustreError: 25405:0:(qmt_handler.c:516:qmt_dqacq0()) $$$ Release too much! uuid


 Comments   
Comment by Gerrit Updater [ 22/May/22 ]

"Hongchao Zhang <hongchao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47425
Subject: LU-15880 quota: free reserved in case of error
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7a2cd69247cbefd31ad97b50a4652b0afe9d6bff

Comment by Gerrit Updater [ 18/Jul/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47425/
Subject: LU-15880 quota: fix issues in reserving quota
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 40daa59ac41f450b60b42eb2bb0ff42ebd3c998b

Comment by Peter Jones [ 20/Jul/22 ]

Landed for 2.16

Comment by Gerrit Updater [ 13/Oct/22 ]

"Hongchao Zhang <hongchao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48856
Subject: LU-15880 quota: fix quota by offline file
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 774e0b4d498f712ba97d09b0ef47cc0231063811

Comment by Gerrit Updater [ 29/Oct/22 ]

"Hongchao Zhang <hongchao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48981
Subject: LU-15880 quota: fix insane grant quota
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1d97cca9265bfa517476c8826b385a1db4a3b188

Comment by Peter Jones [ 26/Nov/22 ]

Reopening to track new patches

Comment by Sergey Cheremencev [ 30/Dec/22 ]

I was thinking how to fix problems causing messsage "Release too much! ..." automatically. There is already a mechanism to release a space at QSD - if usage+qunit is < granted. The problem is that QMT can't release more than stored in lqe_granted. It is not clear what to do in a such case to get correct lqe_granted. Even if the target sends it's actual usage(which is always correct as we got it from the lower level), it is not known about this target contribution in a total granted. To get 100% correct total granted we should recalculate total granted from the beginning. There is already a qmt_pool_recalc that does recalculate for all ids from the pool. We probably can change this to recalculate only certain UID. Now qsd release doesn't fill qb_usage - this also should be changed. I.e. the algorythm at QMT(qmt_dqacq0):
1. Release too much ? (qb_count > slv_granted || qb_count > lqe_granted)
2. If yes, update slave index with a new value(qb_usage).
3. call qmt_pool_recalc for global pool(plus pools that match slave index) for UID.

Comment by Gerrit Updater [ 31/Jan/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48981/
Subject: LU-15880 quota: fix insane grant quota
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: a2fd4d3aee9739dcb23ac3bf46d221a978808463

Comment by Peter Jones [ 31/Jan/23 ]

Landed for 2.16

Comment by Stephane Thiell [ 06/Jul/23 ]

Just hit this assertion with 2.15.3 during MDT unmount:

Jul 05 19:37:27 fir-md1-s2 kernel: LustreError: 25778:0:(lquota_entry.c:135:lqe_iter_cb()) ASSERTION( lqe->u.se.lse_pending_write == 0 ) failed: 
Jul 05 19:37:27 fir-md1-s2 kernel: LustreError: 25778:0:(lquota_entry.c:135:lqe_iter_cb()) LBUG
Jul 05 19:37:27 fir-md1-s2 kernel: Pid: 25778, comm: umount 3.10.0-1160.90.1.el7_lustre.pl1.x86_64 #1 SMP Tue Jun 20 15:47:49 PDT 2023

Message from syslogd@fir-md1-s2 at Jul  5 19:37:27 ...
 kernel:LustreError: 25778:0:(lquota_entry.c:135:lqe_iter_cb()) ASSERTION( lqe->u.se.lse_pending_write == 0 ) failed: 

Message from syslogd@fir-md1-s2 at Jul  5 19:37:27 ...
 kernel:LustreError: 25778:0:(lquota_entry.c:135:lqe_iter_cb()) LBUG
Jul 05 19:37:27 fir-md1-s2 kernel: Call Trace:
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] libcfs_call_trace+0x90/0xf0 [libcfs]
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] lbug_with_loc+0x4c/0xa0 [libcfs]
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] lqe_iter_cb+0x153/0x160 [lquota]
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] cfs_hash_for_each_tight+0x11e/0x340 [libcfs]
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] cfs_hash_for_each_safe+0x13/0x20 [libcfs]
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] lquota_site_free+0x104/0x300 [lquota]
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] qsd_qtype_fini+0x8c/0x510 [lquota]
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] qsd_fini+0x130/0x500 [lquota]
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] osd_shutdown+0x66/0x300 [osd_ldiskfs]
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] osd_process_config+0x277/0x380 [osd_ldiskfs]
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] lod_process_config+0x270/0x1340 [lod]
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] mdd_process_config+0x146/0x640 [mdd]
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] mdt_stack_fini+0x2c2/0xd10 [mdt]
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] mdt_device_fini+0x34b/0x9a0 [mdt]
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] class_cleanup+0xa61/0xd20 [obdclass]
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] class_process_config+0x537/0x2670 [obdclass]
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] class_manual_cleanup+0x1c6/0x760 [obdclass]
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] server_put_super+0xa25/0xf80 [obdclass]
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] generic_shutdown_super+0x6d/0x100
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] kill_anon_super+0x12/0x20
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] lustre_kill_super+0x2b/0x40 [lustre]
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] deactivate_locked_super+0x56/0x70
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] deactivate_super+0x4a/0x70
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] cleanup_mnt+0x3f/0x90
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] __cleanup_mnt+0x12/0x20
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] task_work_run+0xbb/0xe0
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] do_notify_resume+0xad/0xd0
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] int_signal+0x12/0x17
Jul 05 19:37:27 fir-md1-s2 kernel: [<0>] 0xfffffffffffffffe
Jul 05 19:37:27 fir-md1-s2 kernel: Kernel panic - not syncing: LBUG

Message from syslogd@fir-md1-s2 at Jul  5 19:37:27 ...
 kernel:Kernel panic - not syncing: LBUG
Comment by Gerrit Updater [ 06/Jul/23 ]

"Stephane Thiell <sthiell@stanford.edu>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51588
Subject: LU-15880 quota: fix issues in reserving quota
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: a1facc7c92d6fe12960d9b436f87c41234964041

Comment by Gerrit Updater [ 30/Nov/23 ]

"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53290
Subject: LU-15880 quota: fix insane grant quota
Project: fs/lustre-release
Branch: b2_15
Current Patch Set: 1
Commit: e65090f72b716d3aff4f45892623b0f38c7baeb0

Generated at Sat Feb 10 03:22:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.