[LU-11289] ptlrpc_service_purge_all()) ASSERTION( list_empty(&svcpt->scp_rqbd_posted) ) failed Created: 28/Aug/18  Updated: 28/Apr/21  Resolved: 14/Mar/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Major
Reporter: Oleg Drokin Assignee: Yang Sheng
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Just had this trigger in master-next but does not appear to be caused by any of the new patches.
conf-sanity test 23a

[140702.621665] Lustre: Evicted from MGS (at 192.168.123.124@tcp) after server handle changed from 0xfc2c02f3fe266c89 to 0xfc2c02f3fe26733c
[140702.900336] Lustre: lustre-MDT0000: in recovery but waiting for the first client to connect
[140703.520582] Lustre: lustre-MDT0000: Will be in recovery for at least 1:00, or until 3 clients reconnect
[140703.523277] Lustre: lustre-MDT0000: Denying connection for new client 32fa671c-9058-9e50-6bc8-7606ebe66b49(at 0@lo), waiting for 3 known clients (0 recovered, 0 in progress, and 0 evicted) to recover in -4:00
[140703.534338] LustreError: 167-0: lustre-MDT0000-lwp-MDT0001: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
[140703.537878] LustreError: 167-0: lustre-MDT0000-lwp-MDT0002: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
[140703.541355] LustreError: 167-0: lustre-MDT0000-lwp-OST0000: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
[140709.565760] LustreError: 21222:0:(lmv_obd.c:1391:lmv_statfs()) can't stat MDS #0 (lustre-MDT0000-mdc-ffff8802ffff5800), error -16
[140709.898351] LustreError: 21222:0:(lov_obd.c:831:lov_cleanup()) lustre-clilov-ffff8802ffff5800: lov tgt 0 not cleaned! deathrow=0, lovrc=1
[140710.039531] LustreError: 21222:0:(obd_mount.c:1599:lustre_fill_super()) Unable to mount  (-16)
[140713.560743] Lustre: lustre-OST0000: Not available for connect from 0@lo (stopping)
[140713.563671] Lustre: lustre-OST0000: Not available for connect from 0@lo (stopping)
[140713.564344] Lustre: lustre-OST0000: Not available for connect from 0@lo (stopping)
[140717.647627] LustreError: 21462:0:(service.c:3200:ptlrpc_service_purge_all()) ASSERTION( list_empty(&svcpt->scp_rqbd_posted) ) failed: 
[140717.661384] LustreError: 21462:0:(service.c:3200:ptlrpc_service_purge_all()) LBUG
[140717.663769] Pid: 21462, comm: umount 3.10.0-7.5-debug #1 SMP Sun Jun 3 13:35:38 EDT 2018
[140717.666057] Call Trace:
[140717.667293]  [<ffffffffa01c47dc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[140717.668701]  [<ffffffffa01c488c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[140717.670037]  [<ffffffffa0b534dd>] ptlrpc_unregister_service+0xced/0xd90 [ptlrpc]
[140717.673069]  [<ffffffffa005e122>] ost_cleanup+0x82/0x1b0 [ost]
[140717.674430]  [<ffffffffa08e0bfa>] class_free_dev+0x1ca/0x630 [obdclass]
[140717.675653]  [<ffffffffa08e1240>] class_export_put+0x1e0/0x2b0 [obdclass]
[140717.676538]  [<ffffffffa08e2cc5>] class_unlink_export+0x135/0x170 [obdclass]
[140717.677432]  [<ffffffffa08f8030>] class_decref+0x80/0x160 [obdclass]
[140717.678431]  [<ffffffffa08f8481>] class_detach+0x1b1/0x2e0 [obdclass]
[140717.681257]  [<ffffffffa08fef21>] class_process_config+0x1a91/0x2820 [obdclass]
[140717.683267]  [<ffffffffa08ffe90>] class_manual_cleanup+0x1e0/0x6d0 [obdclass]
[140717.685228]  [<ffffffffa092a115>] server_stop_servers+0xd5/0x160 [obdclass]
[140717.686384]  [<ffffffffa092f6c6>] server_put_super+0x126/0xca0 [obdclass]
[140717.687393]  [<ffffffff8121068a>] generic_shutdown_super+0x6a/0xf0
[140717.688273]  [<ffffffff81210a62>] kill_anon_super+0x12/0x20
[140717.689102]  [<ffffffffa09027e2>] lustre_kill_super+0x32/0x50 [obdclass]
[140717.689994]  [<ffffffff81210e59>] deactivate_locked_super+0x49/0x60
[140717.691242]  [<ffffffff812115a6>] deactivate_super+0x46/0x60
[140717.692270]  [<ffffffff8123019f>] cleanup_mnt+0x3f/0x80
[140717.693491]  [<ffffffff81230232>] __cleanup_mnt+0x12/0x20
[140717.694669]  [<ffffffff810ab085>] task_work_run+0xb5/0xf0
[140717.695813]  [<ffffffff8102ac12>] do_notify_resume+0x92/0xb0
[140717.697035]  [<ffffffff81783c83>] int_signal+0x12/0x17
[140717.697939]  [<ffffffffffffffff>] 0xffffffffffffffff
[140717.699135] Kernel panic - not syncing: LBUG


 Comments   
Comment by Gerrit Updater [ 08/Mar/21 ]

Yang Sheng (ys@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41936
Subject: LU-11289 ptlrpc: ASSERTION(list_empty(&svcpt->scp_rqbd_posted)
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4da0051001adfcd3c9dbf164866f4d60e8104247

Comment by Gerrit Updater [ 13/Mar/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41936/
Subject: LU-11289 ptlrpc: fix ASSERTION on scp_rqbd_posted
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b635a0435d13d8431a8344735322b84cb4613b68

Comment by Peter Jones [ 14/Mar/21 ]

Landed for 2.15

Generated at Sat Feb 10 02:42:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.