[LU-14069] OBD_FAIL_LDLM_CANCEL_BL_CB_RACE is buggy in ldlm_handle_cp_callback Created: 23/Oct/20  Updated: 03/Nov/20  Resolved: 29/Oct/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0, Lustre 2.12.6

Type: Bug Priority: Minor
Reporter: Oleg Drokin Assignee: Oleg Drokin
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-13692 MDS slow/hung threads at mdt_object_l... Resolved
is related to LU-11300 LNet: Router Aliveness and Health Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

There's this code in ldlm_handle_cp_callback:

        if (OBD_FAIL_CHECK(OBD_FAIL_LDLM_CANCEL_BL_CB_RACE)) {
                long to = cfs_time_seconds(1);

                ldlm_callback_reply(req, 0);

                while (to > 0) {
                        schedule_timeout_interruptible(to);
                        if (ldlm_is_granted(lock) ||
                            ldlm_is_destroyed(lock))
                                break;
                }
        }

This looks like it was supposed to be a time-bound wait and indeed looking at when it was introduced (commit 022b1022, bz 11300) it has the to assigned from schedule_timeout.

This got broken by commit adde80ff which is some squashed head commit and lost the to assignment.

Not this seems to be breaking LU-13692 patch



 Comments   
Comment by Gerrit Updater [ 23/Oct/20 ]

Oleg Drokin (green@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40375
Subject: LU-14069 ldlm: Fix unbounded OBD_FAIL_LDLM_CANCEL_BL_CB_RACE wait
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 149269158a79aa568ad2cada58b4f4e6b91a273b

Comment by Gerrit Updater [ 27/Oct/20 ]

Oleg Drokin (green@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40411
Subject: LU-14069 ldlm: Fix unbounded OBD_FAIL_LDLM_CANCEL_BL_CB_RACE wait
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 238056ea3a7a48b676b46b7dd93d5708df62f953

Comment by Gerrit Updater [ 29/Oct/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40375/
Subject: LU-14069 ldlm: Fix unbounded OBD_FAIL_LDLM_CANCEL_BL_CB_RACE wait
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 5da99051e58b9e9079b66a275d6c47e1e109eee5

Comment by Peter Jones [ 29/Oct/20 ]

Landed for 2.14

Comment by Gerrit Updater [ 03/Nov/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40411/
Subject: LU-14069 ldlm: Fix unbounded OBD_FAIL_LDLM_CANCEL_BL_CB_RACE wait
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 5026a42944c2219d2a5d8f2692670dcc2727eda2

Generated at Sat Feb 10 03:06:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.