[LU-15076] Add locking to ksocknal_find_timed_out_conn() for safe ksnc_tx_queue list processing Created: 11/Oct/21  Updated: 23/Aug/22  Resolved: 27/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Critical
Reporter: Artem Blagodarenko (Inactive) Assignee: Artem Blagodarenko (Inactive)
Resolution: Fixed Votes: 0
Labels: patch

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

GFP happened in this peice of code from ksocknal_find_timed_out_conn():

 if ((!list_empty(&conn->ksnc_tx_queue) ||
                     conn->ksnc_sock->sk->sk_wmem_queued != 0) &&
                    ktime_get_seconds() >= conn->ksnc_tx_deadline) {
                        /* Timed out messages queued for sending or
                         * buffered in the socket's send buffer */
                        ksocknal_conn_addref(conn);
                        list_for_each_entry(tx, &conn->ksnc_tx_queue,
                                            tx_list)
                                tx->tx_hstatus = <----GFP here
                                        LNET_MSG_STATUS_LOCAL_TIMEOUT; \{code}


It looks like ksnc_tx_queue processing requires some additional locking.
For instance, like it done in ksocknal_write_callback():

spin_lock_bh(&sched->kss_lock);
...
if (!conn->ksnc_tx_scheduled && /* not being progressed */
            !list_empty(&conn->ksnc_tx_queue)) { /* packets to send */
                list_add_tail(&conn->ksnc_tx_list, &sched->kss_tx_conns);
...
spin_unlock_bh(&sched->kss_lock);




 Comments   
Comment by Gerrit Updater [ 11/Oct/21 ]

"Artem Blagodarenko <artem.blagodarenko@hpe.com>" uploaded a new patch: https://review.whamcloud.com/45179
Subject: LU-15076 socklnd: lock ksnc_tx_queue list processing
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7d2db40dbdfcfab6800b3cec4718ae62abec18c1

Comment by Gerrit Updater [ 27/Oct/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45179/
Subject: LU-15076 socklnd: lock ksnc_tx_queue list processing
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 13c7c2e3c248c8cdba4853852bfaecceb7a75afe

Comment by Peter Jones [ 27/Oct/21 ]

Landed for 2.15

Generated at Sat Feb 10 03:15:14 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.