[LU-13049] LNet: Handle shutdown properly Created: 04/Dec/19  Updated: 20/Oct/22  Resolved: 28/Jan/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.14.0

Type: Bug Priority: Minor
Reporter: Amir Shehata (Inactive) Assignee: Amir Shehata (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

There are two code paths which can lookup a peer while the system is shutting down:

  1. net_shutdown_lndnet()>lnet_peer_tables_cleanup()>lnet_peer_table_del_rtrs_locked()->lnet_del_route()
  2. lnet_mt_event_handler()>lnet_handle_recovery_reply()>lnet_find_peer_ni_locked()

In both of these cases the_lnet.ln_state might be shutting down. Currently lnet_get_peer_ni_locked() asserts on ln_stat == LNET_STATE_RUNNING.

This should be handled gracefully. If the state isn't running, then it should return NULL. Callers of the function handle NULL return code.



 Comments   
Comment by Gerrit Updater [ 04/Dec/19 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36925
Subject: LU-13049 lnet: peer lookup handle shutdown
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b38b0b827894a6d30002edd637c6b22180c9ae50

Comment by Gerrit Updater [ 28/Jan/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36925/
Subject: LU-13049 lnet: peer lookup handle shutdown
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f46b22aa6a284773328d91071a2b33ec7db1f9d1

Comment by Peter Jones [ 28/Jan/20 ]

Landed for 2.14

Generated at Sat Feb 10 02:57:55 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.