[LU-10937] Use sysfs to fix up sptlrpc handling. Created: 21/Apr/18  Updated: 16/Jan/19

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: James A Simmons Assignee: James A Simmons
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Lustre with sptlrpc/gss enabled.


Issue Links:
Related
is related to LU-9567 sptlrpc rules are not being updated Resolved
is related to LU-9034 Separate the config logs between diff... Resolved
is related to LU-9086 obd_config.c:1258:class_process_confi... Resolved
is related to LU-7183 lctl set_param -P does not work for s... Closed
is related to LU-7004 fix "lctl set_param -P" to allow depr... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Several problems exist for the GSS / sptlrpc code. The problems are:

1) lctl set_param  P does not work with sptlrpc  LU-7183

2) Specific mgs bingings for sptlrpc is broken. LU-9034 / LU-9086 / LU-9567

    With the move to kobjects with sysfs we could the kobject instead.

3) After the move to sysfs we can use udev events instead of polling the proc files like what is now done in for example svgssd.

 



 Comments   
Comment by Gerrit Updater [ 06/Oct/18 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33311
Subject: LU-10937 mgc: restore mgc binding for sptlrpc
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2347c82412b29c0777f5907551164d5cc27b80d4

Comment by James A Simmons [ 06/Oct/18 ]

Developers from Cray reported to me that the band aid fix landed for LU-9567 actually can cause an MSG server failover to crash the node. So I looked this after noon and figured out how to final resolve LU-9034. This will be needed for the LTS release.

Comment by Gerrit Updater [ 13/Nov/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33311/
Subject: LU-10937 mgc: restore mgc binding for sptlrpc
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ca9300e53dc2b7bcaaa5482bb4234cce7d9a344e

Comment by James A Simmons [ 26/Nov/18 ]

I started to do a code review and I have noticed a bunch of potential bugs in general. This is just for the lctl conf_param case. The first question would be are the targets, the first *. field, just the following types:

lctl conf_param lustre.srpc.****

lctl conf_param lustre-OST0000.srpc.***

lctl conf_param lustre-MDT0000.srpc.***

for example.

If that is the case we can simply replace obdname2fsname() with server_name2fsname(). If server_name2fsname() returns an error then we know the target is a file system.

Next bug I noticed is that if we supply an obd device as a target plus a direction we don't validate the direction. The following should fail but doesn't

lctl conf_param lustre-MDT0000.srpc.flavor.default.cli2ost=skpi.

Lastly I noticed the network type supplied can easily break. Consider the case of a file system named test that are Cray clients and you have routers in between that convert to o2ib1 with infiniband storage backend. So if you do

lctl conf_param test.srpc.flavor.o2ib1=skpi

Does this filter so only the server back end is updated to skpi? I noticed we don't really test what LNet network interface is in use when setting the rule. Is it valid do a partial setup in this case?

 

Comment by Gerrit Updater [ 30/Nov/18 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/33760
Subject: LU-10937 sptlrpc: split sptlrpc_process_config()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 62c61bb1c4692f1c608047210befd9cb9dedc449

Comment by Gerrit Updater [ 16/Jan/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33760/
Subject: LU-10937 sptlrpc: split sptlrpc_process_config()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0ff7d548eb7bf0c04836c7d6809cac163e0ffc2c

Generated at Sat Feb 10 02:39:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.