[LU-9567] sptlrpc rules are not being updated Created: 27/May/17  Updated: 05/Oct/18  Resolved: 19/Jun/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.0
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Critical
Reporter: James A Simmons Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: patch

Attachments: Text File dump-gss-broke-conf_param.log     Text File dump-gss-works.log    
Issue Links:
Blocker
is blocking LU-7854 sanity-gss test_1 fails with 'chmod /... Resolved
Related
is related to LU-9034 Separate the config logs between diff... Resolved
is related to LU-10937 Use sysfs to fix up sptlrpc handling. Open
is related to LU-9073 SSK: lgss_sk generates keys with inva... Resolved
is related to LU-9823 LNet fails to come up when using lctl... Resolved
is related to LU-9086 obd_config.c:1258:class_process_confi... Resolved
is related to LU-7183 lctl set_param -P does not work for s... Closed
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

With the landing of LU-9034 proper sptlrpc handling ended up broken. It was partially fixed with LU-9086 but the currently the sptlrpc rules are not being updated on the clients. I have collected logs and it appears the import lice cycle has changed. Before LU-9034 landed we observe the
following:

import_select_connection -> sptlrpc_import_sec_adapt -> sptlrpc_sec_create

Now this no longer happens.



 Comments   
Comment by Gerrit Updater [ 27/May/17 ]

James Simmons (uja.ornl@yahoo.com) uploaded a new patch: https://review.whamcloud.com/27320
Subject: LU-9567 mgc: revert LU-9086 and LU-9034 work
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: deb326af1e19f44c955abe66b320c3a02bbd09ab

Comment by Peter Jones [ 31/May/17 ]

Hongchao

Can you please advise on this one?

Thanks

Peter

Comment by Peter Jones [ 31/May/17 ]

James

Does this also address LU-9073?

Peter

Comment by Hongchao Zhang [ 01/Jun/17 ]

Hi James,

Did you use the same "PTLDEBUG" parameter in the two tests? I have tested it by adding "sec" and "trace" to "PTLDEBUG" and
found the call sequence is still "import_select_connection -> sptlrpc_import_sec_adapt -> sptlrpc_sec_create"

PTLDEBUG=${PTLDEBUG:-"sec trace vfstrace rpctrace dlmtrace neterror ha config \
                      ioctl super lfsck"}
00000020:00000001:1.0:1492623939.848913:0:27769:0:(lustre_handles.c:156:class_handle2object()) Process entered
00000020:00000001:1.0:1492623939.848914:0:27769:0:(lustre_handles.c:179:class_handle2object()) Process leaving (rc=18446612132762781696 : -131940946769920 : ffff88001abc8800)
00000020:00000001:1.0:1492623939.848915:0:27769:0:(genops.c:725:class_conn2export()) Process leaving (rc=18446612132762781696 : -131940946769920 : ffff88001abc8800)
00000100:00000001:1.0:1492623939.848916:0:27769:0:(import.c:657:ptlrpc_connect_import()) Process entered
00000100:00080000:1.0:1492623939.848917:0:27769:0:(import.c:675:ptlrpc_connect_import()) ffff880071c53000 MGS: changing import state from NEW to CONNECTING
00000100:00000001:1.0:1492623939.848918:0:27769:0:(import.c:504:import_select_connection()) Process entered
00000100:00080000:1.0:1492623939.848919:0:27769:0:(import.c:519:import_select_connection()) MGC10.211.55.9@tcp: connect to NID 0@lo last attempt 0
00000100:00000001:1.0:1492623939.848920:0:27769:0:(connection.c:127:ptlrpc_connection_addref()) Process entered
00000100:00000001:1.0:1492623939.848921:0:27769:0:(connection.c:134:ptlrpc_connection_addref()) Process leaving (rc=18446612133249874752 : -131940459676864 : ffff880037c4fb40)
00000020:00000001:1.0:1492623939.848921:0:27769:0:(genops.c:711:class_conn2export()) Process entered
00000020:00000001:1.0:1492623939.848921:0:27769:0:(lustre_handles.c:156:class_handle2object()) Process entered
00000020:00000001:1.0:1492623939.848922:0:27769:0:(lustre_handles.c:179:class_handle2object()) Process leaving (rc=18446612132762781696 : -131940946769920 : ffff88001abc8800)
00000020:00000001:1.0:1492623939.848922:0:27769:0:(genops.c:725:class_conn2export()) Process leaving (rc=18446612132762781696 : -131940946769920 : ffff88001abc8800)
00000100:00000001:1.0:1492623939.848923:0:27769:0:(connection.c:127:ptlrpc_connection_addref()) Process entered
00000100:00000001:1.0:1492623939.848923:0:27769:0:(connection.c:134:ptlrpc_connection_addref()) Process leaving (rc=18446612133249874752 : -131940459676864 : ffff880037c4fb40)
00000100:00080000:1.0:1492623939.848924:0:27769:0:(import.c:597:import_select_connection()) MGC10.211.55.9@tcp: import ffff880071c53000 using connection MGC10.211.55.9@tcp_0/0@lo
00000100:00000001:1.0:1492623939.848925:0:27769:0:(import.c:601:import_select_connection()) Process leaving (rc=0 : 0 : 0)
02000000:00000001:1.0:1492623939.848926:0:27769:0:(sec.c:1434:sptlrpc_import_sec_adapt()) Process entered
02000000:00000001:1.0:1492623939.848927:0:27769:0:(sec.c:1331:sptlrpc_sec_create()) Process entered
02000000:08000000:1.0:1492623939.848927:0:27769:0:(sec.c:1349:sptlrpc_sec_create()) mgc MGC10.211.55.9@tcp: select security flavor null
02000000:00000001:1.0:1492623939.848929:0:27769:0:(sec.c:1370:sptlrpc_sec_create()) Process leaving (rc=18446744072108090944 : -1601460672 : ffffffffa08ba640)
02000000:00000001:1.0:1492623939.848930:0:27769:0:(sec.c:1505:sptlrpc_import_sec_adapt()) Process leaving (rc=0 : 0 : 0)
00000100:00000001:1.0:1492623939.848930:0:27769:0:(obd_class.h:848:obd_reconnect()) Process entered
00000100:00000001:1.0:1492623939.848931:0:27769:0:(obd_class.h:851:obd_reconnect()) Process leaving (rc=0 : 0 : 0)
00000100:00000001:1.0:1492623939.848935:0:27769:0:(client.c:700:ptlrpc_request_bufs_pack()) Process entered
02000000:00000001:1.0:1492623939.848936:0:27769:0:(sec.c:434:sptlrpc_req_get_ctx()) Process entered
02000000:00000001:1.0:1492623939.848937:0:27769:0:(sec.c:452:sptlrpc_req_get_ctx()) Process leaving (rc=0 : 0 : 0)
00000100:00000001:1.0:1492623939.848941:0:27769:0:(client.c:774:ptlrpc_request_bufs_pack()) Process leaving (rc=0 : 0 : 0)

Could you please look at it?
Thanks!

Comment by Sebastien Buisson (Inactive) [ 07/Jun/17 ]

Hi,

With this bug I confirm that sptlrpc rules are not applied, which means it is not possible to activate Kerberos or Shared Keys for a Lustre file system anymore.
I would tend to consider this issue as a blocker for 2.10.

Cheers,
Sebastien.

Comment by James A Simmons [ 07/Jun/17 ]

Currently we have patch https://review.whamcloud.com/#/c/27320 to work around this issue.

Comment by Sebastien Buisson (Inactive) [ 07/Jun/17 ]

Then I am all for landing https://review.whamcloud.com/27320
AFAICS, this patch is not a proper revert of previous patches, but a fix that deactivates some effects of them.

Comment by Gerrit Updater [ 19/Jun/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27320/
Subject: LU-9567 mgc: set cfg_instance to NULL for sptlrpc case
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: fb9a8a991f5c9e8649053e86b3147a99aaee9f84

Comment by Peter Jones [ 19/Jun/17 ]

Landed for 2.10

Generated at Sat Feb 10 02:27:20 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.