Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
None
-
3
-
9223372036854775807
Description
When I try to mount lustre with GSS (SSK) enabled I receive checksum errors using a multi-rail client where I do not when using only a single interface. My guess is the NID is encoded in the checksum though I haven't dug into the cause yet. I also had lots of errors when using GSS on multi-rail servers although the errors were different.
[154311.786639] LustreError: 194908:0:(gss_sk_mech.c:388:sk_verify_hmac()) checksum mismatch [154311.798154] LustreError: 194908:0:(sec_gss.c:242:gss_verify_msg()) mic verify error: 00060000 [154311.810015] LustreError: 194908:0:(sec_gss.c:2125:gss_svc_verify_request()) failed to verify request: 60000
When I had the servers coming up with multirail they were even failing with GSS. So I had moved it to one interface and it came up fine. Then adding the client with multirail fails with the checksum error but with a single interface works fine.
I'm pretty sure the arp settings Whamcloud keeps recommending is wrong. I went through this with Amir a few months ago but arp_filter and rp_filter should be set to 1 for mutli-rail to function correctly. In every other case it was intermittent.
Is GSS and multirail actually being tested together? I was somewhat assuming when I filed this that they were only being tested independently. If I get a chance this week I'll try to dig into it more to get to the bottom of what's happening.