[LU-11928] Noisy "mdt_attr_valid_xlate()) Unknown attr bits: 0x60000" Created: 05/Feb/19  Updated: 25/Nov/19  Resolved: 06/Mar/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0, Lustre 2.10.5
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Aurelien Degremont (Inactive) Assignee: Aurelien Degremont (Inactive)
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-12021 Error message of mdt_attr_valid_xlate... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

When testing 2.12.0 clients against 2.10.5 servers, the server log was filled up by this kind of message: 

LustreError: 14950:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000
LustreError: 14950:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 3853 previous similar messages

 

2.12 clients are sending RPC with LSOM attributes, matching this 0x60000, that 2.10 servers do not understand. The error looks more to me like a warning as the server will simply ignore the LSOM attributes.

static __u64 mdt_attr_valid_xlate(__u64 in, struct mdt_reint_record *rr,
                                  struct md_attr *ma)
{

            ...

            in &= ~(MDS_ATTR_MODE | MDS_ATTR_UID | MDS_ATTR_GID | MDS_ATTR_PROJID |
                MDS_ATTR_ATIME | MDS_ATTR_MTIME | MDS_ATTR_CTIME |
                MDS_ATTR_ATIME_SET | MDS_ATTR_CTIME_SET | MDS_ATTR_MTIME_SET |
                MDS_ATTR_SIZE | MDS_ATTR_BLOCKS | MDS_ATTR_ATTR_FLAG |
                MDS_ATTR_FORCE | MDS_ATTR_KILL_SUID | MDS_ATTR_KILL_SGID |
                MDS_ATTR_FROM_OPEN | MDS_OPEN_OWNEROVERRIDE);
            if (in != 0)
                CERROR("Unknown attr bits: %#llx\n", in);
            return out;
}

 

I think we should make that at least a warning, of even maybe a debug.

What do you think? I can send a patch for that, based on your preference.

 



 Comments   
Comment by Andreas Dilger [ 05/Feb/19 ]

I agree that this shouldn't be spewing on the console. I think there are a few options, possibly more than one is useful to add:

  • add the lazy flags to the list of known flags, so we quiet these specific errors, but are notified future errors
  • quiet the error message to not print to the console. I don't think marking it a warning is helpful
  • add an OBD_CONNECT_LSOM flag so that clients don't send these flags to older servers that don't support it
Comment by Stephane Thiell [ 05/Feb/19 ]

Just a quick note to say that we're seeing the same error messages on Lustre 2.8 servers (Regal, old system, EOL in 6 months so won't upgrade anymore) and Oak (Lustre 2.10, no problem to upgrade). The errors cannot be seen on Fir servers (Lustre 2.12, Regal's replacement). The logs started when we first upgraded our first clients to 2.12. Example on Regal's MDS:

LustreError: 6605:0:(mdt_lib.c:876:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000
LustreError: 6605:0:(mdt_lib.c:876:mdt_attr_valid_xlate()) Skipped 2012250 previous similar messages

That was only with a handful of clients, but we're upgrading all our clients (~1,400) to 2.12 today so let's hope that this won't have a significant impact on MDT performance.

Comment by Peter Jones [ 01/Mar/19 ]

degremoa do you still plan to submit a patch for this?

Comment by Aurelien Degremont (Inactive) [ 04/Mar/19 ]

Try to find some time to do them.

Andreas, does this make sense to you:

> add the lazy flags to the list of known flags, so we quiet these specific errors, but are notified future errors

Add them to the `in` list in `mdt_attr_valid_xlate()` for 2.10 LTS?

> quiet the error message to not print to the console. I don't think marking it a warning is helpful

Replace `CERROR(...)` with `CDEBUG(D_INFO, ...)` in `mdt_attr_valid_xlate()` for 2.10 LTS, 2.12 LTS and master?

> add an OBD_CONNECT_LSOM flag so that clients don't send these flags to older servers that don't support it

A patch for master and maybe a backport for 2.12 ? Does it look like a bit overkill to add a connect flag for that?

 

Comment by Andreas Dilger [ 04/Mar/19 ]

There is already a patch https://review.whamcloud.com/34343 "LU-12021 lsom: Add an OBD_CONNECT2_LSOM connect flag". This is the only way to avoid this constant error on any old MDS from a new client without adding patches to all the old releases. The connect flag can be used for a few years and dropped eventually, as we've done with other flags. It would be good to still quiet the error message on the MDS so that we don't get this problem again in the future.

Comment by Andreas Dilger [ 06/Mar/19 ]

Patch is under LU-12021 so let's use that for tracking.

Generated at Sat Feb 10 02:48:09 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.