[LU-10588] lfsck generates "kernel: list passed to list_sort() too long for efficiency" Created: 30/Jan/18  Updated: 13/Feb/19  Resolved: 13/Feb/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Trivial
Reporter: Stephane Thiell Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

3.10.0-693.2.2.el7_lustre.pl1.x86_64


Rank (Obsolete): 9223372036854775807

 Description   

Just wanted to report these kernel messages on the MDS seen shortly after having started lfsck_namespace and oi_scrub. I know they clearly lack some information but seem be related to lfsck.

Jan 30 14:11:36 oak-md1-s2 kernel: list passed to list_sort() too long for efficiency

Stephane



 Comments   
Comment by Andreas Dilger [ 31/Jan/18 ]

Stephane,
There is no direct caller for list_sort in the Lustre tree, not any use of "sort" in LFSCK at all, so there is no way to know what is causing this message.

One option would be to change, list_sort() to call WARN_ONCE(), like:

        if (lev > max_lev) {
                if (WARN_ONCE(lev >= AREAY_SIZE(part) - 1,
                              "list too long for efficiency (%d >= %d)\n", lev,
                              MAX_LIST_LENGTH_BITS))
                         lev--;
                max_lev = lev;
        }

This will dump the stack trace of the thread that hits this problem.

Comment by Stephane Thiell [ 31/Jan/18 ]

Ah thanks much Andreas! Will do when possible and update the ticket if I got something. This might take some time though.

Stephane

Comment by Peter Jones [ 21/Mar/18 ]

Fan Yong

Anything else to suggest here?

Peter

Comment by nasf (Inactive) [ 23/Mar/18 ]

In Lustre, the unique caller of list_sort() is __ldiskfs_es_shrink() that will be called by system under the case of system RAM pressure to release some RAM from extent status tree.

Currently, only with the message "Jan 30 14:11:36 oak-md1-s2 kernel: list passed to list_sort() too long for efficiency", I can NOT say whether it is the __ldiskfs_es_shrink() triggered such message or not.

But by default, Lustre only enables file extents for OST, NOT for MDT. But in this case, the message was printed on the MDS. That is strange. Means even if LFSCK caused too many inodes cached (that is controlled by system), the extent status tree on the MDT would be almost empty and should not cause list_sort() warning.

Comment by nasf (Inactive) [ 23/Mar/18 ]

Anyway, we need the stack trace to know who triggered the list_sort() warning.

sthiell,

Have you got the stack trace for the list_sort() sponsor? Have you checked the RAM usage on MDT when list_sort() trouble happened?

Comment by Stephane Thiell [ 23/Mar/18 ]

Hi nasf,

Thanks for this useful information! We're indeed using ldiskfs. But this is currently low priority for us and as we're testing a patched kernel already for another important issue, we decided not to include any other change for now. Reproducibility, you know...

When things get settled for us with 2.10.x, I'll patch the kernel to get more debugging from this warning as suggested by Andreas.

Thanks again.

Stephane

Comment by Andreas Dilger [ 13/Feb/19 ]

Closing this old issue since there is no information, please re-open if more information becomes available.

Generated at Sat Feb 10 02:36:25 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.