[LU-10588] lfsck generates "kernel: list passed to list_sort() too long for efficiency" Created: 30/Jan/18 Updated: 13/Feb/19 Resolved: 13/Feb/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Trivial |
| Reporter: | Stephane Thiell | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
3.10.0-693.2.2.el7_lustre.pl1.x86_64 |
||
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Just wanted to report these kernel messages on the MDS seen shortly after having started lfsck_namespace and oi_scrub. I know they clearly lack some information but seem be related to lfsck. Jan 30 14:11:36 oak-md1-s2 kernel: list passed to list_sort() too long for efficiency Stephane |
| Comments |
| Comment by Andreas Dilger [ 31/Jan/18 ] |
|
Stephane, One option would be to change, list_sort() to call WARN_ONCE(), like: if (lev > max_lev) { if (WARN_ONCE(lev >= AREAY_SIZE(part) - 1, "list too long for efficiency (%d >= %d)\n", lev, MAX_LIST_LENGTH_BITS)) lev--; max_lev = lev; } This will dump the stack trace of the thread that hits this problem. |
| Comment by Stephane Thiell [ 31/Jan/18 ] |
|
Ah thanks much Andreas! Will do when possible and update the ticket if I got something. This might take some time though. Stephane |
| Comment by Peter Jones [ 21/Mar/18 ] |
|
Fan Yong Anything else to suggest here? Peter |
| Comment by nasf (Inactive) [ 23/Mar/18 ] |
|
In Lustre, the unique caller of list_sort() is __ldiskfs_es_shrink() that will be called by system under the case of system RAM pressure to release some RAM from extent status tree. Currently, only with the message "Jan 30 14:11:36 oak-md1-s2 kernel: list passed to list_sort() too long for efficiency", I can NOT say whether it is the __ldiskfs_es_shrink() triggered such message or not. But by default, Lustre only enables file extents for OST, NOT for MDT. But in this case, the message was printed on the MDS. That is strange. Means even if LFSCK caused too many inodes cached (that is controlled by system), the extent status tree on the MDT would be almost empty and should not cause list_sort() warning. |
| Comment by nasf (Inactive) [ 23/Mar/18 ] |
|
Anyway, we need the stack trace to know who triggered the list_sort() warning. Have you got the stack trace for the list_sort() sponsor? Have you checked the RAM usage on MDT when list_sort() trouble happened? |
| Comment by Stephane Thiell [ 23/Mar/18 ] |
|
Hi nasf, Thanks for this useful information! We're indeed using ldiskfs. But this is currently low priority for us and as we're testing a patched kernel already for another important issue, we decided not to include any other change for now. Reproducibility, you know... When things get settled for us with 2.10.x, I'll patch the kernel to get more debugging from this warning as suggested by Andreas. Thanks again. Stephane |
| Comment by Andreas Dilger [ 13/Feb/19 ] |
|
Closing this old issue since there is no information, please re-open if more information becomes available. |