Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.10.0, Lustre 2.10.1, Lustre 2.11.0
-
None
-
3
-
9223372036854775807
Description
Running the (almost) latest version of b2_10 (see LU-9983 for details), seeing quite a few of these on the MDS console:
/scratch/logs/syslog/soak-8.log:Oct 1 22:37:25 soak-8 kernel: LustreError: 8097:0:(mdt_lvb.c:163:mdt_lvbo_fill()) soaked-MDT0000: expected 944 actual 416. /scratch/logs/syslog/soak-9.log:Oct 1 22:42:25 soak-9 kernel: LustreError: 2165:0:(mdt_lvb.c:163:mdt_lvbo_fill()) Skipped 6 previous similar messages /scratch/logs/syslog/soak-9.log:Oct 1 22:42:25 soak-9 kernel: LustreError: 2165:0:(mdt_lvb.c:163:mdt_lvbo_fill()) soaked-MDT0001: expected 872 actual 416. /scratch/logs/syslog/soak-10.log:Oct 1 22:42:26 soak-10 kernel: LustreError: 2401:0:(mdt_lvb.c:163:mdt_lvbo_fill()) Skipped 10 previous similar messages /scratch/logs/syslog/soak-10.log:Oct 1 22:42:26 soak-10 kernel: LustreError: 2401:0:(mdt_lvb.c:163:mdt_lvbo_fill()) soaked-MDT0002: expected 872 actual 416. /scratch/logs/syslog/soak-10.log:Oct 1 22:42:26 soak-10 kernel: LustreError: 4181:0:(mdt_lvb.c:163:mdt_lvbo_fill()) soaked-MDT0002: expected 872 actual 416. /scratch/logs/syslog/soak-10.log:Oct 1 22:44:04 soak-10 kernel: LustreError: 2351:0:(mdt_lvb.c:163:mdt_lvbo_fill()) soaked-MDT0002: expected 872 actual 416. /scratch/logs/syslog/soak-9.log:Oct 1 22:44:04 soak-9 kernel: LustreError: 2351:0:(mdt_lvb.c:163:mdt_lvbo_fill()) soaked-MDT0001: expected 848 actual 416. /scratch/logs/syslog/soak-10.log:Oct 1 22:57:27 soak-10 kernel: LustreError: 4296:0:(mdt_lvb.c:163:mdt_lvbo_fill()) Skipped 8 previous similar messages /scratch/logs/syslog/soak-10.log:Oct 1 22:57:27 soak-10 kernel: LustreError: 4296:0:(mdt_lvb.c:163:mdt_lvbo_fill()) soaked-MDT0002: expected 872 actual 416. /scratch/logs/syslog/soak-9.log:Oct 1 22:57:27 soak-9 kernel: LustreError: 2329:0:(mdt_lvb.c:163:mdt_lvbo_fill()) Skipped 9 previous similar messages /scratch/logs/syslog/soak-9.log:Oct 1 22:57:27 soak-9 kernel: LustreError: 2329:0:(mdt_lvb.c:163:mdt_lvbo_fill()) soaked-MDT0001: expected 800 actual 416. /scratch/logs/syslog/soak-9.log:Oct 1 22:59:06 soak-9 kernel: LustreError: 2357:0:(mdt_lvb.c:163:mdt_lvbo_fill()) soaked-MDT0001: expected 776 actual 416.
I don't think we need to change ldlm_handle_enqueue() for this. This problem occurs in two cases:
1) mdt_max_mdsize is smaller than the layout size and client pack request with not enough size, in that case there will be resend with bigger buffer. This is how that code in mdt_lvbo_fill() is intended to work originally. I think this case don't need to be fixed, it causes such messages quite rare if mdt_max_mdsize is not synced on server and client.
2) mdt_max_mdsize is already big enough and client knows it. But mdt_intent_layout() pack reply buffer with smaller size. It is not about max_mdsize on client and server at all, it is just wrong size packed because it uses current EA size of file which will be updated to the new EA, so this packed size is wrong from the beginning in most cases. And exactly this case produced a lot of messages in log, because it happens each time with bigger EA size than packed.
Patch solves case 2) by setting reply size to max_mdsize if layout is going to be updated and shrinking it later. This is better than intercepting that in ldlm_handle_enqueue0() and expanding buffer because expanding is more expensive operation then shrinking, the shrinking is part of every reply processing now while expanding is an exception for rare cases.