Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.14.0, Lustre 2.15.0
-
3
-
9223372036854775807
Description
Removing widely overstriped files from an ldiskfs MDT causes excessively many transaction credits to be reserved. This can be seen in the MDS console logs:
Lustre: DEBUG MARKER: == sanity test 130g: FIEMAP (overstripe file) ======== Lustre: 25401:0:(osd_handler.c:1934:osd_trans_start()) lustre-MDT0000: credits 54595 > trans_max 2592 Lustre: 25401:0:(osd_handler.c:1863:osd_trans_dump_creds()) create: 800/6400/0, destroy: 1/4/0 Lustre: 25401:0:(osd_handler.c:1870:osd_trans_dump_creds()) attr_set: 3/3/0, xattr_set: 804/148/0 Lustre: 25401:0:(osd_handler.c:1880:osd_trans_dump_creds()) write: 4001/34410/0, punch: 0/0/0, quota 6/6/0 Lustre: 25401:0:(osd_handler.c:1887:osd_trans_dump_creds()) insert: 801/13616/0, delete: 2/5/0 Lustre: 25401:0:(osd_handler.c:1894:osd_trans_dump_creds()) ref_add: 1/1/0, ref_del: 2/2/0 Pid: 25401, comm: mdt00_004 3.10.0-1160.36.2.el7_lustre.x86_64 #1 SMP Tue Aug 3 23:03:31 UTC 2021 Call Trace: libcfs_call_trace+0x90/0xf0 [libcfs] libcfs_debug_dumpstack+0x26/0x30 [libcfs] osd_trans_start+0x4bb/0x4e0 [osd_ldiskfs] top_trans_start+0x702/0x940 [ptlrpc] lod_trans_start+0x34/0x40 [lod] mdd_trans_start+0x1a/0x20 [mdd] mdd_unlink+0x4ee/0xae0 [mdd] mdo_unlink+0x1b/0x1d [mdt] mdt_reint_unlink+0xb64/0x1890 [mdt] mdt_reint_rec+0x83/0x210 [mdt] mdt_reint_internal+0x720/0xaf0 [mdt] mdt_reint+0x67/0x140 [mdt] tgt_request_handle+0x7ea/0x1750 [ptlrpc] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc] ptlrpc_main+0xb3c/0x14e0 [ptlrpc] Lustre: 25401:0:(osd_internal.h:1325:osd_trans_exec_op()) lustre-MDT0000: opcode 7: before 2589 < left 34410, rollback = 7
and
Lustre: DEBUG MARKER: == sanity test 27Cd: test maximum stripe count ======== Lustre: 12686:0:(osd_handler.c:1934:osd_trans_start()) lustre-MDT0003: credits 136195 > trans_max 2592 Lustre: 12686:0:(osd_handler.c:1863:osd_trans_dump_creds()) create: 2000/16000/0, destroy: 1/4/0 Lustre: 12686:0:(osd_handler.c:1870:osd_trans_dump_creds()) attr_set: 3/3/0, xattr_set: 2004/148/0 Lustre: 12686:0:(osd_handler.c:1880:osd_trans_dump_creds()) write: 10001/86010/0, punch: 0/0/0, quota 6/6/0 Lustre: 12686:0:(osd_handler.c:1887:osd_trans_dump_creds()) insert: 2001/34016/0, delete: 2/5/0 Lustre: 12686:0:(osd_handler.c:1894:osd_trans_dump_creds()) ref_add: 1/1/0, ref_del: 2/2/0 Pid: 12686, comm: mdt00_000 3.10.0-1160.36.2.el7_lustre.x86_64 #1 SMP Tue Aug 3 23:03:31 UTC 2021 Call Trace: libcfs_call_trace+0x90/0xf0 [libcfs] libcfs_debug_dumpstack+0x26/0x30 [libcfs] osd_trans_start+0x4bb/0x4e0 [osd_ldiskfs] top_trans_start+0x702/0x940 [ptlrpc] lod_trans_start+0x34/0x40 [lod] mdd_trans_start+0x1a/0x20 [mdd] mdd_unlink+0x4ee/0xae0 [mdd] mdo_unlink+0x1b/0x1d [mdt] mdt_reint_unlink+0xb64/0x1890 [mdt] mdt_reint_rec+0x83/0x210 [mdt] mdt_reint_internal+0x720/0xaf0 [mdt] mdt_reint+0x67/0x140 [mdt] tgt_request_handle+0x7ea/0x1750 [ptlrpc] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc] ptlrpc_main+0xb3c/0x14e0 [ptlrpc]
and similarly in sanity test_130e, sanity-pfl test_0b, test_1c, always during unlink.
The two examples shown are trying to reserve a whopping 213MiB and 532MiB of journal space, respectively. Since the maximum xattr size for an overstriped file is 64KiB, this is pretty excessive.