[LU-14916] Interop: sanity-pfl test 0b fails with 'Create /mnt/lustre/d0b.sanity-pfl/f0b.sanity-pfl succeeded' Created: 06/Aug/21 Updated: 25/Oct/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.15.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | interop | ||
| Environment: |
2.13.0 clients with >= 2.14.50.130 servers |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
sanity-pfl test 0b stated failing on 07 March 2021 for 2.13.0 clients and 2.14.50.130 servers; https://testing.whamcloud.com/test_sets/4bb01bda-d0c3-4e30-8cf5-1b72e5dbff7f. Looking at a recent failure at https://testing.whamcloud.com/test_sets/944ecbd7-e7f4-46de-b435-e52f50afca20, we see the following in the MDS (vm4) console log [ 283.886793] Lustre: DEBUG MARKER: == sanity-pfl test 0b: Verify comp stripe count limits =============================================== 15:58:00 (1622735880) [ 284.136777] Lustre: DEBUG MARKER: dumpe2fs -h /dev/mapper/mds1_flakey 2>&1 | [ 284.136777] grep -E -q '(ea_inode|large_xattr)' [ 284.504759] Lustre: 11026:0:(osd_handler.c:1938:osd_trans_start()) lustre-MDT0000: credits 12995 > trans_max 2592 [ 284.506763] Lustre: 11026:0:(osd_handler.c:1867:osd_trans_dump_creds()) create: 200/800/0, destroy: 1/4/0 [ 284.508514] Lustre: 11026:0:(osd_handler.c:1874:osd_trans_dump_creds()) attr_set: 3/3/0, xattr_set: 204/148/0 [ 284.510309] Lustre: 11026:0:(osd_handler.c:1884:osd_trans_dump_creds()) write: 1001/8610/0, punch: 0/0/0, quota 6/6/0 [ 284.512274] Lustre: 11026:0:(osd_handler.c:1891:osd_trans_dump_creds()) insert: 201/3416/0, delete: 2/5/0 [ 284.514033] Lustre: 11026:0:(osd_handler.c:1898:osd_trans_dump_creds()) ref_add: 1/1/0, ref_del: 2/2/0 [ 284.515721] Pid: 11026, comm: mdt00_000 4.18.0-240.22.1.el8_lustre.x86_64 #1 SMP Sun Apr 11 04:35:52 UTC 2021 [ 284.517504] Call Trace TBD: [ 284.518265] [<0>] libcfs_call_trace+0x6f/0x90 [libcfs] [ 284.519281] [<0>] osd_trans_start+0x50c/0x530 [osd_ldiskfs] [ 284.520707] [<0>] top_trans_start+0x423/0x940 [ptlrpc] [ 284.521741] [<0>] mdd_unlink+0x495/0xb20 [mdd] [ 284.522703] [<0>] mdt_reint_unlink+0xb09/0x12a0 [mdt] [ 284.523656] [<0>] mdt_reint_rec+0x11f/0x250 [mdt] [ 284.524528] [<0>] mdt_reint_internal+0x498/0x780 [mdt] [ 284.525480] [<0>] mdt_reint+0x5e/0x100 [mdt] [ 284.526315] [<0>] tgt_request_handle+0xc78/0x1910 [ptlrpc] [ 284.527355] [<0>] ptlrpc_server_handle_request+0x31a/0xba0 [ptlrpc] [ 284.528533] [<0>] ptlrpc_main+0xba2/0x14a0 [ptlrpc] [ 284.529462] [<0>] kthread+0x112/0x130 [ 284.530166] [<0>] ret_from_fork+0x35/0x40 [ 284.536315] Lustre: 11026:0:(osd_internal.h:1304:osd_trans_exec_op()) lustre-MDT0000: opcode 7: before 2593 < left 8610, rollback = 7 [ 284.822666] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity-pfl test_0b: @@@@@@ FAIL: Create \/mnt\/lustre\/d0b.sanity-pfl\/f0b.sanity-pfl succeeded [ 285.114266] Lustre: DEBUG MARKER: sanity-pfl test_0b: @@@@@@ FAIL: Create /mnt/lustre/d0b.sanity-pfl/f0b.sanity-pfl succeeded |
| Comments |
| Comment by Andreas Dilger [ 30/Oct/21 ] |
|
I added debugging to the test to print out the resulting layout: lcm_layout_gen: 2
lcm_mirror_count: 1
lcm_entry_count: 2
lcme_id: 1
lcme_mirror_id: 0
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: 1048576
lmm_stripe_count: 720
lmm_stripe_size: 1048576
lmm_pattern: raid0,overstriped
lmm_layout_gen: 0
lmm_stripe_offset: 3
lmm_objects:
[720 objects]
lcme_id: 2
lcme_mirror_id: 0
lcme_flags: 0
lcme_extent.e_start: 1048576
lcme_extent.e_end: EOF
lmm_stripe_count: 2000
lmm_stripe_size: 1048576
lmm_pattern: raid0,overstriped
lmm_layout_gen: 0
lmm_stripe_offset: -1
It isn't clear why the first component only got 720 objects when 2000 were requested, but if one thinks about it more - it shouldn't be possible to have more than 1 stripe/LOV_MIN_STRIPE_SIZE, so a 1MB component should allow at most 16 x 64KB stripes in a 1MB component, since the rest are just a waste of space. That doesn't help understand or fix this bug, but it does expose a related issue. |
| Comment by Andreas Dilger [ 30/Oct/21 ] |
|
One possibility is that the OSTs have run out of objects, shrinking the file layout, but there weren't any signs of this in the layout (it was round robin across OSTs 0-6 for the whole file). |
| Comment by Andreas Dilger [ 01/Mar/22 ] |
|
This interop issue was introduced by patch https://review.whamcloud.com/40895 " |