[LU-11239] sanity-lfsck test 36a fails with 'Fail to split mirror' Created: 13/Aug/18 Updated: 21/Nov/19 Resolved: 09/Sep/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0, Lustre 2.13.0, Lustre 2.12.1, Lustre 2.12.2, Lustre 2.12.3 |
| Fix Version/s: | Lustre 2.13.0, Lustre 2.12.4 |
| Type: | Bug | Priority: | Major |
| Reporter: | James Nunez (Inactive) | Assignee: | Zhenyu Xu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
sanity-lfsck test_36a started failing on August 6, 2018 with Lustre version 2.11.53.52 build #3774. Note, sanity-lfsck test 36a landed to master with build #3774. In the test_log for these failures, for example at https://testing.whamcloud.com/test_sets/0e30317a-9ad2-11e8-a9f7-52540065bddc, we see the following: …
/mnt/lustre/d36a.sanity-lfsck/f0
lcm_layout_gen: 10
lcm_mirror_count: 3
lcm_entry_count: 6
lcme_id: 65537
lcme_mirror_id: 1
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: 1048576
lmm_stripe_count: 2
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 0
lmm_objects:
- 0: { l_ost_idx: 0, l_fid: [0x100000000:0x9b6:0x0] }
- 1: { l_ost_idx: 1, l_fid: [0x100010000:0x768:0x0] }
lcme_id: 65538
lcme_mirror_id: 1
lcme_flags: init
lcme_extent.e_start: 1048576
lcme_extent.e_end: EOF
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 2
lmm_objects:
- 0: { l_ost_idx: 2, l_fid: [0x100020000:0x78e:0x0] }
lcme_id: 131075
lcme_mirror_id: 2
lcme_flags: init
lcme_extent.e_start: 0
lcme_extent.e_end: 2097152
lmm_stripe_count: 2
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 1
lmm_objects:
- 0: { l_ost_idx: 1, l_fid: [0x100010000:0x769:0x0] }
- 1: { l_ost_idx: 2, l_fid: [0x100020000:0x788:0x0] }
lcme_id: 131076
lcme_mirror_id: 2
lcme_flags: init
lcme_extent.e_start: 2097152
lcme_extent.e_end: EOF
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 0
lmm_objects:
- 0: { l_ost_idx: 0, l_fid: [0x100000000:0x9bc:0x0] }
lcme_id: 196613
lcme_mirror_id: 3
lcme_flags: init,stale
lcme_extent.e_start: 0
lcme_extent.e_end: 3145728
lmm_stripe_count: 2
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 2
lmm_objects:
- 0: { l_ost_idx: 2, l_fid: [0x100020000:0x789:0x0] }
- 1: { l_ost_idx: 0, l_fid: [0x100000000:0x9b7:0x0] }
lcme_id: 196614
lcme_mirror_id: 3
lcme_flags: init,stale
lcme_extent.e_start: 3145728
lcme_extent.e_end: EOF
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: raid0
lmm_layout_gen: 0
lmm_stripe_offset: 1
lmm_objects:
- 0: { l_ost_idx: 1, l_fid: [0x100010000:0x76e:0x0] }
…
Inject failure, to simulate the case of missing one mirror in LOV
CMD: trevis-7vm4 /usr/sbin/lctl set_param fail_loc=0x1616
fail_loc=0x1616
error: lfs mirror split: setting 'stale' is not supported
sanity-lfsck test_36a: @@@@@@ FAIL: (12) Fail to split 1st mirror from /mnt/lustre/d36a.sanity-lfsck/f0
In comparing sanity-lfsck test 36a that pass testing and those that don’t, the ones that fail have components with “init,stale” flags and the tests that pass don’t have “stale flags More logs for this failure are at |
| Comments |
| Comment by Peter Jones [ 25/Sep/18 ] |
|
Lai This test was added with https://git.whamcloud.com/?p=fs/lustre-release.git;a=commit;h=36ba989752c62cc76b06089373fcd6cec6da9008 . Should we revert this change or is it just a faulty test that we should add to the ALWAYS_EXCEPT list until it is corrected? Peter |
| Comment by Lai Siyao [ 26/Sep/18 ] |
|
IMO it's test script problem. Bobi is more familiar with related code, maybe he knows more. |
| Comment by Peter Jones [ 06/Oct/18 ] |
|
Bobi Any suggestions? Peter |
| Comment by Gerrit Updater [ 10/Oct/18 ] |
|
Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/33330 |
| Comment by Andreas Dilger [ 19/Oct/18 ] |
|
Can you please explain why we need to call lfs mirror resync multiple times on a file to make it correct? If this is a problem for regular users, then it could lead to data loss if they think the file is sync'd to all of the mirrors (e.g. changelog watcher that is calling "lfs mirror resync" on each file once), but in fact there are mirrors that are not uptodate. If there is a bug in how resync is working, then that should be fixed in lfs, and not in the test script. If this is really a bug in the test script, please explain why, so that we are sure that there is not going to be data loss for the users (and we can land the existing patch and get it out of the way for 2.12). |
| Comment by Zhenyu Xu [ 19/Oct/18 ] |
|
Don't know why this case why resync does not successfully sync some components. And lfs mirror resync was designed to not report failure if some components are not sync-ed for some reason, but I don't see why the test_36a() could fail the resync. |
| Comment by Zhenyu Xu [ 25/Oct/18 ] |
|
https://testing.whamcloud.com/test_logs/85810774-d76b-11e8-82f2-52540065bddc/download (client log) 00000008:00000001:1.0:1540371611.978254:0:13766:0:(osc_request.c:1770:osc_brw_fini_request()) Process leaving via out (rc=18446744073709551588 : -28 : 0xffffffffffffffe4) 00000008:00000001:1.0:1540371611.978256:0:13766:0:(osc_request.c:1905:osc_brw_fini_request()) Process leaving (rc=18446744073709551588 : -28 : ffffffffffffffe4) 00000008:00000002:1.0:1540371611.978257:0:13766:0:(osc_request.c:2025:brw_interpret()) request ffff893865d41800 aa ffff893865d41970 rc -28 And the OST log 00000001:00000001:0.0:1540371612.134157:0:12237:0:(osd_io.c:1810:osd_declare_write()) Process leaving (rc=0 : 0 : 0) 00080000:00000001:0.0:1540371612.134160:0:12237:0:(osd_handler.c:1916:osd_trans_start()) Process leaving (rc=0 : 0 : 0) 00000001:00000002:0.0:1540371612.134161:0:12237:0:(osd_io.c:1415:osd_write_commit()) Skipping [0] == -28 00000001:00000002:0.0:1540371612.134164:0:12237:0:(osd_io.c:1415:osd_write_commit()) Skipping [1] == -28 ... 00000001:00000002:0.0:1540371612.134355:0:12237:0:(osd_io.c:1415:osd_write_commit()) Skipping [254] == -28 00000001:00000002:0.0:1540371612.134356:0:12237:0:(osd_io.c:1415:osd_write_commit()) Skipping [255] == -28 00000001:00000001:0.0:1540371612.134357:0:12237:0:(osd_io.c:330:osd_do_bio()) Process entered 00000001:00000001:0.0:1540371612.134358:0:12237:0:(osd_io.c:462:osd_do_bio()) Process leaving (rc=0 : 0 : 0) 00000001:00000001:0.0:1540371612.134359:0:12237:0:(osd_io.c:1484:osd_write_commit()) Process leaving (rc=0 : 0 : 0) |
| Comment by Andreas Dilger [ 25/Oct/18 ] |
|
Running out of space is an acceptable reason to fail resync, so long as some error is printed for the user. However, this shouldn't happen during this particular test. Is the test file too large, or is there some leak of space? |
| Comment by Zhenyu Xu [ 25/Oct/18 ] |
|
It has something about grant. OST reports that it has 62,066,688 bytes free, and 54,898,688 bytes available, minus the space already granted to the client, it has 897,024 left for the incoming write, and it's not enough for the requested write size (1,073,152) 00002000:00000020:0.0:1540371612.048313:0:12237:0:(tgt_grant.c:413:tgt_grant_statfs()) lustre-OST0002: cli 31207193-e3e7-03d0-da8d-e086caad4da3/ffff9f173946e000 free: 62066688 avail: 54898688 00002000:00000020:0.0:1540371612.048315:0:12237:0:(tgt_grant.c:477:tgt_grant_space_left()) lustre-OST0002: cli 31207193-e3e7-03d0-da8d-e086caad4da3/ffff9f173946e000 avail 54898688 left 897024 unstable 0 tot_grant 53999552 pending 0 00002000:00000001:0.0:1540371612.048316:0:12237:0:(tgt_grant.c:479:tgt_grant_space_left()) Process leaving (rc=897024 : 897024 : db000) 00002000:00000001:0.0:1540371612.048317:0:12237:0:(tgt_grant.c:504:tgt_grant_incoming()) Process entered 00002000:00000020:0.0:1540371612.048318:0:12237:0:(tgt_grant.c:520:tgt_grant_incoming()) lustre-OST0002: cli 31207193-e3e7-03d0-da8d-e086caad4da3/ffff9f173946e000 reports grant 43601920 dropped 0, local 46772224 00002000:00000001:0.0:1540371612.048319:0:12237:0:(tgt_grant.c:567:tgt_grant_incoming()) Process leaving 00002000:00000001:0.0:1540371612.048320:0:12237:0:(tgt_grant.c:705:tgt_grant_check()) Process entered 00002000:00000020:0.0:1540371612.048320:0:12237:0:(tgt_grant.c:812:tgt_grant_check()) lustre-OST0002: cli 31207193-e3e7-03d0-da8d-e086caad4da3/ffff9f173946e000 idx 0 no space for 1073152 00002000:00000020:0.0:1540371612.048322:0:12237:0:(tgt_grant.c:832:tgt_grant_check()) lustre-OST0002: cli 31207193-e3e7-03d0-da8d-e086caad4da3/ffff9f173946e000 granted: 0 ungranted: 0 grant: 46772224 dirty: 3170304 00002000:00000001:0.0:1540371612.048323:0:12237:0:(tgt_grant.c:855:tgt_grant_check()) Process leaving |
| Comment by Andreas Dilger [ 26/Oct/18 ] |
|
Is that because the mirror resync writes do not use the grant already held by the client? It doesn't make sense that this would fail on a test system, because there are only 2 clients that might even be holding grant. |
| Comment by Zhenyu Xu [ 26/Oct/18 ] |
|
mirror resync uses direct IO which I think don't consider grant on the client, and OST find the (avail - grants_assigned_to_the_client) < request_direct_write_size. |
| Comment by Andreas Dilger [ 26/Oct/18 ] |
|
I know we've gone back and forth on O_DIRECT writes consuming grant in the past, but IMHO we should consume grant from the client if it is available, and return grant from the server if available. However, we shouldn't need grant for O_DIRECT writes if there is none on the client. This would allow the best of both worlds - clients that have grant would consume it during O_DIRECT writes so that they do not run out of space on the OST. Even if space is becoming short on an OST and grant is restricted then the client can still submit large O_DIRECT writes without grant if there is any free space on the OST. |
| Comment by Andreas Dilger [ 31/Oct/18 ] |
|
As a starting point, we should add error messages to "lfs mirror resync" to print the -ENOSPC errors during resync, and always return an error to the shell if any resync fails (though it shouldn't stop resync of an error is hit, just save it until the end). We shouldn't have to dig through Lustre debug logs to find that out, and users shouldn't have to do that either. |
| Comment by Andreas Dilger [ 31/Oct/18 ] |
|
Jian, can you please work on fixing the lfs mirror resync error handling as described above. Bobijam, can you please take a look at fixing O_DIRECT writes to consume grant (if available). |
| Comment by Jian Yu [ 31/Oct/18 ] |
|
Sure, Andreas, will do. |
| Comment by Zhenyu Xu [ 01/Nov/18 ] |
|
Yes, working on it, and revive/revise a patch (consume grant for sync write patch of |
| Comment by Gerrit Updater [ 01/Nov/18 ] |
|
Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/33537 |
| Comment by Zhenyu Xu [ 02/Nov/18 ] |
|
it seems that the debug patch upon |
| Comment by Jian Yu [ 28/Aug/19 ] |
|
+1 on Lustre b2_12 branch: https://testing.whamcloud.com/test_sets/4f773b5e-c96d-11e9-97d5-52540065bddc |
| Comment by Gerrit Updater [ 07/Sep/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33537/ |
| Comment by Patrick Farrell (Inactive) [ 09/Sep/19 ] |
|
Patch landed. |
| Comment by Gerrit Updater [ 01/Oct/19 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36341 |
| Comment by Gerrit Updater [ 21/Nov/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36341/ |