[LU-11239] sanity-lfsck test 36a fails with 'Fail to split mirror' Created: 13/Aug/18  Updated: 21/Nov/19  Resolved: 09/Sep/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0, Lustre 2.13.0, Lustre 2.12.1, Lustre 2.12.2, Lustre 2.12.3
Fix Version/s: Lustre 2.13.0, Lustre 2.12.4

Type: Bug Priority: Major
Reporter: James Nunez (Inactive) Assignee: Zhenyu Xu
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-11111 crash doing LFSCK: orph_index_insert(... Resolved
is related to LU-4664 sync write should consume grant on cl... Resolved
is related to LU-12757 sanity-lfsck test 36a fails with '(N)... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity-lfsck test_36a started failing on August 6, 2018 with Lustre version 2.11.53.52 build #3774. Note, sanity-lfsck test 36a landed to master with build #3774.

In the test_log for these failures, for example at https://testing.whamcloud.com/test_sets/0e30317a-9ad2-11e8-a9f7-52540065bddc, we see the following:

…
/mnt/lustre/d36a.sanity-lfsck/f0
  lcm_layout_gen:    10
  lcm_mirror_count:  3
  lcm_entry_count:   6
    lcme_id:             65537
    lcme_mirror_id:      1
    lcme_flags:          init
    lcme_extent.e_start: 0
    lcme_extent.e_end:   1048576
      lmm_stripe_count:  2
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 0
      lmm_objects:
      - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x9b6:0x0] }
      - 1: { l_ost_idx: 1, l_fid: [0x100010000:0x768:0x0] }

    lcme_id:             65538
    lcme_mirror_id:      1
    lcme_flags:          init
    lcme_extent.e_start: 1048576
    lcme_extent.e_end:   EOF
      lmm_stripe_count:  1
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 2
      lmm_objects:
      - 0: { l_ost_idx: 2, l_fid: [0x100020000:0x78e:0x0] }

    lcme_id:             131075
    lcme_mirror_id:      2
    lcme_flags:          init
    lcme_extent.e_start: 0
    lcme_extent.e_end:   2097152
      lmm_stripe_count:  2
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 1
      lmm_objects:
      - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x769:0x0] }
      - 1: { l_ost_idx: 2, l_fid: [0x100020000:0x788:0x0] }

    lcme_id:             131076
    lcme_mirror_id:      2
    lcme_flags:          init
    lcme_extent.e_start: 2097152
    lcme_extent.e_end:   EOF
      lmm_stripe_count:  1
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 0
      lmm_objects:
      - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x9bc:0x0] }

    lcme_id:             196613
    lcme_mirror_id:      3
    lcme_flags:          init,stale
    lcme_extent.e_start: 0
    lcme_extent.e_end:   3145728
      lmm_stripe_count:  2
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 2
      lmm_objects:
      - 0: { l_ost_idx: 2, l_fid: [0x100020000:0x789:0x0] }
      - 1: { l_ost_idx: 0, l_fid: [0x100000000:0x9b7:0x0] }

    lcme_id:             196614
    lcme_mirror_id:      3
    lcme_flags:          init,stale
    lcme_extent.e_start: 3145728
    lcme_extent.e_end:   EOF
      lmm_stripe_count:  1
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 1
      lmm_objects:
      - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x76e:0x0] }
…
Inject failure, to simulate the case of missing one mirror in LOV
CMD: trevis-7vm4 /usr/sbin/lctl set_param fail_loc=0x1616
fail_loc=0x1616
error: lfs mirror split: setting 'stale' is not supported
 sanity-lfsck test_36a: @@@@@@ FAIL: (12) Fail to split 1st mirror from /mnt/lustre/d36a.sanity-lfsck/f0 

In comparing sanity-lfsck test 36a that pass testing and those that don’t, the ones that fail have components with “init,stale” flags and the tests that pass don’t have “stale flags

More logs for this failure are at
https://testing.whamcloud.com/test_sets/1131f45c-99d4-11e8-a9f7-52540065bddc
https://testing.whamcloud.com/test_sets/6481868c-9ab4-11e8-a9f7-52540065bddc



 Comments   
Comment by Peter Jones [ 25/Sep/18 ]

Lai

This test was added with https://git.whamcloud.com/?p=fs/lustre-release.git;a=commit;h=36ba989752c62cc76b06089373fcd6cec6da9008 . Should we revert this change or is it just a faulty test that we should add to the ALWAYS_EXCEPT list until it is corrected?

Peter

Comment by Lai Siyao [ 26/Sep/18 ]

IMO it's test script problem. Bobi is more familiar with related code, maybe he knows more.

Comment by Peter Jones [ 06/Oct/18 ]

Bobi

Any suggestions?

Peter

Comment by Gerrit Updater [ 10/Oct/18 ]

Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/33330
Subject: LU-11239 test: sanity-lfsck test_36a fix
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c19a05a5c896ced8dfe962df409cb0892f9e3950

Comment by Andreas Dilger [ 19/Oct/18 ]

Can you please explain why we need to call lfs mirror resync multiple times on a file to make it correct? If this is a problem for regular users, then it could lead to data loss if they think the file is sync'd to all of the mirrors (e.g. changelog watcher that is calling "lfs mirror resync" on each file once), but in fact there are mirrors that are not uptodate.

If there is a bug in how resync is working, then that should be fixed in lfs, and not in the test script. If this is really a bug in the test script, please explain why, so that we are sure that there is not going to be data loss for the users (and we can land the existing patch and get it out of the way for 2.12).

Comment by Zhenyu Xu [ 19/Oct/18 ]

Don't know why this case why resync does not successfully sync some components. And lfs mirror resync was designed to not report failure if some components are not sync-ed for some reason, but I don't see why the test_36a() could fail the resync.

Comment by Zhenyu Xu [ 25/Oct/18 ]

https://testing.whamcloud.com/test_logs/85810774-d76b-11e8-82f2-52540065bddc/download (client log)
shows that -28 is from write

00000008:00000001:1.0:1540371611.978254:0:13766:0:(osc_request.c:1770:osc_brw_fini_request()) Process leaving via out (rc=18446744073709551588 : -28 : 0xffffffffffffffe4)
00000008:00000001:1.0:1540371611.978256:0:13766:0:(osc_request.c:1905:osc_brw_fini_request()) Process leaving (rc=18446744073709551588 : -28 : ffffffffffffffe4)
00000008:00000002:1.0:1540371611.978257:0:13766:0:(osc_request.c:2025:brw_interpret()) request ffff893865d41800 aa ffff893865d41970 rc -28

And the OST log
https://testing.whamcloud.com/test_logs/8580f360-d76b-11e8-82f2-52540065bddc/download (ost log)
shows

00000001:00000001:0.0:1540371612.134157:0:12237:0:(osd_io.c:1810:osd_declare_write()) Process leaving (rc=0 : 0 : 0)
00080000:00000001:0.0:1540371612.134160:0:12237:0:(osd_handler.c:1916:osd_trans_start()) Process leaving (rc=0 : 0 : 0)
00000001:00000002:0.0:1540371612.134161:0:12237:0:(osd_io.c:1415:osd_write_commit()) Skipping [0] == -28
00000001:00000002:0.0:1540371612.134164:0:12237:0:(osd_io.c:1415:osd_write_commit()) Skipping [1] == -28
...
00000001:00000002:0.0:1540371612.134355:0:12237:0:(osd_io.c:1415:osd_write_commit()) Skipping [254] == -28
00000001:00000002:0.0:1540371612.134356:0:12237:0:(osd_io.c:1415:osd_write_commit()) Skipping [255] == -28
00000001:00000001:0.0:1540371612.134357:0:12237:0:(osd_io.c:330:osd_do_bio()) Process entered
00000001:00000001:0.0:1540371612.134358:0:12237:0:(osd_io.c:462:osd_do_bio()) Process leaving (rc=0 : 0 : 0)
00000001:00000001:0.0:1540371612.134359:0:12237:0:(osd_io.c:1484:osd_write_commit()) Process leaving (rc=0 : 0 : 0)
Comment by Andreas Dilger [ 25/Oct/18 ]

Running out of space is an acceptable reason to fail resync, so long as some error is printed for the user.

However, this shouldn't happen during this particular test. Is the test file too large, or is there some leak of space?

Comment by Zhenyu Xu [ 25/Oct/18 ]

It has something about grant. OST reports that it has 62,066,688 bytes free, and 54,898,688 bytes available, minus the space already granted to the client, it has 897,024 left for the incoming write, and it's not enough for the requested write size (1,073,152)

00002000:00000020:0.0:1540371612.048313:0:12237:0:(tgt_grant.c:413:tgt_grant_statfs()) lustre-OST0002: cli 31207193-e3e7-03d0-da8d-e086caad4da3/ffff9f173946e000 free: 62066688 avail: 54898688
00002000:00000020:0.0:1540371612.048315:0:12237:0:(tgt_grant.c:477:tgt_grant_space_left()) lustre-OST0002: cli 31207193-e3e7-03d0-da8d-e086caad4da3/ffff9f173946e000 avail 54898688 left 897024 unstable 0 tot_grant 53999552 pending 0
00002000:00000001:0.0:1540371612.048316:0:12237:0:(tgt_grant.c:479:tgt_grant_space_left()) Process leaving (rc=897024 : 897024 : db000)
00002000:00000001:0.0:1540371612.048317:0:12237:0:(tgt_grant.c:504:tgt_grant_incoming()) Process entered
00002000:00000020:0.0:1540371612.048318:0:12237:0:(tgt_grant.c:520:tgt_grant_incoming()) lustre-OST0002: cli 31207193-e3e7-03d0-da8d-e086caad4da3/ffff9f173946e000 reports grant 43601920 dropped 0, local 46772224
00002000:00000001:0.0:1540371612.048319:0:12237:0:(tgt_grant.c:567:tgt_grant_incoming()) Process leaving
00002000:00000001:0.0:1540371612.048320:0:12237:0:(tgt_grant.c:705:tgt_grant_check()) Process entered
00002000:00000020:0.0:1540371612.048320:0:12237:0:(tgt_grant.c:812:tgt_grant_check()) lustre-OST0002: cli 31207193-e3e7-03d0-da8d-e086caad4da3/ffff9f173946e000 idx 0 no space for 1073152
00002000:00000020:0.0:1540371612.048322:0:12237:0:(tgt_grant.c:832:tgt_grant_check()) lustre-OST0002: cli 31207193-e3e7-03d0-da8d-e086caad4da3/ffff9f173946e000 granted: 0 ungranted: 0 grant: 46772224 dirty: 3170304
00002000:00000001:0.0:1540371612.048323:0:12237:0:(tgt_grant.c:855:tgt_grant_check()) Process leaving
Comment by Andreas Dilger [ 26/Oct/18 ]

Is that because the mirror resync writes do not use the grant already held by the client? It doesn't make sense that this would fail on a test system, because there are only 2 clients that might even be holding grant.

Comment by Zhenyu Xu [ 26/Oct/18 ]

mirror resync uses direct IO which I think don't consider grant on the client, and OST find the (avail - grants_assigned_to_the_client) < request_direct_write_size.

Comment by Andreas Dilger [ 26/Oct/18 ]

I know we've gone back and forth on O_DIRECT writes consuming grant in the past, but IMHO we should consume grant from the client if it is available, and return grant from the server if available. However, we shouldn't need grant for O_DIRECT writes if there is none on the client.

This would allow the best of both worlds - clients that have grant would consume it during O_DIRECT writes so that they do not run out of space on the OST. Even if space is becoming short on an OST and grant is restricted then the client can still submit large O_DIRECT writes without grant if there is any free space on the OST.

Comment by Andreas Dilger [ 31/Oct/18 ]

As a starting point, we should add error messages to "lfs mirror resync" to print the -ENOSPC errors during resync, and always return an error to the shell if any resync fails (though it shouldn't stop resync of an error is hit, just save it until the end). We shouldn't have to dig through Lustre debug logs to find that out, and users shouldn't have to do that either.

Comment by Andreas Dilger [ 31/Oct/18 ]

Jian, can you please work on fixing the lfs mirror resync error handling as described above.

Bobijam, can you please take a look at fixing O_DIRECT writes to consume grant (if available).

Comment by Jian Yu [ 31/Oct/18 ]

Sure, Andreas, will do.

Comment by Zhenyu Xu [ 01/Nov/18 ]

Yes, working on it, and revive/revise a patch (consume grant for sync write patch of LU-4664) and testing whether that could work on this issue.

Comment by Gerrit Updater [ 01/Nov/18 ]

Bobi Jam (bobijam@hotmail.com) uploaded a new patch: https://review.whamcloud.com/33537
Subject: LU-11239 lfs: fix mirror resync error handling
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d2ab5bfd9d3a5710177ea8f07c2182178dab1d2f

Comment by Zhenyu Xu [ 02/Nov/18 ]

it seems that the debug patch upon LU-4664 patch shows that the grant for sync write does fix the -ENOSPC error during resync in the sanity_lfsck test_36a case.

Comment by Jian Yu [ 28/Aug/19 ]

+1 on Lustre b2_12 branch: https://testing.whamcloud.com/test_sets/4f773b5e-c96d-11e9-97d5-52540065bddc

Comment by Gerrit Updater [ 07/Sep/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33537/
Subject: LU-11239 lfs: fix mirror resync error handling
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0f670d1ca9dd5af697bfbf3b95a301c61a8b4447

Comment by Patrick Farrell (Inactive) [ 09/Sep/19 ]

Patch landed.

Comment by Gerrit Updater [ 01/Oct/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36341
Subject: LU-11239 lfs: fix mirror resync error handling
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 3d2705121a7bedb2ebbaefae752258806d868065

Comment by Gerrit Updater [ 21/Nov/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36341/
Subject: LU-11239 lfs: fix mirror resync error handling
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 195ffb9e46709c74c34cbfc2a86378ad86d2062f

Generated at Sat Feb 10 02:42:09 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.