[LU-11596] sanity test_42d/test_42e: FAIL: failed: client:143294464 server: 143110144 Created: 01/Nov/18  Updated: 21/Mar/23  Resolved: 18/Jan/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0, Lustre 2.13.0, Lustre 2.14.0, Lustre 2.12.4, Lustre 2.12.5, Lustre 2.12.6
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Major
Reporter: Jian Yu Assignee: James A Simmons
Resolution: Fixed Votes: 0
Labels: arm, ppc, rhel8
Environment:

Lustre Build: https://build.whamcloud.com/job/lustre-master/3811
Distro/Arch: RHEL7.5/aarch64 (client), RHEL7.5/x86_64 (server)


Issue Links:
Duplicate
duplicates LU-10033 MOFED4.1 sanity test_42d: failed: cli... Resolved
duplicates LU-10571 sanity test_42d: failed: client:11023... Resolved
is duplicated by LU-14630 sanity test_42e: failed grant check Resolved
Related
is related to LU-12255 sanity-dom test 42e fails with ‘test... Open
is related to LU-13838 sanity test_64e: Grants Mismatch on ARM Resolved
is related to LU-10300 Can the Lustre 2.10.x clients support... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity test 42d failed as follows:

Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n osc.lustre-*.cur_*grant_bytes
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param osc.lustre-*.cur_*_bytes
Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity test_42d: @@@@@@ FAIL: failed: client:143294464 server: 143110144. 
Lustre: DEBUG MARKER: sanity test_42d: @@@@@@ FAIL: failed: client:143294464 server: 143110144.

Dmesg on OSS node:

Lustre: DEBUG MARKER: == sanity test 42d: test complete truncate of file with cached dirty data ============================ 01:34:08 (1540517648)
Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 	    fail_val=0 2>/dev/null
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param  obdfilter.lustre-OST*.{tot_granted,tot_pending,grant_precreate}
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param obdfilter.lustre-OST*.tot* obdfilter.lustre-OST*.grant_*
Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity test_42d: @@@@@@ FAIL: failed: client:143294464 server: 143110144. 
Lustre: DEBUG MARKER: sanity test_42d: @@@@@@ FAIL: failed: client:143294464 server: 143110144.

Maloo report: https://testing.whamcloud.com/test_sets/88bbf5c2-d9d0-11e8-b46b-52540065bddc

sanity tests 63a, 63b, 64a, 64b, 64c also failed with the same issue.



 Comments   
Comment by Gerrit Updater [ 14/Nov/18 ]

Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33652
Subject: LU-11596 tests: disable sanity tests 42d,63[a,b],64[a-c] for ARM
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f03165045efacafb69204d460c3409ab91f00a3a

Comment by Gerrit Updater [ 21/Nov/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33652/
Subject: LU-11596 tests: disable several sanity sub-tests for ARM
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ef38bdfd2b7e672e1886d5c9874e4749ce2ac258

Comment by Andreas Dilger [ 27/Nov/18 ]

It looks like this is one of the few test failures that seems to legitimately be caused by the code running on ARM 64KB PAGE_SIZE. I suspect that there is somewhere in the client (OSC) or server (OFD) grant code that is assuming PAGE_SIZE = 4096, and is adding/subtracting 4096 bytes instead of PAGE_SIZE. In the case reported in the bug title, the difference between the grant values is 184320 bytes, or 45x 4096 bytes.

Given that this is a 100% failure it should be fairly straight forward to debug. Note, however, that it is possible the grant has become out-of-sync in some earlier test, and these tests are failing because they are the only ones in $GRANT_CHECK_LIST. It may be useful to modify test-framework.c::check_grant() to allow setting an environment variable (e.g. GRANT_CHECK_ALWAYS=yes) to check grant for every subtest to catch the actual source of the problem.

Comment by Jian Yu [ 27/Nov/18 ]

Thank you Andreas for the suggestions. Let me look into the grant codes.

Comment by Sergey Cheremencev [ 04/Dec/18 ]

Faced the same problem on arm session for another sanity tests - 63a, 63b, 64a, 64c
https://testing.whamcloud.com/test_sets/f33166da-ecdb-11e8-adf2-52540065bddc

Comment by Andreas Dilger [ 05/Dec/18 ]

Faced the same problem on arm session for another sanity tests - 63a, 63b, 64a, 64c
https://testing.whamcloud.com/test_sets/f33166da-ecdb-11e8-adf2-52540065bddc

Sergey,
if you are still hitting these failures on ARM, then it means your patch is not updated to the latest master that includes the above patch to skip these tests. You could rebase your patch to get these fixes, but it is not critical for review-ldiskfs-arm because that test session is optional and does not affect the Maloo +/-1 result.

 

Comment by Andreas Dilger [ 06/Feb/19 ]

It looks like there are also failures on sanity test_64d on ARM, so that should be added to the skip list:
https://testing.whamcloud.com/test_sets/48e911d6-2a08-11e9-b901-52540065bddc

Comment by Gerrit Updater [ 15/Mar/19 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34432
Subject: LU-11596 tests: skip sanity test_64d for ARM
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 74d6824790616f3a997783b65d3c5e0af4a1566e

Comment by Gerrit Updater [ 01/Apr/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34432/
Subject: LU-11596 tests: skip sanity test_64d for ARM
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 946764287309a274fa805d930e4f1126d4a66570

Comment by Gerrit Updater [ 06/Feb/20 ]

James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37463
Subject: LU-11596 tests: skip sanity tests for PPC
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1916b688570501112977b7062c3affcf13a6aaaa

Comment by Gerrit Updater [ 11/Mar/20 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37876
Subject: LU-11596 tests: wait for grant to stop changing
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: fd1d4176367a33eccd2f0715de8e0abf4fcd4996

Comment by Gerrit Updater [ 31/Mar/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37876/
Subject: LU-11596 tests: wait for grant to stop changing
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 75ebfa0b924a41d15aea026dc9a29bec39d1419a

Comment by Gerrit Updater [ 25/Nov/20 ]

James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/40758
Subject: LU-11596 tests: re-enable sanity grant test for ARM
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b11e62e18bbdb56102f872c8c3aeefb2faa0e799

Comment by Xinliang Liu [ 30/Aug/21 ]

Andreas Dilger's guess was right. I can make the test pass on arm CentOS8. When changing the write amount to page size align:

@@ -5091,7 +5091,7 @@ test_42a() {
        sync; sleep 1; sync # just to be safe
        BEFOREWRITES=`count_ost_writes`
         lctl get_param -n osc.*[oO][sS][cC][_-]*.cur_grant_bytes | grep "[0-9]"
-        dd if=/dev/zero of=$DIR/f42a bs=1024 count=100
+        dd if=/dev/zero of=$DIR/f42a bs=$PAGE_SIZE count=2

And looking into the client end grant calculation code, it likely related to this part code:

 830 int osc_extent_finish(const struct lu_env *env, struct osc_extent *ext,
 831                       int sent, int rc)
 832 {
...
 865         } else if (blocksize < PAGE_SIZE &&
 866                    last_count != PAGE_SIZE) {
 867                 /* For short writes we shouldn't count parts of pages that
 868                  * span a whole chunk on the OST side, or our accounting goes
 869                  * wrong.  Should match the code in filter_grant_check. */
 870                 int offset = last_off & ~PAGE_MASK;
 871                 int count = last_count + (offset & (blocksize - 1));
 872                 int end = (offset + last_count) & (blocksize - 1);
 873                 if (end)
 874                         count += blocksize - end;
 875
 876                 lost_grant = PAGE_SIZE - count;
 877         }
 878         if (ext->oe_grants > 0)
 879                 osc_free_grant(cli, nr_pages, lost_grant, ext->oe_grants);

This part code will never go into on x86_64 as page size and block size are always 4k. But this part code looks doesn't match the sever end related code of tgt_grant_check() on Arm RHEL/CentOS, the blocksize seems always 4k, but  page size is 64k on Arm RHEL/CentOS:

 645 static inline u64 tgt_grant_rnb_size(struct obd_export *exp,
 646                                      struct lu_target *lut,
 647                                      struct niobuf_remote *rnb)
 648 {
...
 659
 660         /* The network buffer might span several blocks, align it on block
 661          * boundaries */
 662         bytes  = rnb->rnb_offset & (blksize - 1);
 663         bytes += rnb->rnb_len;
 664         end    = bytes & (blksize - 1);
 665         if (end)
 666                 bytes += blksize - end;

 

But I have no idea how to fix now. Please give me some help, thanks.

Comment by Andreas Dilger [ 30/Aug/21 ]

xinliang, thanks for looking into this. Strictly speaking, the grant is related to the amount of space consumed in the back-end filesystem, which is usually in 4KB multiples (blocksize). The reason that PAGE_SIZE is involved is because the client VM cannot track cached dirty data of less than a full page. As a result, the 64KB clients will need to send 64KB writes, even though only a byte of data was modified in the page. This will consume more of the client grant to dirty the page, but if there is still free space in the OST it will receive replacement grant in the RPC reply.

Should match the code in filter_grant_check.

The filter_grant_check() function doesn't exist anymore, and was replaced by tgt_grant_check(), as you mentioned. If you are updating this code, please also fix this comment.

It looks like the server code is more focussed on handling the case of request size < blocksize (ZFS), compared to request size > blocksize (the common case). Where this might be going wrong is writes at the end of the file, since osc_extent_finish() is not consuming a whole page worth of grant if the end of file is 4KB * N less than the full PAGE_SIZE?

Comment by Xinliang Liu [ 13/Sep/21 ]

Hi Andreas Dilger, thanks for the explanation about grant, will update the comment when send the fixed patch.

After looking into the code and git log, I found that starting from this commit (bd1e41672c LU-2049 grant: add support for OBD_CONNECT_GRANT_PARAM), this OSC part code,

 830 int osc_extent_finish(const struct lu_env *env, struct osc_extent *ext,
 831                       int sent, int rc)
 832 {
...
 865         } else if (blocksize < PAGE_SIZE &&
 866                    last_count != PAGE_SIZE) {
 867                 /* For short writes we shouldn't count parts of pages that
 868                  * span a whole chunk on the OST side, or our accounting goes
 869                  * wrong.  Should match the code in filter_grant_check. */
 870                 int offset = last_off & ~PAGE_MASK;
 871                 int count = last_count + (offset & (blocksize - 1));
 872                 int end = (offset + last_count) & (blocksize - 1);
 873                 if (end)
 874                         count += blocksize - end;
 875
 876                 lost_grant = PAGE_SIZE - count;
 877         }
 878         if (ext->oe_grants > 0)
 879                 osc_free_grant(cli, nr_pages, lost_grant, ext->oe_grants);

begins to mismatch the code of OST tgt_grant_check

708 static void tgt_grant_check(const struct lu_env *env, struct obd_export *exp,
 709                             struct obdo *oa, struct niobuf_remote *rnb,
 710                             int niocount, u64 *left)
 711 {
...
 716         unsigned long            ungranted = 0;
 717         unsigned long            granted = 0;
...
 740         } else if (exp_grant_param_supp(exp) && oa->o_grant_used > 0) {
 741                 /* Client supports the new grant parameters and is telling us
 742                  * how much grant space it consumed for this bulk write.
 743                  * Although all rnbs are supposed to have the OBD_BRW_FROM_GRANT
 744                  * flag set, we will scan the rnb list and looks for non-cache
 745                  * I/O in case it changes in the future */
 746                 if (ted->ted_grant >= oa->o_grant_used) {
 747                         /* skip grant accounting for rnbs with
 748                          * OBD_BRW_FROM_GRANT and just used grant consumption
 749                          * claimed in the request */
 750                         granted = oa->o_grant_used;
 751                         skip = true;
 752                 } else {
 ...
 773         }
774
 775         for (i = 0; i < niocount; i++) {
 776                 int bytes;
 777
 778                 if ((rnb[i].rnb_flags & OBD_BRW_FROM_GRANT)) {
 779                         if (skip) {
 780                                 rnb[i].rnb_flags |= OBD_BRW_GRANTED;
 781                                 continue;
 782                         }
 ...
 829         }
 830
 831         /* record in o_grant_used the actual space reserved for the I/O, will be
 832          * used later in tgt_grant_commmit() */
 833         oa->o_grant_used = granted + ungranted;
 ...
 872 }

 

So when client PAGE_SIZE is 64KB bigger than backend file system block size 4KB (Ldiskfs),

The matched thing should be:

If both server and client support OBD_CONNECT_GRANT_PARAM, the lost_grant in osc_extent_finish should be 0, because server will use grant consumption claimed  by the client. 

And note that, in this case, lost_grant = (client claim o_grant_used) - (OST actual+ o_grant_used)  = 0.

Comment by Gerrit Updater [ 18/Jan/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/40758/
Subject: LU-11596 osc: Fix and re-enable sanity grant test for ARM
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 7d3edce0650f0b66bdd280373a54a16cb28b8469

Generated at Sat Feb 10 02:45:15 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.