[LU-11596] sanity test_42d/test_42e: FAIL: failed: client:143294464 server: 143110144 Created: 01/Nov/18 Updated: 21/Mar/23 Resolved: 18/Jan/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0, Lustre 2.13.0, Lustre 2.14.0, Lustre 2.12.4, Lustre 2.12.5, Lustre 2.12.6 |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Jian Yu | Assignee: | James A Simmons |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | arm, ppc, rhel8 | ||
| Environment: |
Lustre Build: https://build.whamcloud.com/job/lustre-master/3811 |
||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||||||
| Description |
|
sanity test 42d failed as follows: Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n osc.lustre-*.cur_*grant_bytes Lustre: DEBUG MARKER: /usr/sbin/lctl get_param osc.lustre-*.cur_*_bytes Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity test_42d: @@@@@@ FAIL: failed: client:143294464 server: 143110144. Lustre: DEBUG MARKER: sanity test_42d: @@@@@@ FAIL: failed: client:143294464 server: 143110144. Dmesg on OSS node: Lustre: DEBUG MARKER: == sanity test 42d: test complete truncate of file with cached dirty data ============================ 01:34:08 (1540517648)
Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 fail_val=0 2>/dev/null
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param obdfilter.lustre-OST*.{tot_granted,tot_pending,grant_precreate}
Lustre: DEBUG MARKER: /usr/sbin/lctl get_param obdfilter.lustre-OST*.tot* obdfilter.lustre-OST*.grant_*
Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity test_42d: @@@@@@ FAIL: failed: client:143294464 server: 143110144.
Lustre: DEBUG MARKER: sanity test_42d: @@@@@@ FAIL: failed: client:143294464 server: 143110144.
Maloo report: https://testing.whamcloud.com/test_sets/88bbf5c2-d9d0-11e8-b46b-52540065bddc sanity tests 63a, 63b, 64a, 64b, 64c also failed with the same issue. |
| Comments |
| Comment by Gerrit Updater [ 14/Nov/18 ] |
|
Jian Yu (yujian@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33652 |
| Comment by Gerrit Updater [ 21/Nov/18 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33652/ |
| Comment by Andreas Dilger [ 27/Nov/18 ] |
|
It looks like this is one of the few test failures that seems to legitimately be caused by the code running on ARM 64KB PAGE_SIZE. I suspect that there is somewhere in the client (OSC) or server (OFD) grant code that is assuming PAGE_SIZE = 4096, and is adding/subtracting 4096 bytes instead of PAGE_SIZE. In the case reported in the bug title, the difference between the grant values is 184320 bytes, or 45x 4096 bytes. Given that this is a 100% failure it should be fairly straight forward to debug. Note, however, that it is possible the grant has become out-of-sync in some earlier test, and these tests are failing because they are the only ones in $GRANT_CHECK_LIST. It may be useful to modify test-framework.c::check_grant() to allow setting an environment variable (e.g. GRANT_CHECK_ALWAYS=yes) to check grant for every subtest to catch the actual source of the problem. |
| Comment by Jian Yu [ 27/Nov/18 ] |
|
Thank you Andreas for the suggestions. Let me look into the grant codes. |
| Comment by Sergey Cheremencev [ 04/Dec/18 ] |
|
Faced the same problem on arm session for another sanity tests - 63a, 63b, 64a, 64c |
| Comment by Andreas Dilger [ 05/Dec/18 ] |
Sergey,
|
| Comment by Andreas Dilger [ 06/Feb/19 ] |
|
It looks like there are also failures on sanity test_64d on ARM, so that should be added to the skip list: |
| Comment by Gerrit Updater [ 15/Mar/19 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34432 |
| Comment by Gerrit Updater [ 01/Apr/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34432/ |
| Comment by Gerrit Updater [ 06/Feb/20 ] |
|
James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37463 |
| Comment by Gerrit Updater [ 11/Mar/20 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37876 |
| Comment by Gerrit Updater [ 31/Mar/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37876/ |
| Comment by Gerrit Updater [ 25/Nov/20 ] |
|
James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/40758 |
| Comment by Xinliang Liu [ 30/Aug/21 ] |
|
Andreas Dilger's guess was right. I can make the test pass on arm CentOS8. When changing the write amount to page size align:
@@ -5091,7 +5091,7 @@ test_42a() {
sync; sleep 1; sync # just to be safe
BEFOREWRITES=`count_ost_writes`
lctl get_param -n osc.*[oO][sS][cC][_-]*.cur_grant_bytes | grep "[0-9]"
- dd if=/dev/zero of=$DIR/f42a bs=1024 count=100
+ dd if=/dev/zero of=$DIR/f42a bs=$PAGE_SIZE count=2
And looking into the client end grant calculation code, it likely related to this part code: 830 int osc_extent_finish(const struct lu_env *env, struct osc_extent *ext, 831 int sent, int rc) 832 { ... 865 } else if (blocksize < PAGE_SIZE && 866 last_count != PAGE_SIZE) { 867 /* For short writes we shouldn't count parts of pages that 868 * span a whole chunk on the OST side, or our accounting goes 869 * wrong. Should match the code in filter_grant_check. */ 870 int offset = last_off & ~PAGE_MASK; 871 int count = last_count + (offset & (blocksize - 1)); 872 int end = (offset + last_count) & (blocksize - 1); 873 if (end) 874 count += blocksize - end; 875 876 lost_grant = PAGE_SIZE - count; 877 } 878 if (ext->oe_grants > 0) 879 osc_free_grant(cli, nr_pages, lost_grant, ext->oe_grants); This part code will never go into on x86_64 as page size and block size are always 4k. But this part code looks doesn't match the sever end related code of tgt_grant_check() on Arm RHEL/CentOS, the blocksize seems always 4k, but page size is 64k on Arm RHEL/CentOS: 645 static inline u64 tgt_grant_rnb_size(struct obd_export *exp, 646 struct lu_target *lut, 647 struct niobuf_remote *rnb) 648 { ... 659 660 /* The network buffer might span several blocks, align it on block 661 * boundaries */ 662 bytes = rnb->rnb_offset & (blksize - 1); 663 bytes += rnb->rnb_len; 664 end = bytes & (blksize - 1); 665 if (end) 666 bytes += blksize - end;
But I have no idea how to fix now. Please give me some help, thanks. |
| Comment by Andreas Dilger [ 30/Aug/21 ] |
|
xinliang, thanks for looking into this. Strictly speaking, the grant is related to the amount of space consumed in the back-end filesystem, which is usually in 4KB multiples (blocksize). The reason that PAGE_SIZE is involved is because the client VM cannot track cached dirty data of less than a full page. As a result, the 64KB clients will need to send 64KB writes, even though only a byte of data was modified in the page. This will consume more of the client grant to dirty the page, but if there is still free space in the OST it will receive replacement grant in the RPC reply.
The filter_grant_check() function doesn't exist anymore, and was replaced by tgt_grant_check(), as you mentioned. If you are updating this code, please also fix this comment. It looks like the server code is more focussed on handling the case of request size < blocksize (ZFS), compared to request size > blocksize (the common case). Where this might be going wrong is writes at the end of the file, since osc_extent_finish() is not consuming a whole page worth of grant if the end of file is 4KB * N less than the full PAGE_SIZE? |
| Comment by Xinliang Liu [ 13/Sep/21 ] |
|
Hi Andreas Dilger, thanks for the explanation about grant, will update the comment when send the fixed patch. After looking into the code and git log, I found that starting from this commit (bd1e41672c 830 int osc_extent_finish(const struct lu_env *env, struct osc_extent *ext, 831 int sent, int rc) 832 { ... 865 } else if (blocksize < PAGE_SIZE && 866 last_count != PAGE_SIZE) { 867 /* For short writes we shouldn't count parts of pages that 868 * span a whole chunk on the OST side, or our accounting goes 869 * wrong. Should match the code in filter_grant_check. */ 870 int offset = last_off & ~PAGE_MASK; 871 int count = last_count + (offset & (blocksize - 1)); 872 int end = (offset + last_count) & (blocksize - 1); 873 if (end) 874 count += blocksize - end; 875 876 lost_grant = PAGE_SIZE - count; 877 } 878 if (ext->oe_grants > 0) 879 osc_free_grant(cli, nr_pages, lost_grant, ext->oe_grants); begins to mismatch the code of OST tgt_grant_check 708 static void tgt_grant_check(const struct lu_env *env, struct obd_export *exp, 709 struct obdo *oa, struct niobuf_remote *rnb, 710 int niocount, u64 *left) 711 { ... 716 unsigned long ungranted = 0; 717 unsigned long granted = 0; ... 740 } else if (exp_grant_param_supp(exp) && oa->o_grant_used > 0) { 741 /* Client supports the new grant parameters and is telling us 742 * how much grant space it consumed for this bulk write. 743 * Although all rnbs are supposed to have the OBD_BRW_FROM_GRANT 744 * flag set, we will scan the rnb list and looks for non-cache 745 * I/O in case it changes in the future */ 746 if (ted->ted_grant >= oa->o_grant_used) { 747 /* skip grant accounting for rnbs with 748 * OBD_BRW_FROM_GRANT and just used grant consumption 749 * claimed in the request */ 750 granted = oa->o_grant_used; 751 skip = true; 752 } else { ... 773 } 774 775 for (i = 0; i < niocount; i++) { 776 int bytes; 777 778 if ((rnb[i].rnb_flags & OBD_BRW_FROM_GRANT)) { 779 if (skip) { 780 rnb[i].rnb_flags |= OBD_BRW_GRANTED; 781 continue; 782 } ... 829 } 830 831 /* record in o_grant_used the actual space reserved for the I/O, will be 832 * used later in tgt_grant_commmit() */ 833 oa->o_grant_used = granted + ungranted; ... 872 }
So when client PAGE_SIZE is 64KB bigger than backend file system block size 4KB (Ldiskfs), The matched thing should be: If both server and client support OBD_CONNECT_GRANT_PARAM, the lost_grant in osc_extent_finish should be 0, because server will use grant consumption claimed by the client. And note that, in this case, lost_grant = (client claim o_grant_used) - (OST actual+ o_grant_used) = 0. |
| Comment by Gerrit Updater [ 18/Jan/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/40758/ |