[LU-8135] sanity test_101g fails with 'not all RPCs are 16 MiB BRW rpcs' Created: 12/May/16 Updated: 06/Nov/18 Resolved: 21/Sep/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.9.0 |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
autotest review-zfs |
||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
sanity test 101g fails with 'not all RPCs are 16 MiB BRW rpcs' In the test log on the client, we can see the ‘File too large’ error returned when dd tries to write to the file system == sanity test 101g: Big bulk(4/16 MiB) readahead ==================================================== 11:11:42 (1462903902) CMD: onyx-50vm8 /usr/sbin/lctl get_param obdfilter.*.brw_size | while read s; do echo ost1 \$s; done CMD: onyx-50vm8 /usr/sbin/lctl get_param obdfilter.*.brw_size | while read s; do echo ost2 \$s; done CMD: onyx-50vm8 /usr/sbin/lctl set_param -n obdfilter.lustre-OST*.brw_size=16M osd-*.lustre-OST*.brw_size=16M 2>&1 remount client to enable large RPC size CMD: onyx-50vm1.onyx.hpdd.intel.com grep -c /mnt/lustre' ' /proc/mounts Stopping client onyx-50vm1.onyx.hpdd.intel.com /mnt/lustre (opts:) CMD: onyx-50vm1.onyx.hpdd.intel.com lsof -t /mnt/lustre CMD: onyx-50vm1.onyx.hpdd.intel.com umount /mnt/lustre 2>&1 Starting client: onyx-50vm1.onyx.hpdd.intel.com: -o user_xattr,flock onyx-50vm7@tcp:/lustre /mnt/lustre CMD: onyx-50vm1.onyx.hpdd.intel.com mkdir -p /mnt/lustre CMD: onyx-50vm1.onyx.hpdd.intel.com mount -t lustre -o user_xattr,flock onyx-50vm7@tcp:/lustre /mnt/lustre dd: error writing '/mnt/lustre/f101g.sanity': File too large 2+0 records in 1+0 records out 16777216 bytes (17 MB) copied, 0.978339 s, 17.1 MB/s 0+0 records in 0+0 records out 0 bytes (0 B) copied, 0.00599718 s, 0.0 kB/s 0 RPCs sanity test_101g: @@@@@@ FAIL: not all RPCs are 16 MiB BRW rpcs We’ve had three occurrences of this failure since the test was introduced by http://review.whamcloud.com/19368 a little over a week ago and, so far, only seen on review-zfs-part-1: 2016-05-09 - https://testing.hpdd.intel.com/test_sets/145be8f2-1661-11e6-b5f1-5254006e85c2 |
| Comments |
| Comment by Peter Jones [ 12/May/16 ] |
|
Gu Zheng Could you please advise on this failure? Thanks Peter |
| Comment by Li Xi (Inactive) [ 13/May/16 ] |
|
It seems following codes returns -EFBIG LASSERT(obj->oo_sa_hdl != NULL); LASSERT(oh->ot_tx != NULL); dmu_tx_hold_sa(oh->ot_tx, obj->oo_sa_hdl, 0); if (oh->ot_tx->tx_err != 0) GOTO(out, rc = -oh->ot_tx->tx_err); 00080000:00000001:0.0:1462903904.567380:0:13254:0:(osd_object.c:861:osd_declare_attr_set()) Process entered 00080000:00000001:0.0:1462903904.567381:0:13254:0:(osd_object.c:877:osd_declare_attr_set()) Process leaving via out (rc=18446744073709551589 : -27 : 0xffffffffffffffe5) 00080000:00000001:0.0:1462903904.567382:0:13254:0:(osd_object.c:918:osd_declare_attr_set()) Process leaving (rc=18446744073709551589 : -27 : ffffffffffffffe5) 00002000:00000001:0.0:1462903904.567383:0:13254:0:(ofd_io.c:1079:ofd_commitrw_write()) Process leaving via out_stop (rc=18446744073709551589 : -27 : 0xffffffffffffffe5) 00080000:00000001:0.0:1462903904.567384:0:13254:0:(osd_handler.c:282:osd_trans_stop()) Process entered 00040000:00000001:0.0:1462903904.567386:0:13254:0:(qsd_handler.c:1073:qsd_op_end()) Process entered 00040000:00000001:0.0:1462903904.567387:0:13254:0:(qsd_handler.c:1101:qsd_op_end()) Process leaving 00080000:00000010:0.0:1462903904.567387:0:13254:0:(osd_handler.c:296:osd_trans_stop()) kfreed 'oh': 384 at ffff880021666a00. 00080000:00000001:0.0:1462903904.567388:0:13254:0:(osd_handler.c:297:osd_trans_stop()) Process leaving (rc=0 : 0 : 0) 00000020:00000001:0.0:1462903904.567949:0:13254:0:(lustre_fid.h:740:fid_flatten32()) Process leaving (rc=252707909 : 252707909 : f100445) 00000020:00000001:0.0:1462903904.567951:0:13254:0:(lustre_fid.h:740:fid_flatten32()) Process leaving (rc=252707909 : 252707909 : f100445) 00000020:00000002:0.0:1462903904.567952:0:13254:0:(lu_object.c:162:lu_object_put()) Add ffff8800461c7178 to site lru. hash: ffff8800494dc900, bkt: ffff88004ced5118, lru_len: 2 00002000:00000001:0.0:1462903904.567954:0:13254:0:(ofd_grant.c:1268:ofd_grant_commit()) Process entered 00002000:00000001:0.0:1462903904.567955:0:13254:0:(ofd_grant.c:1323:ofd_grant_commit()) Process leaving 00002000:00000001:0.0:1462903904.567956:0:13254:0:(ofd_io.c:1138:ofd_commitrw_write()) Process leaving (rc=18446744073709551589 : -27 : ffffffffffffffe5) 00002000:00000001:0.0:1462903904.567957:0:13254:0:(ofd_io.c:1255:ofd_commitrw()) Process leaving (rc=18446744073709551589 : -27 : ffffffffffffffe5) 00000020:00000001:0.0:1462903904.567958:0:13254:0:(obd_class.h:1129:obd_commitrw()) Process leaving (rc=18446744073709551589 : -27 : ffffffffffffffe5) |
| Comment by Jinshan Xiong (Inactive) [ 13/May/16 ] |
|
Alex may provide some feedback on this issue. |
| Comment by Alex Zhuravlev [ 16/May/16 ] |
|
this is because of the limit on a single tx: if (txh->txh_space_towrite + txh->txh_space_tooverwrite > 2 * DMU_MAX_ACCESS) err = SET_ERROR(EFBIG); #define DMU_MAX_ACCESS (64 * 1024 * 1024) /* 64MB */ ZFS is known for huge overestimation.. I guess we hit this with large writes. |
| Comment by Jinshan Xiong (Inactive) [ 16/May/16 ] |
|
I remembered that Matt once mentioned that it's pretty safe to reduce spa_asize_inflation, which is defined to 24 by default, to a reasonable value, and then try if it can fix this problem. spa_asize_inflation is a module parameter that can be set on the fly by setting '/sys/module/zfs/parameters/spa_asize_inflation'. |
| Comment by Jinshan Xiong (Inactive) [ 06/Sep/16 ] |
|
After taking a further look, I realized that this ticket reveals a severe problem in ZFS in terms of space reservation, especially when large block size is used. In the current implementation of client, it could send discontiguous pages in the same RPC for optimal throughput. However, If those pages in the same RPC locate different blocks in underlying ZFS, it will need to reserve humongous space in the transaction. For ZFS with 1MB block size and 16MB RPC size, it would need to reserve 4GB space in one trans in the worst case. In comparison, current ZFS reports error when a trans size is over 64MB. This potential issue even exists with 1MB RPC size and 1MB block size in ZFS. I'm going to create a test case to demonstrate the problem. To fix this problem, I would propose a new parameter from OFD to client that limits the maximum of chunks in the same RPC. |
| Comment by Patrick Farrell (Inactive) [ 07/Sep/16 ] |
|
Jinshan - Since you asked me to take a look... Nasty problem, an interesting side effect of the ZFS design. Your solution sounds good to me, it should solve the problem and allows the more common case (one or just a few chunks) to benefit from large RPC size. I'll be curious to see the exact patch - I'm not sure how the server would communicate this limit to the client. |
| Comment by Gerrit Updater [ 07/Sep/16 ] |
|
Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: http://review.whamcloud.com/22369 |
| Comment by Gerrit Updater [ 21/Sep/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/22369/ |
| Comment by Peter Jones [ 21/Sep/16 ] |
|
Landed for 2.9 |