[LU-5361] sanity test_77i: algo set to crc32 instead of adler Created: 16/Jul/14 Updated: 22/May/18 Resolved: 14/Mar/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.9.0 |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 14946 | ||||||||
| Description |
|
This issue was created by maloo for John Hammond <john.hammond@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/307abe62-0cd8-11e4-8ce3-5254006e85c2. The sub-test test_77i failed with the following error:
Info required for matching: sanity 77i |
| Comments |
| Comment by John Hammond [ 16/Jul/14 ] |
|
Seems like a bogus test to me. There's no guarantee that adler will be faster than crc32 all the time. |
| Comment by James A Simmons [ 16/Jul/14 ] |
|
Completely agree. On systems with hardware accelerated crc32 it will beat alder32 hands down. |
| Comment by John Hammond [ 16/Jul/14 ] |
|
Note also that we have recently changed the crypto speed test code. See LU-5063 and |
| Comment by Andreas Dilger [ 16/Jul/14 ] |
|
When the test was originally written, the "adler" algorithm was always available since the availability had nothing to do with performance. The change to make only a subset of algorithms available to the client was done in 2.2.58 so that clients would not choose the locally-fastest algorithm at the expense of selecting a slow algorithm on the server. That said, this test should be exercising the client checksum selection if OBD_CONNECT_CKSUM is not available, in which case it should select adler: test_77i() { # bug 13805
#define OBD_FAIL_OSC_CONNECT_CKSUM 0x40b
lctl set_param fail_loc=0x40b
remount_client $MOUNT
lctl set_param fail_loc=0
for VALUE in `lctl get_param osc.*osc-[^mM]*.checksum_type`; do
PARAM=`echo ${VALUE[0]} | cut -d "=" -f1`
algo=`lctl get_param -n $PARAM | sed 's/.*\[\(.*\)\].*/\1/g'`
[ "$algo" = "adler" ] || error "algo set to $algo instead of adler"
done
remount_client $MOUNT
}
run_test 77i "client not supporting OSD_CONNECT_CKSUM"
if (!OBD_FAIL_CHECK(OBD_FAIL_OSC_CONNECT_CKSUM)) { /* OBD_CONNECT_CKSUM should always be set, even if checksums are * disabled by default, because it can still be enabled on the * fly via /proc. As a consequence, we still need to come to an * agreement on the supported algorithms at connect time */ data->ocd_connect_flags |= OBD_CONNECT_CKSUM; if (OBD_FAIL_CHECK(OBD_FAIL_OSC_CKSUM_ADLER_ONLY)) data->ocd_cksum_types = OBD_CKSUM_ADLER; else data->ocd_cksum_types = cksum_types_supported_client(); } if (ocd->ocd_connect_flags & OBD_CONNECT_CKSUM) { /* We sent to the server ocd_cksum_types with bits set * for algorithms we understand. The server masked off * the checksum types it doesn't support */ : : } else { /* The server does not support OBD_CONNECT_CKSUM. * Enforce ADLER for backward compatibility*/ cli->cl_supp_cksum_types = OBD_CKSUM_ADLER; } though the old comments and debug message in ofd_parse_connect_data() would have you think otherwise (they were not updated as part of http://review.whamcloud.com/3098 when the fallback was changed from crc32 to adler), but that should not actually have any effect on operation: } else { /* This client does not support OBD_CONNECT_CKSUM * fall back to CRC32 */ CDEBUG(D_RPCTRACE, "%s: cli %s does not support " "OBD_CONNECT_CKSUM, CRC32 will be used\n", exp->exp_obd->obd_name, obd_export_nid2str(exp)); } So this test shouldn't have anything to do with the relative speed of the algorithms. It may be that the test is racy and the fail_loc is cleared before all of the OSC connections have been made? It might make sense to move the fail_loc=0 line to just before the second remount to avoid this problem. |
| Comment by Andreas Dilger [ 16/Jul/14 ] |
|
When fixing this, the test_77i() code could be cleaned up a bit. It uses cut -d= when it could be using lctl get_param -N to just print the parameter names. It should also use $(...) for subshells instead of `...`, and it should print out the OSC device on which the failure(s) happened. Actually, the whole check could be simplified to: lctl get_param osc.*osc-[^mM]*.checksum_type | grep -v '\[adler\]' && error "checksum algorithm not set to adler" |
| Comment by Andreas Dilger [ 16/Jul/14 ] |
|
Note also that this test failure appears to have been around a long time and is unlikely to be (directly) related to the checksum algorithm changes.
I also see several cases of this test failing if there is an error reading from /proc, but I think those predate the 6205 patch, which prints out the actual checksum algorithm used and doesn't just fail if '[adler]' is not found. |
| Comment by James Nunez (Inactive) [ 16/Jul/15 ] |
|
Another instance of this error: |
| Comment by nasf (Inactive) [ 16/Oct/15 ] |
|
Another failure instance on master: |
| Comment by Bruno Faccini (Inactive) [ 19/Oct/15 ] |
|
+1 again on master : https://testing.hpdd.intel.com/test_sets/aa999490-764d-11e5-86b7-5254006e85c2 |
| Comment by James Nunez (Inactive) [ 30/Oct/15 ] |
|
Another failure on master at https://testing.hpdd.intel.com/test_sets/1654529c-7e46-11e5-965a-5254006e85c2 |
| Comment by Emoly Liu [ 02/Nov/15 ] |
|
Another failure at https://testing.hpdd.intel.com/test_sets/1654529c-7e46-11e5-965a-5254006e85c2 |
| Comment by Andreas Dilger [ 28/Jun/16 ] |
|
The algorithm used depends on the CPU hardware support. It may be that CRC32 is a lot faster than Adler, so Adler is not available. That probably needs a fix to the test. |
| Comment by Gerrit Updater [ 08/Nov/16 ] |
|
Ashish Purkar (ashish.purkar@seagate.com) uploaded a new patch: http://review.whamcloud.com/23644 |
| Comment by Jian Yu [ 05/Dec/16 ] |
|
One more failure instance on master branch: https://testing.hpdd.intel.com/test_sets/a136d632-b36a-11e6-a0b8-5254006e85c2 |
| Comment by Gerrit Updater [ 03/Mar/17 ] |
|
Abrarahmed Momin (kais_abrar@yahoo.co.in) uploaded a new patch: https://review.whamcloud.com/25745 |
| Comment by Gerrit Updater [ 14/Mar/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23644/ |
| Comment by Peter Jones [ 14/Mar/17 ] |
|
Landed for 2.10 |