Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.14.0
-
PPC clients
-
3
-
9223372036854775807
Description
sanity test 77k fails with 'test_77k returned 3' fails 100% of the time for PPC client testing. Looking at a recent failure at https://testing.whamcloud.com/test_sets/84995f3c-492f-11ea-b69a-52540065bddc, the suite_log doesn’t have explicit errors, but it looks like the checksums parameter is not being updated or updated after 90 seconds
remount client, checksum should be 0 CMD: trevis-77vm3.trevis.whamcloud.com grep -c /mnt/lustre' ' /proc/mounts Stopping client trevis-77vm3.trevis.whamcloud.com /mnt/lustre (opts:) CMD: trevis-77vm3.trevis.whamcloud.com lsof -t /mnt/lustre CMD: trevis-77vm3.trevis.whamcloud.com umount /mnt/lustre 2>&1 Starting client: trevis-77vm3.trevis.whamcloud.com: -o user_xattr,flock trevis-37vm12@tcp:/lustre /mnt/lustre CMD: trevis-77vm3.trevis.whamcloud.com mkdir -p /mnt/lustre CMD: trevis-77vm3.trevis.whamcloud.com mount -t lustre -o user_xattr,flock trevis-37vm12@tcp:/lustre /mnt/lustre CMD: trevis-37vm12 /usr/sbin/lctl set_param -P osc.lustre*.checksums=1 CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1 CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1 Waiting 90 secs for update CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1 … CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1 CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1 Update not seen after 90s: wanted '1' got '0' remount client, checksum should be 1
Then later, we see a similar issue. It looks like the checksums is now equal to 1, but we want to change it to 0
remount client with option nochecksum, checksum should be 0 10.9.3.144@tcp:/lustre /mnt/lustre lustre rw,flock,user_xattr,lazystatfs 0 0 CMD: trevis-77vm3.trevis.whamcloud.com grep -c /mnt/lustre' ' /proc/mounts Stopping client trevis-77vm3.trevis.whamcloud.com /mnt/lustre (opts:) … CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1 CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1 Update not seen after 90s: wanted '0' got '1' test_77k returned 3 FAIL 77k (188s)
Logs for some of the sanity test 77k failures are at
https://testing.whamcloud.com/test_sets/938aedee-33fe-11ea-bb75-52540065bddc
https://testing.whamcloud.com/test_sets/6683367e-e757-11e9-b62b-52540065bddc