[LU-13222] sanity test 77k fails with 'test_77k returned 3' Created: 07/Feb/20 Updated: 07/Feb/20 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.14.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | ppc | ||
| Environment: |
PPC clients |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
sanity test 77k fails with 'test_77k returned 3' fails 100% of the time for PPC client testing. Looking at a recent failure at https://testing.whamcloud.com/test_sets/84995f3c-492f-11ea-b69a-52540065bddc, the suite_log doesn’t have explicit errors, but it looks like the checksums parameter is not being updated or updated after 90 seconds remount client, checksum should be 0 CMD: trevis-77vm3.trevis.whamcloud.com grep -c /mnt/lustre' ' /proc/mounts Stopping client trevis-77vm3.trevis.whamcloud.com /mnt/lustre (opts:) CMD: trevis-77vm3.trevis.whamcloud.com lsof -t /mnt/lustre CMD: trevis-77vm3.trevis.whamcloud.com umount /mnt/lustre 2>&1 Starting client: trevis-77vm3.trevis.whamcloud.com: -o user_xattr,flock trevis-37vm12@tcp:/lustre /mnt/lustre CMD: trevis-77vm3.trevis.whamcloud.com mkdir -p /mnt/lustre CMD: trevis-77vm3.trevis.whamcloud.com mount -t lustre -o user_xattr,flock trevis-37vm12@tcp:/lustre /mnt/lustre CMD: trevis-37vm12 /usr/sbin/lctl set_param -P osc.lustre*.checksums=1 CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1 CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1 Waiting 90 secs for update CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1 … CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1 CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1 Update not seen after 90s: wanted '1' got '0' remount client, checksum should be 1 Then later, we see a similar issue. It looks like the checksums is now equal to 1, but we want to change it to 0 remount client with option nochecksum, checksum should be 0 10.9.3.144@tcp:/lustre /mnt/lustre lustre rw,flock,user_xattr,lazystatfs 0 0 CMD: trevis-77vm3.trevis.whamcloud.com grep -c /mnt/lustre' ' /proc/mounts Stopping client trevis-77vm3.trevis.whamcloud.com /mnt/lustre (opts:) … CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1 CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1 Update not seen after 90s: wanted '0' got '1' test_77k returned 3 FAIL 77k (188s) Logs for some of the sanity test 77k failures are at |