[LU-13222] sanity test 77k fails with 'test_77k returned 3' Created: 07/Feb/20  Updated: 07/Feb/20

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: ppc
Environment:

PPC clients


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity test 77k fails with 'test_77k returned 3' fails 100% of the time for PPC client testing. Looking at a recent failure at https://testing.whamcloud.com/test_sets/84995f3c-492f-11ea-b69a-52540065bddc, the suite_log doesn’t have explicit errors, but it looks like the checksums parameter is not being updated or updated after 90 seconds

remount client, checksum should be 0
CMD: trevis-77vm3.trevis.whamcloud.com grep -c /mnt/lustre' ' /proc/mounts
Stopping client trevis-77vm3.trevis.whamcloud.com /mnt/lustre (opts:)
CMD: trevis-77vm3.trevis.whamcloud.com lsof -t /mnt/lustre
CMD: trevis-77vm3.trevis.whamcloud.com umount  /mnt/lustre 2>&1
Starting client: trevis-77vm3.trevis.whamcloud.com:  -o user_xattr,flock trevis-37vm12@tcp:/lustre /mnt/lustre
CMD: trevis-77vm3.trevis.whamcloud.com mkdir -p /mnt/lustre
CMD: trevis-77vm3.trevis.whamcloud.com mount -t lustre -o user_xattr,flock trevis-37vm12@tcp:/lustre /mnt/lustre
CMD: trevis-37vm12 /usr/sbin/lctl set_param -P osc.lustre*.checksums=1
CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1
CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1
Waiting 90 secs for update
CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1
…
CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1
CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1
Update not seen after 90s: wanted '1' got '0'
remount client, checksum should be 1

Then later, we see a similar issue. It looks like the checksums is now equal to 1, but we want to change it to 0

remount client with option nochecksum, checksum should be 0
10.9.3.144@tcp:/lustre /mnt/lustre lustre rw,flock,user_xattr,lazystatfs 0 0
CMD: trevis-77vm3.trevis.whamcloud.com grep -c /mnt/lustre' ' /proc/mounts
Stopping client trevis-77vm3.trevis.whamcloud.com /mnt/lustre (opts:)
…
CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1
CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1
Update not seen after 90s: wanted '0' got '1'
test_77k returned 3
FAIL 77k (188s)

Logs for some of the sanity test 77k failures are at
https://testing.whamcloud.com/test_sets/938aedee-33fe-11ea-bb75-52540065bddc
https://testing.whamcloud.com/test_sets/6683367e-e757-11e9-b62b-52540065bddc


Generated at Sat Feb 10 02:59:26 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.