Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13222

sanity test 77k fails with 'test_77k returned 3'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.14.0
    • PPC clients
    • 3
    • 9223372036854775807

    Description

      sanity test 77k fails with 'test_77k returned 3' fails 100% of the time for PPC client testing. Looking at a recent failure at https://testing.whamcloud.com/test_sets/84995f3c-492f-11ea-b69a-52540065bddc, the suite_log doesn’t have explicit errors, but it looks like the checksums parameter is not being updated or updated after 90 seconds

      remount client, checksum should be 0
      CMD: trevis-77vm3.trevis.whamcloud.com grep -c /mnt/lustre' ' /proc/mounts
      Stopping client trevis-77vm3.trevis.whamcloud.com /mnt/lustre (opts:)
      CMD: trevis-77vm3.trevis.whamcloud.com lsof -t /mnt/lustre
      CMD: trevis-77vm3.trevis.whamcloud.com umount  /mnt/lustre 2>&1
      Starting client: trevis-77vm3.trevis.whamcloud.com:  -o user_xattr,flock trevis-37vm12@tcp:/lustre /mnt/lustre
      CMD: trevis-77vm3.trevis.whamcloud.com mkdir -p /mnt/lustre
      CMD: trevis-77vm3.trevis.whamcloud.com mount -t lustre -o user_xattr,flock trevis-37vm12@tcp:/lustre /mnt/lustre
      CMD: trevis-37vm12 /usr/sbin/lctl set_param -P osc.lustre*.checksums=1
      CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1
      CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1
      Waiting 90 secs for update
      CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1
      …
      CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1
      CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1
      Update not seen after 90s: wanted '1' got '0'
      remount client, checksum should be 1
      

      Then later, we see a similar issue. It looks like the checksums is now equal to 1, but we want to change it to 0

      remount client with option nochecksum, checksum should be 0
      10.9.3.144@tcp:/lustre /mnt/lustre lustre rw,flock,user_xattr,lazystatfs 0 0
      CMD: trevis-77vm3.trevis.whamcloud.com grep -c /mnt/lustre' ' /proc/mounts
      Stopping client trevis-77vm3.trevis.whamcloud.com /mnt/lustre (opts:)
      …
      CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1
      CMD: trevis-77vm3.trevis.whamcloud.com /usr/sbin/lctl get_param -n osc.lustre*.checksums | head -n1
      Update not seen after 90s: wanted '0' got '1'
      test_77k returned 3
      FAIL 77k (188s)
      

      Logs for some of the sanity test 77k failures are at
      https://testing.whamcloud.com/test_sets/938aedee-33fe-11ea-bb75-52540065bddc
      https://testing.whamcloud.com/test_sets/6683367e-e757-11e9-b62b-52540065bddc

      Attachments

        Activity

          People

            wc-triage WC Triage
            jamesanunez James Nunez (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: