Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12865

sanity test 160f fails with ‘mds1: User cl6 not registered’

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.14.0, Lustre 2.12.6
    • Lustre 2.13.0, Lustre 2.12.3, Lustre 2.12.4, Lustre 2.12.5
    • None
    • 3
    • 9223372036854775807

    Description

      sanity test_160f fails with ‘mds1: User cl6 not registered’. So far this year, there have been 52 sanity test 160f failures with this error; 36 of those failures are for ARM clients.

      Looking at the suite_log for a recent failure, https://testing.whamcloud.com/test_sets/8bedad40-ebd5-11e9-b62b-52540065bddc, we see that user cl6 is registered and that we are able to manipulate the changelog register prior to the error

      == sanity test 160f: changelog garbage collect (timestamped users) =================================== 20:27:47 (1570566467)
      CMD: trevis-49vm2 /usr/sbin/lctl get_param mdd.lustre-MDT0000.changelog_mask -n
      CMD: trevis-49vm2 /usr/sbin/lctl set_param mdd.lustre-MDT0000.changelog_mask=+hsm
      mdd.lustre-MDT0000.changelog_mask=+hsm
      CMD: trevis-49vm2 /usr/sbin/lctl --device lustre-MDT0000 changelog_register -n
      CMD: trevis-49vm3 /usr/sbin/lctl get_param mdd.lustre-MDT0001.changelog_mask -n
      CMD: trevis-49vm3 /usr/sbin/lctl set_param mdd.lustre-MDT0001.changelog_mask=+hsm
      mdd.lustre-MDT0001.changelog_mask=+hsm
      CMD: trevis-49vm3 /usr/sbin/lctl --device lustre-MDT0001 changelog_register -n
      CMD: trevis-49vm2 /usr/sbin/lctl get_param mdd.lustre-MDT0002.changelog_mask -n
      CMD: trevis-49vm2 /usr/sbin/lctl set_param mdd.lustre-MDT0002.changelog_mask=+hsm
      mdd.lustre-MDT0002.changelog_mask=+hsm
      CMD: trevis-49vm2 /usr/sbin/lctl --device lustre-MDT0002 changelog_register -n
      CMD: trevis-49vm3 /usr/sbin/lctl get_param mdd.lustre-MDT0003.changelog_mask -n
      CMD: trevis-49vm3 /usr/sbin/lctl set_param mdd.lustre-MDT0003.changelog_mask=+hsm
      mdd.lustre-MDT0003.changelog_mask=+hsm
      CMD: trevis-49vm3 /usr/sbin/lctl --device lustre-MDT0003 changelog_register -n
      Registered 4 changelog users: 'cl6 cl6 cl6 cl6'
      …
      mds1: verifying user cl6 clear:  19 + 2 == 21
      CMD: trevis-49vm2 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users
      CMD: trevis-49vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0001.changelog_users
      CMD: trevis-49vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0001.changelog_users
      lustre-MDT0001: clear the changelog for cl6 to record #10
      CMD: trevis-49vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0001.changelog_users
      mds2: verifying user cl6 clear:  8 + 2 == 10
      CMD: trevis-49vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0001.changelog_users
      CMD: trevis-49vm2 /usr/sbin/lctl get_param -n mdd.lustre-MDT0002.changelog_users
      CMD: trevis-49vm2 /usr/sbin/lctl get_param -n mdd.lustre-MDT0002.changelog_users
      lustre-MDT0002: clear the changelog for cl6 to record #2
      CMD: trevis-49vm2 /usr/sbin/lctl get_param -n mdd.lustre-MDT0002.changelog_users
      mds3: verifying user cl6 clear:  0 + 2 == 2
      CMD: trevis-49vm2 /usr/sbin/lctl get_param -n mdd.lustre-MDT0002.changelog_users
      CMD: trevis-49vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0003.changelog_users
      CMD: trevis-49vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0003.changelog_users
      lustre-MDT0003: clear the changelog for cl6 to record #2
      CMD: trevis-49vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0003.changelog_users
      mds4: verifying user cl6 clear:  0 + 2 == 2
      CMD: trevis-49vm3 /usr/sbin/lctl get_param -n mdd.lustre-MDT0003.changelog_users
      total: 8 create in 0.02 seconds: 453.39 ops/second
      CMD: trevis-49vm2 ps -e -o comm= | grep chlg_gc_thread
      pdsh@trevis-79vm17: trevis-49vm2: ssh exited with exit code 1
      CMD: trevis-49vm2 ps -e -o comm= | grep chlg_gc_thread
      pdsh@trevis-79vm17: trevis-49vm2: ssh exited with exit code 1
      CMD: trevis-49vm3 ps -e -o comm= | grep chlg_gc_thread
      pdsh@trevis-79vm17: trevis-49vm3: ssh exited with exit code 1
      CMD: trevis-49vm3 ps -e -o comm= | grep chlg_gc_thread
      pdsh@trevis-79vm17: trevis-49vm3: ssh exited with exit code 1
      CMD: trevis-49vm2 /usr/sbin/lctl get_param -n mdd.lustre-MDT0000.changelog_users
       sanity test_160f: @@@@@@ FAIL: mds1: User cl6 not registered 
      

      There is no indication of a problem in any of the console logs.

      Logs for other failures are at
      https://testing.whamcloud.com/test_sets/ecc72250-ccfd-11e9-a25b-52540065bddc
      https://testing.whamcloud.com/test_sets/8da05bea-cff3-11e9-9fc9-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              adilger Andreas Dilger
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: