Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13232

sanity test 160j fails with 'read changelog failed'

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.14.0
    • Lustre 2.14.0
    • PPC clients
    • 3
    • 9223372036854775807

    Description

      sanity test_160j fails with 'read changelog failed' for PPC client testing 100% of the time.

      Looking at a recent failure at https://testing.whamcloud.com/test_sets/d3720002-4a27-11ea-b69a-52540065bddc, the actual error is a problem with the input to cat

      Registered 1 changelog users: 'cl3'
      total: 2 create in 0.00 seconds: 1052.66 ops/second
      cat: -: Invalid argument
       sanity test_160j: @@@@@@ FAIL: read changelog failed 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:6121:error()
        = /usr/lib64/lustre/tests/sanity.sh:14350:test_160j()
      

      The code that is failing in sanity test 160j is

      14341         # read changelog
      14342         cat <&4 >/dev/null || error "read changelog failed"
      

      Looking at the client1 (vm12) console log, we see

      [ 5314.374481] Lustre: DEBUG MARKER: == sanity test 160j: client can be umounted while its chanangelog is being used ===================== 01:24:59 (1581125099)
      [ 5314.494530] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre2
      [ 5314.506580] Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock trevis-10vm12@tcp:/lustre /mnt/lustre2
      [ 5314.555637] Lustre: Mounted lustre-client
      [ 5315.555507] Lustre: 10940:0:(llog_cat.c:808:llog_cat_process_common()) lustre-MDT0000-mdc-c0000000b5687800: invalid record in catalog [0x5:0x0:0xa]:0: rc = -22
      [ 5315.555690] LustreError: 10940:0:(mdc_changelog.c:295:chlg_load()) lustre-MDT0000-mdc-c0000000b5687800: fail to process llog: rc = -22
      [ 5315.600825] Lustre: Unmounted lustre-client
      [ 5315.777197] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity test_160j: @@@@@@ FAIL: read changelog failed 
      

      sanity test 160j started failing for PPC clients as soon as it was first introduced/landed on 27 SEPT 2019.

      Logs for more PPC client sanity test 160j failures are at
      https://testing.whamcloud.com/test_sets/717d4832-1dba-11ea-80b4-52540065bddc
      https://testing.whamcloud.com/test_sets/5e7bd63a-f7af-11e9-b62b-52540065bddc

      Attachments

        Activity

          [LU-13232] sanity test 160j fails with 'read changelog failed'
          pjones Peter Jones added a comment -

          Landed for 2.14

          pjones Peter Jones added a comment - Landed for 2.14

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37550/
          Subject: LU-13232 tests: add stack_trap to clean up sanity 160j
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 4891c873184ad2fc3e90abc769456166998cace3

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37550/ Subject: LU-13232 tests: add stack_trap to clean up sanity 160j Project: fs/lustre-release Branch: master Current Patch Set: Commit: 4891c873184ad2fc3e90abc769456166998cace3

          James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37550
          Subject: LU-13232 tests: add stack_trap to clean up snaity 160j
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 9bfbb05901029b637a9b12262eeff181b8d70348

          gerrit Gerrit Updater added a comment - James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37550 Subject: LU-13232 tests: add stack_trap to clean up snaity 160j Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 9bfbb05901029b637a9b12262eeff181b8d70348

          Looking at the test itself, this is pretty clear:

                   # umount the first lustre mount
                   umount $MOUNT
          

          This should have stack_trap calls to undo the various changes in the test, like mount the client, unmount client2, close the file descriptors, etc. rather than doing this manually at the end of the test. I suspect with a simple patch to clean up after this failure that many of the following failures will go away also.

          adilger Andreas Dilger added a comment - Looking at the test itself, this is pretty clear: # umount the first lustre mount umount $MOUNT This should have stack_trap calls to undo the various changes in the test, like mount the client, unmount client2, close the file descriptors, etc. rather than doing this manually at the end of the test. I suspect with a simple patch to clean up after this failure that many of the following failures will go away also.

          This looks like it may be the root cause of many later failures. This test unmounts the client, then fails (likely because of unexpected output), then doesn't remount the client again. All of the later failures are because there is no Lustre client mounted.

          == sanity test 160j: client can be umounted  while its chanangelog is being used
          CMD: trevis-77vm7.trevis.whamcloud.com mount -t lustre -o user_xattr,flock trevis-10vm12@tcp:/lustre /mnt/lustre2
          :
          cat: -: Invalid argument
          sanity test_160j: @@@@@@ FAIL: read changelog failed
          
          adilger Andreas Dilger added a comment - This looks like it may be the root cause of many later failures. This test unmounts the client, then fails (likely because of unexpected output), then doesn't remount the client again. All of the later failures are because there is no Lustre client mounted. == sanity test 160j: client can be umounted while its chanangelog is being used CMD: trevis-77vm7.trevis.whamcloud.com mount -t lustre -o user_xattr,flock trevis-10vm12@tcp:/lustre /mnt/lustre2 : cat: -: Invalid argument sanity test_160j: @@@@@@ FAIL: read changelog failed

          People

            jamesanunez James Nunez (Inactive)
            jamesanunez James Nunez (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: