Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13751

sanity test_160j: FAIL: read changelog failed

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.14.0
    • Lustre 2.14.0
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for jianyu <yujian@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/737a1583-79d3-4fbc-a7a8-97e2c2a459e2

      test_160j failed with the following error:

      cat: -: Cannot send after transport endpoint shutdown
       sanity test_160j: @@@@@@ FAIL: read changelog failed
      

      Console log on client:

      Lustre: DEBUG MARKER: mount -t lustre -o user_xattr,flock trevis-22vm4@tcp:/lustre /mnt/lustre2
      Lustre: Mounted lustre-client
      Lustre: 656:0:(llog_cat.c:834:llog_cat_process_common()) lustre-MDT0000-mdc-ffff8b40363b1000: can't find llog handle [0x51f:0x1:0x0]:0: rc = -108
      LustreError: 656:0:(mdc_changelog.c:335:chlg_load()) lustre-MDT0000-mdc-ffff8b40363b1000: fail to process llog: rc = -108
      Lustre: Unmounted lustre-client
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity test_160j: @@@@@@ FAIL: read changelog failed 
      

      More failure instances on master branch:
      https://testing.whamcloud.com/test_sets/91dbe253-024d-439c-8d6b-e025071d97a7
      https://testing.whamcloud.com/test_sets/d9e11ade-9442-4759-b61e-18635b197bea
      https://testing.whamcloud.com/test_sets/b21e9658-39e1-4e46-b916-84ba852b553c

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity test_160j - read changelog failed

      Attachments

        Issue Links

          Activity

            [LU-13751] sanity test_160j: FAIL: read changelog failed
            pjones Peter Jones added a comment -

            Landed for 2.14

            pjones Peter Jones added a comment - Landed for 2.14

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41317/
            Subject: LU-13751 tests: remove read of changelog sanity 160j
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 388e8ed199b47337616beba573cb595343e71cca

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41317/ Subject: LU-13751 tests: remove read of changelog sanity 160j Project: fs/lustre-release Branch: master Current Patch Set: Commit: 388e8ed199b47337616beba573cb595343e71cca

            James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41317
            Subject: LU-13751 tests: remove error on changelog read 160j
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: bf523dd41fd60ec0f585b898869567328394aeca

            gerrit Gerrit Updater added a comment - James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41317 Subject: LU-13751 tests: remove error on changelog read 160j Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: bf523dd41fd60ec0f585b898869567328394aeca
            pjones Peter Jones added a comment -

            Thanks Mike. James will look into making that change.

            pjones Peter Jones added a comment - Thanks Mike. James will look into making that change.

            I tend to think this is test script issue, the llog read from client sends RPC to server and ptlrpc_wait_queue() may ends up with -ESHUTDOWN on umount, so that is OK I'd say. Originally this test 160j was added in context of LU-11626 to check there is no LBUG due to missed obd device so correctness of this test is not about 'changelog must be read after umount' but about the server shouldn't see LBUG during that. Therefore I think we should just consider that error as valid case during the test

            tappro Mikhail Pershin added a comment - I tend to think this is test script issue, the llog read from client sends RPC to server and ptlrpc_wait_queue() may ends up with -ESHUTDOWN on umount, so that is OK I'd say. Originally this test 160j was added in context of LU-11626 to check there is no LBUG due to missed obd device so correctness of this test is not about 'changelog must be read after umount' but about the server shouldn't see LBUG during that. Therefore I think we should just consider that error as valid case during the test
            jhammond John Hammond added a comment -

            The first failure I could find with this error message was https://testing.whamcloud.com/sub_tests/57f6774c-e2bb-11e9-9874-52540065bddc

            jhammond John Hammond added a comment - The first failure I could find with this error message was https://testing.whamcloud.com/sub_tests/57f6774c-e2bb-11e9-9874-52540065bddc
            pjones Peter Jones added a comment -

            Mike

            Could you please investigate?

            Thanks

            Peter

            pjones Peter Jones added a comment - Mike Could you please investigate? Thanks Peter
            yujian Jian Yu added a comment -

            The failure occurred 9 times in the past one week.

            yujian Jian Yu added a comment - The failure occurred 9 times in the past one week.
            yujian Jian Yu added a comment -

            This failure occurred 8 times in the past two weeks. It's affecting the patch testing on master branch.

            yujian Jian Yu added a comment - This failure occurred 8 times in the past two weeks. It's affecting the patch testing on master branch.

            People

              jamesanunez James Nunez (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: