Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12570

sanity test 134a crash with SSK in use

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.13.0, Lustre 2.12.3
    • Lustre 2.13.0, Lustre 2.12.3
    • None
    • 3
    • 9223372036854775807

    Description

      When running sanity with shared key enabled, it always crashes in test 134a like this:

      LustreError: 32759:0:(ofd_internal.h:412:ofd_info()) ASSERTION( info ) failed:
      LustreError: 32759:0:(ofd_internal.h:412:ofd_info()) LBUG
      Pid: 32759, comm: mdt00_008 3.10.0-7.6-debug #1 SMP Fri Jul 12 02:40:17 EDT 2019
      Call Trace:
      [<ffffffffa01928cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [<ffffffffa019297c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [<ffffffffa0e94dca>] ofd_exit+0x0/0x236 [ofd]
      [<ffffffffa0e9397b>] ofd_lvbo_update+0xd2b/0xe30 [ofd]
      [<ffffffffa05ec99c>] ldlm_handle_ast_error+0x45c/0x820 [ptlrpc]
      [<ffffffffa05ee6ea>] ldlm_cb_interpret+0x19a/0x700 [ptlrpc]
      [<ffffffffa0608071>] ptlrpc_check_set.part.23+0x491/0x1e00 [ptlrpc]
      [<ffffffffa0609a3b>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc]
      [<ffffffffa0609ddc>] ptlrpc_set_wait+0x31c/0x790 [ptlrpc]
      [<ffffffffa05c7e35>] ldlm_run_ast_work+0xd5/0x380 [ptlrpc]
      [<ffffffffa05fe8c5>] ldlm_reclaim_full+0x425/0x7a0 [ptlrpc]
      [<ffffffffa05f0338>] ldlm_handle_enqueue0+0x138/0x15d0 [ptlrpc]
      [<ffffffffa0676b42>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [<ffffffffa067ef85>] tgt_request_handle+0x985/0x1630 [ptlrpc]
      [<ffffffffa0622568>] ptlrpc_server_handle_request+0x258/0xb00 [ptlrpc]
      [<ffffffffa062670a>] ptlrpc_main+0xcba/0x2500 [ptlrpc]
      [<ffffffff810b4ed4>] kthread+0xe4/0xf0
      [<ffffffff817c8c5d>] ret_from_fork_nospec_begin+0x7/0x21
      [<ffffffffffffffff>] 0xffffffffffffffff
      

      Attachments

        Issue Links

          Activity

            [LU-12570] sanity test 134a crash with SSK in use

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36312/
            Subject: LU-12570 mdt: request env for DT threads
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: dee23bcfb821e339ca3b3df2fb08fa946f7d09a1

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36312/ Subject: LU-12570 mdt: request env for DT threads Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: dee23bcfb821e339ca3b3df2fb08fa946f7d09a1

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36312
            Subject: LU-12570 mdt: request env for DT threads
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: a561e4848e6e03a380974ed74ee10d41a3c25a98

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36312 Subject: LU-12570 mdt: request env for DT threads Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: a561e4848e6e03a380974ed74ee10d41a3c25a98
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36179/
            Subject: LU-12570 mdt: request env for DT threads
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 1f94d5eb2be4e921e909d8f18523dcab91bb6531

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36179/ Subject: LU-12570 mdt: request env for DT threads Project: fs/lustre-release Branch: master Current Patch Set: Commit: 1f94d5eb2be4e921e909d8f18523dcab91bb6531

            Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36179
            Subject: LU-12570 mdt: request env for DT threads
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 2b586c40f22ed9e096478d9caf634c42b3fff224

            gerrit Gerrit Updater added a comment - Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36179 Subject: LU-12570 mdt: request env for DT threads Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2b586c40f22ed9e096478d9caf634c42b3fff224

            sorry for late response, adding LCT_DT_THREAD is not quite enough - the problem is that the client is trying to cancel extent locks sending them to MDT.
            iirc, Alexey Lyashkov reported this in another ticket.. will try to find that one.

            bzzz Alex Zhuravlev added a comment - sorry for late response, adding LCT_DT_THREAD is not quite enough - the problem is that the client is trying to cancel extent locks sending them to MDT. iirc, Alexey Lyashkov reported this in another ticket.. will try to find that one.

            Hi,

            I never hit this crash myself, and I have many examples of sanity test 134a passing with SSK enabled, for instance all custom-103 sessions triggered from https://review.whamcloud.com/34380 (latest one is run with patch rebased on August, 1st).

            Not sure it is an issue with SSK, I am wondering if the crash you experience stills occurs with the modification suggested by Alex (LCT_DT_THREAD).

            sebastien Sebastien Buisson added a comment - Hi, I never hit this crash myself, and I have many examples of sanity test 134a passing with SSK enabled, for instance all custom-103 sessions triggered from https://review.whamcloud.com/34380 (latest one is run with patch rebased on August, 1st). Not sure it is an issue with SSK, I am wondering if the crash you experience stills occurs with the modification suggested by Alex (LCT_DT_THREAD).

            Alex, is this just a matter of adding LCT_DT_THREAD to mds_start_ptlrpc_service() setting up the threads? What is the impact/overhead of doing this (if any)? Would it be better to limit ldlm_reclaim_ns() to only clean up locks in the same namespace type as mentioned in LU-12592?

            adilger Andreas Dilger added a comment - Alex, is this just a matter of adding LCT_DT_THREAD to mds_start_ptlrpc_service() setting up the threads? What is the impact/overhead of doing this (if any)? Would it be better to limit ldlm_reclaim_ns() to only clean up locks in the same namespace type as mentioned in LU-12592 ?
            green Oleg Drokin added a comment - sure thing. master: http://testing.linuxhacker.ru:3333/lustre-reports/1455/testresults/sanity2-ldiskfs-DNE-SSK-centos7_x86_64-centos7_x86_64/ b2_12: http://testing.linuxhacker.ru:3333/lustre-reports/1460/testresults/sanity-ldiskfs-DNE-SSK-centos7_x86_64-centos7_x86_64/

            it would be great to have a log/dump for the case

            bzzz Alex Zhuravlev added a comment - it would be great to have a log/dump for the case

            People

              bzzz Alex Zhuravlev
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: