Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12570

sanity test 134a crash with SSK in use

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.13.0, Lustre 2.12.3
    • Lustre 2.13.0, Lustre 2.12.3
    • None
    • 3
    • 9223372036854775807

    Description

      When running sanity with shared key enabled, it always crashes in test 134a like this:

      LustreError: 32759:0:(ofd_internal.h:412:ofd_info()) ASSERTION( info ) failed:
      LustreError: 32759:0:(ofd_internal.h:412:ofd_info()) LBUG
      Pid: 32759, comm: mdt00_008 3.10.0-7.6-debug #1 SMP Fri Jul 12 02:40:17 EDT 2019
      Call Trace:
      [<ffffffffa01928cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [<ffffffffa019297c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [<ffffffffa0e94dca>] ofd_exit+0x0/0x236 [ofd]
      [<ffffffffa0e9397b>] ofd_lvbo_update+0xd2b/0xe30 [ofd]
      [<ffffffffa05ec99c>] ldlm_handle_ast_error+0x45c/0x820 [ptlrpc]
      [<ffffffffa05ee6ea>] ldlm_cb_interpret+0x19a/0x700 [ptlrpc]
      [<ffffffffa0608071>] ptlrpc_check_set.part.23+0x491/0x1e00 [ptlrpc]
      [<ffffffffa0609a3b>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc]
      [<ffffffffa0609ddc>] ptlrpc_set_wait+0x31c/0x790 [ptlrpc]
      [<ffffffffa05c7e35>] ldlm_run_ast_work+0xd5/0x380 [ptlrpc]
      [<ffffffffa05fe8c5>] ldlm_reclaim_full+0x425/0x7a0 [ptlrpc]
      [<ffffffffa05f0338>] ldlm_handle_enqueue0+0x138/0x15d0 [ptlrpc]
      [<ffffffffa0676b42>] tgt_enqueue+0x62/0x210 [ptlrpc]
      [<ffffffffa067ef85>] tgt_request_handle+0x985/0x1630 [ptlrpc]
      [<ffffffffa0622568>] ptlrpc_server_handle_request+0x258/0xb00 [ptlrpc]
      [<ffffffffa062670a>] ptlrpc_main+0xcba/0x2500 [ptlrpc]
      [<ffffffff810b4ed4>] kthread+0xe4/0xf0
      [<ffffffff817c8c5d>] ret_from_fork_nospec_begin+0x7/0x21
      [<ffffffffffffffff>] 0xffffffffffffffff
      

      Attachments

        Issue Links

          Activity

            [LU-12570] sanity test 134a crash with SSK in use

            Hi,

            I never hit this crash myself, and I have many examples of sanity test 134a passing with SSK enabled, for instance all custom-103 sessions triggered from https://review.whamcloud.com/34380 (latest one is run with patch rebased on August, 1st).

            Not sure it is an issue with SSK, I am wondering if the crash you experience stills occurs with the modification suggested by Alex (LCT_DT_THREAD).

            sebastien Sebastien Buisson added a comment - Hi, I never hit this crash myself, and I have many examples of sanity test 134a passing with SSK enabled, for instance all custom-103 sessions triggered from https://review.whamcloud.com/34380 (latest one is run with patch rebased on August, 1st). Not sure it is an issue with SSK, I am wondering if the crash you experience stills occurs with the modification suggested by Alex (LCT_DT_THREAD).

            Alex, is this just a matter of adding LCT_DT_THREAD to mds_start_ptlrpc_service() setting up the threads? What is the impact/overhead of doing this (if any)? Would it be better to limit ldlm_reclaim_ns() to only clean up locks in the same namespace type as mentioned in LU-12592?

            adilger Andreas Dilger added a comment - Alex, is this just a matter of adding LCT_DT_THREAD to mds_start_ptlrpc_service() setting up the threads? What is the impact/overhead of doing this (if any)? Would it be better to limit ldlm_reclaim_ns() to only clean up locks in the same namespace type as mentioned in LU-12592 ?
            green Oleg Drokin added a comment - sure thing. master: http://testing.linuxhacker.ru:3333/lustre-reports/1455/testresults/sanity2-ldiskfs-DNE-SSK-centos7_x86_64-centos7_x86_64/ b2_12: http://testing.linuxhacker.ru:3333/lustre-reports/1460/testresults/sanity-ldiskfs-DNE-SSK-centos7_x86_64-centos7_x86_64/

            it would be great to have a log/dump for the case

            bzzz Alex Zhuravlev added a comment - it would be great to have a log/dump for the case
            bzzz Alex Zhuravlev added a comment - - edited
            static int mds_start_ptlrpc_service(struct mds_device *m)
            ...
            			.tc_ctx_tags		= LCT_MD_THREAD,
            }
            

            i.e. LCT_DT_THREAD is missing?

            bzzz Alex Zhuravlev added a comment - - edited static int mds_start_ptlrpc_service(struct mds_device *m) ... .tc_ctx_tags = LCT_MD_THREAD, } i.e. LCT_DT_THREAD is missing?

            it's MDT thread which seem to be missing LCT_DT_THREAD ?

            bzzz Alex Zhuravlev added a comment - it's MDT thread which seem to be missing LCT_DT_THREAD ?

            People

              bzzz Alex Zhuravlev
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: