Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16648

sanity test_27M: crashed in lod_statfs_and_check()

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Serguei Smirnov <ssmirnov@ddn.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/4c631a9e-c248-43ee-b43b-10a84e86d158

      test_27M failed with the following error:

      onyx-75vm4 crashed during sanity test_27M
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-reviews/93105 - 4.18.0-372.32.1.el8_6.x86_64
      servers: https://build.whamcloud.com/job/lustre-reviews/93105 - 4.18.0-372.32.1.el8_lustre.x86_64

      BUG: unable to handle kernel paging request at 0000000000080000
      Oops: 0000 [#1] SMP PTI
      CPU: 1 PID: 10173 Comm: mdt00_002 4.18.0-425.10.1.el8_lustre.x86_64 #1
      Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      RIP: 0010:lod_statfs_and_check+0x87/0x5e0 [lod]
      Call Trace:
        lod_qos_prep_create+0xfe0/0x1320 [lod]
        lod_prepare_create+0x231/0x320 [lod]
        lod_declare_striped_create+0x291/0x920 [lod]
        lod_declare_create+0x27c/0x530 [lod]
        mdd_declare_create_object_internal+0xcd/0x370 [mdd]
        mdd_declare_create_object.isra.37+0x49/0x880 [mdd]
        mdd_declare_create+0x72/0x490 [mdd]
        mdd_create+0x8a2/0x1a30 [mdd]
        mdt_reint_open+0x2d20/0x3180 [mdt]
        mdt_reint_rec+0x11f/0x270 [mdt]
        mdt_reint_internal+0x4d3/0x7f0 [mdt]
        mdt_intent_open+0x13b/0x420 [mdt]
        mdt_intent_opc+0x12c/0xbf0 [mdt]
        mdt_intent_policy+0x20b/0x3a0 [mdt]
        ldlm_lock_enqueue+0x47f/0xb20 [ptlrpc]
        ldlm_handle_enqueue0+0x634/0x1760 [ptlrpc]
        tgt_enqueue+0xa4/0x220 [ptlrpc]
        tgt_request_handle+0xcc3/0x1920 [ptlrpc]
        ptlrpc_server_handle_request+0x31d/0xbc0 [ptlrpc]
        ptlrpc_main+0xc52/0x1510 [ptlrpc]
        kthread+0x10b/0x130
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity test_27M - onyx-75vm4 crashed during sanity test_27M

      Attachments

        Issue Links

          Activity

            [LU-16648] sanity test_27M: crashed in lod_statfs_and_check()

            It looks like this is fixed by the patch in LU-16872.

            adilger Andreas Dilger added a comment - It looks like this is fixed by the patch in LU-16872 .

            Excluding the above single failure with ASSERTION(tgt), the first similar failure was patch https://review.whamcloud.com/45822 "LU-14692 osp: deprecate IDIF sequence for MDT0000" on 2023-03-15 before that patch landed on 2023-03-21, then the other patches started hitting that failure, so it is very possible that this patch was the origin of this crash.

            There was also patch https://review.whamcloud.com/50074 "LU-16501 lod: add qos_ost_weights to debugfs" that landed on 2023-03-21 and touched code in lustre/lod, but that doesn't explain 45822 crashing before that patch landed.

            adilger Andreas Dilger added a comment - Excluding the above single failure with ASSERTION(tgt) , the first similar failure was patch https://review.whamcloud.com/45822 " LU-14692 osp: deprecate IDIF sequence for MDT0000 " on 2023-03-15 before that patch landed on 2023-03-21, then the other patches started hitting that failure, so it is very possible that this patch was the origin of this crash. There was also patch https://review.whamcloud.com/50074 " LU-16501 lod: add qos_ost_weights to debugfs " that landed on 2023-03-21 and touched code in lustre/lod , but that doesn't explain 45822 crashing before that patch landed.

            I've also seen one failure with "lod_statfs_and_check()) ASSERTION( tgt ) failed" (master, 6 days ago):
            https://testing.whamcloud.com/test_sets/4c631a9e-c248-43ee-b43b-10a84e86d158

            adilger Andreas Dilger added a comment - I've also seen one failure with " lod_statfs_and_check()) ASSERTION( tgt ) failed " (master, 6 days ago): https://testing.whamcloud.com/test_sets/4c631a9e-c248-43ee-b43b-10a84e86d158

            I don't think this is exactly the same as LU-16014, since the actual RIP pointer is in a different function (lod_statfs_and_check+0x87 vs. lod_qos_prep_create+0xe96).

            adilger Andreas Dilger added a comment - I don't think this is exactly the same as LU-16014 , since the actual RIP pointer is in a different function ( lod_statfs_and_check+0x87 vs. lod_qos_prep_create+0xe96 ).
            mdiep Minh Diep added a comment -

            is this same as LU-16014?

            mdiep Minh Diep added a comment - is this same as LU-16014 ?

            People

              vkuznetsov Vitaliy Kuznetsov
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: