Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16014

sanity test_27M: crash in lod_qos_prep_create()

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      Noticed a regular crash that looks like this in boilpot:

      Lustre: DEBUG MARKER: == sanity test 27M: test O_APPEND striping ====== 21:09:25 (1657760965)
      BUG: unable to handle kernel paging request at ffff8801466bccb0
      IP: [<ffffffffa13f68f6>] lod_qos_prep_create+0xe96/0x1ab0 [lod]
      Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      CPU: 3 PID: 2694 Comm: mdt01_002 Kdump: loaded  3.10.0-7.9-debug #2
      Hardware name: Red Hat KVM, BIOS 1.15.0-1.module_el8.6.0+1087+b42c8331 04/01/2014
      Call Trace:
       lod_prepare_create+0x23b/0x320 [lod]
       lod_declare_striped_create+0xf8/0xa50 [lod]
       lod_declare_create+0x1f5/0x600 [lod]
       mdd_declare_create_object_internal+0xd3/0x3b0 [mdd]
       mdd_declare_create_object.isra.35+0x51/0xb60 [mdd]
       mdd_declare_create+0x66/0x480 [mdd]
       mdd_create+0x9a9/0x1d30 [mdd]
       mdt_reint_open+0x2004/0x2c10 [mdt]
       mdt_reint_rec+0x87/0x240 [mdt]
       mdt_reint_internal+0x76c/0xb50 [mdt]
       mdt_intent_open+0x93/0x480 [mdt]
       mdt_intent_opc+0x1dd/0xc10 [mdt]
       mdt_intent_policy+0x1a1/0x360 [mdt]
       ldlm_lock_enqueue+0x3c2/0xb40 [ptlrpc]
       ldlm_handle_enqueue0+0x8c6/0x1780 [ptlrpc]
       tgt_enqueue+0x64/0x240 [ptlrpc]
       tgt_request_handle+0x93a/0x19c0 [ptlrpc]
       ptlrpc_server_handle_request+0x250/0xc30 [ptlrpc]
       ptlrpc_main+0xbd9/0x15f0 [ptlrpc]
       kthread+0xe4/0xf0
      

      I think it came from LU-15727 patch https://review.whamcloud.com/47014

      First hit on June 20th and then it really intensified in the past few days for some reason.

      Very first crash (has vmcore and all):

      http://testing.linuxhacker.ru/lustre-reports/external/crashes/boilpot-bigmem-98-2022-06-20-10:50:27/

      most recent crash with vmcore out of current master-next:

      http://testing.linuxhacker.ru/lustre-reports/external/crashes/boilpot-bigmem-28-2022-07-12-03:11:17/

      Attachments

        Issue Links

          Activity

            [LU-16014] sanity test_27M: crash in lod_qos_prep_create()

            Fixed via LU-16872

            adilger Andreas Dilger added a comment - Fixed via LU-16872

            There is patch https://review.whamcloud.com/51559 "LU-16872 lod: reset llc_ostlist when using O_APPEND stripes" that should fix this.

            adilger Andreas Dilger added a comment - There is patch https://review.whamcloud.com/51559 " LU-16872 lod: reset llc_ostlist when using O_APPEND stripes " that should fix this.

            This bug introduced by patch https://review.whamcloud.com/47014 "LU-15727 lod: honor append_pool with default composite layouts" which landed on master on 2022-07-11.

            adilger Andreas Dilger added a comment - This bug introduced by patch https://review.whamcloud.com/47014 " LU-15727 lod: honor append_pool with default composite layouts " which landed on master on 2022-07-11.
            bzzz Alex Zhuravlev added a comment - +1 on master: https://testing.whamcloud.com/test_sets/128b97d8-62fa-4844-a7f0-cd325ca58198
            adilger Andreas Dilger added a comment - +5 on master in the past 4 weeks, all ldiskfs review sessions (DNE and non-DNE, one aarch64): https://testing.whamcloud.com/test_sets/c4f8dd3c-7514-40f9-84a1-34c6ccb54ae3 https://testing.whamcloud.com/test_sets/01ab0985-4735-42b2-aa62-e3589414d6be https://testing.whamcloud.com/test_sets/7d0da5b9-281a-4d16-b738-947c50d04d64 https://testing.whamcloud.com/test_sets/b3157654-0dd5-4626-85e4-7d6046e1a28f https://testing.whamcloud.com/test_sets/1aa360a8-e42a-4dae-b598-e857f23bc06c
            mdiep Minh Diep added a comment - +1 on master https://testing.whamcloud.com/test_sessions/e5e1efcb-2b2d-44ac-9573-69ffd456b050
            cfaber Colin Faber added a comment -

            Hi green if you remove LU-15727 are you still seeing the issue?

            cfaber Colin Faber added a comment - Hi green if you remove LU-15727 are you still seeing the issue?

            People

              wc-triage WC Triage
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: