Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16593

BUG: unable to handle kernel NULL pointer in mdd_changelog_recalc_mask

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      BUG: unable to handle kernel NULL pointer dereference at 0000000000000160 in conf-sanity / 76c
      
      Trace:
      
      PID: 317366  TASK: ffffa2bfa2c219c0  CPU: 0   COMMAND: "lctl"
       #0 [ffffa2bfa2c27c68] panic at ffffffffaf0b9786
          /tmp/kernel/kernel/panic.c: 299
       #1 [ffffa2bfa2c27d00] no_context at ffffffffaf0a9563
          /tmp/kernel/arch/x86/mm/fault.c: 799
       #2 [ffffa2bfa2c27d50] page_fault at ffffffffaf600f0e
          /tmp/kernel/arch/x86/entry/entry_64.S: 1220
          [exception RIP: mdd_changelog_recalc_mask+212]
          RIP: ffffffffc0ce40a4  RSP: ffffa2bfa2c27e00  RFLAGS: 00010286
          RAX: ffffffffffffffff  RBX: ffffa2bf73145e00  RCX: 0000000000000000
          RDX: 0000000000000a2f  RSI: 0000000000000000  RDI: ffffa2bf765ec950
          RBP: ffffa2bf6a67ec00   R8: ffffa2bfa2c27c28   R9: 0000000000000a6c
          R10: 0000000000000000  R11: ffffa2bf8cd05a6b  R12: ffffa2bf765ec950
          R13: ffffa2bfa2c27e40  R14: ffffa2bf8d508000  R15: ffffa2bfa2c27f10
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
          /home/lustre/master-mine/lustre/mdd/mdd_device.c: 1889
       #3 [ffffa2bfa2c27e30] mdd_changelog_mask_seq_write at ffffffffc0d098e2 [mdd]
          /home/lustre/master-mine/lustre/mdd/mdd_lproc.c: 174
       #4 [ffffa2bfa2c27ea0] full_proxy_write at ffffffffaf2e6a7b
          /tmp/kernel/fs/debugfs/file.c: 230
       #5 [ffffa2bfa2c27ed8] vfs_write at ffffffffaf1cffc9
          /tmp/kernel/fs/read_write.c: 550
       #6 [ffffa2bfa2c27f08] ksys_write at ffffffffaf1d021d
          /tmp/kernel/fs/read_write.c: 599
       #7 [ffffa2bfa2c27f38] do_syscall_64 at ffffffffaf001893
          /tmp/kernel/arch/x86/entry/common.c: 302
       #8 [ffffa2bfa2c27f50] entry_SYSCALL_64_after_hwframe at ffffffffaf600099
          /tmp/kernel/arch/x86/entry/entry_64.S: 151
          RIP: 00007f8dd57e5915  RSP: 00007ffd59725268  RFLAGS: 00000246
          RAX: ffffffffffffffda  RBX: 00000000006822f0  RCX: 00007f8dd57e5915
          RDX: 0000000000000006  RSI: 00007ffd59728f32  RDI: 0000000000000003
          RBP: 0000000000000001   R8: 000000000000fc00   R9: 0000000000000001
          R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000000003
          R13: 000000000068a301  R14: 0000000000682440  R15: 00007ffd5972740c
          ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
      
      /home/lustre/master-mine/lustre/mdd/mdd_device.c: 1889
      0xffffffffc0ce40a0 <mdd_changelog_recalc_mask+208>:	mov    0x30(%rbx),%rsi
      0xffffffffc0ce40a4 <mdd_changelog_recalc_mask+212>:	mov    0x160(%rsi),%rax
      
      crash> p *(struct llog_ctxt *)0xffffa2bf73145e00
      $1 = {
        loc_idx = 14,
        loc_obd = 0xffffa2bf765ec188,
        loc_olg = 0xffffa2bf765ec868,
        loc_exp = 0xffffa2bf6308f800,
        loc_imp = 0x0,
        loc_logops = 0xffffffffc0d470c0 <changelog_orig_logops>,
        loc_handle = 0x0,
      

      just before the panic another process (mount) started to initialize the changelog:

      00000004:00000080:1.0:1677245239.833125:0:317339:0:(mdd_device.c:548:mdd_changelog_llog_init()) changelog starting index=0
      00000040:00000001:1.0:1677245239.833125:0:317339:0:(llog_obd.c:150:llog_setup()) Process entered
      00000040:00000010:1.0:1677245239.833127:0:317339:0:(llog_obd.c:44:llog_new_ctxt()) kmalloced '(ctxt)': 264 at 00000000ccbe3063.
      00000020:00000040:1.0:1677245239.833128:0:317339:0:(genops.c:970:class_export_get()) GET export 00000000c0fe929f refcount=3
      00000040:00000001:1.0:1677245239.833128:0:317339:0:(llog_osd.c:1919:llog_osd_setup()) Process entered
      00000040:00000040:1.0:1677245239.833129:0:317339:0:(lustre_log.h:395:llog_ctxt_get()) GETting ctxt 00000000ccbe3063 : new refcount 2
      00000020:00000001:1.0:1677245239.833129:0:317339:0:(local_storage.c:842:local_oid_storage_init()) Process entered
      00000020:00000001:1.0:1677245239.833130:0:317339:0:(local_storage.c:147:ls_device_get()) Process entered
      00000020:00000001:1.0:1677245239.833130:0:317339:0:(local_storage.c:152:ls_device_get()) Process leaving via out_ls (rc=18446641542280932352 : -102531428619264 : 0xffffa2bf8a9e6c00)
      

      tos this is a race obviously, though I don't quite understand how lctl was able to start before MDS mount completion.

      Attachments

        Activity

          People

            wc-triage WC Triage
            bzzz Alex Zhuravlev
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: