Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4451

Kernel Oops with NFS reexport using mainline 3.12 client

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.4.2
    • None
    • 4
    • 12201

    Description

      Jan 7 18:44:47 fltpu-login kernel: [31468.631107] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
      Jan 7 18:44:47 fltpu-login kernel: [31468.639140] IP: [<ffffffffa0b9c23d>] ll_sai_unplug+0x1d/0x470 [lustre]
      Jan 7 18:44:47 fltpu-login kernel: [31468.645869] PGD 0
      Jan 7 18:44:47 fltpu-login kernel: [31468.647983] Oops: 0000 1 SMP
      Jan 7 18:44:47 fltpu-login kernel: [31468.651330] Modules linked in: lmv(C) fld(C) mgc(C) lustre(C) lov(C) osc(C) mdc(C) fid(C) ptlrpc(C) obdclass(C) lvfs(C) zfs(PO) zcommon(PO) znvpair(PO) zavl(PO) zunicode(PO) spl(O) ksocklnd(C) ko2iblnd(C) lnet(C) sha512_ssse3 sha512_generic sha256_ssse3 sha256_generic crc32 crc32c_intel libcfs(C) sg ib_umad ib_ipoib xfs libcrc32c nfsd exportfs ch st hid_generic usbhid ehci_pci ehci_hcd psmouse uhci_hcd lpc_ich mfd_core aacraid usbcore usb_common i7core_edac igb edac_core i2c_algo_bit ib_qib ixgbe mdio rdma_ucm rdma_cm ib_cm acpi_cpufreq iw_cm ib_sa ib_mad ib_addr processor ipv6 ib_uverbs ib_core qla2xxx blcr(O) scsi_transport_fc blcr_imports(O) scsi_tgt dm_mod
      Jan 7 18:44:47 fltpu-login kernel: [31468.712442] CPU: 2 PID: 15987 Comm: nfsd Tainted: P C O 3.12.5-ql-generic-15 #1
      Jan 7 18:44:47 fltpu-login kernel: [31468.720706] Hardware name: Supermicro X8DTN/X8DTN, BIOS 080015 05/04/2009
      Jan 7 18:44:47 fltpu-login kernel: [31468.727726] task: ffff8800bac99cc0 ti: ffff88019a610000 task.ti: ffff88019a610000
      Jan 7 18:44:47 fltpu-login kernel: [31468.735354] RIP: 0010:[<ffffffffa0b9c23d>] [<ffffffffa0b9c23d>] ll_sai_unplug+0x1d/0x470 [lustre]
      Jan 7 18:44:47 fltpu-login kernel: [31468.744483] RSP: 0018:ffff88019a611718 EFLAGS: 00010282
      Jan 7 18:44:47 fltpu-login kernel: [31468.749945] RAX: 000000005a5a5a5a RBX: ffff8800802fa000 RCX: ffff8800802fa060
      Jan 7 18:44:47 fltpu-login kernel: [31468.757177] RDX: ffff8800802fa060 RSI: ffff8800816ab580 RDI: ffff8800802fa000
      Jan 7 18:44:47 fltpu-login kernel: [31468.764405] RBP: ffff88019a611798 R08: ffff88019a610000 R09: 0000000000000211
      Jan 7 18:44:47 fltpu-login kernel: [31468.771737] R10: 0000000000000000 R11: 0140000000000000 R12: ffff8800816ab580
      Jan 7 18:44:47 fltpu-login kernel: [31468.779009] R13: 0000000000000000 R14: ffff88019a611848 R15: ffff8800802fa058
      Jan 7 18:44:47 fltpu-login kernel: [31468.786254] FS: 0000000000000000(0000) GS:ffff8801b9c80000(0000) knlGS:0000000000000000
      Jan 7 18:44:47 fltpu-login kernel: [31468.794586] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      Jan 7 18:44:47 fltpu-login kernel: [31468.800429] CR2: 0000000000000028 CR3: 000000000169e000 CR4: 00000000000007e0
      Jan 7 18:44:47 fltpu-login kernel: [31468.807677] Stack:
      Jan 7 18:44:47 fltpu-login kernel: [31468.809763] 0000000000000000 dead000000200200 00000001002f9aab ffff8801b7134000
      Jan 7 18:44:47 fltpu-login kernel: [31468.817515] ffffffff8104c950 ffff8800bac99cc0 ffff88019a611768 ffffffff8104dd0a
      Jan 7 18:44:47 fltpu-login kernel: [31468.825162] 0000000000000000 0000000000000282 ffff88019a611798 ffff8800816ab580
      Jan 7 18:44:47 fltpu-login kernel: [31468.832806] Call Trace:
      Jan 7 18:44:47 fltpu-login kernel: [31468.835341] [<ffffffff8104c950>] ? usleep_range+0x40/0x40
      Jan 7 18:44:47 fltpu-login kernel: [31468.840959] [<ffffffff8104dd0a>] ? recalc_sigpending+0x1a/0x50
      Jan 7 18:44:47 fltpu-login kernel: [31468.846969] [<ffffffffa0b9ff03>] do_statahead_enter+0x183/0x13a0 [lustre]
      Jan 7 18:44:47 fltpu-login kernel: [31468.854006] [<ffffffffa093b8bd>] ? ldlm_res_hop_get_locked+0xd/0x10 [ptlrpc]
      Jan 7 18:44:47 fltpu-login kernel: [31468.861226] [<ffffffff810aeaad>] ? from_kgid+0xd/0x10
      Jan 7 18:44:47 fltpu-login kernel: [31468.866452] [<ffffffffa0986495>] ? get_my_ctx+0x55/0x120 [ptlrpc]
      Jan 7 18:44:47 fltpu-login kernel: [31468.872786] [<ffffffff8106c070>] ? try_to_wake_up+0x290/0x290
      Jan 7 18:44:47 fltpu-login kernel: [31468.878734] [<ffffffffa0b8c0d2>] ll_lookup_it+0x552/0x970 [lustre]
      Jan 7 18:44:47 fltpu-login kernel: [31468.885130] [<ffffffffa0b8acdb>] ? ll_iget+0x13b/0x280 [lustre]
      Jan 7 18:44:47 fltpu-login kernel: [31468.891316] [<ffffffffa0c21444>] ? lmv_get_lustre_md+0xf4/0x290 [lmv]
      Jan 7 18:44:47 fltpu-login kernel: [31468.897920] [<ffffffffa0c21787>] ? lmv_free_lustre_md+0x1a7/0x4a0 [lmv]
      Jan 7 18:44:47 fltpu-login kernel: [31468.904777] [<ffffffffa0b519f2>] ? ll_dcompare+0x42/0x100 [lustre]
      Jan 7 18:44:47 fltpu-login kernel: [31468.911148] [<ffffffffa0b8d21a>] ll_lookup_nd+0x7a/0x170 [lustre]
      Jan 7 18:44:47 fltpu-login kernel: [31468.917413] [<ffffffff81140fa8>] lookup_real+0x18/0x50
      Jan 7 18:44:47 fltpu-login kernel: [31468.922782] [<ffffffff81141bd3>] __lookup_hash+0x33/0x40
      Jan 7 18:44:47 fltpu-login kernel: [31468.928260] [<ffffffff81147246>] lookup_one_len+0xc6/0x120
      Jan 7 18:44:47 fltpu-login kernel: [31468.933928] [<ffffffffa03fe880>] encode_entryplus_baggage+0x70/0x160 [nfsd]
      Jan 7 18:44:47 fltpu-login kernel: [31468.941157] [<ffffffffa03fecf5>] encode_entry.isra.11+0x2c5/0x300 [nfsd]
      Jan 7 18:44:47 fltpu-login kernel: [31468.948028] [<ffffffffa04000e0>] ? nfs3svc_encode_entry+0x10/0x10 [nfsd]
      Jan 7 18:44:47 fltpu-login kernel: [31468.954962] [<ffffffffa04000ef>] nfs3svc_encode_entry_plus+0xf/0x20 [nfsd]
      Jan 7 18:44:47 fltpu-login kernel: [31468.962102] [<ffffffffa03f5a37>] nfsd_readdir+0x177/0x270 [nfsd]
      Jan 7 18:44:47 fltpu-login kernel: [31468.968278] [<ffffffff8148d34c>] ? cache_check+0x5c/0x330
      Jan 7 18:44:47 fltpu-login kernel: [31468.973912] [<ffffffffa03f3550>] ? _get_posix_acl+0x60/0x60 [nfsd]
      Jan 7 18:44:47 fltpu-login kernel: [31468.980260] [<ffffffffa03fcfac>] nfsd3_proc_readdirplus+0xac/0x1b0 [nfsd]
      Jan 7 18:44:47 fltpu-login kernel: [31468.987298] [<ffffffffa03efca1>] nfsd_dispatch+0xa1/0x1b0 [nfsd]
      Jan 7 18:44:47 fltpu-login kernel: [31468.993474] [<ffffffff81481d3f>] svc_process_common+0x2ef/0x5a0
      Jan 7 18:44:47 fltpu-login kernel: [31468.999555] [<ffffffff8148233f>] svc_process+0xff/0x150
      Jan 7 18:44:47 fltpu-login kernel: [31469.004949] [<ffffffffa03ef6ef>] nfsd+0xbf/0x130 [nfsd]
      Jan 7 18:44:47 fltpu-login kernel: [31469.010341] [<ffffffffa03ef630>] ? nfsd_destroy+0x80/0x80 [nfsd]
      Jan 7 18:44:47 fltpu-login kernel: [31469.016520] [<ffffffff8106013b>] kthread+0xbb/0xc0
      Jan 7 18:44:47 fltpu-login kernel: [31469.021498] [<ffffffff81060080>] ? kthread_freezable_should_stop+0x70/0x70
      Jan 7 18:44:47 fltpu-login kernel: [31469.028541] [<ffffffff8150a8bc>] ret_from_fork+0x7c/0xb0
      Jan 7 18:44:47 fltpu-login kernel: [31469.034023] [<ffffffff81060080>] ? kthread_freezable_should_stop+0x70/0x70
      Jan 7 18:44:47 fltpu-login kernel: [31469.041063] Code: c7 00 7f bc a0 e8 84 f8 96 ff 0f 1f 40 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 f4 53 48 89 fb 48 83 ec 58 4c 8b 2f 48 85 f6 <49> 8b 45 28 48 8b 80 f8 02 00 00 4c 8b 78 18 0f 84 26 03 00 00
      Jan 7 18:44:47 fltpu-login kernel: [31469.061700] RIP [<ffffffffa0b9c23d>] ll_sai_unplug+0x1d/0x470 [lustre]
      Jan 7 18:44:47 fltpu-login kernel: [31469.068471] RSP <ffff88019a611718>
      Jan 7 18:44:47 fltpu-login kernel: [31469.072091] CR2: 0000000000000028

      Attachments

        Issue Links

          Activity

            [LU-4451] Kernel Oops with NFS reexport using mainline 3.12 client

            Please note, that we'd need a patch against the in-kernel code of vanilla 3.12.8, not against Lustre master.

            rfehren Roland Fehrenbacher added a comment - Please note, that we'd need a patch against the in-kernel code of vanilla 3.12.8, not against Lustre master.
            laisiyao Lai Siyao added a comment -

            Hmm, I'll rebase it to latest master code later.

            laisiyao Lai Siyao added a comment - Hmm, I'll rebase it to latest master code later.

            Sorry for the late reply. Been busy with other stuff ...

            The patch in http://review.whamcloud.com/#/c/6392 fails miserably when trying to apply
            to the mainline 3.12 client code (see below). Do you have a patch that works?

            patch l -p 1 < ./0001LU-3270statahead-statahead-thread-wait-for-RPCs-to.patch
            patching file lustre/include/obd.h
            Hunk #1 succeeded at 1098 (offset -40 lines).
            patching file lustre/llite/dcache.c
            Hunk #1 FAILED at 376.
            1 out of 1 hunk FAILED – saving rejects to file lustre/llite/dcache.c.rej
            patching file lustre/llite/file.c
            Hunk #1 succeeded at 321 (offset -49 lines).
            Hunk #2 FAILED at 541.
            Hunk #3 succeeded at 664 (offset -47 lines).
            1 out of 3 hunks FAILED – saving rejects to file lustre/llite/file.c.rej
            patching file lustre/llite/llite_internal.h
            Hunk #1 FAILED at 141.
            Hunk #2 FAILED at 183.
            Hunk #3 succeeded at 240 (offset -4 lines).
            Hunk #4 FAILED at 511.
            Hunk #5 FAILED at 1252.
            Hunk #6 FAILED at 1284.
            Hunk #7 succeeded at 1248 (offset -64 lines).
            Hunk #8 FAILED at 1319.
            6 out of 8 hunks FAILED – saving rejects to file lustre/llite/llite_internal.h.rej
            patching file lustre/llite/llite_lib.c
            Hunk #1 FAILED at 137.
            Hunk #2 FAILED at 719.
            Hunk #3 FAILED at 740.
            Hunk #4 succeeded at 926 (offset -29 lines).
            3 out of 4 hunks FAILED – saving rejects to file lustre/llite/llite_lib.c.rej
            patching file lustre/llite/statahead.c
            Hunk #1 FAILED at 64.
            Hunk #2 FAILED at 212.
            Hunk #3 succeeded at 244 with fuzz 2 (offset -1 lines).
            Hunk #4 succeeded at 303 (offset 23 lines).
            Hunk #5 FAILED at 299.
            Hunk #6 succeeded at 360 with fuzz 1 (offset 23 lines).
            Hunk #7 succeeded at 378 with fuzz 1 (offset 23 lines).
            Hunk #8 succeeded at 418 (offset 23 lines).
            Hunk #9 succeeded at 440 (offset 23 lines).
            Hunk #10 FAILED at 441.
            Hunk #11 FAILED at 476.
            Hunk #12 FAILED at 524.
            Hunk #13 FAILED at 599.
            Hunk #14 FAILED at 616.
            Hunk #15 FAILED at 678.
            Hunk #16 succeeded at 792 (offset 9 lines).
            Hunk #17 succeeded at 814 (offset 9 lines).
            Hunk #18 FAILED at 930.
            Hunk #19 FAILED at 1002.
            Hunk #20 FAILED at 1036.
            Hunk #21 succeeded at 1070 (offset 3 lines).
            Hunk #22 succeeded at 1142 (offset 3 lines).
            Hunk #23 FAILED at 1195.
            Hunk #24 succeeded at 1252 with fuzz 1 (offset 2 lines).
            Hunk #25 FAILED at 1487.
            14 out of 25 hunks FAILED – saving rejects to file lustre/llite/statahead.c.rej

            rfehren Roland Fehrenbacher added a comment - Sorry for the late reply. Been busy with other stuff ... The patch in http://review.whamcloud.com/#/c/6392 fails miserably when trying to apply to the mainline 3.12 client code (see below). Do you have a patch that works? patch l -p 1 < ./0001 LU-3270 statahead-statahead-thread-wait-for-RPCs-to .patch patching file lustre/include/obd.h Hunk #1 succeeded at 1098 (offset -40 lines). patching file lustre/llite/dcache.c Hunk #1 FAILED at 376. 1 out of 1 hunk FAILED – saving rejects to file lustre/llite/dcache.c.rej patching file lustre/llite/file.c Hunk #1 succeeded at 321 (offset -49 lines). Hunk #2 FAILED at 541. Hunk #3 succeeded at 664 (offset -47 lines). 1 out of 3 hunks FAILED – saving rejects to file lustre/llite/file.c.rej patching file lustre/llite/llite_internal.h Hunk #1 FAILED at 141. Hunk #2 FAILED at 183. Hunk #3 succeeded at 240 (offset -4 lines). Hunk #4 FAILED at 511. Hunk #5 FAILED at 1252. Hunk #6 FAILED at 1284. Hunk #7 succeeded at 1248 (offset -64 lines). Hunk #8 FAILED at 1319. 6 out of 8 hunks FAILED – saving rejects to file lustre/llite/llite_internal.h.rej patching file lustre/llite/llite_lib.c Hunk #1 FAILED at 137. Hunk #2 FAILED at 719. Hunk #3 FAILED at 740. Hunk #4 succeeded at 926 (offset -29 lines). 3 out of 4 hunks FAILED – saving rejects to file lustre/llite/llite_lib.c.rej patching file lustre/llite/statahead.c Hunk #1 FAILED at 64. Hunk #2 FAILED at 212. Hunk #3 succeeded at 244 with fuzz 2 (offset -1 lines). Hunk #4 succeeded at 303 (offset 23 lines). Hunk #5 FAILED at 299. Hunk #6 succeeded at 360 with fuzz 1 (offset 23 lines). Hunk #7 succeeded at 378 with fuzz 1 (offset 23 lines). Hunk #8 succeeded at 418 (offset 23 lines). Hunk #9 succeeded at 440 (offset 23 lines). Hunk #10 FAILED at 441. Hunk #11 FAILED at 476. Hunk #12 FAILED at 524. Hunk #13 FAILED at 599. Hunk #14 FAILED at 616. Hunk #15 FAILED at 678. Hunk #16 succeeded at 792 (offset 9 lines). Hunk #17 succeeded at 814 (offset 9 lines). Hunk #18 FAILED at 930. Hunk #19 FAILED at 1002. Hunk #20 FAILED at 1036. Hunk #21 succeeded at 1070 (offset 3 lines). Hunk #22 succeeded at 1142 (offset 3 lines). Hunk #23 FAILED at 1195. Hunk #24 succeeded at 1252 with fuzz 1 (offset 2 lines). Hunk #25 FAILED at 1487. 14 out of 25 hunks FAILED – saving rejects to file lustre/llite/statahead.c.rej
            laisiyao Lai Siyao added a comment -

            I did commit to master branch, and you should be able to use `git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/92/6392/21 && git cherry-pick FETCH_HEAD` to cherry-pick to your branch, eg. 2.4.2.

            laisiyao Lai Siyao added a comment - I did commit to master branch, and you should be able to use `git fetch http://review.whamcloud.com/fs/lustre-release refs/changes/92/6392/21 && git cherry-pick FETCH_HEAD` to cherry-pick to your branch, eg. 2.4.2.

            I can't find this commit in my git master clone? Where did you commit it?

            rfehren Roland Fehrenbacher added a comment - I can't find this commit in my git master clone? Where did you commit it?
            laisiyao Lai Siyao added a comment -

            http://review.whamcloud.com/#/c/6392/ should be able to fix this, could you apply this patch and verify?

            laisiyao Lai Siyao added a comment - http://review.whamcloud.com/#/c/6392/ should be able to fix this, could you apply this patch and verify?

            Lai,
            Could you please have a look and comment on this one?
            Thank you!

            jlevi Jodi Levi (Inactive) added a comment - Lai, Could you please have a look and comment on this one? Thank you!

            The Oops occurs after a couple of seconds when doing a "rm -r" on a large directory.

            rfehren Roland Fehrenbacher added a comment - The Oops occurs after a couple of seconds when doing a "rm -r" on a large directory.

            People

              laisiyao Lai Siyao
              rfehren Roland Fehrenbacher
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: