Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.13.0, Lustre 2.12.3
    • Lustre 2.10.6
    • None
    • 3
    • 9223372036854775807

    Description

      We have seen this at least 4 time.

      5426573.363714] Lustre: Mounted nbp16-client
      [5428374.627398] general protection fault: 0000 [#1] 
      [5428374.627407] Lustre: Unmounted nbp14-client
      [5428374.636811] SMP 
      [5428374.639106] 
      5428374.639307] Modules linked in: vtsspp(OEN) sep5(OEN) socperf3(OEN) pax(OEN) osc(OEN) mgc(OEN) lustre(OEN) lmv(OEN) fld(OEN) mdc(OEN) fid(OEN) lov(OEN) ko2iblnd(OEN) ptlrpc(OEN) obdclass(OEN) lnet(OEN) libcfs(OEN) beegfs(OEN) rdma_ucm(OEX) ib_ucm(OEX) rdma_cm(OEX) iw_cm(OEX) configfs(E) ib_ipoib(OEX) inet_lro(E) ib_cm(OEX) ib_uverbs(OEX) ib_umad(OEX) mlx4_ib(OEX) ib_core(OEX) mlx4_core(OEX) devlink(E) mlx_compat(OEX) iscsi_ibft(E) iscsi_boot_sysfs(E) msr(E) joydev(E) intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) drbg(E) ansi_cprng(E) ipmi_ssif(E) iTCO_wdt(E) iTCO_vendor_support(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) mgag200(E) ablk_helper(E) cryptd(E) ttm(E)
      [5428374.711255]  acpi_cpufreq(E) drm_kms_helper(E) pcspkr(E) drm(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) lpc_ich(E) mei_me(E) i2c_i801(E) mfd_core(E) mei(E) ioatdma(E) shpchp(E) ipmi_si(E) wmi(E) ipmi_devintf(E) ipmi_msghandler(E) processor(E) button(E) tcp_bic(EN) hwperf(OEX) numatools(OEX) xpmem(OEX) gru(OEX) xvma(OEX) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) autofs4(E) nfsv3(E) nfs_acl(E) nfs(E) lockd(E) grace(E) sunrpc(E) fscache(E) bridge(E) stp(E) llc(E) hid_generic(E) usbhid(E) ahci(E) libahci(E) ehci_pci(E) libata(E) ehci_hcd(E) igb(E) i2c_algo_bit(E) dca(E) ptp(E) scsi_mod(E) usbcore(E) pps_core(E) usb_common(E) af_packet(E) crc32c_intel(E) fjes(E) [last unloaded: socperf2_0]
      [5428374.776352] Supported: No, Unsupported modules are loaded
      [5428374.782187] CPU: 23 PID: 85345 Comm: umount Tainted: G           OE   NX 4.4.162-94.72.1.20181113-nasa #1
      [5428374.792175] Hardware name: SGI.COM ICE-XIP113/X9DRT-Dakota, BIOS DA0E2016 02/01/2016
      [5428374.800341] task: ffff88026ade1000 ti: ffff88026ade4000 task.ti: ffff88026ade4000
      [5428374.808253] RIP: 0010:[<ffffffffa07a47dd>]  [<ffffffffa07a47dd>] mdc_changelog_cdev_finish+0x3d/0x1b1 [mdc]
      [5428374.818437] RSP: 0018:ffff88026ade7b68  EFLAGS: 00010286
      [5428374.824175] RAX: 5a5a5a5a5a5a4b62 RBX: ffff88040e20e008 RCX: ffff88037b826fb0
      [5428374.831741] RDX: 5a5a5a5a5a5a5a5a RSI: ffff88037b826f40 RDI: ffff88040e20e008
      [5428374.839306] RBP: 0000000000000000 R08: 0000000000000c3a R09: 0000000000000000
      [5428374.846863] R10: 0000000000000000 R11: ffff8807c8d833c6 R12: 0000000000000000
      [5428374.854421] R13: ffff88040e20e048 R14: ffff880d1635f000 R15: ffff880cf81e6b60
      [5428374.861978] FS:  00007ffff7fd1880(0000) GS:ffff88085fb40000(0000) knlGS:0000000000000000
      [5428374.870489] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [5428374.876661] CR2: 00007ffff7ff6000 CR3: 0000000371dbe000 CR4: 0000000000160670
      [5428374.884227] Stack:
      [5428374.886678]  ffff88040e20e008 0000000000000000 ffffffffa07904fa ffff88040e20e008
      [5428374.894566]  0000000000000000 0000000000000000 ffffffffa0b8bc9c ffff88026ade7bf8
      [5428374.902452]  ffffffffa0a6afb7 ffff880200000010 ffff88026ade7c08 ffff88026ade7bc8
      [5428374.910337] Call Trace:
      [5428374.913246]  [<ffffffffa07904fa>] mdc_precleanup+0x2a/0x3f0 [mdc]
      [5428374.919816]  [<ffffffffa0b8bc9c>] class_cleanup+0x26c/0xc40 [obdclass]
      [5428374.926811]  [<ffffffffa0b8e5ba>] class_process_config+0x190a/0x2360 [obdclass]
      [5428374.934582]  [<ffffffffa0b8f1ba>] class_manual_cleanup+0x1aa/0x6a0 [obdclass]
      [5428374.942177]  [<ffffffffa0f6f341>] ll_put_super+0x111/0x9f0 [lustre]
      [5428374.948881]  [<ffffffff81212a1c>] generic_shutdown_super+0x6c/0xf0
      [5428374.955497]  [<ffffffff81212aae>] kill_anon_super+0xe/0x20
      [5428374.961416]  [<ffffffff8121236f>] deactivate_locked_super+0x3f/0x70
      [5428374.968117]  [<ffffffff8122da1b>] cleanup_mnt+0x3b/0x80
      [5428374.973775]  [<ffffffff8109f718>] task_work_run+0x78/0x90
      [5428374.979609]  [<ffffffff8107d3cf>] exit_to_usermode_loop+0x91/0xc2
      [5428374.986136]  [<ffffffff81003ae5>] syscall_return_slowpath+0x85/0xa0
      [5428374.992837]  [<ffffffff8161dfec>] int_ret_from_sys_call+0x8/0x6d
      [5428375.002321] DWARF2 unwinder stuck at int_ret_from_sys_call+0x8/0x6d
      [5428375.009019] 
      [5428375.010951] Leftover inexact backtrace:
                       
      [5428375.017130] Code: 3d 90 21 7b a0 48 8d b0 78 ff ff ff 0f 84 d0 00 00 00 48 8b 56 70 48 8d 4e 70 48 39 d1 48 8d 82 08 f1 ff ff 75 1c e9 9d 00 00 00 <48> 8b 90 f8 0e 00 00 48 39 d1 48 8d 82 08 f1 ff ff 0f 84 86 00 
      [5428375.037514] RIP  [<ffffffffa07a47dd>] mdc_changelog_cdev_finish+0x3d/0x1b1 [mdc]
      [5428375.045359]  RSP <ffff88026ade7b68>
      

      Attachments

        Issue Links

          Activity

            [LU-12566] GPF when umounting client

            Reopen this ticket, since I think there is a simpler fix that could be used for the short term.

            adilger Andreas Dilger added a comment - Reopen this ticket, since I think there is a simpler fix that could be used for the short term.

            It looks like mdc_changelog_cdev_finish() is walking the chlg_registered_dev and cl_chg_dev_linkage lists without a lock, which is racy with other threads adding/removing entries from the list:

            void mdc_changelog_cdev_finish(struct obd_device *obd)
            {
                    struct chlg_registered_dev *dev = chlg_registered_dev_find_by_obd(obd);
                    ENTRY;
            
                    mutex_lock(&chlg_registered_dev_lock);
                    list_del_init(&obd->u.cli.cl_chg_dev_linkage);
                    kref_put(&dev->ced_refs, chlg_dev_clear);
                    mutex_unlock(&chlg_registered_dev_lock);
                    EXIT;
            }
            

            A simple fix for this particular issue might be to move the fetching of dev under chlg_registered_dev_lock. It would also be a good idea to add LASSERT(mutex_is_locked(&chlg_registered_dev_lock)) to the start of chlg_registered_dev_find_by_obd() and chlg_registered_dev_find_by_obd() to avoid such bugs in the future.

            This doesn't fix the larger bug in LU-11626 but may at least avoid the particular issue being seen here, and be simple enough to backport to older releases.

            adilger Andreas Dilger added a comment - It looks like mdc_changelog_cdev_finish() is walking the chlg_registered_dev and cl_chg_dev_linkage lists without a lock, which is racy with other threads adding/removing entries from the list: void mdc_changelog_cdev_finish(struct obd_device *obd) { struct chlg_registered_dev *dev = chlg_registered_dev_find_by_obd(obd); ENTRY; mutex_lock(&chlg_registered_dev_lock); list_del_init(&obd->u.cli.cl_chg_dev_linkage); kref_put(&dev->ced_refs, chlg_dev_clear); mutex_unlock(&chlg_registered_dev_lock); EXIT; } A simple fix for this particular issue might be to move the fetching of dev under chlg_registered_dev_lock . It would also be a good idea to add LASSERT(mutex_is_locked(&chlg_registered_dev_lock)) to the start of chlg_registered_dev_find_by_obd() and chlg_registered_dev_find_by_obd() to avoid such bugs in the future. This doesn't fix the larger bug in LU-11626 but may at least avoid the particular issue being seen here, and be simple enough to backport to older releases.

            How many filesystems are being mounted on this client, and what is the total number of MDTs being mounted? I'm wondering if this relates to LU-12506, which causes the client mount to fail if there are "too many" (in the range 56-64) different MDC devices registered on a client, and then there is some kind of bug during the unmount?

            Or do you have something like automount active and there are filesystems mounting/unmounting regularly (maybe with subdirectory mounts?) and there is a race where the same filesystem is mounted multiple times and unmounted?

            adilger Andreas Dilger added a comment - How many filesystems are being mounted on this client, and what is the total number of MDTs being mounted? I'm wondering if this relates to LU-12506 , which causes the client mount to fail if there are "too many" (in the range 56-64) different MDC devices registered on a client, and then there is some kind of bug during the unmount? Or do you have something like automount active and there are filesystems mounting/unmounting regularly (maybe with subdirectory mounts?) and there is a race where the same filesystem is mounted multiple times and unmounted?
            pjones Peter Jones added a comment -

            Duplicate of LU-11626

            pjones Peter Jones added a comment - Duplicate of LU-11626

            Okay I'm looking into it. Please follow under ticket LU-11626

            simmonsja James A Simmons added a comment - Okay I'm looking into it. Please follow under ticket LU-11626

            No, never was seen before so no one looked to fixed it

            simmonsja James A Simmons added a comment - No, never was seen before so no one looked to fixed it

            Ah, yeah, James, I suspect you are correct - That matches up nicely with the unmount here.

            Did Neil ever get anywhere with that one?

            pfarrell Patrick Farrell (Inactive) added a comment - Ah, yeah, James, I suspect you are correct - That matches up nicely with the unmount here. Did Neil ever get anywhere with that one?

            It's worth noting that I am utterly unwilling to try to figure out how the nested loop walking translates to assembly, given the compiler translations, but...

            Looking at the access:

            0xef8(%rax),%rdx 

            It's at an offset of 0xef8, which is 3832, which must be in to the OBD struct (the other one is way too small).

            So this is likely u.cli.cl_chg_dev_linkage for a particular OBD struct (That's about the right distance in to the OBD struct, based on eyeballing it.  If I had the dump & modules I could check).  But it's presumably the linkage of a different OBD struct.

            pfarrell Patrick Farrell (Inactive) added a comment - It's worth noting that I am utterly unwilling to try to figure out how the nested loop walking translates to assembly, given the compiler translations, but... Looking at the access: 0xef8(%rax),%rdx It's at an offset of 0xef8, which is 3832, which must be in to the OBD struct (the other one is way too small). So this is likely u.cli.cl_chg_dev_linkage for a particular OBD struct (That's about the right distance in to the OBD struct, based on eyeballing it.  If I had the dump & modules I could check).  But it's presumably the linkage of a different OBD struct.
            simmonsja James A Simmons added a comment - This might be  https://jira.whamcloud.com/browse/LU-11626

            Since you provided the disassembly and the obd_struct, I thought I'd add...

            Looking at the disassembly, the crash occurs before the mutex is taken.  (This is a bit tricky to figure out because of all the jumps, but it's relatively easy to see if you just assume all of the conditional jumps starting at function entry are not taken - which would be valid, as they're conditionals - and then you can see a pretty straightforward path, running through 0xffffffffa07a47dd and then on to calling mutex_lock.)

            That means it happened in:
            chlg_registered_dev_find_by_obd

            (Which doesn't appear because it's been inlined in to mdc_changelog_cdev_finish)

            Which means dumping the obd struct isn't interesting, because the null pointer wasn't in there.  (You can confirm this by checking the one pointer in the obd struct we access - u.cli.cl_chg_dev_linkage.  It's fine.)

            That means the interesting items are those accessed by chlg_registered_dev_find_by_obd, like:

            chlg_registered_devices
            

            But really, it's the lists in there that are interesting.

            pfarrell Patrick Farrell (Inactive) added a comment - Since you provided the disassembly and the obd_struct, I thought I'd add... Looking at the disassembly, the crash occurs before the mutex is taken.  (This is a bit tricky to figure out because of all the jumps, but it's relatively easy to see if you just assume all of the conditional jumps starting at function entry are not taken - which would be valid, as they're conditionals - and then you can see a pretty straightforward path, running through 0xffffffffa07a47dd and then on to calling mutex_lock.) That means it happened in: chlg_registered_dev_find_by_obd (Which doesn't appear because it's been inlined in to mdc_changelog_cdev_finish) Which means dumping the obd struct isn't interesting, because the null pointer wasn't in there.  (You can confirm this by checking the one pointer in the obd struct we access - u.cli.cl_chg_dev_linkage.  It's fine.) That means the interesting items are those accessed by chlg_registered_dev_find_by_obd, like: chlg_registered_devices But really, it's the lists in there that are interesting.

            Mahmoud,

            It would be very helpful if you provide the actual crash dump, modules, and vmlinux as well.

            pfarrell Patrick Farrell (Inactive) added a comment - - edited Mahmoud, It would be very helpful if you provide the actual crash dump, modules, and vmlinux as well.

            People

              hongchao.zhang Hongchao Zhang
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: