[LU-12566] GPF when umounting client - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.13.0, Lustre 2.12.3
Affects Version/s: Lustre 2.10.6
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

We have seen this at least 4 time.

5426573.363714] Lustre: Mounted nbp16-client
[5428374.627398] general protection fault: 0000 [#1] 
[5428374.627407] Lustre: Unmounted nbp14-client
[5428374.636811] SMP 
[5428374.639106] 
5428374.639307] Modules linked in: vtsspp(OEN) sep5(OEN) socperf3(OEN) pax(OEN) osc(OEN) mgc(OEN) lustre(OEN) lmv(OEN) fld(OEN) mdc(OEN) fid(OEN) lov(OEN) ko2iblnd(OEN) ptlrpc(OEN) obdclass(OEN) lnet(OEN) libcfs(OEN) beegfs(OEN) rdma_ucm(OEX) ib_ucm(OEX) rdma_cm(OEX) iw_cm(OEX) configfs(E) ib_ipoib(OEX) inet_lro(E) ib_cm(OEX) ib_uverbs(OEX) ib_umad(OEX) mlx4_ib(OEX) ib_core(OEX) mlx4_core(OEX) devlink(E) mlx_compat(OEX) iscsi_ibft(E) iscsi_boot_sysfs(E) msr(E) joydev(E) intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) drbg(E) ansi_cprng(E) ipmi_ssif(E) iTCO_wdt(E) iTCO_vendor_support(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) mgag200(E) ablk_helper(E) cryptd(E) ttm(E)
[5428374.711255]  acpi_cpufreq(E) drm_kms_helper(E) pcspkr(E) drm(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) lpc_ich(E) mei_me(E) i2c_i801(E) mfd_core(E) mei(E) ioatdma(E) shpchp(E) ipmi_si(E) wmi(E) ipmi_devintf(E) ipmi_msghandler(E) processor(E) button(E) tcp_bic(EN) hwperf(OEX) numatools(OEX) xpmem(OEX) gru(OEX) xvma(OEX) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) autofs4(E) nfsv3(E) nfs_acl(E) nfs(E) lockd(E) grace(E) sunrpc(E) fscache(E) bridge(E) stp(E) llc(E) hid_generic(E) usbhid(E) ahci(E) libahci(E) ehci_pci(E) libata(E) ehci_hcd(E) igb(E) i2c_algo_bit(E) dca(E) ptp(E) scsi_mod(E) usbcore(E) pps_core(E) usb_common(E) af_packet(E) crc32c_intel(E) fjes(E) [last unloaded: socperf2_0]
[5428374.776352] Supported: No, Unsupported modules are loaded
[5428374.782187] CPU: 23 PID: 85345 Comm: umount Tainted: G           OE   NX 4.4.162-94.72.1.20181113-nasa #1
[5428374.792175] Hardware name: SGI.COM ICE-XIP113/X9DRT-Dakota, BIOS DA0E2016 02/01/2016
[5428374.800341] task: ffff88026ade1000 ti: ffff88026ade4000 task.ti: ffff88026ade4000
[5428374.808253] RIP: 0010:[<ffffffffa07a47dd>]  [<ffffffffa07a47dd>] mdc_changelog_cdev_finish+0x3d/0x1b1 [mdc]
[5428374.818437] RSP: 0018:ffff88026ade7b68  EFLAGS: 00010286
[5428374.824175] RAX: 5a5a5a5a5a5a4b62 RBX: ffff88040e20e008 RCX: ffff88037b826fb0
[5428374.831741] RDX: 5a5a5a5a5a5a5a5a RSI: ffff88037b826f40 RDI: ffff88040e20e008
[5428374.839306] RBP: 0000000000000000 R08: 0000000000000c3a R09: 0000000000000000
[5428374.846863] R10: 0000000000000000 R11: ffff8807c8d833c6 R12: 0000000000000000
[5428374.854421] R13: ffff88040e20e048 R14: ffff880d1635f000 R15: ffff880cf81e6b60
[5428374.861978] FS:  00007ffff7fd1880(0000) GS:ffff88085fb40000(0000) knlGS:0000000000000000
[5428374.870489] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[5428374.876661] CR2: 00007ffff7ff6000 CR3: 0000000371dbe000 CR4: 0000000000160670
[5428374.884227] Stack:
[5428374.886678]  ffff88040e20e008 0000000000000000 ffffffffa07904fa ffff88040e20e008
[5428374.894566]  0000000000000000 0000000000000000 ffffffffa0b8bc9c ffff88026ade7bf8
[5428374.902452]  ffffffffa0a6afb7 ffff880200000010 ffff88026ade7c08 ffff88026ade7bc8
[5428374.910337] Call Trace:
[5428374.913246]  [<ffffffffa07904fa>] mdc_precleanup+0x2a/0x3f0 [mdc]
[5428374.919816]  [<ffffffffa0b8bc9c>] class_cleanup+0x26c/0xc40 [obdclass]
[5428374.926811]  [<ffffffffa0b8e5ba>] class_process_config+0x190a/0x2360 [obdclass]
[5428374.934582]  [<ffffffffa0b8f1ba>] class_manual_cleanup+0x1aa/0x6a0 [obdclass]
[5428374.942177]  [<ffffffffa0f6f341>] ll_put_super+0x111/0x9f0 [lustre]
[5428374.948881]  [<ffffffff81212a1c>] generic_shutdown_super+0x6c/0xf0
[5428374.955497]  [<ffffffff81212aae>] kill_anon_super+0xe/0x20
[5428374.961416]  [<ffffffff8121236f>] deactivate_locked_super+0x3f/0x70
[5428374.968117]  [<ffffffff8122da1b>] cleanup_mnt+0x3b/0x80
[5428374.973775]  [<ffffffff8109f718>] task_work_run+0x78/0x90
[5428374.979609]  [<ffffffff8107d3cf>] exit_to_usermode_loop+0x91/0xc2
[5428374.986136]  [<ffffffff81003ae5>] syscall_return_slowpath+0x85/0xa0
[5428374.992837]  [<ffffffff8161dfec>] int_ret_from_sys_call+0x8/0x6d
[5428375.002321] DWARF2 unwinder stuck at int_ret_from_sys_call+0x8/0x6d
[5428375.009019] 
[5428375.010951] Leftover inexact backtrace:
                 
[5428375.017130] Code: 3d 90 21 7b a0 48 8d b0 78 ff ff ff 0f 84 d0 00 00 00 48 8b 56 70 48 8d 4e 70 48 39 d1 48 8d 82 08 f1 ff ff 75 1c e9 9d 00 00 00 <48> 8b 90 f8 0e 00 00 48 39 d1 48 8d 82 08 f1 ff ff 0f 84 86 00 
[5428375.037514] RIP  [<ffffffffa07a47dd>] mdc_changelog_cdev_finish+0x3d/0x1b1 [mdc]
[5428375.045359]  RSP <ffff88026ade7b68>

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

mdc_changelog_cdev_finish.dis
7 kB
18/Jul/19 8:45 PM
obd_struct.out
24 kB
18/Jul/19 8:45 PM

Issue Links

duplicates

LU-11626 mdc: obd might go away while referenced by code in mdc_changelog

Resolved

Activity

[LU-12566] GPF when umounting client

Andreas Dilger added a comment - 01/Aug/19 8:40 PM

Reopen this ticket, since I think there is a simpler fix that could be used for the short term.

Andreas Dilger added a comment - 01/Aug/19 8:40 PM Reopen this ticket, since I think there is a simpler fix that could be used for the short term.

Andreas Dilger added a comment - 01/Aug/19 8:40 PM

It looks like mdc_changelog_cdev_finish() is walking the chlg_registered_dev and cl_chg_dev_linkage lists without a lock, which is racy with other threads adding/removing entries from the list:

void mdc_changelog_cdev_finish(struct obd_device *obd)
{
        struct chlg_registered_dev *dev = chlg_registered_dev_find_by_obd(obd);
        ENTRY;

        mutex_lock(&chlg_registered_dev_lock);
        list_del_init(&obd->u.cli.cl_chg_dev_linkage);
        kref_put(&dev->ced_refs, chlg_dev_clear);
        mutex_unlock(&chlg_registered_dev_lock);
        EXIT;
}

A simple fix for this particular issue might be to move the fetching of dev under chlg_registered_dev_lock. It would also be a good idea to add LASSERT(mutex_is_locked(&chlg_registered_dev_lock)) to the start of chlg_registered_dev_find_by_obd() and chlg_registered_dev_find_by_obd() to avoid such bugs in the future.

This doesn't fix the larger bug in ~~LU-11626~~ but may at least avoid the particular issue being seen here, and be simple enough to backport to older releases.

Andreas Dilger added a comment - 01/Aug/19 8:40 PM It looks like mdc_changelog_cdev_finish() is walking the chlg_registered_dev and cl_chg_dev_linkage lists without a lock, which is racy with other threads adding/removing entries from the list: void mdc_changelog_cdev_finish(struct obd_device *obd) { struct chlg_registered_dev *dev = chlg_registered_dev_find_by_obd(obd); ENTRY; mutex_lock(&chlg_registered_dev_lock); list_del_init(&obd->u.cli.cl_chg_dev_linkage); kref_put(&dev->ced_refs, chlg_dev_clear); mutex_unlock(&chlg_registered_dev_lock); EXIT; } A simple fix for this particular issue might be to move the fetching of dev under chlg_registered_dev_lock . It would also be a good idea to add LASSERT(mutex_is_locked(&chlg_registered_dev_lock)) to the start of chlg_registered_dev_find_by_obd() and chlg_registered_dev_find_by_obd() to avoid such bugs in the future. This doesn't fix the larger bug in LU-11626 but may at least avoid the particular issue being seen here, and be simple enough to backport to older releases.

Andreas Dilger added a comment - 01/Aug/19 8:30 PM

How many filesystems are being mounted on this client, and what is the total number of MDTs being mounted? I'm wondering if this relates to ~~LU-12506~~, which causes the client mount to fail if there are "too many" (in the range 56-64) different MDC devices registered on a client, and then there is some kind of bug during the unmount?

Or do you have something like automount active and there are filesystems mounting/unmounting regularly (maybe with subdirectory mounts?) and there is a race where the same filesystem is mounted multiple times and unmounted?

Andreas Dilger added a comment - 01/Aug/19 8:30 PM How many filesystems are being mounted on this client, and what is the total number of MDTs being mounted? I'm wondering if this relates to LU-12506 , which causes the client mount to fail if there are "too many" (in the range 56-64) different MDC devices registered on a client, and then there is some kind of bug during the unmount? Or do you have something like automount active and there are filesystems mounting/unmounting regularly (maybe with subdirectory mounts?) and there is a race where the same filesystem is mounted multiple times and unmounted?

Peter Jones added a comment - 19/Jul/19 5:16 PM

Duplicate of ~~LU-11626~~

Peter Jones added a comment - 19/Jul/19 5:16 PM Duplicate of LU-11626

James A Simmons added a comment - 19/Jul/19 4:59 PM

Okay I'm looking into it. Please follow under ticket ~~LU-11626~~

James A Simmons added a comment - 19/Jul/19 4:59 PM Okay I'm looking into it. Please follow under ticket LU-11626

James A Simmons added a comment - 18/Jul/19 9:29 PM

No, never was seen before so no one looked to fixed it

James A Simmons added a comment - 18/Jul/19 9:29 PM No, never was seen before so no one looked to fixed it

Patrick Farrell (Inactive) added a comment - 18/Jul/19 9:19 PM

Ah, yeah, James, I suspect you are correct - That matches up nicely with the unmount here.

Did Neil ever get anywhere with that one?

Patrick Farrell (Inactive) added a comment - 18/Jul/19 9:19 PM Ah, yeah, James, I suspect you are correct - That matches up nicely with the unmount here. Did Neil ever get anywhere with that one?

Patrick Farrell (Inactive) added a comment - 18/Jul/19 9:19 PM

It's worth noting that I am utterly unwilling to try to figure out how the nested loop walking translates to assembly, given the compiler translations, but...

Looking at the access:

0xef8(%rax),%rdx

It's at an offset of 0xef8, which is 3832, which must be in to the OBD struct (the other one is way too small).

So this is likely u.cli.cl_chg_dev_linkage for a particular OBD struct (That's about the right distance in to the OBD struct, based on eyeballing it. If I had the dump & modules I could check). But it's presumably the linkage of a different OBD struct.

Patrick Farrell (Inactive) added a comment - 18/Jul/19 9:19 PM It's worth noting that I am utterly unwilling to try to figure out how the nested loop walking translates to assembly, given the compiler translations, but... Looking at the access: 0xef8(%rax),%rdx It's at an offset of 0xef8, which is 3832, which must be in to the OBD struct (the other one is way too small). So this is likely u.cli.cl_chg_dev_linkage for a particular OBD struct (That's about the right distance in to the OBD struct, based on eyeballing it. If I had the dump & modules I could check). But it's presumably the linkage of a different OBD struct.

James A Simmons added a comment - 18/Jul/19 9:14 PM

This might be https://jira.whamcloud.com/browse/LU-11626

James A Simmons added a comment - 18/Jul/19 9:14 PM This might be https://jira.whamcloud.com/browse/LU-11626

Patrick Farrell (Inactive) added a comment - 18/Jul/19 9:10 PM

Since you provided the disassembly and the obd_struct, I thought I'd add...

Looking at the disassembly, the crash occurs before the mutex is taken. (This is a bit tricky to figure out because of all the jumps, but it's relatively easy to see if you just assume all of the conditional jumps starting at function entry are not taken - which would be valid, as they're conditionals - and then you can see a pretty straightforward path, running through 0xffffffffa07a47dd and then on to calling mutex_lock.)

That means it happened in:
chlg_registered_dev_find_by_obd

(Which doesn't appear because it's been inlined in to mdc_changelog_cdev_finish)

Which means dumping the obd struct isn't interesting, because the null pointer wasn't in there. (You can confirm this by checking the one pointer in the obd struct we access - u.cli.cl_chg_dev_linkage. It's fine.)

That means the interesting items are those accessed by chlg_registered_dev_find_by_obd, like:

chlg_registered_devices

But really, it's the lists in there that are interesting.

Patrick Farrell (Inactive) added a comment - 18/Jul/19 9:10 PM Since you provided the disassembly and the obd_struct, I thought I'd add... Looking at the disassembly, the crash occurs before the mutex is taken. (This is a bit tricky to figure out because of all the jumps, but it's relatively easy to see if you just assume all of the conditional jumps starting at function entry are not taken - which would be valid, as they're conditionals - and then you can see a pretty straightforward path, running through 0xffffffffa07a47dd and then on to calling mutex_lock.) That means it happened in: chlg_registered_dev_find_by_obd (Which doesn't appear because it's been inlined in to mdc_changelog_cdev_finish) Which means dumping the obd struct isn't interesting, because the null pointer wasn't in there. (You can confirm this by checking the one pointer in the obd struct we access - u.cli.cl_chg_dev_linkage. It's fine.) That means the interesting items are those accessed by chlg_registered_dev_find_by_obd, like: chlg_registered_devices But really, it's the lists in there that are interesting.

Patrick Farrell (Inactive) added a comment - 18/Jul/19 8:59 PM - edited

Mahmoud,

It would be very helpful if you provide the actual crash dump, modules, and vmlinux as well.

Patrick Farrell (Inactive) added a comment - 18/Jul/19 8:59 PM - edited Mahmoud, It would be very helpful if you provide the actual crash dump, modules, and vmlinux as well.

People

Assignee:: Hongchao Zhang

Reporter:: Mahmoud Hanafi

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 18/Jul/19 8:46 PM

Updated:: 12/Sep/19 4:08 AM

Resolved:: 15/Aug/19 1:29 PM