[LU-15843] Crash when umount mdt targets lnet with llstat running. Created: 11/May/22 Updated: 13/May/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Etienne Aujames | Assignee: | Etienne Aujames |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
This issue was discovered while analyzing the root cause of LU-15759. Reproducer # llstat -i 5 mds.MDS.mdt.stats > /dev/null & # umount -at lustre (Crash ....) Crash: [151292.415792] Lustre: server umount lustrefs-MDT0001 complete [151292.491875] LustreError: 11-0: lustrefs-MDT0001-osp-MDT0000: operation mds_disconnect to node 0@lo failed: rc = -107 [151295.103507] general protection fault: 0000 [#1] SMP [151295.104283] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) osc(OE) fid(OE) fld(OE) ksocklnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) dm_flakey mbcache jbd2 rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc dm_snapshot dm_bufio iosf_mbi ppdev crc32_pclmul snd_intel8x0 snd_ac97_codec ac97_bus snd_seq ghash_clmulni_intel snd_seq_device snd_pcm aesni_intel lrw gf128mul glue_helper ablk_helper cryptd snd_timer sg snd pcspkr i2c_piix4 soundcore parport_pc parport video ip_tables xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10dif_common [151295.110133] crc32c_intel ahci serio_raw libahci ata_piix drm e1000 libata drm_panel_orientation_quirks dm_mirror dm_region_hash dm_log dm_mod [last unloaded: libcfs] [151295.111650] CPU: 1 PID: 6578 Comm: llstat Kdump: loaded Tainted: G OE ------------ 3.10.0-1160.59.1.el7.centos.plus.x86_64 #1 [151295.113155] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [151295.113907] task: ffff931a5a4c9080 ti: ffff9319bc80c000 task.ti: ffff9319bc80c000 [151295.114663] RIP: 0010:[<ffffffffc0a129a1>] [<ffffffffc0a129a1>] lprocfs_stats_collect+0xc1/0x140 [obdclass] [151295.116235] RSP: 0018:ffff9319bc80fdd8 EFLAGS: 00010202 [151295.117017] RAX: 0000000000004669 RBX: ffff9319bc80fe10 RCX: 0000000000000006 [151295.117778] RDX: dead000000000100 RSI: dead000000000100 RDI: 0000000000000006 [151295.118573] RBP: ffff9319bc80fe00 R08: 0000000000000000 R09: 0000000000000000 [151295.119742] R10: 0000000000000000 R11: ffff9319bc80fc56 R12: ffff931a46694800 [151295.120652] R13: 0000000000000000 R14: ffff931a57bc7000 R15: ffff93199d70d0c0 [151295.121387] FS: 00007fe897599740(0000) GS:ffff931a9fc80000(0000) knlGS:0000000000000000 [151295.122150] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [151295.122888] CR2: 0000000001a11b24 CR3: 0000000069326000 CR4: 00000000000606e0 [151295.123637] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [151295.124336] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [151295.125016] Call Trace: [151295.125709] [<ffffffffc0a13308>] lprocfs_stats_seq_show+0x48/0x140 [obdclass] [151295.126379] [<ffffffff82476d10>] seq_read+0x130/0x450 [151295.127033] [<ffffffff8244e3ff>] vfs_read+0x9f/0x170 [151295.127647] [<ffffffff8244f27f>] SyS_read+0x7f/0xf0 [151295.128248] [<ffffffff829aaed5>] ? system_call_after_swapgs+0xa2/0x13a [151295.128839] [<ffffffff829aaf92>] system_call_fastpath+0x25/0x2a [151295.129439] [<ffffffff829aaed5>] ? system_call_after_swapgs+0xa2/0x13a [151295.130033] Code: c1 e0 03 0f 1f 80 00 00 00 00 48 63 d1 49 83 7c d4 20 00 74 48 4c 89 c2 49 03 54 fc 20 41 f6 44 24 04 02 4a 8d 34 0a 48 0f 45 d6 <48> 8b 32 48 01 33 48 8b 72 20 48 01 73 20 48 8b 72 08 48 3b 73 [151295.131885] RIP [<ffffffffc0a129a1>] lprocfs_stats_collect+0xc1/0x140 [obdclass] [151295.132514] RSP <ffff9319bc80fdd8> |
| Comments |
| Comment by Andreas Dilger [ 11/May/22 ] |
|
Does the patch from LU-15759 fix this problem also? |
| Comment by Etienne Aujames [ 13/May/22 ] |
|
It is not the same issue that LU-15759. The problem here is that "stats"/obd structure is freed at umount time. If a user have an open handle on the debugfs inode after umount , debugfs fops could access to already freed memory . The issue occurs only with debugfs file, procfs stats don't cause a crash. |