[LU-12010] qsd_upd_thread use after free on server shutdown Created: 25/Feb/19 Updated: 25/Feb/19 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.13.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Oleg Drokin | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This crash happens at time on server shutdown for me: [170816.014563] Lustre: server umount lustre-MDT0000 complete [170820.769998] LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. [170820.770106] Lustre: lustre-MDT0001-osp-MDT0002: Connection to lustre-MDT0001 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete [170820.774258] Lustre: Skipped 3 previous similar messages [170820.775154] Lustre: lustre-MDT0001: Not available for connect from 0@lo (stopping) [170820.775163] Lustre: Skipped 4 previous similar messages [170820.788095] LustreError: Skipped 2 previous similar messages [170821.231402] LustreError: 26613:0:(client.c:1175:ptlrpc_import_delay_req()) @@@ IMP_CLOSED req@ffff8802eab93b40 x1626148381397504/t0(0) o41->lustre-MDT0002-osp-MDT0001@0@lo:24/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 [170821.235073] LustreError: 26613:0:(client.c:1175:ptlrpc_import_delay_req()) Skipped 1 previous similar message [170822.774970] Lustre: 26620:0:(client.c:2134:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550816084/real 1550816084] req@ffff8802fa0f7b40 x1626148381397248/t0(0) o400->MGC192.168.123.169@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1550816091 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [170822.781474] LustreError: 166-1: MGC192.168.123.169@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail [170823.045059] BUG: unable to handle kernel paging request at ffff8802b8f5ddcc [170823.045059] IP: [<ffffffff813fb685>] do_raw_spin_lock+0x5/0xa0 [170823.045059] PGD 241b067 PUD 33ebfa067 PMD 33ea32067 PTE 80000002b8f5d060 [170823.055931] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [170823.055931] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) dm_flakey dm_mod loop zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) jbd2 mbcache crc_t10dif crct10dif_generic sb_edac edac_core iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr i2c_piix4 virtio_balloon virtio_console ip_tables rpcsec_gss_krb5 ata_generic drm_kms_helper pata_acpi ttm crct10dif_pclmul drm ata_piix drm_panel_orientation_quirks crct10dif_common serio_raw crc32c_intel i2c_core libata virtio_blk floppy [last unloaded: libcfs] [170823.055931] [170823.055931] CPU: 7 PID: 27695 Comm: lquota_wb_lustr Kdump: loaded Tainted: P OE ------------ 3.10.0-7.6-debug #1 [170823.055931] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [170823.055931] task: ffff8802c3b30c40 ti: ffff8802aa504000 task.ti: ffff8802aa504000 [170823.055931] RIP: 0010:[<ffffffff813fb685>] [<ffffffff813fb685>] do_raw_spin_lock+0x5/0xa0 [170823.055931] RSP: 0018:ffff8802aa507d70 EFLAGS: 00010002 [170823.055931] RAX: 0000000000000246 RBX: 0000000000000246 RCX: 0000000000000000 [170823.055931] RDX: ffff8802aa507fd8 RSI: 0000000000000003 RDI: ffff8802b8f5ddc8 [170823.055931] RBP: ffff8802aa507d78 R08: 0000000000000010 R09: ffff8802daa98700 [170823.055931] R10: 0000000000000000 R11: 000000000000000f R12: ffff8802b8f5dc00 [170823.055931] R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000000 [170823.055931] FS: 0000000000000000(0000) GS:ffff88033dbc0000(0000) knlGS:0000000000000000 [170823.055931] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [170823.055931] CR2: ffff8802b8f5ddcc CR3: 0000000001c10000 CR4: 00000000001607e0 [170823.055931] Call Trace: [170823.055931] [<ffffffff817b9e30>] _raw_spin_lock_irqsave+0x30/0x40 [170823.055931] [<ffffffff810c2db3>] __wake_up+0x23/0x50 [170823.055931] [<ffffffffa0acfbb2>] qsd_upd_thread+0xb02/0xc90 [lquota] [170823.055931] [<ffffffff817b6cd0>] ? __schedule+0x410/0xa00 [170823.055931] [<ffffffff810caae0>] ? wake_up_state+0x20/0x20 [170823.055931] [<ffffffffa0acf0b0>] ? qsd_upd_add+0xf0/0xf0 [lquota] [170823.055931] [<ffffffff810b4ed4>] kthread+0xe4/0xf0 [170823.055931] [<ffffffff810b4df0>] ? kthread_create_on_node+0x140/0x140 [170823.055931] [<ffffffff817c4c5d>] ret_from_fork_nospec_begin+0x7/0x21 [170823.055931] [<ffffffff810b4df0>] ? kthread_create_on_node+0x140/0x140 The call is in 0x13be0 is in qsd_upd_thread (/home/green/git/lustre-release/lustre/quota/qsd_writeback.c:512). 507 qsd_start_reint_thread(qsd->qsd_type_array[qtype]); 508 } 509 lu_env_fini(env); 510 OBD_FREE_PTR(env); 511 thread_set_flags(thread, SVC_STOPPED); 512 wake_up(&thread->t_ctl_waitq); so by the time we got there, the thread structure was already freed, clearly a race. |