[LU-11275] NULL pointer dereference in vvp_page_delete in sanity test 241a Created: 22/Aug/18  Updated: 04/Sep/18  Resolved: 04/Sep/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: Lustre 2.12.0

Type: Bug Priority: Critical
Reporter: Oleg Drokin Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Seeing this on master-next since Aug 9th or so.

[22965.482820] Lustre: DEBUG MARKER: == sanity test 241a: bio vs dio ====================================================================== 21:23:23 (1534901003)
[22975.702174] Lustre: 30392:0:(client.c:2126:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1534901006/real 1534901006]  req@ffff880051062c80 x1609459777477808/t0(0) o101->lustre-MDT0000-mdc-ffff8802e756c800@0@lo:12/10 lens 976/44648 e 0 to 1 dl 1534901013 ref 2 fl Rpc:XP/2/ffffffff rc 0/-1
[22975.714859] Lustre: lustre-MDT0000-mdc-ffff8802e756c800: Connection to lustre-MDT0000 (at 0@lo) was lost; in progress operations using this service will wait for recovery to complete
[22975.718534] Lustre: lustre-MDT0000: Client 883b4996-f586-bc4a-a668-d643202d495f (at 0@lo) reconnecting
[23034.421939] BUG: unable to handle kernel NULL pointer dereference at           (null)
[23034.422839] IP: [<ffffffffa1520714>] vvp_page_delete+0x14/0x140 [lustre]
[23034.422839] PGD 80000002aff7c067 PUD 284474067 PMD 0 
[23034.422839] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[23034.422839] Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE) ost(OE) mdt(OE) mdd(OE) mgs(OE) osd_ldiskfs(OE) ldiskfs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) dm_flakey dm_mod brd ext4 loop zfs(PO) zunicode(PO) zlua(PO) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) jbd2 mbcache crc_t10dif crct10dif_generic crct10dif_common ata_generic pata_acpi ttm drm_kms_helper ata_piix drm virtio_balloon libata i2c_piix4 serio_raw pcspkr virtio_console virtio_blk i2c_core floppy ip_tables rpcsec_gss_krb5 [last unloaded: libcfs]
[23034.422839] CPU: 9 PID: 31804 Comm: dd Kdump: loaded Tainted: P        W  OE  ------------   3.10.0-7.5-debug #1
[23034.422839] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[23034.422839] task: ffff880270406240 ti: ffff8802adbd4000 task.ti: ffff8802adbd4000
[23034.422839] RIP: 0010:[<ffffffffa1520714>]  [<ffffffffa1520714>] vvp_page_delete+0x14/0x140 [lustre]
[23034.422839] RSP: 0018:ffff8802adbd78e0  EFLAGS: 00010286
[23034.422839] RAX: ffffea00084c67b8 RBX: ffff8802c6ad3e50 RCX: ffff8802c6ad3e00
[23034.422839] RDX: 0000000000000000 RSI: ffff8802c6ad3e50 RDI: ffff880082fe51f0
[23034.422839] RBP: ffff8802adbd78e0 R08: ffffffffa081dfd8 R09: 0000000000000000
[23034.422839] R10: 0000000000000000 R11: ffff8801e7418e00 R12: ffff8802c6ad3e28
[23034.422839] R13: ffff880082fe51f0 R14: ffff880082fe51f0 R15: 0000000000000000
[23034.422839] FS:  00007f6bda93a740(0000) GS:ffff88033dc40000(0000) knlGS:0000000000000000
[23034.422839] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[23034.422839] CR2: 0000000000000000 CR3: 00000002c3358000 CR4: 00000000000006e0
[23034.422839] Call Trace:
[23034.422839]  [<ffffffffa039a01d>] cl_page_delete0+0x7d/0x210 [obdclass]
[23034.422839]  [<ffffffffa039bf6e>] cl_page_alloc+0x15e/0x270 [obdclass]
[23034.422839]  [<ffffffffa039c107>] cl_page_find+0x87/0x290 [obdclass]
[23034.422839]  [<ffffffffa14d911d>] ll_dom_finish_open+0x59d/0x830 [lustre]
[23034.422839]  [<ffffffffa14f7b13>] ? ll_prep_inode+0x223/0xb80 [lustre]
[23034.422839]  [<ffffffffa05c62e0>] ? lustre_msg_buf_v2+0x1b0/0x1b0 [ptlrpc]
[23034.422839]  [<ffffffffa150541a>] ll_lookup_it_finish+0x51a/0xe70 [lustre]
[23034.422839]  [<ffffffffa0536c75>] ? lmv_intent_lock+0xd05/0x1970 [lmv]
[23034.422839]  [<ffffffff81117ca2>] ? from_kgid+0x12/0x20
[23034.422839]  [<ffffffffa1504b64>] ? ll_i2gids+0x24/0xb0 [lustre]
[23034.422839]  [<ffffffffa1504850>] ? ll_md_need_convert+0x1b0/0x1b0 [lustre]
[23034.422839]  [<ffffffffa150604f>] ll_lookup_it+0x2df/0xe00 [lustre]
[23034.422839]  [<ffffffffa1506ca7>] ll_atomic_open+0x137/0x11f0 [lustre]
[23034.422839]  [<ffffffff813ccd2b>] ? do_raw_spin_unlock+0x4b/0x90
[23034.422839]  [<ffffffff8177943e>] ? _raw_spin_unlock+0xe/0x20
[23034.422839]  [<ffffffff8121953b>] ? lookup_dcache+0x8b/0xb0
[23034.422839]  [<ffffffff8121e651>] do_last+0xa31/0x12c0
[23034.422839]  [<ffffffff81210100>] ? proc_nr_files+0x30/0x30
[23034.422839]  [<ffffffff8121efad>] path_openat+0xcd/0x6a0
[23034.422839]  [<ffffffff8106dc55>] ? __kernel_map_pages+0xc5/0xd0
[23034.422839]  [<ffffffff812209ad>] do_filp_open+0x4d/0xb0
[23034.422839]  [<ffffffff813ccd2b>] ? do_raw_spin_unlock+0x4b/0x90
[23034.422839]  [<ffffffff8177943e>] ? _raw_spin_unlock+0xe/0x20
[23034.422839]  [<ffffffff8122e303>] ? __alloc_fd+0xc3/0x170
[23034.422839]  [<ffffffff8120c917>] do_sys_open+0x137/0x240
[23034.422839]  [<ffffffff8178386f>] ? system_call_after_swapgs+0xbc/0x160
[23034.422839]  [<ffffffff8120ca3e>] SyS_open+0x1e/0x20
[23034.422839]  [<ffffffff81783929>] system_call_fastpath+0x16/0x1b
[23034.422839]  [<ffffffff8178387b>] ? system_call_after_swapgs+0xc8/0x160

Likely due to the recent landing of read during open?



 Comments   
Comment by Mikhail Pershin [ 28/Aug/18 ]

was that test ran as part of sanity-dom.sh? Or in normal sanity.sh run?

Comment by Gerrit Updater [ 28/Aug/18 ]

Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33087
Subject: LU-11275 llite: check truncate race for DOM pages
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 80a9f3be319a82992f0f792e2708a616746b23c0

Comment by Oleg Drokin [ 28/Aug/18 ]

yes, it was part of sanity-dom. I;; give your patch a try. Thanks.

Comment by Gerrit Updater [ 04/Sep/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33087/
Subject: LU-11275 llite: check truncate race for DOM pages
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0f7d7b200b582ca4bd8e049f6634ac55f6a481b0

Comment by Peter Jones [ 04/Sep/18 ]

Landed for 2.12

Generated at Sat Feb 10 02:42:28 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.