[LU-14021] NULL pointer dereference in _raw_write_lock Created: 09/Oct/20  Updated: 10/Aug/22  Resolved: 29/Sep/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.3
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Minor
Reporter: Wang Shilong (Inactive) Assignee: Wang Shilong (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-13182 MAP_POPULATE hangs with Linux 5.4 Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
[3532997.598295] BUG: unable to handle kernel NULL pointer dereference at 00000000000000cc
[3532997.608183] IP: [<ffffffffb096a1cd>] _raw_write_lock+0xd/0x20
[3532997.615694] PGD 1b9cadc067 PUD 2dc3b28067 PMD 0
[3532997.621960] Oops: 0002 [#1] SMP
[3532997.626619] Modules linked in: tcp_diag udp_diag inet_diag mgc(OE) lustre(OE) lmv(OE) mdc(OE) fid(OE) osc(OE) lov(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) krm_oom_notify(OE) vtsspp(OE) sep5(OE) ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter socperf3(OE) xt_conntrack nf_nat nf_conntrack br_netfilter bridge(E) stp(E) llc(E) overlay(T) pax(OE) mptctl(E) mptbase(E) ib_isert(E) iscsi_target_mod(E) target_core_mod(E) ib_ucm(E) iTCO_wdt iTCO_vendor_support skx_edac intel_powerclamp coretemp(E) intel_rapl iosf_mbi kvm_intel kvm rpcrdma irqbypass crc32_pclmul ghash_clmulni_intel sunrpc(E) aesni_intel lrw gf128mul glue_helper ablk_helper cryptd opa_vnic rdma_ucm(E) ib_umad(E)
[3532997.711300]  ib_uverbs(E) ib_ipoib(OE) ib_iser(E) rdma_cm(E) iw_cm(E) libiscsi(E) scsi_transport_iscsi(E) ib_cm(E) mgag200 ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops pcspkr drm drm_panel_orientation_quirks sg(E) lpc_ich i2c_i801 ipmi_si(E) ipmi_devintf(E) ipmi_msghandler(E) mei_me mei acpi_power_meter ip_tables xfs(E) libcrc32c(E) sd_mod(E) crc_t10dif(E) crct10dif_generic hfi1(OE) crct10dif_pclmul crct10dif_common(E) crc32c_intel rdmavt(OE) igb(E) ib_core(E) ptp(E) ahci(E) pps_core(E) i2c_algo_bit(E) dca(E) libahci(E) libata(E) nfit libnvdimm
[3532997.770513] CPU: 37 PID: 37932 Comm: sim Kdump: loaded Tainted: G           OE  ------------ T 3.10.0-957.el7.x86_64 #1
[3532997.783571] Hardware name: FUJITSU PRIMERGY CX2550 M5/D3853-B1, BIOS V1.0.0.0 R1.10.0 for D3853-B1x            09/11/2019
[3532997.796823] task: ffff9ae6157cb0c0 ti: ffff9ae3fab2c000 task.ti: ffff9ae3fab2c000
[3532997.806197] RIP: 0010:[<ffffffffb096a1cd>]  [<ffffffffb096a1cd>] _raw_write_lock+0xd/0x20
[3532997.816348] RSP: 0018:ffff9ae3fab2fbd8  EFLAGS: 00010206
[3532997.823314] RAX: ffff9ab98bca5760 RBX: ffff9ab98bca5760 RCX: 0000000000000000
[3532997.832284] RDX: ffff9add8a642a00 RSI: ffffffffc12cf520 RDI: 00000000000000cc
[3532997.841241] RBP: ffff9ae3fab2fbd8 R08: 000000000001f8e0 R09: ffffffffb0293b77
[3532997.850190] R10: fffffd8188334900 R11: fffffd81ee3b8800 R12: 00000000000000c8
[3532997.859131] R13: ffff9ab98bca57d8 R14: ffff9ae3fab2fca0 R15: ffff9ada42358988
[3532997.868063] FS:  00002b870336ba40(0000) GS:ffff9ae61da40000(0000) knlGS:0000000000000000
[3532997.878057] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[3532997.885427] CR2: 00000000000000cc CR3: 0000002dfa87a000 CR4: 00000000007607e0
[3532997.894340] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[3532997.903241] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[3532997.912128] PKRU: 55555554
[3532997.916067] Call Trace:
[3532997.919717]  [<ffffffffc127ff72>] ll_cl_remove+0x42/0x70 [lustre]
[3532997.927445]  [<ffffffffc128ce55>] ll_fault+0x115/0x680 [lustre]
[3532997.934958]  [<ffffffffb03e41da>] __do_fault.isra.59+0x8a/0x100
[3532997.942450]  [<ffffffffb03e478c>] do_read_fault.isra.61+0x4c/0x1b0
[3532997.950230]  [<ffffffffb03e9134>] handle_pte_fault+0x2f4/0xd10
[3532997.957611]  [<ffffffffb02ac4ab>] ? recalc_sigpending+0x1b/0x70
[3532997.965077]  [<ffffffffb02ace21>] ? __set_task_blocked+0x41/0xa0
[3532997.972627]  [<ffffffffb03ebc6d>] handle_mm_fault+0x39d/0x9b0
[3532997.979859]  [<ffffffffb096f5e3>] __do_page_fault+0x203/0x500
[3532997.987074]  [<ffffffffb096f915>] do_page_fault+0x35/0x90
[3532997.993883]  [<ffffffffb096b758>] page_fault+0x28/0x30
[3532998.000398] Code: d2 b8 01 00 00 00 75 07 f0 83 47 04 01 30 c0 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 48 83 c7 04 <f0> ff 0f 74 05 e8 79 c7 c1 ff 5d c3 0f 1f 80 00 00 00 00 0f 1f
[3532998.023637] RIP  [<ffffffffb096a1cd>] _raw_write_lock+0xd/0x20
[3532998.030988]  RSP <ffff9ae3fab2fbd8>
[3532998.035664] CR2: 00000000000000cc


 Comments   
Comment by Peter Jones [ 09/Oct/20 ]

Wang Shilong (wshilong@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40205
Subject: LU-14021 llite: add asserts when trying to access ll_file_data
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 59cb4a7ee297372d59e3d30596d3b51a98fc796f

Comment by Gerrit Updater [ 10/Aug/21 ]

"Alexander Boyko <alexander.boyko@hpe.com>" uploaded a new patch: https://review.whamcloud.com/44558
Subject: LU-14021 llite: don't touch vma after filemap_fault
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: da267627a18ebe33560383e55cfcfb75ca27099f

Comment by Gerrit Updater [ 11/Sep/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44558/
Subject: LU-14021 llite: don't touch vma after filemap_fault
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0f5d3c4b954da2f6b880da243dacec52cb4011a6

Comment by Alexander Boyko [ 28/Sep/21 ]

I guess https://review.whamcloud.com/40205 is not needed anymore, after LU-13182 fix and https://review.whamcloud.com/44558/.

 

Comment by Gerrit Updater [ 10/Aug/22 ]

"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/48181
Subject: LU-14021 llite: don't touch vma after filemap_fault
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: ade80ddcb667925a5195cb392e516605ac08803c

Comment by Etienne Aujames [ 10/Aug/22 ]

We run into this issue at the CEA on Bull 2.12.7 clients.

Generated at Sat Feb 10 03:06:10 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.