[LU-18024] sanity-lsnapshot test_1b: Null pointer dreference in queue_work via qmt_lvbo_free - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.16.0
Affects Version/s: Lustre 2.16.0
Labels:
- mtg
- zfs

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

This issue was created by maloo for Oleg Drokin <green@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/c1d8852e-126c-4a30-af92-a8fa44082ee9

test_1b failed with the following error:

onyx-103vm4 crashed during sanity-lsnapshot test_1b

Test session details:
clients: https://build.whamcloud.com/job/lustre-master/4541 - 5.14.0-362.24.1.el9_3.x86_64
servers: https://build.whamcloud.com/job/lustre-master/4541 - 5.14.0-362.24.1_lustre.el9.x86_64

for about a month this is a regular crash in sanity-lsnapshot test 1b, traces differ somewhat but always end up in qmt_lvbo_free and then the NULL pointer dereference in the __queue_work:

[13893.061855] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity-lsnapshot test 1b: mount snapshot without original filesystem mounted ========================================================== 08:20:07 \(1718785207\)
[13893.273668] Lustre: DEBUG MARKER: == sanity-lsnapshot test 1b: mount snapshot without original filesystem mounted ========================================================== 08:20:07 (1718785207)
[13893.439579] Lustre: DEBUG MARKER: /usr/sbin/lctl snapshot_create -F lustre -n lss_1b_0
[13895.994830] Lustre: DEBUG MARKER: /usr/sbin/lctl snapshot_list -F lustre -n lss_1b_0 -d
[13900.112891] Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true
[13900.441082] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1
[13900.721496] BUG: kernel NULL pointer dereference, address: 0000000000000102
[13900.722489] #PF: supervisor read access in kernel mode
[13900.723149] #PF: error_code(0x0000) - not-present page
[13900.723783] PGD 0 P4D 0 
[13900.724150] Oops: 0000 [#1] PREEMPT SMP PTI
[13900.724697] CPU: 0 PID: 225194 Comm: umount Kdump: loaded Tainted: P           OE     -------  ---  5.14.0-362.24.1_lustre.el9.x86_64 #1
[13900.726105] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[13900.726808] RIP: 0010:__queue_work+0x20/0x370
[13900.727396] Code: 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 57 41 56 49 89 d6 41 55 41 54 41 89 fc 55 48 89 f5 53 48 83 ec 10 89 7c 24 04 <f6> 86 02 01 00 00 01 0f 85 ac 02 00 00 e8 fe c7 07 00 49 c7 c5 ac
[13900.729479] RSP: 0018:ffffa5290a5e3938 EFLAGS: 00010082
[13900.730140] RAX: ffffffffc1cd86b0 RBX: 0000000000000202 RCX: 0000000000000000
[13900.730998] RDX: ffff990ab188e340 RSI: 0000000000000000 RDI: 0000000000002000
[13900.731848] RBP: 0000000000000000 R08: ffff990aa7fda8b8 R09: ffffa5290a5e3940
[13900.732701] R10: 0000000000000101 R11: 000000000000000f R12: 0000000000002000
[13900.733574] R13: ffff990aa7fda82c R14: ffff990ab188e340 R15: 0000000000000000
[13900.734429] FS:  00007f1bff822540(0000) GS:ffff990b3fc00000(0000) knlGS:0000000000000000
[13900.735387] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13900.736097] CR2: 0000000000000102 CR3: 00000000045f6003 CR4: 00000000001706f0
[13900.736960] Call Trace:
[13900.737318]  <TASK>
[13900.737636]  ? show_trace_log_lvl+0x1c4/0x2df
[13900.738212]  ? show_trace_log_lvl+0x1c4/0x2df
[13900.738774]  ? queue_work_on+0x24/0x30
[13900.739268]  ? __die_body.cold+0x8/0xd
[13900.739765]  ? page_fault_oops+0x134/0x170
[13900.740329]  ? kernelmode_fixup_or_oops+0x84/0x110
[13900.740936]  ? exc_page_fault+0x62/0x150
[13900.741474]  ? asm_exc_page_fault+0x22/0x30
[13900.742034]  ? __pfx_qmt_lvbo_free+0x10/0x10 [lquota]
[13900.742772]  ? __queue_work+0x20/0x370
[13900.743272]  ? __wake_up_common_lock+0x91/0xd0
[13900.743851]  queue_work_on+0x24/0x30
[13900.744325]  qmt_lvbo_free+0xaf/0x160 [lquota]
[13900.744929]  ldlm_resource_putref+0x18a/0x290 [ptlrpc]
[13900.745721]  cfs_hash_for_each_relax+0x1ab/0x480 [libcfs]
[13900.746468]  ? __pfx_ldlm_resource_clean+0x10/0x10 [ptlrpc]
[13900.747268]  ? __pfx_ldlm_resource_clean+0x10/0x10 [ptlrpc]
[13900.748069]  cfs_hash_for_each_nolock+0x12e/0x210 [libcfs]
[13900.748755]  ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc]
[13900.749514]  __ldlm_namespace_free+0x58/0x4f0 [ptlrpc]
[13900.750288]  ldlm_namespace_free_prior+0x5a/0x1f0 [ptlrpc]
[13900.751093]  mdt_fini+0xd6/0x570 [mdt]
[13900.751631]  mdt_device_fini+0x2b/0xc0 [mdt]
[13900.752224]  obd_precleanup+0x1e4/0x220 [obdclass]
[13900.753213]  class_cleanup+0x2d5/0x600 [obdclass]
[13900.753885]  class_process_config+0x10c0/0x1bc0 [obdclass]
[13900.754627]  ? __kmalloc+0x19b/0x370
[13900.755138]  class_manual_cleanup+0x439/0x7a0 [obdclass]
[13900.755871]  server_put_super+0x7ee/0xa40 [ptlrpc]
[13900.756604]  generic_shutdown_super+0x74/0x120
[13900.757193]  kill_anon_super+0x14/0x30
[13900.757681]  deactivate_locked_super+0x31/0xa0
[13900.758272]  cleanup_mnt+0x100/0x160
[13900.758775]  task_work_run+0x5c/0x90
[13900.759257]  exit_to_user_mode_loop+0x122/0x130
[13900.759854]  exit_to_user_mode_prepare+0xb6/0x100
[13900.760450]  syscall_exit_to_user_mode+0x12/0x40
[13900.761045]  do_syscall_64+0x69/0x90
[13900.761515]  ? syscall_exit_to_user_mode+0x22/0x40
[13900.762130]  ? do_syscall_64+0x69/0x90
[13900.762619]  ? exc_page_fault+0x62/0x150
[13900.763134]  entry_SYSCALL_64_after_hwframe+0x72/0xdc

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-lsnapshot test_1b - onyx-103vm4 crashed during sanity-lsnapshot test_1b

Attachments

Issue Links

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(17 mentioned in)

Activity

People

Assignee:: Hongchao Zhang

Reporter:: Maloo

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 11/Jul/24 3:12 PM

Updated:: 01/Nov/24 8:05 AM

Resolved:: 27/Oct/24 10:58 PM