Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.16.0
-
3
-
9223372036854775807
Description
This issue was created by maloo for Oleg Drokin <green@whamcloud.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/c1d8852e-126c-4a30-af92-a8fa44082ee9
test_1b failed with the following error:
onyx-103vm4 crashed during sanity-lsnapshot test_1b
Test session details:
clients: https://build.whamcloud.com/job/lustre-master/4541 - 5.14.0-362.24.1.el9_3.x86_64
servers: https://build.whamcloud.com/job/lustre-master/4541 - 5.14.0-362.24.1_lustre.el9.x86_64
for about a month this is a regular crash in sanity-lsnapshot test 1b, traces differ somewhat but always end up in qmt_lvbo_free and then the NULL pointer dereference in the __queue_work:
[13893.061855] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity-lsnapshot test 1b: mount snapshot without original filesystem mounted ========================================================== 08:20:07 \(1718785207\) [13893.273668] Lustre: DEBUG MARKER: == sanity-lsnapshot test 1b: mount snapshot without original filesystem mounted ========================================================== 08:20:07 (1718785207) [13893.439579] Lustre: DEBUG MARKER: /usr/sbin/lctl snapshot_create -F lustre -n lss_1b_0 [13895.994830] Lustre: DEBUG MARKER: /usr/sbin/lctl snapshot_list -F lustre -n lss_1b_0 -d [13900.112891] Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true [13900.441082] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1 [13900.721496] BUG: kernel NULL pointer dereference, address: 0000000000000102 [13900.722489] #PF: supervisor read access in kernel mode [13900.723149] #PF: error_code(0x0000) - not-present page [13900.723783] PGD 0 P4D 0 [13900.724150] Oops: 0000 [#1] PREEMPT SMP PTI [13900.724697] CPU: 0 PID: 225194 Comm: umount Kdump: loaded Tainted: P OE ------- --- 5.14.0-362.24.1_lustre.el9.x86_64 #1 [13900.726105] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [13900.726808] RIP: 0010:__queue_work+0x20/0x370 [13900.727396] Code: 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 57 41 56 49 89 d6 41 55 41 54 41 89 fc 55 48 89 f5 53 48 83 ec 10 89 7c 24 04 <f6> 86 02 01 00 00 01 0f 85 ac 02 00 00 e8 fe c7 07 00 49 c7 c5 ac [13900.729479] RSP: 0018:ffffa5290a5e3938 EFLAGS: 00010082 [13900.730140] RAX: ffffffffc1cd86b0 RBX: 0000000000000202 RCX: 0000000000000000 [13900.730998] RDX: ffff990ab188e340 RSI: 0000000000000000 RDI: 0000000000002000 [13900.731848] RBP: 0000000000000000 R08: ffff990aa7fda8b8 R09: ffffa5290a5e3940 [13900.732701] R10: 0000000000000101 R11: 000000000000000f R12: 0000000000002000 [13900.733574] R13: ffff990aa7fda82c R14: ffff990ab188e340 R15: 0000000000000000 [13900.734429] FS: 00007f1bff822540(0000) GS:ffff990b3fc00000(0000) knlGS:0000000000000000 [13900.735387] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [13900.736097] CR2: 0000000000000102 CR3: 00000000045f6003 CR4: 00000000001706f0 [13900.736960] Call Trace: [13900.737318] <TASK> [13900.737636] ? show_trace_log_lvl+0x1c4/0x2df [13900.738212] ? show_trace_log_lvl+0x1c4/0x2df [13900.738774] ? queue_work_on+0x24/0x30 [13900.739268] ? __die_body.cold+0x8/0xd [13900.739765] ? page_fault_oops+0x134/0x170 [13900.740329] ? kernelmode_fixup_or_oops+0x84/0x110 [13900.740936] ? exc_page_fault+0x62/0x150 [13900.741474] ? asm_exc_page_fault+0x22/0x30 [13900.742034] ? __pfx_qmt_lvbo_free+0x10/0x10 [lquota] [13900.742772] ? __queue_work+0x20/0x370 [13900.743272] ? __wake_up_common_lock+0x91/0xd0 [13900.743851] queue_work_on+0x24/0x30 [13900.744325] qmt_lvbo_free+0xaf/0x160 [lquota] [13900.744929] ldlm_resource_putref+0x18a/0x290 [ptlrpc] [13900.745721] cfs_hash_for_each_relax+0x1ab/0x480 [libcfs] [13900.746468] ? __pfx_ldlm_resource_clean+0x10/0x10 [ptlrpc] [13900.747268] ? __pfx_ldlm_resource_clean+0x10/0x10 [ptlrpc] [13900.748069] cfs_hash_for_each_nolock+0x12e/0x210 [libcfs] [13900.748755] ldlm_namespace_cleanup+0x2b/0xc0 [ptlrpc] [13900.749514] __ldlm_namespace_free+0x58/0x4f0 [ptlrpc] [13900.750288] ldlm_namespace_free_prior+0x5a/0x1f0 [ptlrpc] [13900.751093] mdt_fini+0xd6/0x570 [mdt] [13900.751631] mdt_device_fini+0x2b/0xc0 [mdt] [13900.752224] obd_precleanup+0x1e4/0x220 [obdclass] [13900.753213] class_cleanup+0x2d5/0x600 [obdclass] [13900.753885] class_process_config+0x10c0/0x1bc0 [obdclass] [13900.754627] ? __kmalloc+0x19b/0x370 [13900.755138] class_manual_cleanup+0x439/0x7a0 [obdclass] [13900.755871] server_put_super+0x7ee/0xa40 [ptlrpc] [13900.756604] generic_shutdown_super+0x74/0x120 [13900.757193] kill_anon_super+0x14/0x30 [13900.757681] deactivate_locked_super+0x31/0xa0 [13900.758272] cleanup_mnt+0x100/0x160 [13900.758775] task_work_run+0x5c/0x90 [13900.759257] exit_to_user_mode_loop+0x122/0x130 [13900.759854] exit_to_user_mode_prepare+0xb6/0x100 [13900.760450] syscall_exit_to_user_mode+0x12/0x40 [13900.761045] do_syscall_64+0x69/0x90 [13900.761515] ? syscall_exit_to_user_mode+0x22/0x40 [13900.762130] ? do_syscall_64+0x69/0x90 [13900.762619] ? exc_page_fault+0x62/0x150 [13900.763134] entry_SYSCALL_64_after_hwframe+0x72/0xdc
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-lsnapshot test_1b - onyx-103vm4 crashed during sanity-lsnapshot test_1b
Attachments
Issue Links
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...