[LU-16562] sanity test_408: aarch64 crash with NULL pointer dereference at virtual address 00000000000000a0 Created: 15/Feb/23 Updated: 25/Aug/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | arm | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com> This issue relates to the following test suite runs (both aarch64 clients): test_408 failed with the following error: onyx-91vm11 crashed during sanity test_408 [19747.669390] Lustre: DEBUG MARKER: == sanity test 408: drop_caches should not hang due to page leaks ========================================================== 21:17:51 (1676409471) [19747.880492] Lustre: *** cfs_fail_loc=40a, val=0*** [19747.884133] LustreError: 631631:0:(osc_request.c:2756:osc_build_rpc()) prep_req failed: -22 [19747.890192] LustreError: 631631:0:(osc_cache.c:2199:osc_check_rpcs()) Read request failed with -22 [19749.278274] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a0 [19749.282908] bash (1056941): drop_caches: 2 [19749.319912] Internal error: Oops: 96000005 [#1] SMP [19749.356011] CPU: 0 PID: 1057184 Comm: ldlm_bl_06 Kdump: loaded 4.18.0-372.32.1.el8_6.aarch64 #1 [19749.365795] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [19749.371401] pstate: 80000005 (Nzcv daif -PAN -UAO) [19749.375217] pc : ll_lock_cancel_bits+0x7b0/0xfa0 [lustre] [19749.379986] lr : ll_lock_cancel_bits+0x54/0xfa0 [lustre] [19749.449339] Process ldlm_bl_06 (pid: 1057184, stack limit = 0x00000000b232b25f) [19749.455034] Call trace: [19749.456943] ll_lock_cancel_bits+0x7b0/0xfa0 [lustre] [19749.461172] ll_md_blocking_ast+0x1d0/0x410 [lustre] [19749.465320] ldlm_cancel_callback+0x74/0x368 [ptlrpc] [19749.470399] ldlm_cli_cancel_local+0x100/0x7a8 [ptlrpc] [19749.474833] ldlm_cli_cancel_list_local+0x118/0x440 [ptlrpc] [19749.479597] ldlm_bl_thread_main+0x920/0xc60 [ptlrpc] [19749.483892] kthread+0x128/0x138 Test session details: <<Please provide additional information about the failure here>> VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |
| Comments |
| Comment by Andreas Dilger [ 16/Feb/23 ] |
|
It looks like the first such failure was 2023-02-14. Patches landed in that timeframe are: $ git log --oneline --after 2023-02-13 --before 2023-02-15 master eed4d4c752 LU-16536 osp: don't cleanup ldlm in precleanup phase 1c8b40d5e4 LU-16493 tests: recovery-small/144b to wait longer 19c38f6c94 LU-16515 clio: Remove cl_page_size() * 1f034cf610 LU-16532 sec: session key bad keyring * 7fe7f4ca06 LU-16520 build: Move strscpy to libcfs common header 7fcef255d2 LU-16502 lutf: cleanup lutf_start.py, fix bugs 3cd0bb6968 LU-16502 lutf: fix bugs in bash scripts 9a72c073d3 LU-16494 fileset: check fileset for operations by fid * a2de6af65d LU-16479 utils: Add option to manage degraded ZFS OST 90e1f2ee0c LU-16428 tests: cache is_project_quota_supported result d2b633226e LU-16382 spec: use pkgconfig() as appropriate. 941d59e7b9 LU-16382 spec: Don't include Group: tags. 9cb4b10c87 LU-14224 misc: add firewalld service configuration 3c69d46e17 LU-14111 obdclass: count eviction per obd_device ? 511bf2f4cc LU-16501 tgt: skip free inodes in OST weights 51136f2dc6 LU-6142 lov: use list_for_each_entry in lov_obd.c * c1936c9d29 LU-14918 osd: don't declare similar zfs writes twice 9e6225b2e7 LU-14918 osd: don't declare similar ldiskfs writes twice f16c31ccd9 LU-16454 mdt: Add a per-MDT "max_mod_rpcs_in_flight" Patches marked by '*' affect the client, and '?' might affect the client, so there aren't many candidate patches that could have triggered this. |
| Comment by James A Simmons [ 25/Aug/23 ] |
|
Do you think patch https://review.whamcloud.com/c/fs/lustre-release/+/47086 resolved this? |