[LU-17012] conf-sanity test_113: crashed during conf-sanity test_113 Created: 03/Aug/23  Updated: 04/Aug/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-16954 mount failed: File exists(cannot crea... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

This issue relates to the following test suite run:
https://testing.whamcloud.com/test_sets/7a49f99b-6593-423d-97ce-6bb8790f459d

test_113 failed with the following error:

[13531.690472] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[13531.692557] Oops: 0000 [#1] SMP PTI
[13531.693244] CPU: 1 PID: 1175920 Comm: llog_process_th 4.18.0-425.10.1.el8_lustre.x86_64 #1
[13531.695593] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[13531.696668] RIP: 0010:ls_device_get+0x1e3/0x3b0 [obdclass]
[13531.712774] Call Trace:
[13531.713318]  local_oid_storage_init+0xb8/0x16c0 [obdclass]
[13531.714401]  llog_osd_setup+0x9d/0x400 [obdclass]
[13531.715359]  llog_setup.part.6+0x146/0x840 [obdclass]
[13531.716373]  osp_sync_llog_init+0x1e0/0xb10 [osp]
[13531.718232]  osp_sync_init+0x262/0x770 [osp]
[13531.719065]  osp_init0.isra.19+0x1689/0x19f0 [osp]
[13531.720005]  osp_device_alloc+0xcb/0x180 [osp]
[13531.720878]  obd_setup+0x119/0x300 [obdclass]
[13531.721771]  class_setup+0x587/0x7a0 [obdclass]
[13531.722687]  class_process_config+0x1248/0x2160 [obdclass]
[13531.723771]  class_config_llog_handler+0x93b/0x12e0 [obdclass]
[13531.724916]  llog_process_thread+0xedf/0x1b60 [obdclass]
[13531.728786]  llog_process_thread_daemonize+0x9b/0xe0 [obdclass]
[13531.729936]  kthread+0x10b/0x130

Test session details:
clients: https://build.whamcloud.com/job/lustre-master/4447 - 5.15.0-52-generic
servers: https://build.whamcloud.com/job/lustre-master/4447 - 4.18.0-425.10.1.el8_lustre.x86_64

There have been 13 failures since 2023-07-08, but none in the two months before then.

$ git log --oneline --after 2023-07-06 --before 2023-07-09
51d62f2122fe LU-16637 llite: call truncate_inode_pages() in inode lock
0cb7ebf22304 LU-16927 tests: improve sanity-quota
629d6bca95f9 LU-8191 tests: convert functions to static
97df1cba957b LU-16925 osd-ldiskfs: Remove unused bio_integrity_enabled
1defc11dfa59 LU-16922 kernel: update RHEL 9.2 [5.14.0-284.18.1.el9_2]
a1d332f613ac LU-8191 mdt: convert functions to static
094ae18ed8a9 LU-16548 lnet: Fixing missing gnilnd define CURRENT_LND_VERSION
0d77e94b4793 LU-16723 parser: fix help hanging
46a9abf4330e LU-16890 obd: OBD_FREE_PRE() to ignore NULL pointers
9190af53287b LU-16899 gnilnd: Use libcfs_nidstr and fix typo
35017d0973bb LU-16898 osd-ldiskfs: do not return dr_error from past RPC
4fc3c208422e LU-16518 obd: fix style and clang error
acdc2c8bb7aa LU-16796 libcfs: Remove reference to LASSERT_ATOMIC_GT
c1915c5f0dd8 LU-16846 nrs: Fix console messages
4ce452292fbe LU-16842 fsx: tolerate delete last non-stale mirror error
7ea4e0c7c534 LU-12019 build: Recognize Debian Kernel and set KMP dir
c2f548dacc5f LU-16805 llite: improve readpage debug
f5a75ea44db3 LU-16697 llite: Set BDI_CAP_* flags for lustre
b16c9333a008 LU-16691 ldiskfs: limit length of per-inode prealloc list
bba59b1287c9 LU-16651 llite: hold invalidate_lock when invalidate cache pages
3ef773db80fc LU-16594 build: get_random_u32_below, get_acl with dentry
e7cf1fc1f274 LU-13340 lustre: Support large nids in LCFG_ADD_UUID
7f1aa5b66b24 LU-16518 build: llvm/clang support
e3e91ea95fd9 LU-13343 gss: no sec flavor on loopback connection
530a302e10fc LU-12511 build: include firewalld files for native Linux client
aac625055e50 LU-12511 llite: use mapping_set_error instead of opencoded set_bit
84bb366642e8 LU-16847 ldiskfs: refactor t10 code.
9e5040a304a9 LU-16847 ldiskfs: do not copy ldiskfs_chunk_trans_blocks

None of these patches look particularly related to the crash. However, it looks like there is always an earlier failure due to LU-16954 from the BDI_CAP patch.

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
conf-sanity test_113 - onyx-65vm4 crashed during conf-sanity test_113


Generated at Sat Feb 10 03:31:52 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.