Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/37653b0d-00b2-48d6-b8ce-0c09bb5f3f0a
test_136 failed with the following error:
trevis-99vm4 crashed during sanity test_136
Test session details:
clients: https://build.whamcloud.com/job/lustre-master-patchless/772 - 4.18.0-240.22.1.el8_3.x86_64
servers: https://build.whamcloud.com/job/lustre-master-patchless/772 - 4.18.0-240.22.1.el8_3.x86_64
It looks like this has been crashing for a long time, but only in full testing because the test is only run with "SLOW=y", and is skipped otherwise. The test itself is allocating and immediately deleting about 150k files in a loop.
The first failure is on 2023-04-22 with commit v2_15_55-90-g73a7b1c2a3, and it looks like the early test failures are all with ZFS hitting OOM:
https://testing.whamcloud.com/test_sets/37653b0d-00b2-48d6-b8ce-0c09bb5f3f0a
but by 2023-07-01 they are hitting a NULL pointer dereference (likely also an allocation failure before OOM) in the ZFS inode handling:
https://testing.whamcloud.com/test_sets/855d00e9-7906-465b-96cf-f7cfbc08c16a
[19856.663841] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 [19856.673442] CPU: 1 PID: 851991 Comm: dp_sync_taskq 4.18.0-477.10.1.el8_lustre.x86_64 #1 [19856.675740] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [19856.676816] RIP: 0010:arc_write+0xf5/0x460 [zfs] [19856.692605] Call Trace: [19856.696922] dbuf_write+0x2ff/0x550 [zfs] [19856.700667] dbuf_sync_leaf+0x137/0x660 [zfs] [19856.703284] dbuf_sync_list+0xcf/0x120 [zfs] [19856.704161] dbuf_sync_indirect+0xe2/0x170 [zfs] [19856.705108] dbuf_sync_list+0xae/0x120 [zfs] [19856.705992] dbuf_sync_indirect+0xe2/0x170 [zfs] [19856.706935] dbuf_sync_list+0xae/0x120 [zfs] [19856.707814] dnode_sync+0x365/0xa20 [zfs] [19856.709421] sync_dnodes_task+0x71/0xa0 [zfs] [19856.710341] taskq_thread+0x2e1/0x510 [spl] [19856.712759] kthread+0x134/0x150
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_136 - trevis-99vm4 crashed during sanity test_136