[LU-13536] Lustre ZFS dnode kernel panic Created: 08/May/20 Updated: 20/Apr/21 Resolved: 21/Sep/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | SC Admin (Inactive) | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Dell R740. Centos 7.8. Kernel: 3.10.0-1127.el7.x86_64, lustre-2.12.4-1.el7.x86_64, zfs-0.7.13-1.el7.x86_64, spl-0.7.13-1.el7.x86_64 |
||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
Hi Folks, We recently upgraded our Lustre ZFS servers at SUT and have been experiencing an issues with the ZFS filesytem crashing. Last week we upgraded from Lustre 2.10.5 (plus a dozen patches) & ZFS 0.7.9, over to Lustre 2.12.4 & ZFS 0.7.13 Now if we import and mount our main zfs/lustre filesystem, then and resume Slurm jobs and move onto starting the Slurm partitions we'll hit a kernel panic on the MDS shortly after the partitions are up:
May 8 20:12:37 warble2 kernel: VERIFY(dnode_add_ref(dn, (void *)(uintptr_t)tx->tx_txg)) failed May 8 20:12:37 warble2 kernel: PANIC at dnode.c:1635:dnode_setdirty() May 8 20:12:37 warble2 kernel: Showing stack for process 45209 May 8 20:12:37 warble2 kernel: CPU: 7 PID: 45209 Comm: mdt01_123 Tainted: P OE ------------ 3.10.0-1127.el7.x86_64 #1 May 8 20:12:37 warble2 kernel: Hardware name: Dell Inc. PowerEdge R740/0JM3W2, BIOS 2.5.4 01/13/2020 May 8 20:12:37 warble2 kernel: Call Trace: May 8 20:12:37 warble2 kernel: [<ffffffff9077ff85>] dump_stack+0x19/0x1b May 8 20:12:37 warble2 kernel: [<ffffffffc04d4f24>] spl_dumpstack+0x44/0x50 [spl] May 8 20:12:37 warble2 kernel: [<ffffffffc04d4ff9>] spl_panic+0xc9/0x110 [spl] May 8 20:12:37 warble2 kernel: [<ffffffff900c7780>] ? wake_up_atomic_t+0x30/0x30 May 8 20:12:37 warble2 kernel: [<ffffffffc0c21073>] ? dbuf_rele_and_unlock+0x283/0x4c0 [zfs] May 8 20:12:37 warble2 kernel: [<ffffffffc04d0238>] ? spl_kmem_zalloc+0xd8/0x180 [spl] May 8 20:12:37 warble2 kernel: [<ffffffff90784002>] ? mutex_lock+0x12/0x2f May 8 20:12:37 warble2 kernel: [<ffffffffc0c31a2c>] ? dmu_objset_userquota_get_ids+0x23c/0x440 [zfs] May 8 20:12:37 warble2 kernel: [<ffffffffc0c40f39>] dnode_setdirty+0xe9/0xf0 [zfs] May 8 20:12:37 warble2 kernel: [<ffffffffc0c4120c>] dnode_allocate+0x18c/0x230 [zfs] May 8 20:12:37 warble2 kernel: [<ffffffffc0c2dd2b>] dmu_object_alloc_dnsize+0x34b/0x3e0 [zfs] May 8 20:12:37 warble2 kernel: [<ffffffffc1630032>] __osd_object_create+0x82/0x170 [osd_zfs] May 8 20:12:37 warble2 kernel: [<ffffffffc163027b>] osd_mksym+0x6b/0x110 [osd_zfs] May 8 20:12:37 warble2 kernel: [<ffffffff907850c2>] ? down_write+0x12/0x3d May 8 20:12:37 warble2 kernel: [<ffffffffc162b966>] osd_create+0x316/0xaf0 [osd_zfs] May 8 20:12:37 warble2 kernel: [<ffffffffc18ed9c5>] lod_sub_create+0x1f5/0x480 [lod] May 8 20:12:37 warble2 kernel: [<ffffffffc18de179>] lod_create+0x69/0x340 [lod] May 8 20:12:37 warble2 kernel: [<ffffffffc1622690>] ? osd_trans_create+0x410/0x410 [osd_zfs] May 8 20:12:37 warble2 kernel: [<ffffffffc1958173>] mdd_create_object_internal+0xc3/0x300 [mdd] May 8 20:12:37 warble2 kernel: [<ffffffffc194122b>] mdd_create_object+0x7b/0x820 [mdd] May 8 20:12:37 warble2 kernel: [<ffffffffc194b7b8>] mdd_create+0xdd8/0x14a0 [mdd] May 8 20:12:37 warble2 kernel: [<ffffffffc17d96d4>] mdt_create+0xb54/0x1090 [mdt] May 8 20:12:37 warble2 kernel: [<ffffffffc119ae94>] ? lprocfs_stats_lock+0x24/0xd0 [obdclass] May 8 20:12:37 warble2 kernel: [<ffffffffc17d9d7b>] mdt_reint_create+0x16b/0x360 [mdt] May 8 20:12:37 warble2 kernel: [<ffffffffc17dc963>] mdt_reint_rec+0x83/0x210 [mdt] May 8 20:12:37 warble2 kernel: [<ffffffffc17b9273>] mdt_reint_internal+0x6e3/0xaf0 [mdt] May 8 20:12:37 warble2 kernel: [<ffffffffc17c46e7>] mdt_reint+0x67/0x140 [mdt] May 8 20:12:37 warble2 kernel: [<ffffffffc14af64a>] tgt_request_handle+0xada/0x1570 [ptlrpc] May 8 20:12:37 warble2 kernel: [<ffffffffc1488d91>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] May 8 20:12:37 warble2 kernel: [<ffffffffc07dcbde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] May 8 20:12:37 warble2 kernel: [<ffffffffc145447b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] May 8 20:12:37 warble2 kernel: [<ffffffffc1451295>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc] May 8 20:12:37 warble2 kernel: [<ffffffff900d3dc3>] ? __wake_up+0x13/0x20 May 8 20:12:37 warble2 kernel: [<ffffffffc1457de4>] ptlrpc_main+0xb34/0x1470 [ptlrpc] May 8 20:12:37 warble2 kernel: [<ffffffffc14572b0>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc] May 8 20:12:37 warble2 kernel: [<ffffffff900c6691>] kthread+0xd1/0xe0 May 8 20:12:37 warble2 kernel: [<ffffffff900c65c0>] ? insert_kthread_work+0x40/0x40 May 8 20:12:37 warble2 kernel: [<ffffffff90792d1d>] ret_from_fork_nospec_begin+0x7/0x21 May 8 20:12:37 warble2 kernel: [<ffffffff900c65c0>] ? insert_kthread_work+0x40/0x40 This issue has come up once last week, and twice tonight. We note there's a little bit of chatter over at: https://github.com/openzfs/zfs/issues/8705 but no real feedback yet, and it's been open for some time now. Are there any recommendations from the experience of Lustre developers on how we might mitigate this particular problem? Right now we're cloning our server image to include ZFS 0.8.3 to see if that will help.
Cheers, Simon
|
| Comments |
| Comment by Andreas Dilger [ 08/May/20 ] |
|
Have you tried disabling the ZFS dnodesize=auto property? That may avoid the frequent crashes while this issue is investigated. |
| Comment by Peter Jones [ 08/May/20 ] |
|
Alex Can you please advise here? Thanks Peter |
| Comment by SC Admin (Inactive) [ 09/May/20 ] |
|
Thanks Andreas, We have not changed the dnodesize setting on our datasets. If we were to do so, how would we best determine the most suitable fixed value? It looks like the most common sizes for dnsize on our MDT are 512 & 1K, but that's just skimming through a small selection - not the entire FS.{{}} The change to ZFS 0.8.3 went OK. No issues with importing and mounting ZFS & Lustre. Starting the partition in Slurm this time did not cause a crash. That's not to say it was the cause, it just happened to be one loose theory based only on a few occurances.
Cheers, Simon |
| Comment by Andreas Dilger [ 09/May/20 ] |
|
The original behavior (before dnodesize was an option) can be had with dnodesize=512, and would be the de-facto safest option to use. You could try using a static dnodesize=1024 to still give you better xattr performance, possibly avoiding whatever problem that dnodesize=auto is causing, but it may be that =1024 is itself the problem. It is definitely also possible that 0.8.3 has fixed issues in this area that were not backported to 0.7.13. |
| Comment by Peter Jones [ 09/May/20 ] |
|
I could be mistaken but wouldn't moving to ZFS 0.8.3 require also switching to RHEL 8.x servers? If so, I would caution that this is still in the early stages of support so would need some careful testing against your workloads before considering it ready for production. Work is active in this area but, at the time of writing, this is still somewhat of a WIP. |
| Comment by Peter Jones [ 09/May/20 ] |
|
> The change to ZFS 0.8.3 went OK. No issues with importing and mounting ZFS & Lustre Hmm I was working backwards through my email so had not seen that when I posted the above. Clearly my memory is faulty on this occasion. |
| Comment by Peter Jones [ 18/Sep/20 ] |
|
scadmin just to check in this old ticket to confirm - is no news good news and the move to ZFS 0.8.x resolved this issue for you? |
| Comment by SC Admin (Inactive) [ 21/Sep/20 ] |
|
Thanks guys. Yes, it's possible the issue has now been resolved in the later ZFS release. That problem doesn't seem to have come back to bite us yet! Let's archive this case. Cheers, |
| Comment by Peter Jones [ 21/Sep/20 ] |
|
0k - thanks! |
| Comment by Aurelien Degremont (Inactive) [ 26/Jan/21 ] |
|
For reference, we could not reproduce this crash after applying these 2 patches from zfs-0.8.0-rc3 on top of zfs-0.7.13
|
| Comment by Kaizaad Bilimorya [ 20/Apr/21 ] |
|
Thanks for finding those patches @degremoa @pjones We are hitting this issue with Lustre 2.12.6 and I think there have been a few other reports. Anecdotally, I think we hit this bug when we have "badly behaving" jobs that heavily stress the MDS. I installed the "zfs-dkms-0.7.13.rpm" from the Whamcloud download site along with the other Lustre required software. What do you think about Whamcloud applying the above two patches and re-rolling this rpm specifically for Lustre? thanks -k |