[LU-13536] Lustre ZFS dnode kernel panic Created: 08/May/20  Updated: 20/Apr/21  Resolved: 21/Sep/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.4
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: SC Admin (Inactive) Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: None
Environment:

Dell R740. Centos 7.8. Kernel: 3.10.0-1127.el7.x86_64, lustre-2.12.4-1.el7.x86_64, zfs-0.7.13-1.el7.x86_64, spl-0.7.13-1.el7.x86_64


Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Hi Folks,

We recently upgraded our Lustre ZFS servers at SUT and have been experiencing an issues with the ZFS filesytem crashing. Last week we upgraded from Lustre 2.10.5 (plus a dozen patches) & ZFS 0.7.9, over to Lustre 2.12.4 & ZFS 0.7.13

Now if we import and mount our main zfs/lustre filesystem, then and resume Slurm jobs and move onto starting the Slurm partitions we'll hit a kernel panic on the MDS shortly after the partitions are up:

 

May  8 20:12:37 warble2 kernel: VERIFY(dnode_add_ref(dn, (void *)(uintptr_t)tx->tx_txg)) failed
May  8 20:12:37 warble2 kernel: PANIC at dnode.c:1635:dnode_setdirty()
May  8 20:12:37 warble2 kernel: Showing stack for process 45209
May  8 20:12:37 warble2 kernel: CPU: 7 PID: 45209 Comm: mdt01_123 Tainted: P           OE  ------------   3.10.0-1127.el7.x86_64 #1
May  8 20:12:37 warble2 kernel: Hardware name: Dell Inc. PowerEdge R740/0JM3W2, BIOS 2.5.4 01/13/2020
May  8 20:12:37 warble2 kernel: Call Trace:
May  8 20:12:37 warble2 kernel: [<ffffffff9077ff85>] dump_stack+0x19/0x1b
May  8 20:12:37 warble2 kernel: [<ffffffffc04d4f24>] spl_dumpstack+0x44/0x50 [spl]
May  8 20:12:37 warble2 kernel: [<ffffffffc04d4ff9>] spl_panic+0xc9/0x110 [spl]
May  8 20:12:37 warble2 kernel: [<ffffffff900c7780>] ? wake_up_atomic_t+0x30/0x30
May  8 20:12:37 warble2 kernel: [<ffffffffc0c21073>] ? dbuf_rele_and_unlock+0x283/0x4c0 [zfs]
May  8 20:12:37 warble2 kernel: [<ffffffffc04d0238>] ? spl_kmem_zalloc+0xd8/0x180 [spl]
May  8 20:12:37 warble2 kernel: [<ffffffff90784002>] ? mutex_lock+0x12/0x2f
May  8 20:12:37 warble2 kernel: [<ffffffffc0c31a2c>] ? dmu_objset_userquota_get_ids+0x23c/0x440 [zfs]
May  8 20:12:37 warble2 kernel: [<ffffffffc0c40f39>] dnode_setdirty+0xe9/0xf0 [zfs]
May  8 20:12:37 warble2 kernel: [<ffffffffc0c4120c>] dnode_allocate+0x18c/0x230 [zfs]
May  8 20:12:37 warble2 kernel: [<ffffffffc0c2dd2b>] dmu_object_alloc_dnsize+0x34b/0x3e0 [zfs]
May  8 20:12:37 warble2 kernel: [<ffffffffc1630032>] __osd_object_create+0x82/0x170 [osd_zfs]
May  8 20:12:37 warble2 kernel: [<ffffffffc163027b>] osd_mksym+0x6b/0x110 [osd_zfs]
May  8 20:12:37 warble2 kernel: [<ffffffff907850c2>] ? down_write+0x12/0x3d
May  8 20:12:37 warble2 kernel: [<ffffffffc162b966>] osd_create+0x316/0xaf0 [osd_zfs]
May  8 20:12:37 warble2 kernel: [<ffffffffc18ed9c5>] lod_sub_create+0x1f5/0x480 [lod]
May  8 20:12:37 warble2 kernel: [<ffffffffc18de179>] lod_create+0x69/0x340 [lod]
May  8 20:12:37 warble2 kernel: [<ffffffffc1622690>] ? osd_trans_create+0x410/0x410 [osd_zfs]
May  8 20:12:37 warble2 kernel: [<ffffffffc1958173>] mdd_create_object_internal+0xc3/0x300 [mdd]
May  8 20:12:37 warble2 kernel: [<ffffffffc194122b>] mdd_create_object+0x7b/0x820 [mdd]
May  8 20:12:37 warble2 kernel: [<ffffffffc194b7b8>] mdd_create+0xdd8/0x14a0 [mdd]
May  8 20:12:37 warble2 kernel: [<ffffffffc17d96d4>] mdt_create+0xb54/0x1090 [mdt]
May  8 20:12:37 warble2 kernel: [<ffffffffc119ae94>] ? lprocfs_stats_lock+0x24/0xd0 [obdclass]
May  8 20:12:37 warble2 kernel: [<ffffffffc17d9d7b>] mdt_reint_create+0x16b/0x360 [mdt]
May  8 20:12:37 warble2 kernel: [<ffffffffc17dc963>] mdt_reint_rec+0x83/0x210 [mdt]
May  8 20:12:37 warble2 kernel: [<ffffffffc17b9273>] mdt_reint_internal+0x6e3/0xaf0 [mdt]
May  8 20:12:37 warble2 kernel: [<ffffffffc17c46e7>] mdt_reint+0x67/0x140 [mdt]
May  8 20:12:37 warble2 kernel: [<ffffffffc14af64a>] tgt_request_handle+0xada/0x1570 [ptlrpc]
May  8 20:12:37 warble2 kernel: [<ffffffffc1488d91>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc]
May  8 20:12:37 warble2 kernel: [<ffffffffc07dcbde>] ? ktime_get_real_seconds+0xe/0x10 [libcfs]
May  8 20:12:37 warble2 kernel: [<ffffffffc145447b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
May  8 20:12:37 warble2 kernel: [<ffffffffc1451295>] ? ptlrpc_wait_event+0xa5/0x360 [ptlrpc]
May  8 20:12:37 warble2 kernel: [<ffffffff900d3dc3>] ? __wake_up+0x13/0x20
May  8 20:12:37 warble2 kernel: [<ffffffffc1457de4>] ptlrpc_main+0xb34/0x1470 [ptlrpc]
May  8 20:12:37 warble2 kernel: [<ffffffffc14572b0>] ? ptlrpc_register_service+0xf80/0xf80 [ptlrpc]
May  8 20:12:37 warble2 kernel: [<ffffffff900c6691>] kthread+0xd1/0xe0
May  8 20:12:37 warble2 kernel: [<ffffffff900c65c0>] ? insert_kthread_work+0x40/0x40
May  8 20:12:37 warble2 kernel: [<ffffffff90792d1d>] ret_from_fork_nospec_begin+0x7/0x21
May  8 20:12:37 warble2 kernel: [<ffffffff900c65c0>] ? insert_kthread_work+0x40/0x40

This issue has come up once last week, and twice tonight. We note there's a little bit of chatter over at: https://github.com/openzfs/zfs/issues/8705 but no real feedback yet, and it's been open for some time now. Are there any recommendations from the experience of Lustre developers on how we might mitigate this particular problem?

Right now we're cloning our server image to include ZFS 0.8.3 to see if that will help.

 

Cheers,

Simon

 



 Comments   
Comment by Andreas Dilger [ 08/May/20 ]

Have you tried disabling the ZFS dnodesize=auto property? That may avoid the frequent crashes while this issue is investigated.

Comment by Peter Jones [ 08/May/20 ]

Alex

Can you please advise here?

Thanks

Peter

Comment by SC Admin (Inactive) [ 09/May/20 ]

Thanks Andreas,

We have not changed the dnodesize setting on our datasets. If we were to do so, how would we best determine the most suitable fixed value? It looks like the most common sizes for dnsize on our MDT are 512 & 1K, but that's just skimming through a small selection - not the entire FS.{{}}

The change to ZFS 0.8.3 went OK. No issues with importing and mounting ZFS & Lustre. Starting the partition in Slurm this time did not cause a crash. That's not to say it was the cause, it just happened to be one loose theory based only on a few occurances.

 

Cheers,

Simon

Comment by Andreas Dilger [ 09/May/20 ]

The original behavior (before dnodesize was an option) can be had with dnodesize=512, and would be the de-facto safest option to use. You could try using a static dnodesize=1024 to still give you better xattr performance, possibly avoiding whatever problem that dnodesize=auto is causing, but it may be that =1024 is itself the problem.

It is definitely also possible that 0.8.3 has fixed issues in this area that were not backported to 0.7.13.

Comment by Peter Jones [ 09/May/20 ]

I could be mistaken but wouldn't moving to ZFS 0.8.3 require also switching to RHEL 8.x servers? If so, I would caution that this is still in the early stages of support so would need some careful testing against your workloads before considering it ready for production. Work is active in this area but, at the time of writing, this is still somewhat of a WIP.

Comment by Peter Jones [ 09/May/20 ]

> The change to ZFS 0.8.3 went OK. No issues with importing and mounting ZFS & Lustre

Hmm I was working backwards through my email so had not seen that when I posted the above. Clearly my memory is faulty on this occasion.

Comment by Peter Jones [ 18/Sep/20 ]

scadmin just to check in this old ticket to confirm - is no news good news and the move to ZFS 0.8.x resolved this issue for you?

Comment by SC Admin (Inactive) [ 21/Sep/20 ]

Thanks guys.

Yes, it's possible the issue has now been resolved in the later ZFS release. That problem doesn't seem to have come back to bite us yet! Let's archive this case.

Cheers,
simon

Comment by Peter Jones [ 21/Sep/20 ]

0k - thanks!

Comment by Aurelien Degremont (Inactive) [ 26/Jan/21 ]

For reference, we could not reproduce this crash after applying these 2 patches from zfs-0.8.0-rc3 on top of zfs-0.7.13

  • 78e213946 Fix dnode_hold() freeing dnode behavior
  • 58769a4eb Don’t allow dnode allocation if dn_holds != 0
Comment by Kaizaad Bilimorya [ 20/Apr/21 ]

Thanks for finding those patches @degremoa

@pjones We are hitting this issue with Lustre 2.12.6 and I think there have been a few other reports. Anecdotally, I think we hit this bug when we have "badly behaving" jobs that heavily stress the MDS.

I installed the "zfs-dkms-0.7.13.rpm" from the  Whamcloud download site along with the other Lustre required software. What do you think about Whamcloud applying the above two patches and re-rolling this rpm specifically for Lustre?

thanks

-k

Generated at Sat Feb 10 03:02:07 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.