[LU-15143] journal_dirty_metadata failed: handle type 0 started at line 1926, credits 9/0, errcode -28 Created: 21/Oct/21 Updated: 20/Nov/21 Resolved: 20/Nov/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Andrew Perepechko | Assignee: | Andrew Perepechko |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
[17750.028365] LDISKFS-fs: ldiskfs_getblk:896: aborting transaction: error 28 in __ldiskfs_handle_dirty_metadata [17750.028367] LDISKFS-fs error (device md66): ldiskfs_getblk:896: inode #85: block 3107684352: comm mdt06_007: journal_dirty_metadata failed: handle type 0 started at line 1926, credits 9/0, errcode -28 [17750.640480] Aborting journal on device md66-8. [17750.640508] Kernel panic - not syncing: LDISKFS-fs (device md66): panic forced after error Lustre crashes with the following stack trace: [17750.640510] Call Trace: [17750.640512] [<ffffffffb4164e41>] dump_stack+0x19/0x1b [17750.640514] [<ffffffffb415e550>] panic+0xe8/0x21f [17750.640524] [<ffffffffc14b3c86>] ldiskfs_handle_error.part.191+0xa6/0xb0 [ldiskfs] [17750.640530] [<ffffffffc14b3ffa>] __ldiskfs_error_inode+0xaa/0x180 [ldiskfs] [17750.640535] [<ffffffffc1475f9a>] ? ldiskfs_journal_abort_handle.isra.5+0xaa/0xc0 [ldiskfs] [17750.640539] [<ffffffffc1476627>] __ldiskfs_handle_dirty_metadata+0x137/0x220 [ldiskfs] [17750.640545] [<ffffffffc149a891>] ldiskfs_getblk+0x131/0x200 [ldiskfs] [17750.640550] [<ffffffffc149a987>] ldiskfs_bread+0x27/0xc0 [ldiskfs] [17750.640558] [<ffffffffc153c370>] osd_write+0x4c0/0xc30 [osd_ldiskfs] [17750.640564] [<ffffffffc1493dcf>] ? ldiskfs_xattr_find_entry+0x3f/0x130 [ldiskfs] [17750.640582] [<ffffffffc0c1f3b6>] dt_record_write+0x36/0x120 [obdclass] [17750.640614] [<ffffffffc0f04cbe>] tgt_mk_reply_data+0x49e/0xaa0 [ptlrpc] [17750.640630] [<ffffffffc0bfc919>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [17750.640658] [<ffffffffc0f05fa2>] tgt_last_rcvd_update+0x7d2/0xa00 [ptlrpc] [17750.640659] [<ffffffffb3c1c412>] ? kmem_cache_alloc+0x1c2/0x1f0 [17750.640662] [<ffffffffc07037ec>] ? start_this_handle+0x3ac/0x430 [jbd2] [17750.640689] [<ffffffffc0f0a9c0>] tgt_txn_stop_cb+0x1a0/0x490 [ptlrpc] [17750.640706] [<ffffffffc0c20621>] dt_txn_hook_stop+0x81/0xd0 [obdclass] [17750.640712] [<ffffffffc152429d>] osd_trans_stop+0x18d/0x920 [osd_ldiskfs] [17750.640718] [<ffffffffc151db00>] ? osd_trans_start+0x130/0x4e0 [osd_ldiskfs] [17750.640729] [<ffffffffc12c3e8e>] mdt_empty_transno+0x11e/0x850 [mdt] [17750.640738] [<ffffffffc12c90be>] mdt_mfd_open+0x8de/0xe70 [mdt] [17750.640747] [<ffffffffc12c9d33>] mdt_finish_open+0x6e3/0x880 [mdt] [17750.640756] [<ffffffffc12ca1c3>] mdt_open_by_fid+0x2f3/0x4b0 [mdt] [17750.640765] [<ffffffffc12cb738>] mdt_reint_open+0x928/0x2630 [mdt] [17750.640782] [<ffffffffc0c334b8>] ? upcall_cache_get_entry+0x218/0x8b0 [obdclass] [17750.640797] [<ffffffffc0c384fe>] ? lu_ucred+0x1e/0x30 [obdclass] [17750.640806] [<ffffffffc12a9db5>] ? mdt_ucred+0x15/0x20 [mdt] [17750.640814] [<ffffffffc12aa651>] ? mdt_root_squash+0x21/0x430 [mdt] [17750.640823] [<ffffffffc12bef43>] mdt_reint_rec+0x83/0x210 [mdt] [17750.640831] [<ffffffffc12984b3>] mdt_reint_internal+0x703/0xae0 [mdt] [17750.640840] [<ffffffffc12a4f26>] ? mdt_intent_fixup_resent+0x36/0x220 [mdt] [17750.640848] [<ffffffffc12a51a3>] mdt_intent_open+0x93/0x430 [mdt] [17750.640876] [<ffffffffc0ea9437>] ? lustre_msg_buf+0x17/0x60 [ptlrpc] [17750.640884] [<ffffffffc129b6ba>] mdt_intent_opc+0x1ba/0xb50 [mdt] [17750.640911] [<ffffffffc0ead900>] ? lustre_swab_ldlm_policy_data+0x30/0x30 [ptlrpc] [17750.640919] [<ffffffffc12a5110>] ? mdt_intent_fixup_resent+0x220/0x220 [mdt] [17750.640928] [<ffffffffc12a37f4>] mdt_intent_policy+0x1a4/0x360 [mdt] [17750.640949] [<ffffffffc0e5b52a>] ldlm_lock_enqueue+0x3da/0xae0 [ptlrpc] [17750.640955] [<ffffffffc095d6c3>] ? cfs_hash_bd_add_locked+0x63/0x80 [libcfs] [17750.640961] [<ffffffffc0960e4e>] ? cfs_hash_add+0xbe/0x1a0 [libcfs] [17750.640985] [<ffffffffc0e83d57>] ldlm_handle_enqueue0+0xa87/0x1660 [ptlrpc] [17750.641010] [<ffffffffc0ead980>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc] [17750.641039] [<ffffffffc0f0de52>] tgt_enqueue+0x62/0x210 [ptlrpc] [17750.641067] [<ffffffffc0f12c6a>] tgt_request_handle+0x96a/0x1680 [ptlrpc] [17750.641095] [<ffffffffc0eede71>] ? ptlrpc_nrs_req_get_nolock0+0xd1/0x170 [ptlrpc] [17750.641100] [<ffffffffc0951c0e>] ? ktime_get_real_seconds+0xe/0x10 [libcfs] [17750.641126] [<ffffffffc0eb87b6>] ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc] [17750.641150] [<ffffffffc0eb91fe>] ? ptlrpc_wait_event+0xae/0x380 [ptlrpc] [17750.641151] [<ffffffffb3ad6b22>] ? default_wake_function+0x12/0x20 [17750.641152] [<ffffffffb3acbdeb>] ? __wake_up_common+0x5b/0x90 [17750.641176] [<ffffffffc0ebc895>] ptlrpc_main+0xc75/0x1c50 [ptlrpc] [17750.641200] [<ffffffffc0ebbc20>] ? ptlrpc_register_service+0x1010/0x1010 [ptlrpc] [17750.641201] [<ffffffffb3ac1f81>] kthread+0xd1/0xe0 [17750.641202] [<ffffffffb3ac1eb0>] ? insert_kthread_work+0x40/0x40 [17750.641204] [<ffffffffb4177c1d>] ret_from_fork_nospec_begin+0x7/0x21 [17750.641204] [<ffffffffb3ac1eb0>] ? insert_kthread_work+0x40/0x40 It seems that osd_declare_write() incorrectly calculates the necessary journal credits for /reply_data (32-byte append write) when extents are moved from the inode body to an external block: 1. user quota (properly accounted) The last three blocks are not properly accounted, only two blocks are in fact calculated via:
depth = inode != NULL ? ext_depth(inode) : 0;
depth = max(depth, 1) + 1; // we only get 2 here
credits = depth;
Eventually, only 9 credits are reserved but 10 credits are used. A patch will be uploaded shortly. |
| Comments |
| Comment by Gerrit Updater [ 21/Oct/21 ] |
|
"Andrew Perepechko <andrew.perepechko@hpe.com>" uploaded a new patch: https://review.whamcloud.com/45328 |
| Comment by Gerrit Updater [ 20/Nov/21 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45328/ |
| Comment by Peter Jones [ 20/Nov/21 ] |
|
Landed for 2.15 |