[LU-7581] Large EA: "ldiskfs_xattr_inode_iget: Backpointer from EA inode 2300579989 to parent invalid" Created: 18/Dec/15 Updated: 24/Aug/16 Resolved: 05/Jan/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Kalpak Shah (Inactive) | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | ldiskfs | ||
| Issue Links: |
|
||||||||||||
| Severity: | 1 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
MDT crashed with the below stack trace and the issue is related to large EA. [411763.959595] LDISKFS-fs error (device md66): ldiskfs_xattr_inode_iget: [411763.959667] LDISKFS-fs error (device md66): ldiskfs_xattr_inode_iget: Backpointer from EA inode 2300579986 to parent invalid. [411763.959670] Aborting journal on device md66-8. [411763.959758] LustreError: 243183:0:(osd_handler.c:914:osd_trans_commit_cb()) transaction @0xffff8806f77781c0 commit error: 2 [411763.986296] Kernel panic - not syncing: LDISKFS-fs (device md66): panic forced after error [411763.986297] [411763.986299] Pid: 179095, comm: mdt01_092 Not tainted 2.6.32-431.17.1.x2.0.61.x86_64 #1 [411763.986300] Call Trace: [411763.986311] [<ffffffff81524e6e>] ? panic+0xa7/0x16f [411763.986324] [<ffffffffa10a98a8>] ? ldiskfs_commit_super+0x188/0x210 [ldiskfs] [411763.986331] [<ffffffffa10a9eb4>] ? ldiskfs_handle_error+0xc4/0xd0 [ldiskfs] [411763.986338] [<ffffffffa10aa252>] ? __ldiskfs_error+0x82/0x90 [ldiskfs] [411763.986344] [<ffffffffa10b194f>] ? ldiskfs_xattr_inode_iget+0xbf/0x190 [ldiskfs] [411763.986350] [<ffffffffa10b1c66>] ? ldiskfs_xattr_inode_get+0x26/0xf0 [ldiskfs] [411763.986356] [<ffffffffa10b15ce>] ? ldiskfs_xattr_find_entry+0x3e/0x120 [ldiskfs] [411763.986362] [<ffffffffa10b26bc>] ? ldiskfs_xattr_get+0x18c/0x330 [ldiskfs] [411763.986368] [<ffffffffa10b492b>] ? ldiskfs_xattr_trusted_get+0x2b/0x30 [ldiskfs] [411763.986372] [<ffffffff811ae467>] ? generic_getxattr+0x87/0x90 [411763.986384] [<ffffffffa1102a60>] ? osd_xattr_get+0x230/0x2e0 [osd_ldiskfs] [411763.986392] [<ffffffffa11a657d>] ? lod_xattr_get+0x19d/0x4c0 [lod] [411763.986403] [<ffffffffa0f5bce1>] ? mdd_xattr_get+0x111/0x400 [mdd] [411763.986418] [<ffffffffa0faddd0>] ? mdt_big_xattr_get+0x110/0xa70 [mdt] [411763.986424] [<ffffffffa11a3708>] ? lod_object_read_unlock+0x38/0xd0 [lod] [411763.986430] [<ffffffffa0f52b38>] ? mdd_read_unlock+0x38/0xd0 [mdd] [411763.986435] [<ffffffffa0f5bcef>] ? mdd_xattr_get+0x11f/0x400 [mdd] [411763.986441] [<ffffffffa0f56b15>] ? mdd_la_get+0xb5/0x240 [mdd] [411763.986451] [<ffffffffa0fae7fc>] ? mdt_attr_get_lov+0xcc/0x1e0 [mdt] [411763.986460] [<ffffffffa0faeed6>] ? mdt_attr_get_complex+0x5c6/0xbb0 [mdt] [411763.986469] [<ffffffffa0fafc1f>] ? mdt_getattr_internal+0x23f/0x13f0 [mdt] [411763.986478] [<ffffffffa0fb3072>] ? mdt_getattr_name_lock+0xbf2/0x1aa0 [mdt] [411763.986519] [<ffffffffa0a4a294>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc] [411763.986528] [<ffffffffa0fb4442>] ? mdt_intent_getattr+0x292/0x470 [mdt] [411763.986536] [<ffffffffa0fa3594>] ? mdt_intent_policy+0x494/0xce0 [mdt] [411763.986557] [<ffffffffa09fc739>] ? ldlm_lock_enqueue+0x129/0x9d0 [ptlrpc] [411763.986580] [<ffffffffa0a28a2b>] ? ldlm_handle_enqueue0+0x51b/0x13b0 [ptlrpc] [411763.986592] [<ffffffffa067e63e>] ? cfs_timer_arm+0xe/0x10 [libcfs] [411763.986625] [<ffffffffa0aaaac1>] ? tgt_enqueue+0x61/0x230 [ptlrpc] [411763.986653] [<ffffffffa0aaa1ee>] ? tgt_request_handle+0x6fe/0xaf0 [ptlrpc] [411763.986678] [<ffffffffa0a5a57a>] ? ptlrpc_main+0xd7a/0x1880 [ptlrpc] [411763.986681] [<ffffffff8152557e>] ? thread_return+0x4e/0x760 [411763.986706] [<ffffffffa0a59800>] ? ptlrpc_main+0x0/0x1880 [ptlrpc] [411763.986709] [<ffffffff8109ac66>] ? kthread+0x96/0xa0 [411763.986712] [<ffffffff8100c20a>] ? child_rip+0xa/0x20 [411763.986714] [<ffffffff8109abd0>] ? kthread+0x0/0xa0 [411763.986715] [<ffffffff8100c200>] ? child_rip+0x0/0x20 Zam investigated the issue and the details are below: #define EXT4_INODE_GET_XTIME(xtime, inode, raw_inode) \
do { \
(inode)->xtime.tv_sec = (signed)le32_to_cpu((raw_inode)->xtime); \
if (EXT4_FITS_IN_INODE(raw_inode, EXT4_I(inode), xtime ## _extra)) \
ext4_decode_extra_time(&(inode)->xtime, \
raw_inode->xtime ## _extra); \
} while (0)
in result, the highest bit of the inode number which we know is 1 (ino > 2G ) gets extended over high 32 bits of long int tv_sec field. if (ea_inode->i_xattr_inode_parent != parent->i_ino ||
ea_inode->i_generation != parent->i_generation) {
ext4_error(parent->i_sb, "Backpointer from EA inode %d "
"to parent invalid.", ea_ino);
*err = -EINVAL;
goto error;
}
Zam has been able to reproduce the issue and also has a patch fix for it which he will post shortly. |
| Comments |
| Comment by Andreas Dilger [ 18/Dec/15 ] |
|
Looks like the fix would be to cast (__u32)ea_inode->i_xattr_inode_parent so the comparison is valid? |
| Comment by Gerrit Updater [ 18/Dec/15 ] |
|
Alexander Zarochentsev (alexander.zarochentsev@seagate.com) uploaded a new patch: http://review.whamcloud.com/17675 |
| Comment by Alexander Zarochentsev [ 18/Dec/15 ] |
| Comment by Alexander Zarochentsev [ 18/Dec/15 ] |
|
adilger ,
yes, that was fist fix and it worked. adding type conversions in place looked not right for me and I made macros to set/get of backpointers. |
| Comment by Gerrit Updater [ 04/Jan/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17675/ |
| Comment by Andreas Dilger [ 05/Jan/16 ] |
|
Patch was landed to master for 2.8.0. |
| Comment by Gerrit Updater [ 08/Jan/16 ] |
|
Jian Yu (jian.yu@intel.com) uploaded a new patch: http://review.whamcloud.com/17888 |
| Comment by Gerrit Updater [ 12/Feb/16 ] |
|
Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/18436 |
| Comment by Gerrit Updater [ 25/Feb/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18436/ |