[LU-7325] wrong integer type used for inode number in LargeEA patch Created: 21/Oct/15  Updated: 01/Jul/16  Resolved: 30/Oct/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0, Lustre 2.8.0, Lustre 2.5.4
Fix Version/s: Lustre 2.8.0

Type: Bug Priority: Blocker
Reporter: Alexander Zarochentsev Assignee: Andreas Dilger
Resolution: Fixed Votes: 0
Labels: None
Environment:

MDT with inode count > 2G


Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Hitting "LDISKFS-fs error (device md66): ldiskfs_xattr_inode_iget: error while reading EA inode -2147483347" on large MDT volumes with large_xattr feature enabled.

Recreation:

1. MDS should have more than 2G inodes
2. mdt fs should be created with large_xattr flag.
3. set inode_goal to get higher inode number allocated.
# echo 2147483947 > /sys/fs/ldiskfs/md66/inode_goal
3. create a file
4. start adding hard links to that file

eventually LdiskFS fails with:

[16767.564754] LDISKFS-fs (md66): mounted filesystem with ordered data mode. quota=on. Opts: 
[16773.660132] Lustre: 25226:0:(client.c:1934:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1445001285/real 1445001285]  req@ffff8810232b3080 x1515193580388392/
t0(0) o8->pinkfs-OST0001-osc-MDT0000@172.18.56.133@o2ib:28/4 lens 400/544 e 0 to 1 dl 1445001290 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[17278.157672] LDISKFS-fs error (device md66): ldiskfs_xattr_inode_iget: error while reading EA inode -2147483446
[17278.169053] Aborting journal on device md66-8.
[17278.187860] LDISKFS-fs (md66): Remounting filesystem read-only
[17278.203840] LustreError: 28195:0:(osd_io.c:1690:osd_ldiskfs_write_record()) journal_get_write_access() returned error -30
[17278.216264] LustreError: 28195:0:(osd_handler.c:1105:osd_trans_stop()) Failure in transaction hook: -30
[17278.226941] LustreError: 28195:0:(osd_handler.c:1114:osd_trans_stop()) Failure to stop transaction: -30
[17278.227001] LustreError: 28024:0:(osd_handler.c:914:osd_trans_commit_cb()) transaction @0xffff8807d0b97d80 commit error: 2
[17280.394367] LDISKFS-fs warning (device md66): kmmpd: kmmpd being stopped since filesystem has been remounted as readonly.


 Comments   
Comment by Alexander Zarochentsev [ 21/Oct/15 ]

The culprit was identified as a wrong integer type used in ext4-large-eas.patch :

+struct inode *ext4_xattr_inode_iget(struct inode *parent, int ea_ino, int *err)
+{
+       struct inode *ea_inode = NULL;
+
+       ea_inode = ext4_iget(parent->i_sb, ea_ino);

ext4_iget() is called with signed integer while ext4_iget() accepts unsigned long inode number.
Type conversion takes place and the sign bit of singed int gets extended to high 32 bits of unsigned long integer.

we have a patch but it needs some time to port to all Lustre 2.8 platforms.

Comment by Andreas Dilger [ 21/Oct/15 ]

Looks like this bug exists in several places: ext4_xattr_inode_get(), ext4_xattr_inode_iget(), ext4_xattr_inode_unlink(), ext4_xattr_inode_set(), and ext4_xattr_set_entry() (where it declares ea_ino before calling ext4_xattr_inode_set()).

It isn't clear whether the right type is __u32 or unsigned long. All the on-disk ext4 structures use __u32 for inode numbers, but the VFS API uses unsigned long and ext4 might conceivably use a 64-bit value in the future. I would accept either one. There is also the xia_inodes[] array using "unsigned int" instead of __u32 or "unsigned long". That itself isn't a problem today, but it may as well be made consistent with the rest of the code, as I don't think the memory usage is a concern (at most the list of large xattr inodes for a single inode, typically only the difference between a 32-byte and a 64-byte allocation).

We've made this a 2.8.0 blocker, so if it speeds things up you could just edit the ext4-large-eas.patch files in place to use __u32 or unsigned long for ea_ino (which is thankfully consistently named throughout the patch), and push the patch to Gerrit for build testing, rather than trying to round up all the different kernels to regenerate the patches. It appears that these functions are not referenced in other patches (even as context) so there is very little chance of this causing a compile failure, and you would find this out in an hour rather than spending a lot of time trying to track down and build each kernel.

Comment by Gerrit Updater [ 22/Oct/15 ]

Alexander Zarochentsev (alexander_zarochentsev@xyratex.com) uploaded a new patch: http://review.whamcloud.com/16913
Subject: LU-7325 ldiskfs: use correct types for inode num
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 3b0057136e7883d507845cef2a5c12732175fb78

Comment by Alexander Zarochentsev [ 22/Oct/15 ]

patch http://review.whamcloud.com/16913

Comment by Gerrit Updater [ 30/Oct/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16913/
Subject: LU-7325 ldiskfs: use correct types for inode num
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d630e9f288e96d50791c4d0fd4414337c673d9ea

Comment by Joseph Gmitter (Inactive) [ 30/Oct/15 ]

Landed for 2.8

Generated at Sat Feb 10 02:07:55 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.