[LU-17380] b2_15 sanity test suite running gets stuck at test 64e Created: 20/Dec/23 Updated: 27/Dec/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Xinliang Liu | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
OpenEuler 22.03 LTS SP2 |
||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
When running sanity test suite on openEuler 22.03 with update kernel 153.35.0.112.oe2203sp2, lustre version is latest b2_15 2.15.4 rc1, it gets stuck: [openeuler@oe2203-test-02 ~]$ sudo RUNAS_ID="1000" ~/lustre/lustre-release/lustre/tests/auster -rvsk sanity --only 50-64 ... == sanity test 64b: check out-of-space detection on client ========================================================== 09:53:13 (1702979593) STRIPECOUNT=2 ORIGFREE=551632 MAXFREE=800000 BEFORE dd started lustre-OST0000 avl=273288 grnt=260362 diff=12926 limit=11131 lustre-OST0001 avl=278344 grnt=276135 diff=2209 limit=11131 FULL lt-lfs setstripe: unable to open '/mnt/lustre/oosfile': Read-only file system (30) Trace dump: = /home/openeuler/lustre/lustre-release/lustre/tests/oos.sh:37:main() oos: FAIL: test-framework exiting on error sanity test_64b: @@@@@@ FAIL: oos.sh failed: 30 Trace dump: = /home/openeuler/lustre/lustre-release/lustre/tests/test-framework.sh:6427:error() = /home/openeuler/lustre/lustre-release/lustre/tests/sanity.sh:8816:test_64b() = /home/openeuler/lustre/lustre-release/lustre/tests/test-framework.sh:6744:run_one() = /home/openeuler/lustre/lustre-release/lustre/tests/test-framework.sh:6791:run_one_logged() = /home/openeuler/lustre/lustre-release/lustre/tests/test-framework.sh:6617:run_test() = /home/openeuler/lustre/lustre-release/lustre/tests/sanity.sh:8818:main() Dumping lctl log to /tmp/test_logs/2023-12-19/094338/sanity.test_64b.*.1702979595.log Dumping logs only on local client. FAIL 64b (3s)== sanity test 64c: verify grant shrink ================== 09:53:16 (1702979596) osc.lustre-OST0000-osc-ffff3c548183a800.cur_grant_bytes=0 checking grant......UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 125056 8608 105212 8% /mnt/lustre[MDT:0] R lustre-OST0000_UUID 313104 12656 273288 5% /mnt/lustre[OST:0] lustre-OST0001_UUID 313104 7600 278344 3% /mnt/lustre[OST:1] filesystem_summary: 626208 20256 551632 4% /mnt/lustrepass grant check: client:286814208 server:286814208 PASS 64c (0s)== sanity test 64d: check grant limit exceed ============= 09:53:16 (1702979596) lt-lfs setstripe: unable to open '/mnt/lustre/f64d.sanity': Read-only file system (30) dd: failed to open '/mnt/lustre/f64d.sanity': Read-only file system /home/openeuler/lustre/lustre-release/lustre/tests/sanity.sh: line 8898: kill: (213825) - No such process checking grant......UUID 1K-blocks Used Available Use% Mounted on lustre-MDT0000_UUID 125056 8608 105212 8% /mnt/lustre[MDT:0] R lustre-OST0000_UUID 313104 12656 273288 5% /mnt/lustre[OST:0] lustre-OST0001_UUID 313104 7600 278344 3% /mnt/lustre[OST:1] filesystem_summary: 626208 20256 551632 4% /mnt/lustrepass grant check: client:286814208 server:286814208 Waiting for MDT destroys to complete PASS 64d (4s)== sanity test 64e: check grant consumption (no grant allocation) ========================================================== 09:53:20 (1702979600) debug=+cache Stopping client oe2203-test-02 /mnt/lustre (opts:) Starting client: oe2203-test-02: -o user_xattr,flock oe2203-test-02@tcp:/lustre /mnt/lustre **Stucks here ** |
| Comments |
| Comment by Xinliang Liu [ 20/Dec/23 ] |
|
Narrow down that it is relating to recently backported ext4 patch "ext4: add EA_INODE checking to ext4_iget()". The checking to ext4_iget() is:
+static const char *check_igot_inode(struct inode *inode, ext4_iget_flags flags) + +{ + if (flags & EXT4_IGET_EA_INODE) { + if (!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL)) + return "missing EA_INODE flag"; + } else { + if ((EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL)) + return "unexpected EA_INODE flag"; + } + if (is_bad_inode(inode) && !(flags & EXT4_IGET_BAD)) + return "unexpected bad inode w/o EXT4_IGET_BAD"; + return NULL; +} And from the sanity running kernel log and osd_ldiskfs_iget() code part below, we can see that lustre seems doesn't allow to get EA inode with this ext4 patch "ext4: add EA_INODE checking to ext4_iget()" [113677.017540] LDISKFS-fs error (device dm-0): osd_iget:503: inode #289: comm OI_scrub: unexpected EA_INODE flag [113677.020146] LDISKFS-fs error (device dm-0): osd_iget:503: inode #289: comm lfsck: unexpected EA_INODE flag [113677.021426] Aborting journal on device dm-0-8. [113677.023049] LDISKFS-fs (dm-0): Remounting filesystem read-only [113677.174850] LustreError: 136912:0:(osd_handler.c:1790:osd_trans_commit_cb()) transaction @0x00000000ce666d28 commit error: 2 [113677.196990] LustreError: 209429:0:(scrub.c:243:scrub_file_store()) lustre-MDT0000: store scrub file: rc = -30 lustre/osd-ldiskfs/osd_internal.h
975 #ifdef HAVE_LDISKFS_IGET_WITH_FLAGS
976 # define osd_ldiskfs_iget(sb, ino) \
977 ldiskfs_iget((sb), (ino), \
978 LDISKFS_IGET_HANDLE | LDISKFS_IGET_SPECIAL)
979 #else
980 # define osd_ldiskfs_iget(sb, ino) ldiskfs_iget((sb), (ino))
981 #endif
This is the root cause, osd_iget() fails and make lustre remounting read-only which break the later tests running. To be noticed that the patch "ext4: add EA_INODE checking to ext4_iget()" is introduced in kernel 6.4. Kernel with this patch should have this issue, such as kernel v6.4+, v6.1.33, v5.15.116, openEuler 22.03 kernel 153.35.0.112.oe2203sp2, etc.
|
| Comment by Xinliang Liu [ 20/Dec/23 ] |
|
Maybe we should explicitly set the iget_flags(if possible) when calling function ldiskfs_iget() like ext4 does now , thus makes check_igot_inode() work. linux$ grep "= ext4_iget(" -rn fs/ext4/ fs/ext4/ioctl.c:383: inode_bl = ext4_iget(sb, EXT4_BOOT_LOADER_INO, fs/ext4/resize.c:1720: inode = ext4_iget(sb, EXT4_RESIZE_INO, EXT4_IGET_SPECIAL); fs/ext4/resize.c:2061: resize_inode = ext4_iget(sb, EXT4_RESIZE_INO, fs/ext4/orphan.c:586: inode = ext4_iget(sb, orphan_ino, EXT4_IGET_SPECIAL); fs/ext4/block_validity.c:160: inode = ext4_iget(sb, ino, EXT4_IGET_SPECIAL); fs/ext4/super.c:1550: inode = ext4_iget(sb, ino, EXT4_IGET_HANDLE); fs/ext4/super.c:5482: root = ext4_iget(sb, EXT4_ROOT_INO, EXT4_IGET_SPECIAL); fs/ext4/super.c:5788: journal_inode = ext4_iget(sb, journal_inum, EXT4_IGET_SPECIAL); fs/ext4/super.c:7063: qf_inode = ext4_iget(sb, qf_inums[type], EXT4_IGET_SPECIAL); fs/ext4/namei.c:1855: inode = ext4_iget(dir->i_sb, ino, EXT4_IGET_NORMAL); fs/ext4/ialloc.c:1394: inode = ext4_iget(sb, ino, EXT4_IGET_NORMAL); fs/ext4/fast_commit.c:1379: inode = ext4_iget(sb, darg.ino, EXT4_IGET_NORMAL); fs/ext4/fast_commit.c:1386: old_parent = ext4_iget(sb, darg.parent_ino, fs/ext4/fast_commit.c:1412: dir = ext4_iget(sb, darg->parent_ino, EXT4_IGET_NORMAL); fs/ext4/fast_commit.c:1473: inode = ext4_iget(sb, darg.ino, EXT4_IGET_NORMAL); fs/ext4/fast_commit.c:1534: inode = ext4_iget(sb, ino, EXT4_IGET_NORMAL); fs/ext4/fast_commit.c:1584: inode = ext4_iget(sb, ino, EXT4_IGET_NORMAL); fs/ext4/fast_commit.c:1638: inode = ext4_iget(sb, darg.ino, EXT4_IGET_NORMAL); fs/ext4/fast_commit.c:1651: dir = ext4_iget(sb, darg.parent_ino, EXT4_IGET_NORMAL); fs/ext4/fast_commit.c:1738: inode = ext4_iget(sb, le32_to_cpu(fc_add_ex.fc_ino), EXT4_IGET_NORMAL); fs/ext4/fast_commit.c:1855: inode = ext4_iget(sb, le32_to_cpu(lrange.fc_ino), EXT4_IGET_NORMAL); fs/ext4/fast_commit.c:1911: inode = ext4_iget(sb, state->fc_modified_inodes[i], fs/ext4/xattr.c:440: inode = ext4_iget(parent->i_sb, ea_ino, EXT4_IGET_EA_INODE); fs/ext4/xattr.c:1542: ea_inode = ext4_iget(inode->i_sb, ce->e_value, |
| Comment by Xinliang Liu [ 27/Dec/23 ] |
|
Or revert check_igot_inode() in ldiskfs series patchset, if we can't know which kind inode when getting it in code osd_scrub.c, as EA inode is really a kind of internal inode of ext4. |