[LU-4677] e2fsprogs compile error on ppc64: posix-types.h:48: error: conflicting types for 'umode_t' Created: 27/Feb/14  Updated: 22/Nov/18  Resolved: 22/Nov/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Jian Yu Assignee: Jian Yu
Resolution: Fixed Votes: 0
Labels: ppc
Environment:

Lustre Build: http://build.whamcloud.com/job/lustre-master/1909/
Distro/Arch: RHEL6.5/ppc64


Attachments: Text File build_e2fsprogs.ppc64.log     Text File build_e2fsprogs_rpm.ppc64.log    
Issue Links:
Related
is related to LU-5326 conflicting types for ‘umode_t’ on po... Closed
Severity: 3
Rank (Obsolete): 12838

 Description   

While compiling e2fsprogs against Lustre master branch on RHEL6.5/ppc64, I hit the following errors:

In file included from /root/lustre-release/build/lustre-2.5.56//lustre/include/lustre/lustre_user.h:51,
                 from /root/lustre-release/build/lustre-2.5.56//lustre/include/lustre/lustreapi.h:46,
                 from ../lib/ext2fs/lfsck.h:10,
                 from lfsck_common.c:29:
/root/lustre-release/build/lustre-2.5.56//libcfs/include/libcfs/posix/posix-types.h:48: error: conflicting types for 'umode_t'
/usr/include/asm/types.h:31: note: previous declaration of 'umode_t' was here
        CC util.c
In file included from /root/lustre-release/build/lustre-2.5.56//lustre/include/lustre/lustre_user.h:51,
                 from /root/lustre-release/build/lustre-2.5.56//lustre/include/lustre/lustreapi.h:46,
                 from ../lib/ext2fs/lfsck.h:10,
                 from lfsck.c:80:
/root/lustre-release/build/lustre-2.5.56//libcfs/include/libcfs/posix/posix-types.h:48: error: conflicting types for 'umode_t'
/usr/include/asm/types.h:31: note: previous declaration of 'umode_t' was here
make[3]: *** [lfsck_common.o] Error 1

The build log "build_e2fsprogs.ppc64.log" is attached.



 Comments   
Comment by Jian Yu [ 28/Feb/14 ]

Here is the patch for e2fsprogs master-lustre branch to check if type “umode_t” is declared in asm/types.h: http://review.whamcloud.com/9437

Comment by Jian Yu [ 28/Feb/14 ]

With patch #9437, e2fsprogs passed building on ppc64 node.

However, it failed at "Running e2fsprogs test suite..." stage:

Running e2fsprogs test suite...

/bin/cp ./mke2fs.conf.in mke2fs.conf
Creating test_one script...
Creating test_script...
cmp: d_loaddump.ver.tmp: No such file or directory
d_loaddump: debugfs load/dump test: failed
— d_loaddump/expect 2014-02-27 09:06:42.000000000 +0000
+++ d_loaddump.log 2014-02-28 14:45:47.251311888 +0000
@@ -7,12 +7,26 @@
e2fsck -yf -N test_filesys
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
+Directory inode 2, block #0, offset 24: directory corrupted
+Salvage? yes
+
Pass 3: Checking directory connectivity
+Unconnected directory inode 11 (/???)
+Connect to /lost+found? yes
+
+/lost+found not found. Create? yes
+
+Pass 3A: Optimizing directories
Pass 4: Checking reference counts
+Inode 11 ref count is 3, should be 2. Fix? yes
+
Pass 5: Checking group summary information
-test_filesys: 12/64 files (0.0% non-contiguous), 158/512 blocks
-Exit status is 0
+
+test_filesys: ***** FILE SYSTEM WAS MODIFIED *****
+test_filesys: 13/64 files (0.0% non-contiguous), 148/512 blocks
+Exit status is 1
debugfs -R ''dump test_data d_loaddump.ver.tmp'' test.img
+test_data: EXT2 directory corrupted
Exit status is 0
cmp d_loaddump.tmp d_loaddump.ver.tmp
-Exit status is 0
+Exit status is 2
-----------8<-----------

118 tests succeeded 37 tests failed
Tests failed: d_loaddump d_special_files f_badroot f_badsymlinks f_badtable f_dir_optimize f_dirdata f_dirdata_optimize f_dup4 f_dup_de f_dup_de2 f_dup_resize f_dupdot f_eofblocks f_expand f_ext_journal f_h_badnode f_h_badroot f_h_reindex f_illitable f_jchksum_remount f_lpf f_lpffile f_noroot f_pgsize_gt_blksize f_random_corruption f_recnect_bad f_reconnect f_rehash_dir f_uninit_bad_free_inodes f_uninit_blk_used_not_set f_uninit_set_inode_not_set f_zero_group f_zero_super m_no_opt s_basic_scan t_quota_1on
make[2]: *** [test_post] Error 1

The log of "build_e2fsprogs_rpm.ppc64.log" is attached.

Comment by Jinshan Xiong (Inactive) [ 03/Mar/14 ]

Hi Andreas & Alex,

Do you have any idea for the error mentioned by Yujian?

Jinshan

Comment by Andreas Dilger [ 03/Mar/14 ]

It is possible that this error is caused by the dirdata patch. I would suggest to use "git bisect" to try and isolate it quickly. Please ping me in Skype if you haven't used this before and want some help.

Comment by Jinshan Xiong (Inactive) [ 03/Mar/14 ]

I can work with Yujian on this - do you have a rough idea about what was the latest working version?

Comment by Andreas Dilger [ 04/Mar/14 ]

Note that we have probably never compiled the Lustre-patches e2fsprogs code on PPC. When I was suggesting to bisect the code, I was thinking about the patches that we apply on top of the base e2fsprogs. It is my guess that the base e2fsprogs (before the "Lustre version" patch will not have this problem.

Comment by Jian Yu [ 04/Mar/14 ]

It is my guess that the base e2fsprogs (before the "Lustre version" patch will not have this problem.

Yes, Andreas, I verified this.

Here are the commits on e2fsprogs master-lustre branch: http://git.whamcloud.com/?p=tools/e2fsprogs.git;a=shortlog;h=refs/heads/master-lustre
Since commit f6632828e4845041bce12f79d8f3aae095534bc6 (e2fsck: handle preallocation for large PAGE_SIZE), "make check" started failing on ppc64 node. And before that commit, "make check" passed.

Here are the details about which test failures on ppc64 node are introduced by which commits:

Failed tests Introduced by
f_eofblocks e2fsck: handle preallocation for large PAGE_SIZE
f_pgsize_gt_blksize tests: PAGE_SIZE larger than blocksize with hole
f_badsymlinks f_dup_de2 f_dup_de f_ext_journal f_h_badnode f_h_badroot f_h_reindex f_jchksum_remount f_rehash_dir f_uninit_bad_free_inodes f_uninit_blk_used_not_set f_uninit_set_inode_not_set f_zero_group f_zero_super e2fsck: add support for dir_data feature
f_dirdata tests: add basic tests for dir_data feature
f_dirdata_optimize f_dir_optimize LU-1774 tests: e2fsck -D does not change dirdata
f_random_corruption debugfs: dump "fid" and "lma" xattrs on inode stat
s_basic_scan tests: add basic test case for e2scan
d_loaddump d_special_files f_badroot f_badtable f_dup4 f_dupdot f_dup_resize f_expand f_illitable f_lpf f_lpffile f_noroot f_recnect_bad f_reconnect m_no_opt t_quota_1on LU-2462 e2fsprogs Consider DIRENT_LUFID flag in link_proc().

Could you and Jinshan please give me some instructions about how to make Lustre e2fsprogs codes pass building on ppc64 node without affecting the testing for LU-4665? Should I look into the above test failures and resolve them? Thanks a lot in advance.

Comment by Jinshan Xiong (Inactive) [ 04/Mar/14 ]

I;ve added Fanyong to the list because he can provide expertise in this topic.

Comment by Jinshan Xiong (Inactive) [ 05/Mar/14 ]

Fanyong will look into this.

Comment by nasf (Inactive) [ 05/Mar/14 ]

It should be related with bytes order in the structure of "ext2_dirent_entry" and "ext2_dirent_entry_2" on the PPC64 machine.

struct ext4_dir_entry {
        __le32  inode;                  /* Inode number */
        __le16  rec_len;                /* Directory entry length */
        __le16  name_len;               /* Name length */
        char    name[EXT4_NAME_LEN];    /* File name */
};

/*
 * The new version of the directory entry.  Since EXT4 structures are
 * stored in intel byte order, and the name_len field could never be
 * bigger than 255 chars, it's safe to reclaim the extra byte for the
 * file_type field.
 */
struct ext4_dir_entry_2 {
        __le32  inode;                  /* Inode number */
        __le16  rec_len;                /* Directory entry length */
        __u8    name_len;               /* Name length */
        __u8    file_type;
        char    name[EXT4_NAME_LEN];    /* File name */
};

In fact, the ext4_dir_entry_2::name_len and ext4_dir_entry_2::file_type should be exchanged on PPC64 as following:

struct ext2_dir_entry_2 {
        __le32  inode;                  /* Inode number */
        __le16  rec_len;                /* Directory entry length */
#ifndef WORDS_BIGENDIAN
        __u8    name_len;               /* Name length */
        __u8    file_type;
#else
        __u8    file_type;
        __u8    name_len;               /* Name length */
#endif
        char    name[EXT2_NAME_LEN];    /* File name */
};

Then we can access the ext2_dir_entry::name_len via ext2_dir_entry_2::name_len.

In fact, because the CPU bytes-order for PPC64 and the on-disk bytes-order are different, any value (not only ext2_dir_dirent, but also others, such as inode, super_block, and so on) which is larger than 1-byte and loaded from disk (write or disk) should be converted via lexx_to_cpu (or cpu_to_lexx). Unfortunately, we missed to do that at a lot of points in current implantation.

On the other hand, in current implementation, we assume the ext2_dir_entry_2::name_len is lower than ext2_dir_entry_2::file_type, so there are some logic like the following:

int     filetype = dirent->name_len >> 8;
...
dirent->name_len = ((dirent->name_len & 0xFF) | (dirdata | should_be) << 8);
...

To make the e2fsck workable on PPC64 correctly, all related codes should be well checked and re-considere. In theory, it is not complex, but it involves a lot of trivial things and I am not sure whether there will be other data structures to be handled. We must check carefully one by one. So it is not a simple work.

Comment by Jian Yu [ 06/Mar/14 ]

By using the following temporary fix suggested by nasf, the number of failed tests was reduced from 37 to 3 (f_eofblocks, f_pgsize_gt_blksize, f_dirdata):

diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
index a99ab37..95233ba 100644
--- a/lib/ext2fs/ext2_fs.h
+++ b/lib/ext2fs/ext2_fs.h
@@ -785,11 +785,16 @@ struct ext2_dir_entry {
  * file_type field.
  */
 struct ext2_dir_entry_2 {
-       __u32   inode;                  /* Inode number */
-       __u16   rec_len;                /* Directory entry length */
-       __u8    name_len;               /* Name length */
-       __u8    file_type;
-       char    name[EXT2_NAME_LEN];    /* File name */
+        __u32   inode;                  /* Inode number */
+        __u16   rec_len;                /* Directory entry length */
+#ifndef WORDS_BIGENDIAN
+        __u8    name_len;               /* Name length */
+        __u8    file_type;
+#else
+        __u8    file_type;
+        __u8    name_len;               /* Name length */
+#endif
+        char    name[EXT2_NAME_LEN];    /* File name */
 };

After temporarily disabling the 3 failed tests, I successfully built e2fsprogs on the ppc64 node and got the rpm packages.
I'll see what errors I'll hit while testing Lustre server codes on the ppc64 node. In the meantime, nasf will continue digging into the failures and find out more proper way to resolve those failures. Thanks, nasf!

Comment by Andreas Dilger [ 08/Mar/14 ]

Looking at the e2fsprogs code, it seems that there is almost no code using ext2_dir_entry_2, it all uses ext2_dir_entry and then masks name_len and shifts down file_type. Unfortunately, in the e2fsprogs code, the in-memory representation of the directory entry is often already swabbed, so the change to ext2_dir_entry_2 would break for entries that are already swabbed.

I think it would be better to update the Lustre patches to use ext2_dir_entry and use the mask-and-shift method used by the rest of the code. Since I'm already updating the e2fsprogs patches to the latest maint release, I will do this part. I'm not sure what is wrong with the PAGE_SIZE patches or others since I don't know why they are failing.

Comment by nasf (Inactive) [ 08/Mar/14 ]

About the patch for PAGE_SIZE, I think the tests should be improved. Here are the files in the test image for f_eofblocks:

# ls -ail /mnt/mds1
total 75K
   2 drwxr-xr-x    3 root root 1.0K Apr 11  2012 ./
8199 drwxr-xr-x. 115 root root 4.0K Dec 31 01:17 ../
  12 -rw-r--r--    1 root root 1.0K Apr 11  2012 a1
  13 -rw-r--r--    1 root root 1.0K Apr 11  2012 a2
  14 -rw-r--r--    1 root root 1.0K Apr 11  2012 b1
  15 -rw-r--r--    1 root root 1.0K Apr 11  2012 b2
  16 -rw-r--r--    1 root root 2.0K Apr 11  2012 c1
  17 -rw-r--r--    1 root root 2.0K Apr 11  2012 c2
  18 -rw-r--r--    1 root root 2.0K Apr 11  2012 d1
  19 -rw-r--r--    1 root root 2.0K Apr 11  2012 d2
  20 -rw-r--r--    1 root root 2.0K Apr 11  2012 e1
  21 -rw-r--r--    1 root root 2.0K Apr 11  2012 e2
  22 -rw-r--r--    1 root root 1.0K Apr 11  2012 f1
  23 -rw-r--r--    1 root root 1.0K Apr 11  2012 f2
  24 -rw-r--r--    1 root root 4.0K Apr 11  2012 g1
  25 -rw-r--r--    1 root root 4.0K Apr 11  2012 g2
  26 -rw-r--r--    1 root root 4.0K Apr 11  2012 h1
  27 -rw-r--r--    1 root root 4.0K Apr 11  2012 h2
  28 -rw-r--r--    1 root root    0 Apr 11  2012 i1
  29 -rw-r--r--    1 root root    0 Apr 11  2012 i2
  30 -rw-r--r--    1 root root 2.0K Apr 11  2012 j1
  31 -rw-r--r--    1 root root 2.0K Apr 11  2012 j2
  11 drwx------    2 root root  12K Apr 11  2012 lost+found/

root@RHEL6:/tmp/
# stat /mnt/mds1/j1
  File: `/mnt/mds1/j1'
  Size: 2048      	Blocks: 8          IO Block: 1024   regular file
Device: 700h/1792d	Inode: 30          Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2012-04-11 11:26:25.000000000 +0800
Modify: 2012-04-11 11:29:47.000000000 +0800
Change: 2012-04-11 11:29:47.000000000 +0800

root@RHEL6:/tmp/
# stat /mnt/mds1/j2
  File: `/mnt/mds1/j2'
  Size: 2048      	Blocks: 12         IO Block: 1024   regular file
Device: 700h/1792d	Inode: 31          Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2012-04-11 11:26:30.000000000 +0800
Modify: 2012-04-11 11:29:50.000000000 +0800
Change: 2012-04-11 11:29:50.000000000 +0800

For the server with 4KB PAGE_SIZE, such as x86_64, the "j1" case is acceptable, but for other large PAGE_SIZE cases, such as 64KB on PPC64, "Blocks: 8" should be for "truncate" case, so the size for "j1" should be 512 * 8 = 4096, but not the showed 2048.

To make the tests to be workable for difference arch, the file "j1" should be removed from the test image.

Comment by Andreas Dilger [ 22/Nov/18 ]

This is fixed in the latest e2fsprogs release.

Generated at Sat Feb 10 01:44:53 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.