Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-136

test e2fsprogs-1.42.wc1 against 32TB+ ldiskfs filesystems

Details

    • Task
    • Resolution: Fixed
    • Major
    • Lustre 2.1.0
    • Lustre 2.1.0, Lustre 1.8.6
    • None
    • 16,038
    • 4966

    Description

      In order for Lustre to use OSTs larger than 16TB, the e2fsprogs "master" branch needs to be tested against such large LUNs. The "master" branch has unreleased modifications that should allow mke2fs, e2fsck, and other tools to use LUNs over 16TB, but it has not been heavily tested at this point.

      Bruce, I believe we previously discussed a test plan for this work, using llverdev and llverfs. Please attach a document or comment here with details. The testing for 16TB LUNs is documented in https://bugzilla.lustre.org/show_bug.cgi?id=16038.

      After the local ldiskfs filesystem testing is complete, then obdfilter-survey and full Lustre client testing is needed.

      Attachments

        Activity

          [LU-136] test e2fsprogs-1.42.wc1 against 32TB+ ldiskfs filesystems
          yujian Jian Yu added a comment -

          Firstly, do you know why none of the large-LUN-inodes test results in Maloo include the test logs? That makes it hard to look at the results in the future if there is reason to do so. I wanted to see the e2fsck times for the many-inodes runs, but only have the one test result above to look at. Could you please file a separate TT- bug to fix whatever problem is preventing the logs for this test to be sent to Maloo.

          I've no idea about this issue. Syslog could be displayed, but not the suite log and test log. I just created TT-180 to ask John for help.

          Are the MDT and OST e2fsck runs in the same VM on the SFA10k, or is the MDT on a separate MDS node?

          The MDT and OST are in the same VM.

          Before TT-180 is fixed, please find the attached large-LUN-inodes.suite_log.ddn-sfa10000e-stack01.build273.log file for the test output of the inodes creation + e2fsck test on the following builds:

          Lustre build: http://newbuild.whamcloud.com/job/lustre-master/273/arch=x86_64,build_type=server,distro=el5,ib_stack=ofa/
          e2fsprogs build: http://newbuild.whamcloud.com/job/e2fsprogs-master/42/arch=x86_64,distro=el5/

          yujian Jian Yu added a comment - Firstly, do you know why none of the large-LUN-inodes test results in Maloo include the test logs? That makes it hard to look at the results in the future if there is reason to do so. I wanted to see the e2fsck times for the many-inodes runs, but only have the one test result above to look at. Could you please file a separate TT- bug to fix whatever problem is preventing the logs for this test to be sent to Maloo. I've no idea about this issue. Syslog could be displayed, but not the suite log and test log. I just created TT-180 to ask John for help. Are the MDT and OST e2fsck runs in the same VM on the SFA10k, or is the MDT on a separate MDS node? The MDT and OST are in the same VM. Before TT-180 is fixed, please find the attached large-LUN-inodes.suite_log.ddn-sfa10000e-stack01.build273.log file for the test output of the inodes creation + e2fsck test on the following builds: Lustre build: http://newbuild.whamcloud.com/job/lustre-master/273/arch=x86_64,build_type=server,distro=el5,ib_stack=ofa/ e2fsprogs build: http://newbuild.whamcloud.com/job/e2fsprogs-master/42/arch=x86_64,distro=el5/

          After running for about 120 hours, the inodes creation and e2fsck tests passed on 128TB Lustre filesystem.
          Please refer to the attached test output file: large-LUN-inodes.suite_log.ddn-sfa10000e-stack01.build263.log

          Yu Jian, I'm looking at the log file, and found some strange results.

          Firstly, do you know why none of the large-LUN-inodes test results in Maloo include the test logs? That makes it hard to look at the results in the future if there is reason to do so. I wanted to see the e2fsck times for the many-inodes runs, but only have the one test result above to look at. Could you please file a separate TT- bug to fix whatever problem is preventing the logs for this test to be sent to Maloo.

          Looking at the above log, it seems that the MDT (with 25 dirs of 5M files each) took only 7 minutes to run e2fsck, while the OST (with 32 dirs of 4M files each) took 3500 minutes (58 hours) to run. That doesn't make sense, and I wanted to compare this to the most recent large-LUN-inodes test result, which took 20h less time to run.

          Are the MDT and OST e2fsck runs in the same VM on the SFA10k, or is the MDT on a separate MDS node?

          adilger Andreas Dilger added a comment - After running for about 120 hours, the inodes creation and e2fsck tests passed on 128TB Lustre filesystem. Please refer to the attached test output file: large-LUN-inodes.suite_log.ddn-sfa10000e-stack01.build263.log Yu Jian, I'm looking at the log file, and found some strange results. Firstly, do you know why none of the large-LUN-inodes test results in Maloo include the test logs? That makes it hard to look at the results in the future if there is reason to do so. I wanted to see the e2fsck times for the many-inodes runs, but only have the one test result above to look at. Could you please file a separate TT- bug to fix whatever problem is preventing the logs for this test to be sent to Maloo. Looking at the above log, it seems that the MDT (with 25 dirs of 5M files each) took only 7 minutes to run e2fsck, while the OST (with 32 dirs of 4M files each) took 3500 minutes (58 hours) to run. That doesn't make sense, and I wanted to compare this to the most recent large-LUN-inodes test result, which took 20h less time to run. Are the MDT and OST e2fsck runs in the same VM on the SFA10k, or is the MDT on a separate MDS node?

          For the 1.41.90.wc4 e2fsprogs I've cherry-picked a couple of recent 64-bit fixes from upstream:

          commit bc526c65d2a4cf0c6c04e9ed4837d6dd7dbbf2b3
          Author: Theodore Ts'o <tytso@mit.edu>
          Date: Tue Jul 5 20:35:46 2011 -0400

          libext2fs: fix 64-bit support in ext2fs_bmap2()

          Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

          commit 24404aa340b274e077b2551fa7bdde5122d3eb43
          Author: Theodore Ts'o <tytso@mit.edu>
          Date: Tue Jul 5 20:02:27 2011 -0400

          libext2fs: fix 64-bit support in ext2fs_

          {read,write}

          _inode_full()

          This fixes a problem where reading or writing inodes located after the
          4GB boundary would fail.

          Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

          The first one is unlikely to affect most uses, but may hit in rare cases.
          The second one is only a problem on 32-bit machines, so is unlikely to affect Lustre users.

          I don't think there is anything left to do for this bug, so it can be closed.

          adilger Andreas Dilger added a comment - For the 1.41.90.wc4 e2fsprogs I've cherry-picked a couple of recent 64-bit fixes from upstream: commit bc526c65d2a4cf0c6c04e9ed4837d6dd7dbbf2b3 Author: Theodore Ts'o <tytso@mit.edu> Date: Tue Jul 5 20:35:46 2011 -0400 libext2fs: fix 64-bit support in ext2fs_bmap2() Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> commit 24404aa340b274e077b2551fa7bdde5122d3eb43 Author: Theodore Ts'o <tytso@mit.edu> Date: Tue Jul 5 20:02:27 2011 -0400 libext2fs: fix 64-bit support in ext2fs_ {read,write} _inode_full() This fixes a problem where reading or writing inodes located after the 4GB boundary would fail. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> The first one is unlikely to affect most uses, but may hit in rare cases. The second one is only a problem on 32-bit machines, so is unlikely to affect Lustre users. I don't think there is anything left to do for this bug, so it can be closed.
          yujian Jian Yu added a comment - - edited

          After the issue is resolved, I'll complete the e2fsck part.

          OK, now the issue is resolved. The testing is restarted on the following master build:

          Lustre build: http://newbuild.whamcloud.com/job/lustre-master/263/arch=x86_64,build_type=server,distro=el5,ib_stack=ofa/
          e2fsprogs build: http://newbuild.whamcloud.com/job/e2fsprogs-master/42/arch=x86_64,distro=el5/

          After running for about 120 hours, the inodes creation and e2fsck tests passed on 128TB Lustre filesystem.
          Please refer to the attached test output file: large-LUN-inodes.suite_log.ddn-sfa10000e-stack01.build263.log

          yujian Jian Yu added a comment - - edited After the issue is resolved, I'll complete the e2fsck part. OK, now the issue is resolved. The testing is restarted on the following master build: Lustre build: http://newbuild.whamcloud.com/job/lustre-master/263/arch=x86_64,build_type=server,distro=el5,ib_stack=ofa/ e2fsprogs build: http://newbuild.whamcloud.com/job/e2fsprogs-master/42/arch=x86_64,distro=el5/ After running for about 120 hours, the inodes creation and e2fsck tests passed on 128TB Lustre filesystem. Please refer to the attached test output file: large-LUN-inodes.suite_log.ddn-sfa10000e-stack01.build263.log
          yujian Jian Yu added a comment -

          Yu Jian, I looked through the inodes run, but I didn't see it running e2fsck on the large LUN? That should be added as part of the test script if it isn't there today. If the LUN with the 135M files still exists, can you please start an e2fsck on both the MDS and the OST.

          Sorry for the confusion, Andreas. The e2fsck part is in the test script. While running e2fsck on the OST after creating the 134M files, the following errors occurred on the virtual disks which were presented to the virtual machine:

          --------8<--------
          kernel: janusdrvr: WARNING: cpCompleteIoReq(): Req Context ID 0x0 completed with error status 0x7
          kernel: end_request: I/O error, dev sfa0066, sector 0
          kernel: Buffer I/O error on device sfa0066, logical block 0
          kernel: janusdrvr: WARNING: cpCompleteIoReq(): Req Context ID 0x1 completed with error status 0x7
          kernel: end_request: I/O error, dev sfa0066, sector 0
          kernel: Buffer I/O error on device sfa0066, logical block 0
          --------8<-------- 
          

          The same issue also occurred on other disks presented to other virtual machines. And then all of the disks became invisible. I've tried to reboot the virtual machine and re-load the disk driver, but it did not work. I think it's hardware issue, so I removed the incomplete e2fsck part from the test result and just uploaded the complete inodes creation part.

          After the issue is resolved, I'll complete the e2fsck part.

          yujian Jian Yu added a comment - Yu Jian, I looked through the inodes run, but I didn't see it running e2fsck on the large LUN? That should be added as part of the test script if it isn't there today. If the LUN with the 135M files still exists, can you please start an e2fsck on both the MDS and the OST. Sorry for the confusion, Andreas. The e2fsck part is in the test script. While running e2fsck on the OST after creating the 134M files, the following errors occurred on the virtual disks which were presented to the virtual machine: --------8<-------- kernel: janusdrvr: WARNING: cpCompleteIoReq(): Req Context ID 0x0 completed with error status 0x7 kernel: end_request: I/O error, dev sfa0066, sector 0 kernel: Buffer I/O error on device sfa0066, logical block 0 kernel: janusdrvr: WARNING: cpCompleteIoReq(): Req Context ID 0x1 completed with error status 0x7 kernel: end_request: I/O error, dev sfa0066, sector 0 kernel: Buffer I/O error on device sfa0066, logical block 0 --------8<-------- The same issue also occurred on other disks presented to other virtual machines. And then all of the disks became invisible. I've tried to reboot the virtual machine and re-load the disk driver, but it did not work. I think it's hardware issue, so I removed the incomplete e2fsck part from the test result and just uploaded the complete inodes creation part. After the issue is resolved, I'll complete the e2fsck part.

          Yu Jian, I looked through the inodes run, but I didn't see it running e2fsck on the large LUN? That should be added as part of the test script if it isn't there today. If the LUN with the 135M files still exists, can you please start an e2fsck on both the MDS and the OST.

          adilger Andreas Dilger added a comment - Yu Jian, I looked through the inodes run, but I didn't see it running e2fsck on the large LUN? That should be added as part of the test script if it isn't there today. If the LUN with the 135M files still exists, can you please start an e2fsck on both the MDS and the OST.
          yujian Jian Yu added a comment -

          After running for about 53 hours, the test passed at Thu Aug 11 04:41:09 PDT 2011:
          https://maloo.whamcloud.com/test_sets/af225374-c72b-11e0-a7e2-52540025f9af

          The test log was not showed up in the above Maloo report. Please find it in the attachment - large-LUN-inodes.suite_log.ddn-sfa10000e-stack01.log.

          yujian Jian Yu added a comment - After running for about 53 hours, the test passed at Thu Aug 11 04:41:09 PDT 2011: https://maloo.whamcloud.com/test_sets/af225374-c72b-11e0-a7e2-52540025f9af The test log was not showed up in the above Maloo report. Please find it in the attachment - large-LUN-inodes.suite_log.ddn-sfa10000e-stack01.log.
          yujian Jian Yu added a comment - - edited

          The "large-LUN-inodes" testing is going to be started on the latest master branch...

          The inode creation testing on 128TB Lustre filesystem against master branch on CentOS5.6/x86_64 (kernel version: 2.6.18-238.19.1.el5_lustre.gd4ea36c) was started at Mon Aug 8 22:51:49 PDT 2011. About 134M inodes would be created.

          The following builds were used:
          Lustre build: http://newbuild.whamcloud.com/job/lustre-master/246/arch=x86_64,build_type=server,distro=el5,ib_stack=ofa/
          e2fsprogs build: http://newbuild.whamcloud.com/job/e2fsprogs-master/42/arch=x86_64,distro=el5/

          After running for about 53 hours, the test passed at Thu Aug 11 04:41:09 PDT 2011:
          https://maloo.whamcloud.com/test_sets/af225374-c72b-11e0-a7e2-52540025f9af

          Here is a short summary of the test result after running mdsrate with "--create" option:

          # /opt/mpich/bin/mpirun  -np 25 -machinefile /tmp/mdsrate-create.machines /usr/lib64/lustre/tests/mdsrate --create --verbose --ndirs 25 --dirfmt '/mnt/lustre/mdsrate/dir%d' --nfiles 5360000 --filefmt 'file%%d'
          
          Rate: 694.17 eff 694.18 aggr 27.77 avg client creates/sec (total: 25 threads 134000000 creates 25 dirs 1 threads/dir 193035.50 secs)
          
          # lfs df -h /mnt/lustre
          UUID                       bytes        Used   Available Use% Mounted on
          largefs-MDT0000_UUID        1.5T       13.6G        1.4T   1% /mnt/lustre[MDT:0]
          largefs-OST0000_UUID      128.0T        3.6G      121.6T   0% /mnt/lustre[OST:0]
          
          filesystem summary:       128.0T        3.6G      121.6T   0% /mnt/lustre
          
          
          # lfs df -i /mnt/lustre
          UUID                      Inodes       IUsed       IFree IUse% Mounted on
          largefs-MDT0000_UUID  1073741824   134000062   939741762  12% /mnt/lustre[MDT:0]
          largefs-OST0000_UUID   134217728   134006837      210891 100% /mnt/lustre[OST:0]
          
          filesystem summary:   1073741824   134000062   939741762  12% /mnt/lustre
          
          yujian Jian Yu added a comment - - edited The "large-LUN-inodes" testing is going to be started on the latest master branch... The inode creation testing on 128TB Lustre filesystem against master branch on CentOS5.6/x86_64 (kernel version: 2.6.18-238.19.1.el5_lustre.gd4ea36c) was started at Mon Aug 8 22:51:49 PDT 2011 . About 134M inodes would be created. The following builds were used: Lustre build: http://newbuild.whamcloud.com/job/lustre-master/246/arch=x86_64,build_type=server,distro=el5,ib_stack=ofa/ e2fsprogs build: http://newbuild.whamcloud.com/job/e2fsprogs-master/42/arch=x86_64,distro=el5/ After running for about 53 hours, the test passed at Thu Aug 11 04:41:09 PDT 2011 : https://maloo.whamcloud.com/test_sets/af225374-c72b-11e0-a7e2-52540025f9af Here is a short summary of the test result after running mdsrate with "--create" option: # /opt/mpich/bin/mpirun -np 25 -machinefile /tmp/mdsrate-create.machines /usr/lib64/lustre/tests/mdsrate --create --verbose --ndirs 25 --dirfmt '/mnt/lustre/mdsrate/dir%d' --nfiles 5360000 --filefmt 'file%%d' Rate: 694.17 eff 694.18 aggr 27.77 avg client creates/sec (total: 25 threads 134000000 creates 25 dirs 1 threads/dir 193035.50 secs) # lfs df -h /mnt/lustre UUID bytes Used Available Use% Mounted on largefs-MDT0000_UUID 1.5T 13.6G 1.4T 1% /mnt/lustre[MDT:0] largefs-OST0000_UUID 128.0T 3.6G 121.6T 0% /mnt/lustre[OST:0] filesystem summary: 128.0T 3.6G 121.6T 0% /mnt/lustre # lfs df -i /mnt/lustre UUID Inodes IUsed IFree IUse% Mounted on largefs-MDT0000_UUID 1073741824 134000062 939741762 12% /mnt/lustre[MDT:0] largefs-OST0000_UUID 134217728 134006837 210891 100% /mnt/lustre[OST:0] filesystem summary: 1073741824 134000062 939741762 12% /mnt/lustre
          yujian Jian Yu added a comment -

          Now, the read operation is ongoing...

          Done.

          After running for about 21 days in total, the 128TB LUN full testing on CentOS5.6/x86_64 (kernel version: 2.6.18-238.12.1.el5_lustre.g5c1e9f9) passed on Lustre master build v2_0_65_0:
          https://maloo.whamcloud.com/test_sets/69c35618-bdd3-11e0-8bdf-52540025f9af

          The "large-LUN-inodes" testing is going to be started on the latest master branch...

          yujian Jian Yu added a comment - Now, the read operation is ongoing... Done. After running for about 21 days in total, the 128TB LUN full testing on CentOS5.6/x86_64 (kernel version: 2.6.18-238.12.1.el5_lustre.g5c1e9f9) passed on Lustre master build v2_0_65_0: https://maloo.whamcloud.com/test_sets/69c35618-bdd3-11e0-8bdf-52540025f9af The "large-LUN-inodes" testing is going to be started on the latest master branch...
          yujian Jian Yu added a comment -

          After running for about 12385 minutes (206 hours, 8 days), the 128TB Lustre filesystem was successfully filled up by llverfs:

          # lfs df -h /mnt/lustre
          UUID                       bytes        Used   Available Use% Mounted on
          largefs-MDT0000_UUID        1.5T      499.3M        1.4T   0% /mnt/lustre[MDT:0]
          largefs-OST0000_UUID      128.0T      121.4T      120.0G 100% /mnt/lustre[OST:0]
          
          filesystem summary:       128.0T      121.4T      120.0G 100% /mnt/lustre
          
          # lfs df -i /mnt/lustre
          UUID                      Inodes       IUsed       IFree IUse% Mounted on
          largefs-MDT0000_UUID  1073741824       32099  1073709725   0% /mnt/lustre[MDT:0]
          largefs-OST0000_UUID   134217728       31191   134186537   0% /mnt/lustre[OST:0]
          
          filesystem summary:   1073741824       32099  1073709725   0% /mnt/lustre
          

          Now, the read operation is ongoing...

          yujian Jian Yu added a comment - After running for about 12385 minutes (206 hours, 8 days), the 128TB Lustre filesystem was successfully filled up by llverfs: # lfs df -h /mnt/lustre UUID bytes Used Available Use% Mounted on largefs-MDT0000_UUID 1.5T 499.3M 1.4T 0% /mnt/lustre[MDT:0] largefs-OST0000_UUID 128.0T 121.4T 120.0G 100% /mnt/lustre[OST:0] filesystem summary: 128.0T 121.4T 120.0G 100% /mnt/lustre # lfs df -i /mnt/lustre UUID Inodes IUsed IFree IUse% Mounted on largefs-MDT0000_UUID 1073741824 32099 1073709725 0% /mnt/lustre[MDT:0] largefs-OST0000_UUID 134217728 31191 134186537 0% /mnt/lustre[OST:0] filesystem summary: 1073741824 32099 1073709725 0% /mnt/lustre Now, the read operation is ongoing...
          yujian Jian Yu added a comment -

          After http://review.whamcloud.com/1071 and http://review.whamcloud.com/1073 were merged into the master branch, I proceeded with the 128TB LUN full testing on CentOS5.6/x86_64 (kernel version: 2.6.18-238.12.1.el5_lustre.g5c1e9f9). The testing was started at Sun Jul 10 23:56:02 PDT 2011.

          The following builds were used:
          Lustre build: http://newbuild.whamcloud.com/job/lustre-master/199/arch=x86_64,build_type=server,distro=el5,ib_stack=ofa/
          e2fsprogs build: http://newbuild.whamcloud.com/job/e2fsprogs-master/42/arch=x86_64,distro=el5/

          There were no extra mkfs.lustre options specified when formatting the 128TB OST.

          ===================== format the OST /dev/large_vg/ost_lv =====================
          # time mkfs.lustre --reformat --fsname=largefs --ost --mgsnode=192.168.77.1@o2ib /dev/large_vg/ost_lv
          
             Permanent disk data:
          Target:     largefs-OSTffff
          Index:      unassigned
          Lustre FS:  largefs
          Mount type: ldiskfs
          Flags:      0x72
                        (OST needs_index first_time update )
          Persistent mount opts: errors=remount-ro,extents,mballoc
          Parameters: mgsnode=192.168.77.1@o2ib
          
          device size = 134217728MB
          formatting backing filesystem ldiskfs on /dev/large_vg/ost_lv
                  target name  largefs-OSTffff
                  4k blocks     34359738368
                  options        -J size=400 -I 256 -i 1048576 -q -O extents,uninit_bg,dir_nlink,huge_file,64bit,flex_bg -G 256 -E lazy_journal_init, -F
          mkfs_cmd = mke2fs -j -b 4096 -L largefs-OSTffff  -J size=400 -I 256 -i 1048576 -q -O extents,uninit_bg,dir_nlink,huge_file,64bit,flex_bg -G 256 -E lazy_journal_init, -F /dev/large_vg/ost_lv 34359738368
          Writing CONFIGS/mountdata
          
          real    0m44.489s
          user    0m6.669s
          sys     0m31.087s
          
          yujian Jian Yu added a comment - After http://review.whamcloud.com/1071 and http://review.whamcloud.com/1073 were merged into the master branch, I proceeded with the 128TB LUN full testing on CentOS5.6/x86_64 (kernel version: 2.6.18-238.12.1.el5_lustre.g5c1e9f9). The testing was started at Sun Jul 10 23:56:02 PDT 2011 . The following builds were used: Lustre build: http://newbuild.whamcloud.com/job/lustre-master/199/arch=x86_64,build_type=server,distro=el5,ib_stack=ofa/ e2fsprogs build: http://newbuild.whamcloud.com/job/e2fsprogs-master/42/arch=x86_64,distro=el5/ There were no extra mkfs.lustre options specified when formatting the 128TB OST. ===================== format the OST /dev/large_vg/ost_lv ===================== # time mkfs.lustre --reformat --fsname=largefs --ost --mgsnode=192.168.77.1@o2ib /dev/large_vg/ost_lv Permanent disk data: Target: largefs-OSTffff Index: unassigned Lustre FS: largefs Mount type: ldiskfs Flags: 0x72 (OST needs_index first_time update ) Persistent mount opts: errors=remount-ro,extents,mballoc Parameters: mgsnode=192.168.77.1@o2ib device size = 134217728MB formatting backing filesystem ldiskfs on /dev/large_vg/ost_lv target name largefs-OSTffff 4k blocks 34359738368 options -J size=400 -I 256 -i 1048576 -q -O extents,uninit_bg,dir_nlink,huge_file,64bit,flex_bg -G 256 -E lazy_journal_init, -F mkfs_cmd = mke2fs -j -b 4096 -L largefs-OSTffff -J size=400 -I 256 -i 1048576 -q -O extents,uninit_bg,dir_nlink,huge_file,64bit,flex_bg -G 256 -E lazy_journal_init, -F /dev/large_vg/ost_lv 34359738368 Writing CONFIGS/mountdata real 0m44.489s user 0m6.669s sys 0m31.087s

          People

            yujian Jian Yu
            adilger Andreas Dilger
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: