Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-136

test e2fsprogs-1.42.wc1 against 32TB+ ldiskfs filesystems

Details

    • Task
    • Resolution: Fixed
    • Major
    • Lustre 2.1.0
    • Lustre 2.1.0, Lustre 1.8.6
    • None
    • 16,038
    • 4966

    Description

      In order for Lustre to use OSTs larger than 16TB, the e2fsprogs "master" branch needs to be tested against such large LUNs. The "master" branch has unreleased modifications that should allow mke2fs, e2fsck, and other tools to use LUNs over 16TB, but it has not been heavily tested at this point.

      Bruce, I believe we previously discussed a test plan for this work, using llverdev and llverfs. Please attach a document or comment here with details. The testing for 16TB LUNs is documented in https://bugzilla.lustre.org/show_bug.cgi?id=16038.

      After the local ldiskfs filesystem testing is complete, then obdfilter-survey and full Lustre client testing is needed.

      Attachments

        Activity

          [LU-136] test e2fsprogs-1.42.wc1 against 32TB+ ldiskfs filesystems
          yujian Jian Yu added a comment -

          Before TT-180 is fixed, please find the attached large-LUN-inodes.suite_log.ddn-sfa10000e-stack01.build273.log file for the test output of the inodes creation + e2fsck test on the following builds:

          Lustre build: http://newbuild.whamcloud.com/job/lustre-master/273/arch=x86_64,build_type=server,distro=el5,ib_stack=ofa/
          e2fsprogs build: http://newbuild.whamcloud.com/job/e2fsprogs-master/42/arch=x86_64,distro=el5/

          TT-180 was just fixed.
          Here is the Maloo report for the above test result: https://maloo.whamcloud.com/test_sets/83e2174e-ddfb-11e0-9909-52540025f9af

          yujian Jian Yu added a comment - Before TT-180 is fixed, please find the attached large-LUN-inodes.suite_log.ddn-sfa10000e-stack01.build273.log file for the test output of the inodes creation + e2fsck test on the following builds: Lustre build: http://newbuild.whamcloud.com/job/lustre-master/273/arch=x86_64,build_type=server,distro=el5,ib_stack=ofa/ e2fsprogs build: http://newbuild.whamcloud.com/job/e2fsprogs-master/42/arch=x86_64,distro=el5/ TT-180 was just fixed. Here is the Maloo report for the above test result: https://maloo.whamcloud.com/test_sets/83e2174e-ddfb-11e0-9909-52540025f9af
          yujian Jian Yu added a comment -

          Firstly, do you know why none of the large-LUN-inodes test results in Maloo include the test logs? That makes it hard to look at the results in the future if there is reason to do so. I wanted to see the e2fsck times for the many-inodes runs, but only have the one test result above to look at. Could you please file a separate TT- bug to fix whatever problem is preventing the logs for this test to be sent to Maloo.

          I've no idea about this issue. Syslog could be displayed, but not the suite log and test log. I just created TT-180 to ask John for help.

          Are the MDT and OST e2fsck runs in the same VM on the SFA10k, or is the MDT on a separate MDS node?

          The MDT and OST are in the same VM.

          Before TT-180 is fixed, please find the attached large-LUN-inodes.suite_log.ddn-sfa10000e-stack01.build273.log file for the test output of the inodes creation + e2fsck test on the following builds:

          Lustre build: http://newbuild.whamcloud.com/job/lustre-master/273/arch=x86_64,build_type=server,distro=el5,ib_stack=ofa/
          e2fsprogs build: http://newbuild.whamcloud.com/job/e2fsprogs-master/42/arch=x86_64,distro=el5/

          yujian Jian Yu added a comment - Firstly, do you know why none of the large-LUN-inodes test results in Maloo include the test logs? That makes it hard to look at the results in the future if there is reason to do so. I wanted to see the e2fsck times for the many-inodes runs, but only have the one test result above to look at. Could you please file a separate TT- bug to fix whatever problem is preventing the logs for this test to be sent to Maloo. I've no idea about this issue. Syslog could be displayed, but not the suite log and test log. I just created TT-180 to ask John for help. Are the MDT and OST e2fsck runs in the same VM on the SFA10k, or is the MDT on a separate MDS node? The MDT and OST are in the same VM. Before TT-180 is fixed, please find the attached large-LUN-inodes.suite_log.ddn-sfa10000e-stack01.build273.log file for the test output of the inodes creation + e2fsck test on the following builds: Lustre build: http://newbuild.whamcloud.com/job/lustre-master/273/arch=x86_64,build_type=server,distro=el5,ib_stack=ofa/ e2fsprogs build: http://newbuild.whamcloud.com/job/e2fsprogs-master/42/arch=x86_64,distro=el5/

          After running for about 120 hours, the inodes creation and e2fsck tests passed on 128TB Lustre filesystem.
          Please refer to the attached test output file: large-LUN-inodes.suite_log.ddn-sfa10000e-stack01.build263.log

          Yu Jian, I'm looking at the log file, and found some strange results.

          Firstly, do you know why none of the large-LUN-inodes test results in Maloo include the test logs? That makes it hard to look at the results in the future if there is reason to do so. I wanted to see the e2fsck times for the many-inodes runs, but only have the one test result above to look at. Could you please file a separate TT- bug to fix whatever problem is preventing the logs for this test to be sent to Maloo.

          Looking at the above log, it seems that the MDT (with 25 dirs of 5M files each) took only 7 minutes to run e2fsck, while the OST (with 32 dirs of 4M files each) took 3500 minutes (58 hours) to run. That doesn't make sense, and I wanted to compare this to the most recent large-LUN-inodes test result, which took 20h less time to run.

          Are the MDT and OST e2fsck runs in the same VM on the SFA10k, or is the MDT on a separate MDS node?

          adilger Andreas Dilger added a comment - After running for about 120 hours, the inodes creation and e2fsck tests passed on 128TB Lustre filesystem. Please refer to the attached test output file: large-LUN-inodes.suite_log.ddn-sfa10000e-stack01.build263.log Yu Jian, I'm looking at the log file, and found some strange results. Firstly, do you know why none of the large-LUN-inodes test results in Maloo include the test logs? That makes it hard to look at the results in the future if there is reason to do so. I wanted to see the e2fsck times for the many-inodes runs, but only have the one test result above to look at. Could you please file a separate TT- bug to fix whatever problem is preventing the logs for this test to be sent to Maloo. Looking at the above log, it seems that the MDT (with 25 dirs of 5M files each) took only 7 minutes to run e2fsck, while the OST (with 32 dirs of 4M files each) took 3500 minutes (58 hours) to run. That doesn't make sense, and I wanted to compare this to the most recent large-LUN-inodes test result, which took 20h less time to run. Are the MDT and OST e2fsck runs in the same VM on the SFA10k, or is the MDT on a separate MDS node?

          For the 1.41.90.wc4 e2fsprogs I've cherry-picked a couple of recent 64-bit fixes from upstream:

          commit bc526c65d2a4cf0c6c04e9ed4837d6dd7dbbf2b3
          Author: Theodore Ts'o <tytso@mit.edu>
          Date: Tue Jul 5 20:35:46 2011 -0400

          libext2fs: fix 64-bit support in ext2fs_bmap2()

          Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

          commit 24404aa340b274e077b2551fa7bdde5122d3eb43
          Author: Theodore Ts'o <tytso@mit.edu>
          Date: Tue Jul 5 20:02:27 2011 -0400

          libext2fs: fix 64-bit support in ext2fs_

          {read,write}

          _inode_full()

          This fixes a problem where reading or writing inodes located after the
          4GB boundary would fail.

          Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

          The first one is unlikely to affect most uses, but may hit in rare cases.
          The second one is only a problem on 32-bit machines, so is unlikely to affect Lustre users.

          I don't think there is anything left to do for this bug, so it can be closed.

          adilger Andreas Dilger added a comment - For the 1.41.90.wc4 e2fsprogs I've cherry-picked a couple of recent 64-bit fixes from upstream: commit bc526c65d2a4cf0c6c04e9ed4837d6dd7dbbf2b3 Author: Theodore Ts'o <tytso@mit.edu> Date: Tue Jul 5 20:35:46 2011 -0400 libext2fs: fix 64-bit support in ext2fs_bmap2() Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> commit 24404aa340b274e077b2551fa7bdde5122d3eb43 Author: Theodore Ts'o <tytso@mit.edu> Date: Tue Jul 5 20:02:27 2011 -0400 libext2fs: fix 64-bit support in ext2fs_ {read,write} _inode_full() This fixes a problem where reading or writing inodes located after the 4GB boundary would fail. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> The first one is unlikely to affect most uses, but may hit in rare cases. The second one is only a problem on 32-bit machines, so is unlikely to affect Lustre users. I don't think there is anything left to do for this bug, so it can be closed.
          yujian Jian Yu added a comment - - edited

          After the issue is resolved, I'll complete the e2fsck part.

          OK, now the issue is resolved. The testing is restarted on the following master build:

          Lustre build: http://newbuild.whamcloud.com/job/lustre-master/263/arch=x86_64,build_type=server,distro=el5,ib_stack=ofa/
          e2fsprogs build: http://newbuild.whamcloud.com/job/e2fsprogs-master/42/arch=x86_64,distro=el5/

          After running for about 120 hours, the inodes creation and e2fsck tests passed on 128TB Lustre filesystem.
          Please refer to the attached test output file: large-LUN-inodes.suite_log.ddn-sfa10000e-stack01.build263.log

          yujian Jian Yu added a comment - - edited After the issue is resolved, I'll complete the e2fsck part. OK, now the issue is resolved. The testing is restarted on the following master build: Lustre build: http://newbuild.whamcloud.com/job/lustre-master/263/arch=x86_64,build_type=server,distro=el5,ib_stack=ofa/ e2fsprogs build: http://newbuild.whamcloud.com/job/e2fsprogs-master/42/arch=x86_64,distro=el5/ After running for about 120 hours, the inodes creation and e2fsck tests passed on 128TB Lustre filesystem. Please refer to the attached test output file: large-LUN-inodes.suite_log.ddn-sfa10000e-stack01.build263.log
          yujian Jian Yu added a comment -

          Yu Jian, I looked through the inodes run, but I didn't see it running e2fsck on the large LUN? That should be added as part of the test script if it isn't there today. If the LUN with the 135M files still exists, can you please start an e2fsck on both the MDS and the OST.

          Sorry for the confusion, Andreas. The e2fsck part is in the test script. While running e2fsck on the OST after creating the 134M files, the following errors occurred on the virtual disks which were presented to the virtual machine:

          --------8<--------
          kernel: janusdrvr: WARNING: cpCompleteIoReq(): Req Context ID 0x0 completed with error status 0x7
          kernel: end_request: I/O error, dev sfa0066, sector 0
          kernel: Buffer I/O error on device sfa0066, logical block 0
          kernel: janusdrvr: WARNING: cpCompleteIoReq(): Req Context ID 0x1 completed with error status 0x7
          kernel: end_request: I/O error, dev sfa0066, sector 0
          kernel: Buffer I/O error on device sfa0066, logical block 0
          --------8<-------- 
          

          The same issue also occurred on other disks presented to other virtual machines. And then all of the disks became invisible. I've tried to reboot the virtual machine and re-load the disk driver, but it did not work. I think it's hardware issue, so I removed the incomplete e2fsck part from the test result and just uploaded the complete inodes creation part.

          After the issue is resolved, I'll complete the e2fsck part.

          yujian Jian Yu added a comment - Yu Jian, I looked through the inodes run, but I didn't see it running e2fsck on the large LUN? That should be added as part of the test script if it isn't there today. If the LUN with the 135M files still exists, can you please start an e2fsck on both the MDS and the OST. Sorry for the confusion, Andreas. The e2fsck part is in the test script. While running e2fsck on the OST after creating the 134M files, the following errors occurred on the virtual disks which were presented to the virtual machine: --------8<-------- kernel: janusdrvr: WARNING: cpCompleteIoReq(): Req Context ID 0x0 completed with error status 0x7 kernel: end_request: I/O error, dev sfa0066, sector 0 kernel: Buffer I/O error on device sfa0066, logical block 0 kernel: janusdrvr: WARNING: cpCompleteIoReq(): Req Context ID 0x1 completed with error status 0x7 kernel: end_request: I/O error, dev sfa0066, sector 0 kernel: Buffer I/O error on device sfa0066, logical block 0 --------8<-------- The same issue also occurred on other disks presented to other virtual machines. And then all of the disks became invisible. I've tried to reboot the virtual machine and re-load the disk driver, but it did not work. I think it's hardware issue, so I removed the incomplete e2fsck part from the test result and just uploaded the complete inodes creation part. After the issue is resolved, I'll complete the e2fsck part.

          Yu Jian, I looked through the inodes run, but I didn't see it running e2fsck on the large LUN? That should be added as part of the test script if it isn't there today. If the LUN with the 135M files still exists, can you please start an e2fsck on both the MDS and the OST.

          adilger Andreas Dilger added a comment - Yu Jian, I looked through the inodes run, but I didn't see it running e2fsck on the large LUN? That should be added as part of the test script if it isn't there today. If the LUN with the 135M files still exists, can you please start an e2fsck on both the MDS and the OST.
          yujian Jian Yu added a comment -

          After running for about 53 hours, the test passed at Thu Aug 11 04:41:09 PDT 2011:
          https://maloo.whamcloud.com/test_sets/af225374-c72b-11e0-a7e2-52540025f9af

          The test log was not showed up in the above Maloo report. Please find it in the attachment - large-LUN-inodes.suite_log.ddn-sfa10000e-stack01.log.

          yujian Jian Yu added a comment - After running for about 53 hours, the test passed at Thu Aug 11 04:41:09 PDT 2011: https://maloo.whamcloud.com/test_sets/af225374-c72b-11e0-a7e2-52540025f9af The test log was not showed up in the above Maloo report. Please find it in the attachment - large-LUN-inodes.suite_log.ddn-sfa10000e-stack01.log.
          yujian Jian Yu added a comment - - edited

          The "large-LUN-inodes" testing is going to be started on the latest master branch...

          The inode creation testing on 128TB Lustre filesystem against master branch on CentOS5.6/x86_64 (kernel version: 2.6.18-238.19.1.el5_lustre.gd4ea36c) was started at Mon Aug 8 22:51:49 PDT 2011. About 134M inodes would be created.

          The following builds were used:
          Lustre build: http://newbuild.whamcloud.com/job/lustre-master/246/arch=x86_64,build_type=server,distro=el5,ib_stack=ofa/
          e2fsprogs build: http://newbuild.whamcloud.com/job/e2fsprogs-master/42/arch=x86_64,distro=el5/

          After running for about 53 hours, the test passed at Thu Aug 11 04:41:09 PDT 2011:
          https://maloo.whamcloud.com/test_sets/af225374-c72b-11e0-a7e2-52540025f9af

          Here is a short summary of the test result after running mdsrate with "--create" option:

          # /opt/mpich/bin/mpirun  -np 25 -machinefile /tmp/mdsrate-create.machines /usr/lib64/lustre/tests/mdsrate --create --verbose --ndirs 25 --dirfmt '/mnt/lustre/mdsrate/dir%d' --nfiles 5360000 --filefmt 'file%%d'
          
          Rate: 694.17 eff 694.18 aggr 27.77 avg client creates/sec (total: 25 threads 134000000 creates 25 dirs 1 threads/dir 193035.50 secs)
          
          # lfs df -h /mnt/lustre
          UUID                       bytes        Used   Available Use% Mounted on
          largefs-MDT0000_UUID        1.5T       13.6G        1.4T   1% /mnt/lustre[MDT:0]
          largefs-OST0000_UUID      128.0T        3.6G      121.6T   0% /mnt/lustre[OST:0]
          
          filesystem summary:       128.0T        3.6G      121.6T   0% /mnt/lustre
          
          
          # lfs df -i /mnt/lustre
          UUID                      Inodes       IUsed       IFree IUse% Mounted on
          largefs-MDT0000_UUID  1073741824   134000062   939741762  12% /mnt/lustre[MDT:0]
          largefs-OST0000_UUID   134217728   134006837      210891 100% /mnt/lustre[OST:0]
          
          filesystem summary:   1073741824   134000062   939741762  12% /mnt/lustre
          
          yujian Jian Yu added a comment - - edited The "large-LUN-inodes" testing is going to be started on the latest master branch... The inode creation testing on 128TB Lustre filesystem against master branch on CentOS5.6/x86_64 (kernel version: 2.6.18-238.19.1.el5_lustre.gd4ea36c) was started at Mon Aug 8 22:51:49 PDT 2011 . About 134M inodes would be created. The following builds were used: Lustre build: http://newbuild.whamcloud.com/job/lustre-master/246/arch=x86_64,build_type=server,distro=el5,ib_stack=ofa/ e2fsprogs build: http://newbuild.whamcloud.com/job/e2fsprogs-master/42/arch=x86_64,distro=el5/ After running for about 53 hours, the test passed at Thu Aug 11 04:41:09 PDT 2011 : https://maloo.whamcloud.com/test_sets/af225374-c72b-11e0-a7e2-52540025f9af Here is a short summary of the test result after running mdsrate with "--create" option: # /opt/mpich/bin/mpirun -np 25 -machinefile /tmp/mdsrate-create.machines /usr/lib64/lustre/tests/mdsrate --create --verbose --ndirs 25 --dirfmt '/mnt/lustre/mdsrate/dir%d' --nfiles 5360000 --filefmt 'file%%d' Rate: 694.17 eff 694.18 aggr 27.77 avg client creates/sec (total: 25 threads 134000000 creates 25 dirs 1 threads/dir 193035.50 secs) # lfs df -h /mnt/lustre UUID bytes Used Available Use% Mounted on largefs-MDT0000_UUID 1.5T 13.6G 1.4T 1% /mnt/lustre[MDT:0] largefs-OST0000_UUID 128.0T 3.6G 121.6T 0% /mnt/lustre[OST:0] filesystem summary: 128.0T 3.6G 121.6T 0% /mnt/lustre # lfs df -i /mnt/lustre UUID Inodes IUsed IFree IUse% Mounted on largefs-MDT0000_UUID 1073741824 134000062 939741762 12% /mnt/lustre[MDT:0] largefs-OST0000_UUID 134217728 134006837 210891 100% /mnt/lustre[OST:0] filesystem summary: 1073741824 134000062 939741762 12% /mnt/lustre
          yujian Jian Yu added a comment -

          Now, the read operation is ongoing...

          Done.

          After running for about 21 days in total, the 128TB LUN full testing on CentOS5.6/x86_64 (kernel version: 2.6.18-238.12.1.el5_lustre.g5c1e9f9) passed on Lustre master build v2_0_65_0:
          https://maloo.whamcloud.com/test_sets/69c35618-bdd3-11e0-8bdf-52540025f9af

          The "large-LUN-inodes" testing is going to be started on the latest master branch...

          yujian Jian Yu added a comment - Now, the read operation is ongoing... Done. After running for about 21 days in total, the 128TB LUN full testing on CentOS5.6/x86_64 (kernel version: 2.6.18-238.12.1.el5_lustre.g5c1e9f9) passed on Lustre master build v2_0_65_0: https://maloo.whamcloud.com/test_sets/69c35618-bdd3-11e0-8bdf-52540025f9af The "large-LUN-inodes" testing is going to be started on the latest master branch...

          People

            yujian Jian Yu
            adilger Andreas Dilger
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: