Details
-
Task
-
Resolution: Fixed
-
Major
-
Lustre 2.1.0, Lustre 1.8.6
-
None
-
16,038
-
4966
Description
In order for Lustre to use OSTs larger than 16TB, the e2fsprogs "master" branch needs to be tested against such large LUNs. The "master" branch has unreleased modifications that should allow mke2fs, e2fsck, and other tools to use LUNs over 16TB, but it has not been heavily tested at this point.
Bruce, I believe we previously discussed a test plan for this work, using llverdev and llverfs. Please attach a document or comment here with details. The testing for 16TB LUNs is documented in https://bugzilla.lustre.org/show_bug.cgi?id=16038.
After the local ldiskfs filesystem testing is complete, then obdfilter-survey and full Lustre client testing is needed.
Attachments
Activity
Firstly, do you know why none of the large-LUN-inodes test results in Maloo include the test logs? That makes it hard to look at the results in the future if there is reason to do so. I wanted to see the e2fsck times for the many-inodes runs, but only have the one test result above to look at. Could you please file a separate TT- bug to fix whatever problem is preventing the logs for this test to be sent to Maloo.
I've no idea about this issue. Syslog could be displayed, but not the suite log and test log. I just created TT-180 to ask John for help.
Are the MDT and OST e2fsck runs in the same VM on the SFA10k, or is the MDT on a separate MDS node?
The MDT and OST are in the same VM.
Before TT-180 is fixed, please find the attached large-LUN-inodes.suite_log.ddn-sfa10000e-stack01.build273.log file for the test output of the inodes creation + e2fsck test on the following builds:
Lustre build: http://newbuild.whamcloud.com/job/lustre-master/273/arch=x86_64,build_type=server,distro=el5,ib_stack=ofa/
e2fsprogs build: http://newbuild.whamcloud.com/job/e2fsprogs-master/42/arch=x86_64,distro=el5/
After running for about 120 hours, the inodes creation and e2fsck tests passed on 128TB Lustre filesystem.
Please refer to the attached test output file: large-LUN-inodes.suite_log.ddn-sfa10000e-stack01.build263.log
Yu Jian, I'm looking at the log file, and found some strange results.
Firstly, do you know why none of the large-LUN-inodes test results in Maloo include the test logs? That makes it hard to look at the results in the future if there is reason to do so. I wanted to see the e2fsck times for the many-inodes runs, but only have the one test result above to look at. Could you please file a separate TT- bug to fix whatever problem is preventing the logs for this test to be sent to Maloo.
Looking at the above log, it seems that the MDT (with 25 dirs of 5M files each) took only 7 minutes to run e2fsck, while the OST (with 32 dirs of 4M files each) took 3500 minutes (58 hours) to run. That doesn't make sense, and I wanted to compare this to the most recent large-LUN-inodes test result, which took 20h less time to run.
Are the MDT and OST e2fsck runs in the same VM on the SFA10k, or is the MDT on a separate MDS node?
For the 1.41.90.wc4 e2fsprogs I've cherry-picked a couple of recent 64-bit fixes from upstream:
commit bc526c65d2a4cf0c6c04e9ed4837d6dd7dbbf2b3
Author: Theodore Ts'o <tytso@mit.edu>
Date: Tue Jul 5 20:35:46 2011 -0400
libext2fs: fix 64-bit support in ext2fs_bmap2()
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
commit 24404aa340b274e077b2551fa7bdde5122d3eb43
Author: Theodore Ts'o <tytso@mit.edu>
Date: Tue Jul 5 20:02:27 2011 -0400
libext2fs: fix 64-bit support in ext2fs_
{read,write}_inode_full()
This fixes a problem where reading or writing inodes located after the
4GB boundary would fail.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The first one is unlikely to affect most uses, but may hit in rare cases.
The second one is only a problem on 32-bit machines, so is unlikely to affect Lustre users.
I don't think there is anything left to do for this bug, so it can be closed.
After the issue is resolved, I'll complete the e2fsck part.
OK, now the issue is resolved. The testing is restarted on the following master build:
Lustre build: http://newbuild.whamcloud.com/job/lustre-master/263/arch=x86_64,build_type=server,distro=el5,ib_stack=ofa/
e2fsprogs build: http://newbuild.whamcloud.com/job/e2fsprogs-master/42/arch=x86_64,distro=el5/
After running for about 120 hours, the inodes creation and e2fsck tests passed on 128TB Lustre filesystem.
Please refer to the attached test output file: large-LUN-inodes.suite_log.ddn-sfa10000e-stack01.build263.log
Yu Jian, I looked through the inodes run, but I didn't see it running e2fsck on the large LUN? That should be added as part of the test script if it isn't there today. If the LUN with the 135M files still exists, can you please start an e2fsck on both the MDS and the OST.
Sorry for the confusion, Andreas. The e2fsck part is in the test script. While running e2fsck on the OST after creating the 134M files, the following errors occurred on the virtual disks which were presented to the virtual machine:
--------8<-------- kernel: janusdrvr: WARNING: cpCompleteIoReq(): Req Context ID 0x0 completed with error status 0x7 kernel: end_request: I/O error, dev sfa0066, sector 0 kernel: Buffer I/O error on device sfa0066, logical block 0 kernel: janusdrvr: WARNING: cpCompleteIoReq(): Req Context ID 0x1 completed with error status 0x7 kernel: end_request: I/O error, dev sfa0066, sector 0 kernel: Buffer I/O error on device sfa0066, logical block 0 --------8<--------
The same issue also occurred on other disks presented to other virtual machines. And then all of the disks became invisible. I've tried to reboot the virtual machine and re-load the disk driver, but it did not work. I think it's hardware issue, so I removed the incomplete e2fsck part from the test result and just uploaded the complete inodes creation part.
After the issue is resolved, I'll complete the e2fsck part.
Yu Jian, I looked through the inodes run, but I didn't see it running e2fsck on the large LUN? That should be added as part of the test script if it isn't there today. If the LUN with the 135M files still exists, can you please start an e2fsck on both the MDS and the OST.
After running for about 53 hours, the test passed at Thu Aug 11 04:41:09 PDT 2011:
https://maloo.whamcloud.com/test_sets/af225374-c72b-11e0-a7e2-52540025f9af
The test log was not showed up in the above Maloo report. Please find it in the attachment - large-LUN-inodes.suite_log.ddn-sfa10000e-stack01.log.
The "large-LUN-inodes" testing is going to be started on the latest master branch...
The inode creation testing on 128TB Lustre filesystem against master branch on CentOS5.6/x86_64 (kernel version: 2.6.18-238.19.1.el5_lustre.gd4ea36c) was started at Mon Aug 8 22:51:49 PDT 2011. About 134M inodes would be created.
The following builds were used:
Lustre build: http://newbuild.whamcloud.com/job/lustre-master/246/arch=x86_64,build_type=server,distro=el5,ib_stack=ofa/
e2fsprogs build: http://newbuild.whamcloud.com/job/e2fsprogs-master/42/arch=x86_64,distro=el5/
After running for about 53 hours, the test passed at Thu Aug 11 04:41:09 PDT 2011:
https://maloo.whamcloud.com/test_sets/af225374-c72b-11e0-a7e2-52540025f9af
Here is a short summary of the test result after running mdsrate with "--create" option:
# /opt/mpich/bin/mpirun -np 25 -machinefile /tmp/mdsrate-create.machines /usr/lib64/lustre/tests/mdsrate --create --verbose --ndirs 25 --dirfmt '/mnt/lustre/mdsrate/dir%d' --nfiles 5360000 --filefmt 'file%%d' Rate: 694.17 eff 694.18 aggr 27.77 avg client creates/sec (total: 25 threads 134000000 creates 25 dirs 1 threads/dir 193035.50 secs) # lfs df -h /mnt/lustre UUID bytes Used Available Use% Mounted on largefs-MDT0000_UUID 1.5T 13.6G 1.4T 1% /mnt/lustre[MDT:0] largefs-OST0000_UUID 128.0T 3.6G 121.6T 0% /mnt/lustre[OST:0] filesystem summary: 128.0T 3.6G 121.6T 0% /mnt/lustre # lfs df -i /mnt/lustre UUID Inodes IUsed IFree IUse% Mounted on largefs-MDT0000_UUID 1073741824 134000062 939741762 12% /mnt/lustre[MDT:0] largefs-OST0000_UUID 134217728 134006837 210891 100% /mnt/lustre[OST:0] filesystem summary: 1073741824 134000062 939741762 12% /mnt/lustre
Now, the read operation is ongoing...
Done.
After running for about 21 days in total, the 128TB LUN full testing on CentOS5.6/x86_64 (kernel version: 2.6.18-238.12.1.el5_lustre.g5c1e9f9) passed on Lustre master build v2_0_65_0:
https://maloo.whamcloud.com/test_sets/69c35618-bdd3-11e0-8bdf-52540025f9af
The "large-LUN-inodes" testing is going to be started on the latest master branch...
TT-180 was just fixed.
Here is the Maloo report for the above test result: https://maloo.whamcloud.com/test_sets/83e2174e-ddfb-11e0-9909-52540025f9af