[LU-1658] Review consistantly fails when running with 300 osts - wide stripe testing. Created: 23/Jul/12  Updated: 09/Oct/21  Resolved: 09/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Chris Gearing (Inactive) Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None

Attachments: File config.sh    
Issue Links:
Related
is related to LU-9846 Overstriping - more than stripe per O... Resolved
Severity: 3
Rank (Obsolete): 10456

 Description   

When running reviews with 300 osts several test reliably fail.

Sanity: test_27n (Out of disk space so probably a test issue rather than lustre issue)
Sanityn: test_34
Conf-sanity: test_46a
Sanity-quota: test_1
replay-single: Timesout

Results here:
https://maloo.whamcloud.com/test_sessions/21b8e000-d3bb-11e1-90f0-52540035b04c
https://maloo.whamcloud.com/test_sessions/360f11e4-d3ea-11e1-a98e-52540035b04c
https://maloo.whamcloud.com/test_sessions/e03561de-d3cf-11e1-90f0-52540035b04c
https://maloo.whamcloud.com/test_sessions/f94947d8-d449-11e1-a98e-52540035b04c

Note these results where found during my development testing of autotest but are so repeatable I'm convinced they are real. If wide-stripe testing is not part of production autotest when this is debug please ask for my help (chris)



 Comments   
Comment by Peter Jones [ 23/Jul/12 ]

Yujian

could you please look into this one?

Thanks

Peter

Comment by Jian Yu [ 25/Jul/12 ]

Hi Chris,

From the test outputs and MDT debug logs, I found the "large_xattr" option was not specified while formatting the MDT. After http://review.whamcloud.com/#change,2907 was landed on master branch, the cfg/local.sh was changed a lot, and the variable used to specify arguments for "--mkfsoptions" on MDT is MDS_FS_MKFS_OPTS. So, in autotest_config.sh, the following variable should be specified instead of MDSOPT:

MDS_FS_MKFS_OPTS="-O large_xattr"

The test session you sent to me by email had the "large_xattr" option specified but the journal size was too big:
https://maloo.whamcloud.com/test_sessions/13920dfa-d0f8-11e1-8d8f-52540035b04c

Comment by Chris Gearing (Inactive) [ 25/Jul/12 ]

I've fixed up the code so that it makes use of local.sh and ncli.sh from the lustre sources itself, much less is now defined by autotest.

I fear this may throw up new issues but it is certainly a move forwards

Comment by Chris Gearing (Inactive) [ 26/Jul/12 ]

So these results use the local.sh and ncli.sh from the lustre source itself. I've attached the config file used, I believe this means that if a mount option is incorrect it needs to be changed in the source - i.e. autotest no longer produces mount the options.

The errors seem to be the same;

https://maloo.whamcloud.com/test_sessions/6aeba286-d73e-11e1-ab1c-52540035b04c

To be clear autotest is not specifying any mount options, all mount options are within Lustre and perhaps a change is required there, perhaps mkfs_opts() needs to be updated.

Comment by Jian Yu [ 01/Aug/12 ]

The errors seem to be the same;

https://maloo.whamcloud.com/test_sessions/6aeba286-d73e-11e1-ab1c-52540035b04c

To be clear autotest is not specifying any mount options, all mount options are within Lustre and perhaps a change is required there, perhaps mkfs_opts() needs to be updated.

Hi Chris,

In the above test session, the "large_xattr" option was still not specified while formatting the MDT. Could you please specify the following variable in autotest_config.sh?

MDS_FS_MKFS_OPTS="-O large_xattr"

Thanks.

Comment by Chris Gearing (Inactive) [ 02/Aug/12 ]

This is a lustre issue surely. If this flag is required then should mkfs_opts not add the flag. That is the point of mkfs_opts in the test framework.

local.cfg does not contain any MKFS_OPTS.

Comment by Jian Yu [ 03/Aug/12 ]

This is a lustre issue surely. If this flag is required then should mkfs_opts not add the flag. That is the point of mkfs_opts in the test framework.

Currently the large xattr feature (wide striping) is disabled by default in Lustre. To test this feature, the "-O large_xattr" option needs to be set on MDT either with --mkfsoptions at format time or via tune2fs.

local.cfg does not contain any MKFS_OPTS.

In cfg/local.sh:

# Arguments for "--mkfsoptions" shall be specified with these
# variables:
#
#   - <fstype>_MKFS_OPTS
#   - <facet_type>_FS_MKFS_OPTS
#
<~snip~>
MDS_FS_MKFS_OPTS=${MDS_FS_MKFS_OPTS:-}

Before we decide to enable the feature by default in Lustre, we have to explicitly specify MDS_FS_MKFS_OPTS="-O large_xattr" to test the feature.

Comment by Peter Jones [ 09/Aug/12 ]

Lowering priority because the testing that triggered these failures is not rolled into production yet and so the issue will not affect release testing.

Comment by Jian Yu [ 04/Sep/12 ]

The next step for this ticket is to make autotest perform the wide stripe testing with MDS_FS_MKFS_OPTS="-O large_xattr" on a specific test cluster (not to disturb the normal review/full testings). So, we need the help from Chris to do this. And after new results are reported, I'll vet them.

Comment by Andreas Dilger [ 09/Oct/21 ]

Fixed as part of patch https://review.whamcloud.com/28425 "LU-9846 lod: Add overstriping support"

Generated at Sat Feb 10 01:18:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.