Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1658

Review consistantly fails when running with 300 osts - wide stripe testing.

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.3.0
    • None
    • 3
    • 10456

    Description

      When running reviews with 300 osts several test reliably fail.

      Sanity: test_27n (Out of disk space so probably a test issue rather than lustre issue)
      Sanityn: test_34
      Conf-sanity: test_46a
      Sanity-quota: test_1
      replay-single: Timesout

      Results here:
      https://maloo.whamcloud.com/test_sessions/21b8e000-d3bb-11e1-90f0-52540035b04c
      https://maloo.whamcloud.com/test_sessions/360f11e4-d3ea-11e1-a98e-52540035b04c
      https://maloo.whamcloud.com/test_sessions/e03561de-d3cf-11e1-90f0-52540035b04c
      https://maloo.whamcloud.com/test_sessions/f94947d8-d449-11e1-a98e-52540035b04c

      Note these results where found during my development testing of autotest but are so repeatable I'm convinced they are real. If wide-stripe testing is not part of production autotest when this is debug please ask for my help (chris)

      Attachments

        Issue Links

          Activity

            [LU-1658] Review consistantly fails when running with 300 osts - wide stripe testing.

            Fixed as part of patch https://review.whamcloud.com/28425 "LU-9846 lod: Add overstriping support"

            adilger Andreas Dilger added a comment - Fixed as part of patch https://review.whamcloud.com/28425 " LU-9846 lod: Add overstriping support "
            yujian Jian Yu added a comment -

            The next step for this ticket is to make autotest perform the wide stripe testing with MDS_FS_MKFS_OPTS="-O large_xattr" on a specific test cluster (not to disturb the normal review/full testings). So, we need the help from Chris to do this. And after new results are reported, I'll vet them.

            yujian Jian Yu added a comment - The next step for this ticket is to make autotest perform the wide stripe testing with MDS_FS_MKFS_OPTS="-O large_xattr" on a specific test cluster (not to disturb the normal review/full testings). So, we need the help from Chris to do this. And after new results are reported, I'll vet them.
            pjones Peter Jones added a comment -

            Lowering priority because the testing that triggered these failures is not rolled into production yet and so the issue will not affect release testing.

            pjones Peter Jones added a comment - Lowering priority because the testing that triggered these failures is not rolled into production yet and so the issue will not affect release testing.
            yujian Jian Yu added a comment - - edited

            This is a lustre issue surely. If this flag is required then should mkfs_opts not add the flag. That is the point of mkfs_opts in the test framework.

            Currently the large xattr feature (wide striping) is disabled by default in Lustre. To test this feature, the "-O large_xattr" option needs to be set on MDT either with --mkfsoptions at format time or via tune2fs.

            local.cfg does not contain any MKFS_OPTS.

            In cfg/local.sh:

            # Arguments for "--mkfsoptions" shall be specified with these
            # variables:
            #
            #   - <fstype>_MKFS_OPTS
            #   - <facet_type>_FS_MKFS_OPTS
            #
            <~snip~>
            MDS_FS_MKFS_OPTS=${MDS_FS_MKFS_OPTS:-}
            

            Before we decide to enable the feature by default in Lustre, we have to explicitly specify MDS_FS_MKFS_OPTS="-O large_xattr" to test the feature.

            yujian Jian Yu added a comment - - edited This is a lustre issue surely. If this flag is required then should mkfs_opts not add the flag. That is the point of mkfs_opts in the test framework. Currently the large xattr feature (wide striping) is disabled by default in Lustre. To test this feature, the "-O large_xattr" option needs to be set on MDT either with --mkfsoptions at format time or via tune2fs. local.cfg does not contain any MKFS_OPTS. In cfg/local.sh: # Arguments for "--mkfsoptions" shall be specified with these # variables: # # - <fstype>_MKFS_OPTS # - <facet_type>_FS_MKFS_OPTS # <~snip~> MDS_FS_MKFS_OPTS=${MDS_FS_MKFS_OPTS:-} Before we decide to enable the feature by default in Lustre, we have to explicitly specify MDS_FS_MKFS_OPTS="-O large_xattr" to test the feature.

            This is a lustre issue surely. If this flag is required then should mkfs_opts not add the flag. That is the point of mkfs_opts in the test framework.

            local.cfg does not contain any MKFS_OPTS.

            chris Chris Gearing (Inactive) added a comment - This is a lustre issue surely. If this flag is required then should mkfs_opts not add the flag. That is the point of mkfs_opts in the test framework. local.cfg does not contain any MKFS_OPTS.
            yujian Jian Yu added a comment -

            The errors seem to be the same;

            https://maloo.whamcloud.com/test_sessions/6aeba286-d73e-11e1-ab1c-52540035b04c

            To be clear autotest is not specifying any mount options, all mount options are within Lustre and perhaps a change is required there, perhaps mkfs_opts() needs to be updated.

            Hi Chris,

            In the above test session, the "large_xattr" option was still not specified while formatting the MDT. Could you please specify the following variable in autotest_config.sh?

            MDS_FS_MKFS_OPTS="-O large_xattr"
            

            Thanks.

            yujian Jian Yu added a comment - The errors seem to be the same; https://maloo.whamcloud.com/test_sessions/6aeba286-d73e-11e1-ab1c-52540035b04c To be clear autotest is not specifying any mount options, all mount options are within Lustre and perhaps a change is required there, perhaps mkfs_opts() needs to be updated. Hi Chris, In the above test session, the "large_xattr" option was still not specified while formatting the MDT. Could you please specify the following variable in autotest_config.sh? MDS_FS_MKFS_OPTS="-O large_xattr" Thanks.

            So these results use the local.sh and ncli.sh from the lustre source itself. I've attached the config file used, I believe this means that if a mount option is incorrect it needs to be changed in the source - i.e. autotest no longer produces mount the options.

            The errors seem to be the same;

            https://maloo.whamcloud.com/test_sessions/6aeba286-d73e-11e1-ab1c-52540035b04c

            To be clear autotest is not specifying any mount options, all mount options are within Lustre and perhaps a change is required there, perhaps mkfs_opts() needs to be updated.

            chris Chris Gearing (Inactive) added a comment - So these results use the local.sh and ncli.sh from the lustre source itself. I've attached the config file used, I believe this means that if a mount option is incorrect it needs to be changed in the source - i.e. autotest no longer produces mount the options. The errors seem to be the same; https://maloo.whamcloud.com/test_sessions/6aeba286-d73e-11e1-ab1c-52540035b04c To be clear autotest is not specifying any mount options, all mount options are within Lustre and perhaps a change is required there, perhaps mkfs_opts() needs to be updated.

            I've fixed up the code so that it makes use of local.sh and ncli.sh from the lustre sources itself, much less is now defined by autotest.

            I fear this may throw up new issues but it is certainly a move forwards

            chris Chris Gearing (Inactive) added a comment - I've fixed up the code so that it makes use of local.sh and ncli.sh from the lustre sources itself, much less is now defined by autotest. I fear this may throw up new issues but it is certainly a move forwards
            yujian Jian Yu added a comment -

            Hi Chris,

            From the test outputs and MDT debug logs, I found the "large_xattr" option was not specified while formatting the MDT. After http://review.whamcloud.com/#change,2907 was landed on master branch, the cfg/local.sh was changed a lot, and the variable used to specify arguments for "--mkfsoptions" on MDT is MDS_FS_MKFS_OPTS. So, in autotest_config.sh, the following variable should be specified instead of MDSOPT:

            MDS_FS_MKFS_OPTS="-O large_xattr"
            

            The test session you sent to me by email had the "large_xattr" option specified but the journal size was too big:
            https://maloo.whamcloud.com/test_sessions/13920dfa-d0f8-11e1-8d8f-52540035b04c

            yujian Jian Yu added a comment - Hi Chris, From the test outputs and MDT debug logs, I found the "large_xattr" option was not specified while formatting the MDT. After http://review.whamcloud.com/#change,2907 was landed on master branch, the cfg/local.sh was changed a lot, and the variable used to specify arguments for "--mkfsoptions" on MDT is MDS_FS_MKFS_OPTS. So, in autotest_config.sh, the following variable should be specified instead of MDSOPT: MDS_FS_MKFS_OPTS="-O large_xattr" The test session you sent to me by email had the "large_xattr" option specified but the journal size was too big: https://maloo.whamcloud.com/test_sessions/13920dfa-d0f8-11e1-8d8f-52540035b04c
            pjones Peter Jones added a comment -

            Yujian

            could you please look into this one?

            Thanks

            Peter

            pjones Peter Jones added a comment - Yujian could you please look into this one? Thanks Peter

            People

              wc-triage WC Triage
              chris Chris Gearing (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: