Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4109

sanity test_57b failure: 'MDC before 15214812 != after 15214800'

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.6.0, Lustre 2.5.2
    • Lustre 2.5.0
    • 3
    • 11065

    Description

      While testing HSM features, acceptance-small was run with test results at: https://maloo.whamcloud.com/test_sessions/4574cfe8-35e1-11e3-b051-52540035b04c

      From the test_log:

      == sanity test 57b: default LOV EAs are stored inside large inodes ===== 13:06:10 (1381781170)
      mcreating 100 files
      total: 100 creates in 0.17 seconds: 572.16 creates/second
      Filesystem           1K-blocks      Used Available Use% Mounted on
      mds@o2ib:/scratch     25088052    942228  22886476   4% /lustre/scratch
      opening files to create objects/EAs
      Filesystem           1K-blocks      Used Available Use% Mounted on
      mds@o2ib:/scratch     25088052    942228  22886720   4% /lustre/scratch
       sanity test_57b: @@@@@@ FAIL: MDC before 15214812 != after 15214800 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:4264:error_noexit()
        = /usr/lib64/lustre/tests/test-framework.sh:4291:error()
        = /usr/lib64/lustre/tests/sanity.sh:4670:test_57b()
        = /usr/lib64/lustre/tests/test-framework.sh:4530:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:4563:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:4433:run_test()
        = /usr/lib64/lustre/tests/sanity.sh:4678:main()
      

      Attachments

        Issue Links

          Activity

            [LU-4109] sanity test_57b failure: 'MDC before 15214812 != after 15214800'
            jamesanunez James Nunez (Inactive) added a comment - http://review.whamcloud.com/#/c/9652/ landed to b2_5.
            jamesanunez James Nunez (Inactive) added a comment - Patch for b2_5 at http://review.whamcloud.com/#/c/9652/

            I'm seeing a similar (same?) error again in 2.5.1-RC3 with error "sanity test_57b: @@@@@@ FAIL: MDC before 46667332 != after 46667320". The results are at https://maloo.whamcloud.com/test_sessions/f6c5b8e6-aac8-11e3-a41c-52540035b04c

            I'm running on the OpenSFS cluster with similar set up to what was originally reported.

            It looks like this patch was not back ported to b2_5.

            jamesanunez James Nunez (Inactive) added a comment - I'm seeing a similar (same?) error again in 2.5.1-RC3 with error "sanity test_57b: @@@@@@ FAIL: MDC before 46667332 != after 46667320". The results are at https://maloo.whamcloud.com/test_sessions/f6c5b8e6-aac8-11e3-a41c-52540035b04c I'm running on the OpenSFS cluster with similar set up to what was originally reported. It looks like this patch was not back ported to b2_5.
            bfaccini Bruno Faccini (Inactive) added a comment - - edited

            Patch has landed.

            bfaccini Bruno Faccini (Inactive) added a comment - - edited Patch has landed.

            Patch raising threshold, to detect out-of inode LOV EA alloc, from 8 to 16 is at http://review.whamcloud.com/8156.

            bfaccini Bruno Faccini (Inactive) added a comment - Patch raising threshold, to detect out-of inode LOV EA alloc, from 8 to 16 is at http://review.whamcloud.com/8156 .

            What if the xattrs are stored more efficiently than 1 per 4kB (e.g. MDT formatted with 1kB blocksize)? I don't mind having a larger margin, but I think the test could start to fail silently if the error margin is too large. I think 32-64kB is large enough to avoid future problems, and smaller than the 400kB that would be seen with 100 4kB blocks or even 400 1kB blocks.

            adilger Andreas Dilger added a comment - What if the xattrs are stored more efficiently than 1 per 4kB (e.g. MDT formatted with 1kB blocksize)? I don't mind having a larger margin, but I think the test could start to fail silently if the error margin is too large. I think 32-64kB is large enough to avoid future problems, and smaller than the 400kB that would be seen with 100 4kB blocks or even 400 1kB blocks.

            Ok it was also my understanding that if LOV EA can not fit into MDT inode this will need at least 1 additional block per file, so in this case why not do the reverse test and simply detect if the number of free blocks decreased of 100+ blocks ?

            bfaccini Bruno Faccini (Inactive) added a comment - Ok it was also my understanding that if LOV EA can not fit into MDT inode this will need at least 1 additional block per file, so in this case why not do the reverse test and simply detect if the number of free blocks decreased of 100+ blocks ?

            Bruno, test_57b is trying to detect if the LOV EA is too large to fit into the MDT inode for some reason (e.g. formatted with too-small inodes, default LOV EA becomes too large, etc). If 100 files are created and 100 blocks are allocated then the test fails. In this case, only a few blocks are allocated (e.g. llog files, ChangeLog, etc), and the test shouldn't fail, so increasing the margin a small amount is fine.

            adilger Andreas Dilger added a comment - Bruno, test_57b is trying to detect if the LOV EA is too large to fit into the MDT inode for some reason (e.g. formatted with too-small inodes, default LOV EA becomes too large, etc). If 100 files are created and 100 blocks are allocated then the test fails. In this case, only a few blocks are allocated (e.g. llog files, ChangeLog, etc), and the test shouldn't fail, so increasing the margin a small amount is fine.

            Bruno,
            The file system was set up and run with pretty much all default settings. I think the striping for this set of tests was a single stripe with a single OSS with two OSTs. So, default striping, but not wide.

            James

            jamesanunez James Nunez (Inactive) added a comment - Bruno, The file system was set up and run with pretty much all default settings. I think the striping for this set of tests was a single stripe with a single OSS with two OSTs. So, default striping, but not wide. James

            Andreas, I would like to better understand what sanity/test_57b() tries to detect.
            So, what could cause this ? Wide default striping vs Inode size at mke2fs ?
            If yes, we could anticipate it and use/compute an appropriated margin, no ? If no, what did I miss and misunderstand?

            James, do you remember these parameters value during your failing test ?

            bfaccini Bruno Faccini (Inactive) added a comment - Andreas, I would like to better understand what sanity/test_57b() tries to detect. So, what could cause this ? Wide default striping vs Inode size at mke2fs ? If yes, we could anticipate it and use/compute an appropriated margin, no ? If no, what did I miss and misunderstand? James, do you remember these parameters value during your failing test ?

            People

              bfaccini Bruno Faccini (Inactive)
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: