Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12404

conf-sanity test 69 fails with 'create file after reformat'

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.13.0, Lustre 2.12.3
    • None
    • 3
    • 9223372036854775807

    Description

      conf-sanity test_69 fails with 'create file after reformat'

      Looking at the client test log from the failure https://testing.whamcloud.com/test_sets/efc1357e-8895-11e9-8c65-52540065bddc, we see the following error

      Starting client: trevis-18vm4.trevis.whamcloud.com:  -o user_xattr,flock trevis-18vm11@tcp:/lustre /mnt/lustre
      CMD: trevis-18vm4.trevis.whamcloud.com mkdir -p /mnt/lustre
      CMD: trevis-18vm4.trevis.whamcloud.com mount -t lustre -o user_xattr,flock trevis-18vm11@tcp:/lustre /mnt/lustre
      touch: cannot touch '/mnt/lustre/d69.conf-sanity/f69.conf-sanity-last': No space left on device
       conf-sanity test_69: @@@@@@ FAIL: create file after reformat 
      

      This looks like LU-8158 but this is happening for non-SLES clients.

      Looking at the OST (vm6) console log, we see

      [37858.976026] Lustre: DEBUG MARKER: /usr/sbin/lctl mark osc.lustre-OST0000-osc-MDT0000.ost_server_uuid in FULL state after 3 sec
      [37859.170354] Lustre: DEBUG MARKER: osc.lustre-OST0000-osc-MDT0000.ost_server_uuid in FULL state after 3 sec
      [37862.106450] LustreError: 30205:0:(ofd_dev.c:1709:ofd_create_hdl()) lustre-OST0000: unable to precreate: rc = -28
      [37879.498553] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  conf-sanity test_69: @@@@@@ FAIL: create file after reformat 
      [37879.685068] Lustre: DEBUG MARKER: conf-sanity test_69: @@@@@@ FAIL: create file after reformat
      

      A different ofd_create_hdl() error is seen in LU-8158, but the root cause could be the same.

      We've started seeing this test fail with this ofd_create_hdl() error since 2019-05-27 Lustre version 2.12.53.62. Here are links to a few of the failed test session logs:
      https://testing.whamcloud.com/test_sets/e3585f0a-8148-11e9-a028-52540065bddc
      https://testing.whamcloud.com/test_sets/b1e108b8-8135-11e9-b8e0-52540065bddc
      https://testing.whamcloud.com/test_sets/5cf8cb50-83de-11e9-a028-52540065bddc
      https://testing.whamcloud.com/test_sets/6bd12340-83d2-11e9-af1f-52540065bddc

      Attachments

        Issue Links

          Activity

            [LU-12404] conf-sanity test 69 fails with 'create file after reformat'
            pfarrell Patrick Farrell (Inactive) added a comment - Fixed under LU-11760 .

            Suggest to leave this open. And do revert of https://review.whamcloud.com/#/c/33833/ with LU-12404 while new patch will be landed under LU-11760.

            scherementsev Sergey Cheremencev added a comment - Suggest to leave this open. And do revert of  https://review.whamcloud.com/#/c/33833/  with LU-12404 while new patch will be landed under LU-11760 .

            I put a more detailed comment on how to fix this in LU-11760. Maybe I should have left that ticket closed, and we should track the fix here?

            adilger Andreas Dilger added a comment - I put a more detailed comment on how to fix this in LU-11760 . Maybe I should have left that ticket closed, and we should track the fix here?
            adilger Andreas Dilger added a comment - - edited

            I suspect that this problem is caused by patch https://review.whamcloud.com/33833 which was committed 2019-05-25 and affects exactly the number of objects created after reformat that test_69() is verifying:

            commit d07d9c5ed0aa1d6614944c7d1e0ca55cba301dc4
            Author:     Sergey Cheremencev <c17829@cray.com>
            AuthorDate: Fri Aug 24 17:03:45 2018 +0300
            Commit:     Oleg Drokin <green@whamcloud.com>
            CommitDate: Sat May 25 04:55:51 2019 +0000
            
            LU-11760 ofd: formatted OST recognition change
                
                Modern system is fast enough to create above
                100 000(5 * OST_MAX_PRECREATE) objects during commit interval.
                Increase the difference between MDS last_used ID
                and OST LAST_ID to 500 000 to avoid gaps after OST failover.
            

            The problem is that if the OST filesystem is does not have enough free inodes to store an extra 500k objects at recovery time, and the OST has previously created more objects than this, then the OST will run out of space during this test.

            adilger Andreas Dilger added a comment - - edited I suspect that this problem is caused by patch https://review.whamcloud.com/33833 which was committed 2019-05-25 and affects exactly the number of objects created after reformat that test_69() is verifying: commit d07d9c5ed0aa1d6614944c7d1e0ca55cba301dc4 Author: Sergey Cheremencev <c17829@cray.com> AuthorDate: Fri Aug 24 17:03:45 2018 +0300 Commit: Oleg Drokin <green@whamcloud.com> CommitDate: Sat May 25 04:55:51 2019 +0000 LU-11760 ofd: formatted OST recognition change Modern system is fast enough to create above 100 000(5 * OST_MAX_PRECREATE) objects during commit interval. Increase the difference between MDS last_used ID and OST LAST_ID to 500 000 to avoid gaps after OST failover. The problem is that if the OST filesystem is does not have enough free inodes to store an extra 500k objects at recovery time, and the OST has previously created more objects than this, then the OST will run out of space during this test.

            People

              scherementsev Sergey Cheremencev
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: