Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4765

Failure on test suite sanity test_39l: atime is not updated

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.6.0, Lustre 2.9.0, Lustre 2.10.0, Lustre 2.11.0
    • None
    • server: lustre-master build # 1937
      client: SLES11 SP3
    • 3
    • 13108

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/5b796592-aa08-11e3-b4b1-52540035b04c.

      The sub-test test_39l failed with the following error:

      atime is not updated from future: 1426094124, 1394558125<atime<1394558125

      == sanity test 39l: directory atime update =========================================================== 10:15:24 (1394558124)
      CMD: shadow-8vm12 lctl get_param -n mdd.*MDT0000*.atime_diff
       sanity test_39l: @@@@@@ FAIL: atime is not updated from future: 1426094124, 1394558125<atime<1394558125 
      

      Attachments

        Issue Links

          Activity

            [LU-4765] Failure on test suite sanity test_39l: atime is not updated

            Close as duplicate of LU-1739.

            adilger Andreas Dilger added a comment - Close as duplicate of LU-1739 .
            bogl Bob Glossman (Inactive) added a comment - - edited another on master: https://testing.hpdd.intel.com/test_sets/6bb96e0e-15b0-11e7-8920-5254006e85c2
            bogl Bob Glossman (Inactive) added a comment - another on master: https://testing.hpdd.intel.com/test_sets/3c35c808-08ae-11e7-9053-5254006e85c2

            wondering if the recent rash of these failures have something to do with the mod landed this year; "LU-8041 mdd: increasing only atime update on close"

            I note this includes changes to test 39l

            bogl Bob Glossman (Inactive) added a comment - wondering if the recent rash of these failures have something to do with the mod landed this year; " LU-8041 mdd: increasing only atime update on close" I note this includes changes to test 39l
            bogl Bob Glossman (Inactive) added a comment - another on master: https://testing.hpdd.intel.com/test_sets/3f9ace90-c8ca-11e6-8911-5254006e85c2

            In general, I think 30-day results are less useful than, say, 7-day results, because a problem may have happened many times three weeks ago but was fixed and hasn't happened at all in the past week. If you are reporting on a problem that is still happening regularly (i.e. it also happened 8 times in the past week) then please include that information in the summary.

            For example, the sanityn test_77[a-e] failures appear even in the 7-day results, but not at all in the 3-day results since the problem was resolved earlier in the week.

            adilger Andreas Dilger added a comment - In general, I think 30-day results are less useful than, say, 7-day results, because a problem may have happened many times three weeks ago but was fixed and hasn't happened at all in the past week. If you are reporting on a problem that is still happening regularly (i.e. it also happened 8 times in the past week) then please include that information in the summary. For example, the sanityn test_77 [a-e] failures appear even in the 7-day results, but not at all in the 3-day results since the problem was resolved earlier in the week.

            This issue has occurred around 44 times in past 30 days.

            standan Saurabh Tandan (Inactive) added a comment - This issue has occurred around 44 times in past 30 days.
            sarah Sarah Liu added a comment -

            failure seen in interop testing between 2.9 server and 2.8 client:
            server: master/#3385 RHEL7.2
            client: 2.8.0 RHEL6.7

            https://testing.hpdd.intel.com/test_sets/a9ca904c-2d62-11e6-80b9-5254006e85c2

            sarah Sarah Liu added a comment - failure seen in interop testing between 2.9 server and 2.8 client: server: master/#3385 RHEL7.2 client: 2.8.0 RHEL6.7 https://testing.hpdd.intel.com/test_sets/a9ca904c-2d62-11e6-80b9-5254006e85c2
            bogl Bob Glossman (Inactive) added a comment - another one seen on sles11sp3: https://testing.hpdd.intel.com/test_sets/14d6386e-8c7e-11e4-b81b-5254006e85c2

            What your test was missing is that the MDS does not write the atime update to disk unless the atime is at least 60s old. The atime is updated on the MDT at close time when the atime exceeds the mdd.*.atime_diff tunable, so that this doesn't cause numerous updates at close time when the same file is opened by thousands of clients.

            mds# lctl get_param mdd.*.atime_diff
            mdd.myth-MDT0000.atime_diff=60
            

            In Jinshan's example, the atime happened to be only 9s old (07:26:39 vs. 07:26:48) so it was not updated on the MDT on disk. Testing this on an old file I have on my filesystem (which you can see has already had the atime updated due to previous accesses):

            $ stat /myth/tmp/f9
              File: `/myth/tmp/f9'
            :
            Access: 2014-06-23 12:59:48.000000000 -0600
            Modify: 2013-05-28 22:04:47.000000000 -0600
            Change: 2013-11-27 15:26:07.000000000 -0700
            $ cat /myth/tmp/f9 > /dev/null
            $ stat /myth/tmp/f9
            :
            Access: 2014-09-15 12:38:12.000000000 -0600
            $ sudo lctl set_param ldlm.namespaces.*.lru_size=clear
            ldlm.namespaces.MGC192.168.20.1@tcp.lru_size=clear
            ldlm.namespaces.myth-MDT0000-mdc-ffff8800377fc000.lru_size=clear
            ldlm.namespaces.myth-OST0000-osc-ffff8800377fc000.lru_size=clear
            ldlm.namespaces.myth-OST0001-osc-ffff8800377fc000.lru_size=clear
            ldlm.namespaces.myth-OST0002-osc-ffff8800377fc000.lru_size=clear
            ldlm.namespaces.myth-OST0003-osc-ffff8800377fc000.lru_size=clear
            ldlm.namespaces.myth-OST0004-osc-ffff8800377fc000.lru_size=clear
            $ stat /myth/tmp/f9
            :
            Access: 2014-09-15 12:38:12.000000000 -0600
            Modify: 2013-05-28 22:04:47.000000000 -0600
            Change: 2013-11-27 15:26:07.000000000 -0700
            

            I also verified with debugfs that this inode's atime was updated on disk on the MDT.

            The original problem reported in this bug is quite different. In newer kernels, setting the atime into the past (e.g. after accessing a file with "tar" for backup and trying to reset the atime to the original value) does not affect filesystems mounted using the "relatime" option. This bug has not been fixed in upstream kernels yet, and may never be.

            PS: setting "lru_size=0" doesn't necessarily clear the DLM locks from the client, but rather disables the static DLM LRU maximum and enables dynamic LRU sizing. Use "lctl set_param ldlm.namespaces.*.lru_size=clear" to clear the DLM lock cache on the client.

            adilger Andreas Dilger added a comment - What your test was missing is that the MDS does not write the atime update to disk unless the atime is at least 60s old. The atime is updated on the MDT at close time when the atime exceeds the mdd.*.atime_diff tunable, so that this doesn't cause numerous updates at close time when the same file is opened by thousands of clients. mds# lctl get_param mdd.*.atime_diff mdd.myth-MDT0000.atime_diff=60 In Jinshan's example, the atime happened to be only 9s old (07:26:39 vs. 07:26:48) so it was not updated on the MDT on disk. Testing this on an old file I have on my filesystem (which you can see has already had the atime updated due to previous accesses): $ stat /myth/tmp/f9 File: `/myth/tmp/f9' : Access: 2014-06-23 12:59:48.000000000 -0600 Modify: 2013-05-28 22:04:47.000000000 -0600 Change: 2013-11-27 15:26:07.000000000 -0700 $ cat /myth/tmp/f9 > /dev/null $ stat /myth/tmp/f9 : Access: 2014-09-15 12:38:12.000000000 -0600 $ sudo lctl set_param ldlm.namespaces.*.lru_size=clear ldlm.namespaces.MGC192.168.20.1@tcp.lru_size=clear ldlm.namespaces.myth-MDT0000-mdc-ffff8800377fc000.lru_size=clear ldlm.namespaces.myth-OST0000-osc-ffff8800377fc000.lru_size=clear ldlm.namespaces.myth-OST0001-osc-ffff8800377fc000.lru_size=clear ldlm.namespaces.myth-OST0002-osc-ffff8800377fc000.lru_size=clear ldlm.namespaces.myth-OST0003-osc-ffff8800377fc000.lru_size=clear ldlm.namespaces.myth-OST0004-osc-ffff8800377fc000.lru_size=clear $ stat /myth/tmp/f9 : Access: 2014-09-15 12:38:12.000000000 -0600 Modify: 2013-05-28 22:04:47.000000000 -0600 Change: 2013-11-27 15:26:07.000000000 -0700 I also verified with debugfs that this inode's atime was updated on disk on the MDT. The original problem reported in this bug is quite different. In newer kernels, setting the atime into the past (e.g. after accessing a file with "tar" for backup and trying to reset the atime to the original value) does not affect filesystems mounted using the "relatime" option. This bug has not been fixed in upstream kernels yet, and may never be. PS: setting "lru_size=0" doesn't necessarily clear the DLM locks from the client, but rather disables the static DLM LRU maximum and enables dynamic LRU sizing. Use "lctl set_param ldlm.namespaces.*.lru_size=clear" to clear the DLM lock cache on the client.

            People

              bogl Bob Glossman (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: