Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1783

sanity test_39l failed: atime is not updated

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • Lustre 2.4.0
    • Lustre 2.3.0, Lustre 2.4.0
    • None
    • server: lustre-master-tag2.2.93 RHEL6
      client: lustre-master-tag2.2.93 SLES11
    • 3
    • 24,198
    • 4424

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/ed029fec-ec3b-11e1-ba25-52540035b04c.

      The sub-test test_39l failed with the following error:

      atime is not updated from future: 1377057515, should be 1345521515<atime<1345521515

      20:58:37:== sanity test 39l: directory atime update =========================================================== 20:58:35 (1345521515)
      20:58:37:CMD: client-23vm3 lctl get_param -n mdd.*.atime_diff
      20:58:38:flink
      20:58:38:flink2
      20:58:38:fopen
      20:58:38:frename2
      20:58:38:funlink
      20:58:38: sanity test_39l: @@@@@@ FAIL: atime is not updated from future: 1377057515, should be 1345521515<atime<1345521515 
      20:58:38:  Trace dump:
      20:58:38:  = /usr/lib64/lustre/tests/test-framework.sh:3614:error_noexit()
      20:58:38:  = /usr/lib64/lustre/tests/test-framework.sh:3636:error()
      20:58:38:  = /usr/lib64/lustre/tests/sanity.sh:2503:test_39l()
      20:58:38:  = /usr/lib64/lustre/tests/test-framework.sh:3869:run_one()
      20:58:38:  = /usr/lib64/lustre/tests/test-framework.sh:3898:run_one_logged()
      20:58:38:  = /usr/lib64/lustre/tests/test-framework.sh:3772:run_test()
      20:58:38:  = /usr/lib64/lustre/tests/sanity.sh:2529:main()
      20:58:38:Dumping lctl log to /logdir/test_logs/2012-08-20/lustre-master-el6-x86_64-sles11-x86_64__787__-7fad9a6fd1d8/sanity.test_39l.*.1345521516.log
      20:58:38:CMD: client-23vm1,client-23vm2,client-23vm3,client-23vm4 /usr/sbin/lctl dk > /logdir/test_logs/2012-08-20/lustre-master-el6-x86_64-sles11-x86_64__787__-7fad9a6fd1d8/sanity.test_39l.debug_log.\$(hostname -s).1345521516.log;
      20:58:38:         dmesg > /logdir/test_logs/2012-08-20/lustre-master-el6-x86_64-sles11-x86_64__787__-7fad9a6fd1d8/sanity.test_39l.dmesg.\$(hostname -s).1345521516.log
      20:58:44:FAIL 39l (8s)
      

      Attachments

        Issue Links

          Activity

            [LU-1783] sanity test_39l failed: atime is not updated

            I tried to submit an updated patch upstream (http://www.spinics.net/lists/linux-fsdevel/msg60515.html), but it wasn't accepted due to minor complaints about the patch description and implementation. I didn't get around to resubmitting it at the time, and I suspect there will be a general apathy about changing the behaviour of a patch that landed 7 years ago.

            It probably makes more sense to remove forcing MS_STRICTATIME in mount.lustre and change the MDD to default to MAX_ATIME_DIFF=24*3600 to match the kernel relatime and honor the "(which will improve normal filesystem performance due to reduced atime updates) and then change sanity.sh test_39l to remove the check for setting atime backward from the future on normal access since it seems people don't care about this very much.

            adilger Andreas Dilger added a comment - I tried to submit an updated patch upstream ( http://www.spinics.net/lists/linux-fsdevel/msg60515.html ), but it wasn't accepted due to minor complaints about the patch description and implementation. I didn't get around to resubmitting it at the time, and I suspect there will be a general apathy about changing the behaviour of a patch that landed 7 years ago. It probably makes more sense to remove forcing MS_STRICTATIME in mount.lustre and change the MDD to default to MAX_ATIME_DIFF=24*3600 to match the kernel relatime and honor the "(which will improve normal filesystem performance due to reduced atime updates) and then change sanity.sh test_39l to remove the check for setting atime backward from the future on normal access since it seems people don't care about this very much.
            rread Robert Read added a comment -

            It's been a while since this has been updated. Has the fix landed upstream now?

            rread Robert Read added a comment - It's been a while since this has been updated. Has the fix landed upstream now?
            adilger Andreas Dilger added a comment - - edited

            Seems that this was originally fixed for RHEL6 with https://bugzilla.lustre.org/show_bug.cgi?id=24198 and is now hit on SLES11SP2 as well. The problem is that the kernel defaults to MS_RELATIME, and this causes test_39l to fail to reset an atime in the future.

            Forcing MS_STRICTATIME on all Lustre clients is undesirable, since it adds to the overhead on the client, but since atime handling is done in the VFS we need an upstream patch.

            It looks like Yang Sheng sent a patch upstream for this in http://lkml.org/lkml/2011/1/4/88 22 months ago, which got generally favorable reviews, but had a small typo in the comment. That patch needs to be updated to the latest kernel, fix the typo, and preferably wrap the "future atime check" in unlikely() since this will indeed be very unlikely to happen.

            Andrew Morton also requested that the patch have a better commit comment:

            Relatime should update the inode atime if it is more than a day in the
            future.  The original problem seen was a tarball that had a bad atime,
            but could also happen if someone fat-fingers a "touch".  The future
            atime will never be fixed.  Before the relatime patch, the future atime
            would be updated back to the current time on the next access.
            
            Only update the atime if it is more than one day in the future.  That
            avoids thrashing the atime if the clocks on clients of a network fs are
            only slightly out of sync, but still allows fixing bogus atimes.
            

            Please also CC stable@vger.kernel.org so that the patch is more likely to be backported to vendor kernels.

            adilger Andreas Dilger added a comment - - edited Seems that this was originally fixed for RHEL6 with https://bugzilla.lustre.org/show_bug.cgi?id=24198 and is now hit on SLES11SP2 as well. The problem is that the kernel defaults to MS_RELATIME, and this causes test_39l to fail to reset an atime in the future. Forcing MS_STRICTATIME on all Lustre clients is undesirable, since it adds to the overhead on the client, but since atime handling is done in the VFS we need an upstream patch. It looks like Yang Sheng sent a patch upstream for this in http://lkml.org/lkml/2011/1/4/88 22 months ago, which got generally favorable reviews, but had a small typo in the comment. That patch needs to be updated to the latest kernel, fix the typo, and preferably wrap the "future atime check" in unlikely() since this will indeed be very unlikely to happen. Andrew Morton also requested that the patch have a better commit comment: Relatime should update the inode atime if it is more than a day in the future. The original problem seen was a tarball that had a bad atime, but could also happen if someone fat-fingers a "touch". The future atime will never be fixed. Before the relatime patch, the future atime would be updated back to the current time on the next access. Only update the atime if it is more than one day in the future. That avoids thrashing the atime if the clocks on clients of a network fs are only slightly out of sync, but still allows fixing bogus atimes. Please also CC stable@vger.kernel.org so that the patch is more likely to be backported to vendor kernels.
            bogl Bob Glossman (Inactive) added a comment - - edited

            I think this problem in SLES may be a build problem in mount.lustre. If I add an explicit #include of <linux/fs.h> to the source additional conditional options get compiled into the SLES version of mount.lustre. Now a mount with no options has relatime not set, and all the extra options defined in the man page get recognized & actually work.

            I will submit a patch with this 1 line fix and tear out the skip of test 39l.

            bogl Bob Glossman (Inactive) added a comment - - edited I think this problem in SLES may be a build problem in mount.lustre. If I add an explicit #include of <linux/fs.h> to the source additional conditional options get compiled into the SLES version of mount.lustre. Now a mount with no options has relatime not set, and all the extra options defined in the man page get recognized & actually work. I will submit a patch with this 1 line fix and tear out the skip of test 39l.

            Andreas, turns out you are correct. In SuSE it looks like a lustre mount always mounts with relatime. This option shows up only in 'cat /proc/mounts', not in mount -v.

            In addition in spite of being listed in the mount(8) man page, none of the command line options of atime, strictatime, or

            {no}

            relatime have any effect on the mount. Some aren't even accepted, cause the the mount command to fail with Invalid argument. Others are silently accepted, but have no effect. cat /proc/mounts still shows the lustre mount has the relatime option set.

            All of those options appear to really work as advertised in el6.

            bogl Bob Glossman (Inactive) added a comment - Andreas, turns out you are correct. In SuSE it looks like a lustre mount always mounts with relatime. This option shows up only in 'cat /proc/mounts', not in mount -v. In addition in spite of being listed in the mount(8) man page, none of the command line options of atime, strictatime, or {no} relatime have any effect on the mount. Some aren't even accepted, cause the the mount command to fail with Invalid argument. Others are silently accepted, but have no effect. cat /proc/mounts still shows the lustre mount has the relatime option set. All of those options appear to really work as advertised in el6.

            I will revisit to make sure, but I think I tried all permutations of explicit mount options

            {no}atime and {no}

            relatime. mounting with -o relatime showed that option in mount -v. It didn't show in a default mount. None of the options made the atime flush as it does in el6 client.

            bogl Bob Glossman (Inactive) added a comment - I will revisit to make sure, but I think I tried all permutations of explicit mount options {no}atime and {no} relatime. mounting with -o relatime showed that option in mount -v. It didn't show in a default mount. None of the options made the atime flush as it does in el6 client.

            Bob, is it possible that SLES clients mount with "relatime" enabled by default? That is a mount option that is somewhere between "atime enabled" and "noatime". With "relatime" the atime will only be updated if the file is accessed and the atime is older than "mtime" on the directory. This could be checked by explicitly mounting the SLES client with the "atime" option.

            adilger Andreas Dilger added a comment - Bob, is it possible that SLES clients mount with "relatime" enabled by default? That is a mount option that is somewhere between "atime enabled" and "noatime". With "relatime" the atime will only be updated if the file is accessed and the atime is older than "mtime" on the directory. This could be checked by explicitly mounting the SLES client with the "atime" option.

            test revision to skip the subtest for all SLES11 SPs
            http://review.whamcloud.com/4605

            bogl Bob Glossman (Inactive) added a comment - test revision to skip the subtest for all SLES11 SPs http://review.whamcloud.com/4605

            another failure, this one in SLES11 SP2
            https://maloo.whamcloud.com/test_sets/405b4216-2fcb-11e2-866f-52540035b04c

            Will have to extend the test fix to skip this test for SP2 as well.
            Or else a kernel expert will need to dig in to see how SLES differs from RHEL with respect to atime updates.

            bogl Bob Glossman (Inactive) added a comment - another failure, this one in SLES11 SP2 https://maloo.whamcloud.com/test_sets/405b4216-2fcb-11e2-866f-52540035b04c Will have to extend the test fix to skip this test for SP2 as well. Or else a kernel expert will need to dig in to see how SLES differs from RHEL with respect to atime updates.
            sarah Sarah Liu added a comment - another failure on lustre-master build# 1011: https://maloo.whamcloud.com/test_sets/d0072b9c-2534-11e2-9e7c-52540035b04c

            People

              bogl Bob Glossman (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated: