Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12026

verify that MDS stores atime/mtime/ctime during LSOM update

Details

    • Task
    • Resolution: Fixed
    • Minor
    • Lustre 2.13.0, Lustre 2.12.4
    • Lustre 2.12.0, Lustre 2.13.0
    • None
    • 9223372036854775807

    Description

      In order to make direct inode scanning on the MDT useful, in addition to storing the file size/blocks via LSOM on the MDT, we also need to store the atime/mtime/ctime on the MDT inodes when the LSOM attributes are updated.

      Currently the atime is already lazily updated on the MDS (at close time), but I'm not sure if the final mtime/ctime are sent to the MDS at close time, nor whether they are updated on the MDT inode by the MDS. If this is not being done, then any MDT-only scanning will be broken.

      Attachments

        Issue Links

          Activity

            [LU-12026] verify that MDS stores atime/mtime/ctime during LSOM update

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36869/
            Subject: LU-12026 mdt: MDS stores atime|mtime|ctime during close
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: b67e307f2d332c2ab1643aaa04c19c024a37b22a

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36869/ Subject: LU-12026 mdt: MDS stores atime|mtime|ctime during close Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: b67e307f2d332c2ab1643aaa04c19c024a37b22a

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36869
            Subject: LU-12026 mdt: MDS stores atime|mtime|ctime during close
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: bc85324ec7d8793683005124fd8612d2f56d7082

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36869 Subject: LU-12026 mdt: MDS stores atime|mtime|ctime during close Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: bc85324ec7d8793683005124fd8612d2f56d7082

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36286/
            Subject: LU-12026 mdt: MDS stores atime|mtime|ctime during close
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: d2f7cb7934a0b38fa9503e8257f2b70ed656c11d

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36286/ Subject: LU-12026 mdt: MDS stores atime|mtime|ctime during close Project: fs/lustre-release Branch: master Current Patch Set: Commit: d2f7cb7934a0b38fa9503e8257f2b70ed656c11d

            Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/36286
            Subject: LU-12026 mdt: MDS stores atime|mtime|ctime during close
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 051195a24dd851e4412710bdf8a2e6ee22e914bf

            gerrit Gerrit Updater added a comment - Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/36286 Subject: LU-12026 mdt: MDS stores atime|mtime|ctime during close Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 051195a24dd851e4412710bdf8a2e6ee22e914bf
            qian_wc Qian Yingjin added a comment -

            Yes, You're right!

             

            After further testing, I found that in some test cases, the timestamps are not being updated on MDT.

            In the previous tests, the reason that the timestamps were updated is the command "dd if=/dev/zero of=/mnt/lustre/test bs=1k count=2" truncates the file when open the file.

            After add the "conv=notrunc", the timestamps are difference.

             

            [root@qian tests]# dd if=/dev/zero of=/mnt/lustre/test bs=1k count=2 conv=notrunc
            2+0 records in
            2+0 records out
            2048 bytes (2.0 kB) copied, 0.00368011 s, 557 kB/s
            [root@qian tests]# stat /mnt/lustre/test 
              File: '/mnt/lustre/test'
              Size: 2048      	Blocks: 8          IO Block: 4194304 regular file
            Device: 2c54f966h/743766374d	Inode: 144115205272502273  Links: 1
            Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
            Context: unconfined_u:object_r:unlabeled_t:s0
            Access: 2019-09-24 10:59:52.000000000 +0800
            Modify: 2019-09-24 21:28:32.000000000 +0800
            Change: 2019-09-24 21:28:32.000000000 +0800
             Birth: -
            [root@qian tests]# debugfs -c -R 'stat ROOT/test' /dev/mapper/mds1_flakey 
            debugfs 1.45.2.wc1 (27-May-2019)
            /dev/mapper/mds1_flakey: catastrophic mode - not reading inode or group bitmaps
            Inode: 162   Type: regular    Mode:  0644   Flags: 0x0
            Generation: 667952766    Version: 0x00000001:00000001
            User:     0   Group:     0   Project:     0   Size: 0
            File ACL: 0
            Links: 1   Blockcount: 0
            Fragment:  Address: 0    Number: 0    Size: 0
             ctime: 0x5d8994b8:00000000 -- Tue Sep 24 11:59:52 2019
             atime: 0x5d8986a8:9963a170 -- Tue Sep 24 10:59:52 2019
             mtime: 0x5d8994b8:00000000 -- Tue Sep 24 11:59:52 2019
            crtime: 0x5d8986a8:9963a170 -- Tue Sep 24 10:59:52 2019
            Size of extra inode fields: 32
            Extended attributes:
              trusted.lma (24) = 00 00 00 00 00 00 00 00 01 04 00 00 02 00 00 00 01 00 00 00
             00 00 00 00 
              lma: fid=[0x200000401:0x1:0x0] compat=0 incompat=0
              trusted.lov (56)
              security.selinux (37) = "unconfined_u:object_r:unlabeled_t:s0\000"
              trusted.link (46)
              trusted.som (24) = 04 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 08 00 00 00
             00 00 00 00 
            BLOCKS:
            
            

            I will patch the llite and MDT code to make it update mtime and ctime accordingly.

             

            Regards,

            Qian

            qian_wc Qian Yingjin added a comment - Yes, You're right!   After further testing, I found that in some test cases, the timestamps are not being updated on MDT. In the previous tests, the reason that the timestamps were updated is the command "dd if=/dev/zero of=/mnt/lustre/test bs=1k count=2" truncates the file when open the file. After add the "conv=notrunc", the timestamps are difference.   [root@qian tests]# dd if =/dev/zero of=/mnt/lustre/test bs=1k count=2 conv=notrunc 2+0 records in 2+0 records out 2048 bytes (2.0 kB) copied, 0.00368011 s, 557 kB/s [root@qian tests]# stat /mnt/lustre/test    File: '/mnt/lustre/test'   Size: 2048      Blocks: 8          IO Block: 4194304 regular file Device: 2c54f966h/743766374d Inode: 144115205272502273  Links: 1 Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root) Context: unconfined_u:object_r:unlabeled_t:s0 Access: 2019-09-24 10:59:52.000000000 +0800 Modify: 2019-09-24 21:28:32.000000000 +0800 Change: 2019-09-24 21:28:32.000000000 +0800  Birth: - [root@qian tests]# debugfs -c -R 'stat ROOT/test' /dev/mapper/mds1_flakey  debugfs 1.45.2.wc1 (27-May-2019) /dev/mapper/mds1_flakey: catastrophic mode - not reading inode or group bitmaps Inode: 162   Type: regular    Mode:  0644   Flags: 0x0 Generation: 667952766    Version: 0x00000001:00000001 User:     0   Group:     0   Project:     0   Size: 0 File ACL: 0 Links: 1   Blockcount: 0 Fragment:  Address: 0    Number : 0    Size: 0  ctime: 0x5d8994b8:00000000 -- Tue Sep 24 11:59:52 2019  atime: 0x5d8986a8:9963a170 -- Tue Sep 24 10:59:52 2019  mtime: 0x5d8994b8:00000000 -- Tue Sep 24 11:59:52 2019 crtime: 0x5d8986a8:9963a170 -- Tue Sep 24 10:59:52 2019 Size of extra inode fields: 32 Extended attributes:   trusted.lma (24) = 00 00 00 00 00 00 00 00 01 04 00 00 02 00 00 00 01 00 00 00  00 00 00 00    lma: fid=[0x200000401:0x1:0x0] compat=0 incompat=0   trusted.lov (56)   security.selinux (37) = "unconfined_u:object_r:unlabeled_t:s0\000"   trusted.link (46)   trusted.som (24) = 04 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 08 00 00 00  00 00 00 00  BLOCKS: I will patch the llite and MDT code to make it update mtime and ctime accordingly.   Regards, Qian

            Qian, I'm happy for now that we have confirmed the timestamp attributes are being updated on the MDT inodes. It isn't really clear to me where this is being done? Looking at the mdt_mfd_close() code:

                    /* Update atime on close only. */
                    if ((open_flags & MDS_FMODE_EXEC || open_flags & MDS_FMODE_READ ||
                         open_flags & MDS_FMODE_WRITE) && (ma->ma_valid & MA_INODE) &&
                        (ma->ma_attr.la_valid & LA_ATIME)) {
                            /* Set the atime only. */
                            ma->ma_valid = MA_INODE;
                            ma->ma_attr.la_valid = LA_ATIME;
                            rc = mo_attr_set(info->mti_env, next, ma);
                    }
            

            it seems that this is only updating atime but not mtime and ctime.

            For now it seems we are doing the right thing, but I'm not yet convinced that we are doing the right thing all the time.

            adilger Andreas Dilger added a comment - Qian, I'm happy for now that we have confirmed the timestamp attributes are being updated on the MDT inodes. It isn't really clear to me where this is being done? Looking at the mdt_mfd_close() code: /* Update atime on close only. */ if ((open_flags & MDS_FMODE_EXEC || open_flags & MDS_FMODE_READ || open_flags & MDS_FMODE_WRITE) && (ma->ma_valid & MA_INODE) && (ma->ma_attr.la_valid & LA_ATIME)) { /* Set the atime only. */ ma->ma_valid = MA_INODE; ma->ma_attr.la_valid = LA_ATIME; rc = mo_attr_set(info->mti_env, next, ma); } it seems that this is only updating atime but not mtime and ctime. For now it seems we are doing the right thing, but I'm not yet convinced that we are doing the right thing all the time.
            qian_wc Qian Yingjin added a comment -

            Hi Andreas,

            I did a simple manual test on my local system that verifies the inode timestamps are updated:

            /dev/mapper/mds1_flakey        125368      1956    112176   2% /mnt/lustre-mds1
            /dev/mapper/ost1_flakey        325368     13512    284696   5% /mnt/lustre-ost1
            /dev/mapper/ost2_flakey        325368     13508    284700   5% /mnt/lustre-ost2
            192.168.150.128@tcp:/lustre    650736     27020    569396   5% /mnt/lustre
            
            [root@qian tests]# dd if=/dev/zero of=/mnt/lustre/test bs=1k count=1
            1+0 records in
            1+0 records out
            1024 bytes (1.0 kB) copied, 0.000959122 s, 1.1 MB/s
            
            [root@qian tests]# sleep 5
            
            [root@qian tests]# stat /mnt/lustre/test 
              File: '/mnt/lustre/test'
              Size: 1024      	Blocks: 8          IO Block: 4194304 regular file
            Device: 2c54f966h/743766374d	Inode: 144115205272502273  Links: 1
            Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
            Context: unconfined_u:object_r:unlabeled_t:s0
            Access: 2019-09-24 10:59:52.000000000 +0800
            Modify: 2019-09-24 10:59:52.000000000 +0800
            Change: 2019-09-24 10:59:52.000000000 +0800
             Birth: -
            
            [root@qian tests]# debugfs -c -R 'stat ROOT/test' /dev/mapper/mds1_flakey 
            debugfs 1.45.2.wc1 (27-May-2019)
            /dev/mapper/mds1_flakey: catastrophic mode - not reading inode or group bitmaps
            Inode: 162   Type: regular    Mode:  0644   Flags: 0x0
            Generation: 667952766    Version: 0x00000001:00000001
            User:     0   Group:     0   Project:     0   Size: 0
            File ACL: 0
            Links: 1   Blockcount: 0
            Fragment:  Address: 0    Number: 0    Size: 0
             ctime: 0x5d8986a8:9963a170 -- Tue Sep 24 10:59:52 2019
             atime: 0x5d8986a8:9963a170 -- Tue Sep 24 10:59:52 2019
             mtime: 0x5d8986a8:9963a170 -- Tue Sep 24 10:59:52 2019
            crtime: 0x5d8986a8:9963a170 -- Tue Sep 24 10:59:52 2019
            Size of extra inode fields: 32
            Extended attributes:
              trusted.lma (24) = 00 00 00 00 00 00 00 00 01 04 00 00 02 00 00 00 01 00 00 00
             00 00 00 00 
              lma: fid=[0x200000401:0x1:0x0] compat=0 incompat=0
              trusted.lov (56)
              security.selinux (37) = "unconfined_u:object_r:unlabeled_t:s0\000"
              trusted.link (46)
              trusted.som (24) = 04 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 08 00 00 00
             00 00 00 00 
            BLOCKS:
            
            [root@qian tests]# dd if=/dev/zero of=/mnt/lustre/test bs=1k count=1
            1+0 records in
            1+0 records out
            1024 bytes (1.0 kB) copied, 0.000835408 s, 1.2 MB/s
            [root@qian tests]# stat /mnt/lustre/test 
              File: '/mnt/lustre/test'
              Size: 1024      	Blocks: 1          IO Block: 4194304 regular file
            Device: 2c54f966h/743766374d	Inode: 144115205272502273  Links: 1
            Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
            Context: unconfined_u:object_r:unlabeled_t:s0
            Access: 2019-09-24 10:59:52.000000000 +0800
            Modify: 2019-09-24 11:53:23.000000000 +0800
            Change: 2019-09-24 11:53:23.000000000 +0800
             Birth: -
            [root@qian tests]# debugfs -c -R 'stat ROOT/test' /dev/mapper/mds1_flakey 
            debugfs 1.45.2.wc1 (27-May-2019)
            /dev/mapper/mds1_flakey: catastrophic mode - not reading inode or group bitmaps
            Inode: 162   Type: regular    Mode:  0644   Flags: 0x0
            Generation: 667952766    Version: 0x00000001:00000001
            User:     0   Group:     0   Project:     0   Size: 0
            File ACL: 0
            Links: 1   Blockcount: 0
            Fragment:  Address: 0    Number: 0    Size: 0
             ctime: 0x5d899333:00000000 -- Tue Sep 24 11:53:23 2019
             atime: 0x5d8986a8:9963a170 -- Tue Sep 24 10:59:52 2019
             mtime: 0x5d899333:00000000 -- Tue Sep 24 11:53:23 2019
            crtime: 0x5d8986a8:9963a170 -- Tue Sep 24 10:59:52 2019
            Size of extra inode fields: 32
            Extended attributes:
              trusted.lma (24) = 00 00 00 00 00 00 00 00 01 04 00 00 02 00 00 00 01 00 00 00
             00 00 00 00 
              lma: fid=[0x200000401:0x1:0x0] compat=0 incompat=0
              trusted.lov (56)
              security.selinux (37) = "unconfined_u:object_r:unlabeled_t:s0\000"
              trusted.link (46)
              trusted.som (24) = 04 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 00 00 00
             00 00 00 00 
            BLOCKS:
            
            
            

            Should I write a test script in sanity.sh to verify it later?

             

            Regards,

            Qian

            qian_wc Qian Yingjin added a comment - Hi Andreas, I did a simple manual test on my local system that verifies the inode timestamps are updated: /dev/mapper/mds1_flakey        125368      1956    112176   2% /mnt/lustre-mds1 /dev/mapper/ost1_flakey        325368     13512    284696   5% /mnt/lustre-ost1 /dev/mapper/ost2_flakey        325368     13508    284700   5% /mnt/lustre-ost2 192.168.150.128@tcp:/lustre    650736     27020    569396   5% /mnt/lustre [root@qian tests]# dd if =/dev/zero of=/mnt/lustre/test bs=1k count=1 1+0 records in 1+0 records out 1024 bytes (1.0 kB) copied, 0.000959122 s, 1.1 MB/s [root@qian tests]# sleep 5 [root@qian tests]# stat /mnt/lustre/test   File: '/mnt/lustre/test'   Size: 1024      Blocks: 8          IO Block: 4194304 regular file Device: 2c54f966h/743766374d Inode: 144115205272502273  Links: 1 Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root) Context: unconfined_u:object_r:unlabeled_t:s0 Access: 2019-09-24 10:59:52.000000000 +0800 Modify: 2019-09-24 10:59:52.000000000 +0800 Change: 2019-09-24 10:59:52.000000000 +0800 Birth: - [root@qian tests]# debugfs -c -R 'stat ROOT/test' /dev/mapper/mds1_flakey debugfs 1.45.2.wc1 (27-May-2019) /dev/mapper/mds1_flakey: catastrophic mode - not reading inode or group bitmaps Inode: 162   Type: regular    Mode:  0644   Flags: 0x0 Generation: 667952766    Version: 0x00000001:00000001 User:     0   Group:     0   Project:     0   Size: 0 File ACL: 0 Links: 1   Blockcount: 0 Fragment:  Address: 0    Number : 0    Size: 0 ctime: 0x5d8986a8:9963a170 -- Tue Sep 24 10:59:52 2019 atime: 0x5d8986a8:9963a170 -- Tue Sep 24 10:59:52 2019 mtime: 0x5d8986a8:9963a170 -- Tue Sep 24 10:59:52 2019 crtime: 0x5d8986a8:9963a170 -- Tue Sep 24 10:59:52 2019 Size of extra inode fields: 32 Extended attributes:   trusted.lma (24) = 00 00 00 00 00 00 00 00 01 04 00 00 02 00 00 00 01 00 00 00 00 00 00 00   lma: fid=[0x200000401:0x1:0x0] compat=0 incompat=0   trusted.lov (56)   security.selinux (37) = "unconfined_u:object_r:unlabeled_t:s0\000"   trusted.link (46)   trusted.som (24) = 04 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 08 00 00 00 00 00 00 00 BLOCKS: [root@qian tests]# dd if =/dev/zero of=/mnt/lustre/test bs=1k count=1 1+0 records in 1+0 records out 1024 bytes (1.0 kB) copied, 0.000835408 s, 1.2 MB/s [root@qian tests]# stat /mnt/lustre/test   File: '/mnt/lustre/test'   Size: 1024      Blocks: 1          IO Block: 4194304 regular file Device: 2c54f966h/743766374d Inode: 144115205272502273  Links: 1 Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root) Context: unconfined_u:object_r:unlabeled_t:s0 Access: 2019-09-24 10:59:52.000000000 +0800 Modify: 2019-09-24 11:53:23.000000000 +0800 Change: 2019-09-24 11:53:23.000000000 +0800 Birth: - [root@qian tests]# debugfs -c -R 'stat ROOT/test' /dev/mapper/mds1_flakey debugfs 1.45.2.wc1 (27-May-2019) /dev/mapper/mds1_flakey: catastrophic mode - not reading inode or group bitmaps Inode: 162   Type: regular    Mode:  0644   Flags: 0x0 Generation: 667952766    Version: 0x00000001:00000001 User:     0   Group:     0   Project:     0   Size: 0 File ACL: 0 Links: 1   Blockcount: 0 Fragment:  Address: 0    Number : 0    Size: 0 ctime: 0x5d899333:00000000 -- Tue Sep 24 11:53:23 2019 atime: 0x5d8986a8:9963a170 -- Tue Sep 24 10:59:52 2019 mtime: 0x5d899333:00000000 -- Tue Sep 24 11:53:23 2019 crtime: 0x5d8986a8:9963a170 -- Tue Sep 24 10:59:52 2019 Size of extra inode fields: 32 Extended attributes:   trusted.lma (24) = 00 00 00 00 00 00 00 00 01 04 00 00 02 00 00 00 01 00 00 00 00 00 00 00   lma: fid=[0x200000401:0x1:0x0] compat=0 incompat=0   trusted.lov (56)   security.selinux (37) = "unconfined_u:object_r:unlabeled_t:s0\000"   trusted.link (46)   trusted.som (24) = 04 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 BLOCKS: Should I write a test script in sanity.sh to verify it later?   Regards, Qian
            qian_wc Qian Yingjin added a comment -

            Sure, I will work on it sooner. Sorry for late reply.

            qian_wc Qian Yingjin added a comment - Sure, I will work on it sooner. Sorry for late reply.

            Yingjin, can you please at least look at the MDT code and/or run a quick manual test to verify that the mtime and ctime are updated on the MDT inode when the file is closed after a write. It would need a test that creates/writes a file, sleeps for 5s, then writes it again. Then, check the MDT with debugfs to see if the inode timestamps are updated.

            If this is not working for 2.13 then we need to make a patch and backport to 2.12/EXA5, otherwise MDT scanning will not be working properly. If this is working properly then making a test is less critical.

            adilger Andreas Dilger added a comment - Yingjin, can you please at least look at the MDT code and/or run a quick manual test to verify that the mtime and ctime are updated on the MDT inode when the file is closed after a write. It would need a test that creates/writes a file, sleeps for 5s, then writes it again. Then, check the MDT with debugfs to see if the inode timestamps are updated. If this is not working for 2.13 then we need to make a patch and backport to 2.12/EXA5, otherwise MDT scanning will not be working properly. If this is working properly then making a test is less critical.

            It would also be useful to add a sanity test case for this, to verify that it is working properly (e.g. by checking the "lfs find" RPC count to verify that it didn't do a glimpse on the file when scanning for -mtime or -size.

            adilger Andreas Dilger added a comment - It would also be useful to add a sanity test case for this, to verify that it is working properly (e.g. by checking the " lfs find " RPC count to verify that it didn't do a glimpse on the file when scanning for -mtime or -size .

            People

              qian_wc Qian Yingjin
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: