Details

    • Technical task
    • Resolution: Fixed
    • Blocker
    • Lustre 2.11.0
    • Lustre 2.11.0
    • 9223372036854775807

    Description

      This task is to verify the client should be running okay when mirrored files are being accessed by old version of Lustre. There should be no system crash or any other problems that stops old file system from accessing plain and PFL files. However, it's acceptable for old clients to see IO errors when it is trying to access mirrored files.

      Andreas once mentioned that when a mirrored file is being accessed by an old client, the MDS should be able to make a fake PFL layout from one of mirror so that old clients can still read the data.

      Attachments

        Issue Links

          Activity

            [LU-10286] Verify the behaviors when mirrored files are being accessed by old clients
            yujian Jian Yu added a comment -

            I set up Lustre filesystem with the following interop configuration on 4 test nodes:

            Client1: onyx-22vm3 (2.10.3 RC1)
            Client2: onyx-22vm5 (2.10.57)
            MDS: onyx-22vm1 (2.10.57)
            OSS: onyx-22vm2 (2.10.57)
            

            On 2.10.57 Client2:

            [root@onyx-22vm5 tests]# lfs mirror create -N -o 1 -N -o 2 -N -o 3 /mnt/lustre/file1
            [root@onyx-22vm5 tests]# stat /mnt/lustre/file1
              File: ‘/mnt/lustre/file1’
              Size: 0               Blocks: 0          IO Block: 4194304 regular empty file
            Device: 2c54f966h/743766374d    Inode: 144115205272502273  Links: 1
            Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
            Access: 2018-01-21 05:30:15.000000000 +0000
            Modify: 2018-01-21 05:30:15.000000000 +0000
            Change: 2018-01-21 05:30:15.000000000 +0000
             Birth: -
            [root@onyx-22vm5 tests]# cat /mnt/lustre/file1
            

            Then on 2.10.3 Client1:

            [root@onyx-22vm3 ~]# stat /mnt/lustre/file1
              File: ‘/mnt/lustre/file1’
              Size: 0               Blocks: 0          IO Block: 4194304 regular empty file
            Device: 2c54f966h/743766374d    Inode: 144115205272502273  Links: 1
            Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
            Access: 2018-01-21 05:30:15.000000000 +0000
            Modify: 2018-01-21 05:30:15.000000000 +0000
            Change: 2018-01-21 05:30:15.000000000 +0000
             Birth: -
            [root@onyx-22vm3 ~]# cat /mnt/lustre/file1
            

            Then on 2.10.57 Client2:

            [root@onyx-22vm5 tests]# echo foo > /mnt/lustre/file1
            [root@onyx-22vm5 tests]# lfs mirror resync /mnt/lustre/file1
            [root@onyx-22vm5 tests]# cat /mnt/lustre/file1
            foo
            

            Then on 2.10.3 Client1:

            [root@onyx-22vm3 ~]# cat /mnt/lustre/file1
            foo
            [root@onyx-22vm3 ~]# echo goo >> /mnt/lustre/file1
            [root@onyx-22vm3 ~]# cat /mnt/lustre/file1
            foo
            goo
            

            Then on 2.10.57 Client2:

            [root@onyx-22vm5 tests]# cat /mnt/lustre/file1
            foo
            [root@onyx-22vm5 tests]# lfs mirror resync /mnt/lustre/file1
            lfs mirror resync: '/mnt/lustre/file1' file state error: ro.
            

            2.10.3 Client1 wrote "goo" into the mirrored file /mnt/lustre/file1, but on 2.10.57 Client2, the file data were not updated.

            Then on 2.10.57 Client2:

            [root@onyx-22vm5 tests]# echo hoo >> /mnt/lustre/file1
            [root@onyx-22vm5 tests]# lfs mirror resync /mnt/lustre/file1
            [root@onyx-22vm5 tests]# cat /mnt/lustre/file1
            foo
            goo
            hoo
            

            The file data were updated after writing new data and re-syncing.

            Then on 2.10.3 Client1:

            [root@onyx-22vm3 ~]# cat /mnt/lustre/file1
            foo
            goo
            hoo
            

            The file data were correct on 2.10.3 Client1.

            yujian Jian Yu added a comment - I set up Lustre filesystem with the following interop configuration on 4 test nodes: Client1: onyx-22vm3 (2.10.3 RC1) Client2: onyx-22vm5 (2.10.57) MDS: onyx-22vm1 (2.10.57) OSS: onyx-22vm2 (2.10.57) On 2.10.57 Client2: [root@onyx-22vm5 tests]# lfs mirror create -N -o 1 -N -o 2 -N -o 3 /mnt/lustre/file1 [root@onyx-22vm5 tests]# stat /mnt/lustre/file1 File: ‘/mnt/lustre/file1’ Size: 0 Blocks: 0 IO Block: 4194304 regular empty file Device: 2c54f966h/743766374d Inode: 144115205272502273 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2018-01-21 05:30:15.000000000 +0000 Modify: 2018-01-21 05:30:15.000000000 +0000 Change: 2018-01-21 05:30:15.000000000 +0000 Birth: - [root@onyx-22vm5 tests]# cat /mnt/lustre/file1 Then on 2.10.3 Client1: [root@onyx-22vm3 ~]# stat /mnt/lustre/file1 File: ‘/mnt/lustre/file1’ Size: 0 Blocks: 0 IO Block: 4194304 regular empty file Device: 2c54f966h/743766374d Inode: 144115205272502273 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2018-01-21 05:30:15.000000000 +0000 Modify: 2018-01-21 05:30:15.000000000 +0000 Change: 2018-01-21 05:30:15.000000000 +0000 Birth: - [root@onyx-22vm3 ~]# cat /mnt/lustre/file1 Then on 2.10.57 Client2: [root@onyx-22vm5 tests]# echo foo > /mnt/lustre/file1 [root@onyx-22vm5 tests]# lfs mirror resync /mnt/lustre/file1 [root@onyx-22vm5 tests]# cat /mnt/lustre/file1 foo Then on 2.10.3 Client1: [root@onyx-22vm3 ~]# cat /mnt/lustre/file1 foo [root@onyx-22vm3 ~]# echo goo >> /mnt/lustre/file1 [root@onyx-22vm3 ~]# cat /mnt/lustre/file1 foo goo Then on 2.10.57 Client2: [root@onyx-22vm5 tests]# cat /mnt/lustre/file1 foo [root@onyx-22vm5 tests]# lfs mirror resync /mnt/lustre/file1 lfs mirror resync: '/mnt/lustre/file1' file state error: ro. 2.10.3 Client1 wrote "goo" into the mirrored file /mnt/lustre/file1, but on 2.10.57 Client2, the file data were not updated. Then on 2.10.57 Client2: [root@onyx-22vm5 tests]# echo hoo >> /mnt/lustre/file1 [root@onyx-22vm5 tests]# lfs mirror resync /mnt/lustre/file1 [root@onyx-22vm5 tests]# cat /mnt/lustre/file1 foo goo hoo The file data were updated after writing new data and re-syncing. Then on 2.10.3 Client1: [root@onyx-22vm3 ~]# cat /mnt/lustre/file1 foo goo hoo The file data were correct on 2.10.3 Client1.

            Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: https://review.whamcloud.com/30957
            Subject: LU-10286 mdt: deny 2.10 clients to open mirrored files
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 0f735407d3582c3b6b4e5c943ecbf57a32d43a73

            gerrit Gerrit Updater added a comment - Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: https://review.whamcloud.com/30957 Subject: LU-10286 mdt: deny 2.10 clients to open mirrored files Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 0f735407d3582c3b6b4e5c943ecbf57a32d43a73

            Now I recall more details. Since 2.10 clients don't verify overlapping extents, they would access mirrored files like normal PFL files, which means they could use any components for I/O. So you're right, we need to define the behavior when mirrored files are being accessed by old clients.

            I also looked at the options to return a fake layout to 2.10 clients, but the problem was there are too many places that a layout could be packed and sent to clients. Returning fake layout will have to repair all those code.

            For read access, a 2.11 MDS could return a single mirror to 2.10 clients, and if that becomes stale then the MDS would cancel the layout lock and the 2.10 client should get a new layout with the non-STALE mirror?

            Yes, I was thinking about the case that read I/O and mirror staling would happen at the same time, so that the read still would return stale data. However, that's probably okay since it would also happen for 2.11 clients.

            Do we need an OBD_CONNECT_MIRROR or _FLR flag so the MDS can detect which clients work properly? That is easy to do now, much harder to do later.

            Let's add this flag and if a client that don't have this flag is trying to open a mirrored file, let's return an error. This seems the simplest solution. We can come back to make a better solution if necessary.

            jay Jinshan Xiong (Inactive) added a comment - Now I recall more details. Since 2.10 clients don't verify overlapping extents, they would access mirrored files like normal PFL files, which means they could use any components for I/O. So you're right, we need to define the behavior when mirrored files are being accessed by old clients. I also looked at the options to return a fake layout to 2.10 clients, but the problem was there are too many places that a layout could be packed and sent to clients. Returning fake layout will have to repair all those code. For read access, a 2.11 MDS could return a single mirror to 2.10 clients, and if that becomes stale then the MDS would cancel the layout lock and the 2.10 client should get a new layout with the non-STALE mirror? Yes, I was thinking about the case that read I/O and mirror staling would happen at the same time, so that the read still would return stale data. However, that's probably okay since it would also happen for 2.11 clients. Do we need an OBD_CONNECT_MIRROR or _FLR flag so the MDS can detect which clients work properly? That is easy to do now, much harder to do later. Let's add this flag and if a client that don't have this flag is trying to open a mirrored file, let's return an error. This seems the simplest solution. We can come back to make a better solution if necessary.

            It's not a question about "whether we should support it", but rather that users will do this whether we tell them to or not. Either it should "work" or there needs to be some mechanism that prevents 2.10 clients from incorrectly accessing these files incorrectly.

            For read access, a 2.11 MDS could return a single mirror to 2.10 clients, and if that becomes stale then the MDS would cancel the layout lock and the 2.10 client should get a new layout with the non-STALE mirror?

            Similarly, for 2.10 clients opening the file for write would just mark all but one mirror STALE right away. Not the best for performance, but at least correct.

            Do we need an OBD_CONNECT_MIRROR or _FLR flag so the MDS can detect which clients work properly? That is easy to do now, much harder to do later.

            adilger Andreas Dilger added a comment - It's not a question about "whether we should support it", but rather that users will do this whether we tell them to or not. Either it should "work" or there needs to be some mechanism that prevents 2.10 clients from incorrectly accessing these files incorrectly. For read access, a 2.11 MDS could return a single mirror to 2.10 clients, and if that becomes stale then the MDS would cancel the layout lock and the 2.10 client should get a new layout with the non-STALE mirror? Similarly, for 2.10 clients opening the file for write would just mark all but one mirror STALE right away. Not the best for performance, but at least correct. Do we need an OBD_CONNECT_MIRROR or _FLR flag so the MDS can detect which clients work properly? That is easy to do now, much harder to do later.

            I thought about this and understood your expectation clearly. Let me explain it a little bit(I did this before but it was on Skype channel).

            In your case, there would be a cluster that has mixed 2.11 and 2.10 clients, because obviously mirrored files can only be created by 2.11 clients. If write is supported by 2.10 clients(only writing to the first mirror but not mark the other mirrors stale), then the corresponding files are really messed because reading by different 2.11 clients could return different version of data.

            Read support by returning a fake layout would have problem too. After the file has been written by 2.11 clients, the layout cached on 2.10 client would be marked as stale but the 2.10 client has no idea about it, then stale data will be returned from read. Users would think it as a bug.

            As you can see, we make huge effort on it but end up with a defective solution. I would rather not support it because only 2.10 clients will be affected(clients prior to 2.10 do not even understand PFL), probably not a big deal.

            jay Jinshan Xiong (Inactive) added a comment - I thought about this and understood your expectation clearly. Let me explain it a little bit(I did this before but it was on Skype channel). In your case, there would be a cluster that has mixed 2.11 and 2.10 clients, because obviously mirrored files can only be created by 2.11 clients. If write is supported by 2.10 clients(only writing to the first mirror but not mark the other mirrors stale), then the corresponding files are really messed because reading by different 2.11 clients could return different version of data. Read support by returning a fake layout would have problem too. After the file has been written by 2.11 clients, the layout cached on 2.10 client would be marked as stale but the 2.10 client has no idea about it, then stale data will be returned from read. Users would think it as a bug. As you can see, we make huge effort on it but end up with a defective solution. I would rather not support it because only 2.10 clients will be affected(clients prior to 2.10 do not even understand PFL), probably not a big deal.

            As described in the original request, testing also needs to be done with 2.10 clients, for both read and write operations. I expect 2.10 clients may be able to read FLR files, but will not write to them properly, possibly writing to the first mirror and not marking the other mirror stale on the MDS.

            adilger Andreas Dilger added a comment - As described in the original request, testing also needs to be done with 2.10 clients, for both read and write operations. I expect 2.10 clients may be able to read FLR files, but will not write to them properly, possibly writing to the first mirror and not marking the other mirror stale on the MDS.

            I have captured LU-10535 to address the error handling improvement so that we can resolve this ticket as the testing is complete and work the improvement separately.

            jgmitter Joseph Gmitter (Inactive) added a comment - I have captured LU-10535 to address the error handling improvement so that we can resolve this ticket as the testing is complete and work the improvement separately.

            It probably makes sense to improve these error messages to consolidate them to at most one message per unknown magic, or similar. It probably isn't useful to dump the long hex string to the console.

            adilger Andreas Dilger added a comment - It probably makes sense to improve these error messages to consolidate them to at most one message per unknown magic, or similar. It probably isn't useful to dump the long hex string to the console.
            sarah Sarah Liu added a comment -

            I have a system configured as 2.11 servers, one 2.11 client and one 2.9.0 client
            1. on the 2.11 client, create 1 pfl file, 1 flr file with plain layout, and 1 flr file with composite layout
            2. on the 2.9 client, got these messages when try to access these files and when I do "ls -al" :

            [root@onyx-77 lustre]# ls
            foo-ext  foo-flr  foo-pfl  foo-plain-2.9
            [root@onyx-77 lustre]# ls -al
            [329391.090438] LustreError: 57728:0:(lov_internal.h:100:lsm_op_find()) unrecognized lsm_magic 0bd60bd0
            [329391.102999] LustreError: 57728:0:(lov_internal.h:100:lsm_op_find()) Skipped 3 previous similar messages
            [329391.115668] LustreError: 57728:0:(lov_pack.c:213:lov_verify_lmm()) bad disk LOV MAGIC: 0x0BD60BD0; dumping LMM (size=552):
            [329391.130044] LustreError: 57728:0:(lov_pack.c:213:lov_verify_lmm()) Skipped 3 previous similar messages
            [329391.142376] LustreError: 57728:0:(lov_pack.c:222:lov_verify_lmm()) FF0BFF0B2802000003000000010005000200000000000000000000000000000001000100100000000000000000000000FFFFFFFFFFFFFFFF10010000380000000000000000000000000000000000000001000200100000000000000000000000000010000000000048010000380000000000000000000000000000000000000002000200000000000000100000000000FFFFFFFFFFFFFFFFFF0100003800000000000000000000000000000000000000010003001000000000000000000000000000100000000000FF010000380000000000000000000000000000000000000002000300000000000000100000000000FFFFFFFFFFFFFFFFFF0100003800000000000000000000000000000000000000FF0BFF0B01000000030000000000000001040000020000000000100001000000040000000000000000000000000000000000000000000000FF0BFF0B01000000030000000000000001040000020000000000100001000000040000000000000000000000000000000000000001000000FF0BFF0B0100000003000000000000000104000002000000000010000200FFFF0000000000000000000000000000000000000000FFFFFFFFFF0BFF0B0100000003000000000000000104000002000000000010[329391.251564] LustreError: 57728:0:(lov_pack.c:222:lov_verify_lmm()) Skipped 3 previous similar messages
            [329391.266288] LustreError: 57728:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000401:0x3:0x0]: -22
            [329391.283577] LustreError: 57728:0:(lcommon_cl.c:181:cl_file_inode_init()) Skipped 3 previous similar messages
            [329391.296622] LustreError: 57728:0:(llite_lib.c:2300:ll_prep_inode()) new_inode -fatal: rc -22
            [329391.307933] LustreError: 57728:0:(llite_lib.c:2300:ll_prep_inode()) Skipped 1 previous similar message
            ls: cannot access foo-ext: Invalid argument
            ls: cannot access foo-pfl: Invalid argument
            ls: cannot access foo-flr: Invalid argument
            total 8
            drwxr-xr-x  3 root root 4096 Dec 22 15:56 .
            drwxr-xr-x. 3 root root 4096 Dec 18 20:52 ..
            -?????????? ? ?    ?       ?            ? foo-ext
            -?????????? ? ?    ?       ?            ? foo-flr
            -?????????? ? ?    ?       ?            ? foo-pfl
            -rw-r--r--  1 root root    0 Dec 22 15:56 foo-plain-2.9
            [root@onyx-77 lustre]# 
            
            sarah Sarah Liu added a comment - I have a system configured as 2.11 servers, one 2.11 client and one 2.9.0 client 1. on the 2.11 client, create 1 pfl file, 1 flr file with plain layout, and 1 flr file with composite layout 2. on the 2.9 client, got these messages when try to access these files and when I do "ls -al" : [root@onyx-77 lustre]# ls foo-ext foo-flr foo-pfl foo-plain-2.9 [root@onyx-77 lustre]# ls -al [329391.090438] LustreError: 57728:0:(lov_internal.h:100:lsm_op_find()) unrecognized lsm_magic 0bd60bd0 [329391.102999] LustreError: 57728:0:(lov_internal.h:100:lsm_op_find()) Skipped 3 previous similar messages [329391.115668] LustreError: 57728:0:(lov_pack.c:213:lov_verify_lmm()) bad disk LOV MAGIC: 0x0BD60BD0; dumping LMM (size=552): [329391.130044] LustreError: 57728:0:(lov_pack.c:213:lov_verify_lmm()) Skipped 3 previous similar messages [329391.142376] LustreError: 57728:0:(lov_pack.c:222:lov_verify_lmm()) FF0BFF0B2802000003000000010005000200000000000000000000000000000001000100100000000000000000000000FFFFFFFFFFFFFFFF10010000380000000000000000000000000000000000000001000200100000000000000000000000000010000000000048010000380000000000000000000000000000000000000002000200000000000000100000000000FFFFFFFFFFFFFFFFFF0100003800000000000000000000000000000000000000010003001000000000000000000000000000100000000000FF010000380000000000000000000000000000000000000002000300000000000000100000000000FFFFFFFFFFFFFFFFFF0100003800000000000000000000000000000000000000FF0BFF0B01000000030000000000000001040000020000000000100001000000040000000000000000000000000000000000000000000000FF0BFF0B01000000030000000000000001040000020000000000100001000000040000000000000000000000000000000000000001000000FF0BFF0B0100000003000000000000000104000002000000000010000200FFFF0000000000000000000000000000000000000000FFFFFFFFFF0BFF0B0100000003000000000000000104000002000000000010[329391.251564] LustreError: 57728:0:(lov_pack.c:222:lov_verify_lmm()) Skipped 3 previous similar messages [329391.266288] LustreError: 57728:0:(lcommon_cl.c:181:cl_file_inode_init()) Failure to initialize cl object [0x200000401:0x3:0x0]: -22 [329391.283577] LustreError: 57728:0:(lcommon_cl.c:181:cl_file_inode_init()) Skipped 3 previous similar messages [329391.296622] LustreError: 57728:0:(llite_lib.c:2300:ll_prep_inode()) new_inode -fatal: rc -22 [329391.307933] LustreError: 57728:0:(llite_lib.c:2300:ll_prep_inode()) Skipped 1 previous similar message ls: cannot access foo-ext: Invalid argument ls: cannot access foo-pfl: Invalid argument ls: cannot access foo-flr: Invalid argument total 8 drwxr-xr-x 3 root root 4096 Dec 22 15:56 . drwxr-xr-x. 3 root root 4096 Dec 18 20:52 .. -?????????? ? ? ? ? ? foo-ext -?????????? ? ? ? ? ? foo-flr -?????????? ? ? ? ? ? foo-pfl -rw-r--r-- 1 root root 0 Dec 22 15:56 foo-plain-2.9 [root@onyx-77 lustre]#

            There are a few different cases that are of interest here:

            • 2.9 or earlier clients that do not understand PFL at all. I expect they will get EIO or similar error when trying to access an FLR file, since they don't understand composite files
            • 2.10 clients + 2.11 MDS. The client understand composite layouts, but not FLR, and the MDS is FLR-aware. If a non-FLR client opens the file for write, it could mark all but one mirror as STALE, and allow the client to access the stale component. Alternately, it could deny such clients write access, but allow it read access.
            • 2.10 clients + 2.10 MDS. Both client and MDS understand composite layouts, but not FLR. What happens if the client tries to read the file? Success or error (both are OK, though it would be better if such a client could at least read an FLR file)? What happens if the client writes the file? If this is allowed, it would result in the mirrors becoming out of sync, but not marked STALE.
            adilger Andreas Dilger added a comment - There are a few different cases that are of interest here: 2.9 or earlier clients that do not understand PFL at all. I expect they will get EIO or similar error when trying to access an FLR file, since they don't understand composite files 2.10 clients + 2.11 MDS. The client understand composite layouts, but not FLR, and the MDS is FLR-aware. If a non-FLR client opens the file for write, it could mark all but one mirror as STALE, and allow the client to access the stale component. Alternately, it could deny such clients write access, but allow it read access. 2.10 clients + 2.10 MDS. Both client and MDS understand composite layouts, but not FLR. What happens if the client tries to read the file? Success or error (both are OK, though it would be better if such a client could at least read an FLR file)? What happens if the client writes the file? If this is allowed, it would result in the mirrors becoming out of sync, but not marked STALE.

            People

              sarah Sarah Liu
              jay Jinshan Xiong (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: