Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3279

Interop 2.3.0 <-> 2.4 failure on test suite lustre-rsync-test test_7: Failure in replication; differences found

Details

    • 3
    • 8116

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/5584e96e-b5de-11e2-9d08-52540035b04c.

      The sub-test test_7 failed with the following error:

      Failure in replication; differences found.

      Info required for matching: lustre-rsync-test 7

      Attachments

        Issue Links

          Activity

            [LU-3279] Interop 2.3.0 <-> 2.4 failure on test suite lustre-rsync-test test_7: Failure in replication; differences found
            pjones Peter Jones added a comment -

            Andreas

            All the present patches in flight have landed. Is any further work required or can this ticket be marked as resolved?

            Peter

            pjones Peter Jones added a comment - Andreas All the present patches in flight have landed. Is any further work required or can this ticket be marked as resolved? Peter

            I've pushed http://review.whamcloud.com/6338 as a follow-on patch for master. This changes the comment for CL_LAYOUT, and also changes the output string for this record type to match the 5-character convention used by other names.

            I pushed http://review.whamcloud.com/6335 as a combined patch for b2_1.

            adilger Andreas Dilger added a comment - I've pushed http://review.whamcloud.com/6338 as a follow-on patch for master. This changes the comment for CL_LAYOUT, and also changes the output string for this record type to match the 5-character convention used by other names. I pushed http://review.whamcloud.com/6335 as a combined patch for b2_1.

            We have concerns about the comment that has been added to lustre/lustre_user.h:650 though. As Jinshan Xiong noted, stating that CL_LAYOUT doesn't reflect any actual data change is misleading.

            My thought is that the content of the file is not changing, either in the case of file release and file migrate. In the case of file release, the content is still in the archive. Even if it is not in Lustre, any application accessing that file would get the same data back after it is restored from the archive.

            There is of course some concern that a poorly-written layout swap could change the content of the file (e.g. source and target file do not contain the same data), but I can't see any reason to do that. Only the file owner could do this migrate, and they could just as easily write some other data directly into the file if they want.

            In any case, I'll remove the comment from the CL_LAYOUT description.

            adilger Andreas Dilger added a comment - We have concerns about the comment that has been added to lustre/lustre_user.h:650 though. As Jinshan Xiong noted, stating that CL_LAYOUT doesn't reflect any actual data change is misleading. My thought is that the content of the file is not changing, either in the case of file release and file migrate. In the case of file release, the content is still in the archive. Even if it is not in Lustre, any application accessing that file would get the same data back after it is restored from the archive. There is of course some concern that a poorly-written layout swap could change the content of the file (e.g. source and target file do not contain the same data), but I can't see any reason to do that. Only the file owner could do this migrate, and they could just as easily write some other data directly into the file if they want. In any case, I'll remove the comment from the CL_LAYOUT description.
            sarah Sarah Liu added a comment -

            interop between 2.1.5 client and 2.4 server also hit this issue in tag-2.3.65 testing:

            https://maloo.whamcloud.com/test_sets/3e989b54-b9a5-11e2-875f-52540035b04c

            sarah Sarah Liu added a comment - interop between 2.1.5 client and 2.4 server also hit this issue in tag-2.3.65 testing: https://maloo.whamcloud.com/test_sets/3e989b54-b9a5-11e2-875f-52540035b04c

            Thanks for the heads up. The change doesn't affect our tools.

            We have concerns about the comment that has been added to lustre/lustre_user.h:650 though. As Jinshan Xiong noted, stating that CL_LAYOUT doesn't reflect any actual data change is misleading.

            hdoreau Henri Doreau (Inactive) added a comment - Thanks for the heads up. The change doesn't affect our tools. We have concerns about the comment that has been added to lustre/lustre_user.h:650 though. As Jinshan Xiong noted, stating that CL_LAYOUT doesn't reflect any actual data change is misleading.

            I appreciate the heads up! . AFAIK, we do not use any tools external to the Lustre tree which make use of changelogs except robinhood. Can one of the robinhood developers comment on this? I'm not familiar with its internals to know if it would affect it or not..

            prakash Prakash Surya (Inactive) added a comment - I appreciate the heads up! . AFAIK, we do not use any tools external to the Lustre tree which make use of changelogs except robinhood. Can one of the robinhood developers comment on this? I'm not familiar with its internals to know if it would affect it or not..
            green Oleg Drokin added a comment -

            Just a heads up to CEA and LLNL:

            Current patch in review changes LAYOUT changelog record type to an incompatible value. Do you guys currently monitor it from your tools, would you be affected by the change?

            green Oleg Drokin added a comment - Just a heads up to CEA and LLNL: Current patch in review changes LAYOUT changelog record type to an incompatible value. Do you guys currently monitor it from your tools, would you be affected by the change?

            Jinshan - that depends on what records are being filtered out. 275120128 = 0x10660000 = CHANGELOG_REC, and 20 = CL_LAYOUT. The CL_LAYOUT records do not affect the actual data in the file, which is what lustre_rsync cares about.

            To be honest, I don't think the kernel should actually care about what is in the ChangeLog, it should only be the consumer that needs to check what the records are. changelog_show_cb() should IMHO be modified to remove the CL_LAST check, and just pass all of the records on to the consumer. Otherwise, we will have all sorts of "compatibility" problems here for no real benefit.

            If necessary, we might consider to split up the cr_type field to contain a few bits of information, like:

            #define CL_FLAG_OPTIONAL 0xff000000 /* does not affect data content/layout, can be skipped */
            

            For now, to keep things simple, I've pushed http://review.whamcloud.com/6308 to fix this interop problem. It moves CL_LAYOUT in the place of CL_IOCTL (which was never used, and was IMHO a pointless record). This avoids the problem of CL_LAYOUT > CL_LAST(2.3). It also removes the check for cr_type > CL_LAST, and just passes the records up to the caller, and the caller can decide what to do with the records. It does not do anything like CL_FLAG_OPTIONAL, but we might consider that in the future.

            adilger Andreas Dilger added a comment - Jinshan - that depends on what records are being filtered out. 275120128 = 0x10660000 = CHANGELOG_REC, and 20 = CL_LAYOUT. The CL_LAYOUT records do not affect the actual data in the file, which is what lustre_rsync cares about. To be honest, I don't think the kernel should actually care about what is in the ChangeLog, it should only be the consumer that needs to check what the records are. changelog_show_cb() should IMHO be modified to remove the CL_LAST check, and just pass all of the records on to the consumer. Otherwise, we will have all sorts of "compatibility" problems here for no real benefit. If necessary, we might consider to split up the cr_type field to contain a few bits of information, like: #define CL_FLAG_OPTIONAL 0xff000000 /* does not affect data content/layout, can be skipped */ For now, to keep things simple, I've pushed http://review.whamcloud.com/6308 to fix this interop problem. It moves CL_LAYOUT in the place of CL_IOCTL (which was never used, and was IMHO a pointless record). This avoids the problem of CL_LAYOUT > CL_LAST(2.3). It also removes the check for cr_type > CL_LAST, and just passes the records up to the caller, and the caller can decide what to do with the records. It does not do anything like CL_FLAG_OPTIONAL, but we might consider that in the future.

            I don't think to filter out the unknown record is an option because that will make rsync not work practically.

            jay Jinshan Xiong (Inactive) added a comment - I don't think to filter out the unknown record is an option because that will make rsync not work practically.

            It seems there is nothing else we can do instead of requiring the customers to use uptodate client to do rsync.

            jay Jinshan Xiong (Inactive) added a comment - It seems there is nothing else we can do instead of requiring the customers to use uptodate client to do rsync.

            People

              tappro Mikhail Pershin
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: