Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17777

runtests test_1: FAIL: old and new files are different: rc=22

Details

    • Bug
    • Resolution: Unresolved
    • Blocker
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

      This issue relates to the following test suite run:
      https://testing.whamcloud.com/test_sets/72dfd1b1-7efa-4dcc-a366-043549333756
      https://testing.whamcloud.com/test_sets/1dde1cd4-24fd-4ae3-ab20-d7261788833f

      test_1 failed with the following error:

      CMD: onyx-106vm11 /usr/sbin/lctl set_param -P lod.*.mdt_hash=crush
      comparing 520 previously copied files
      Files /etc/yum.repos.d/redhat.repo and /mnt/lustre/d1.runtests//etc/yum.repos.d/redhat.repo differ
      Files /etc/pki/entitlement/2519028287967039457.pem and /mnt/lustre/d1.runtests//etc/pki/entitlement/2519028287967039457.pem differ
       runtests test_1: @@@@@@ FAIL: old and new files are different: rc=22 
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-master-next/784 - 5.14.0-362.18.1.el9_3.x86_64
      servers: https://build.whamcloud.com/job/lustre-master-next/784 - 5.14.0-362.18.1_lustre.el9.x86_64

      This failed for the first time with this error on 2024-04-24 for on two separate test runs, one an unlanded patch, and one a "full" test run on master. Strangely, both failures were reported on the same two files. There don't appear to be any Lustre console errors immediately before this failure (a few back when the filesystem is remounted in the test).

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      runtests test_1 - old and new files are different: rc=22

      Attachments

        Issue Links

          Activity

            [LU-17777] runtests test_1: FAIL: old and new files are different: rc=22
            ys Yang Sheng added a comment -

            Can we copy the /etc to somewhere in local FS, then copy it to lustre to compare?

            ys Yang Sheng added a comment - Can we copy the /etc to somewhere in local FS, then copy it to lustre to compare?
            emoly.liu Emoly Liu added a comment - +1 on master: https://testing.whamcloud.com/test_sets/435ee982-749c-4a7e-871b-c187cc35013b
            nangelinas Nikitas Angelinas added a comment - +1 on master: https://testing.whamcloud.com/test_sets/d0abfffa-0bcc-4624-b540-5f733c47b357
            mdiep Minh Diep added a comment -

            yes I think we should skip some directories that change

            mdiep Minh Diep added a comment - yes I think we should skip some directories that change

            Something is updating /etc/yum.repos.d/redhat.repo and /etc/pki/entitlement/xxxx.pem during the test run. Not very often, but enough to cause some failures. If we don't know what that is, I guess we need a patch to make the runtests script more robust.
             

            adilger Andreas Dilger added a comment - Something is updating /etc/yum.repos.d/redhat.repo and /etc/pki/entitlement/xxxx.pem during the test run. Not very often, but enough to cause some failures. If we don't know what that is, I guess we need a patch to make the runtests script more robust.  
            mdiep Minh Diep added a comment -

            I don't think RHEL license check nor change anything in /etc

            mdiep Minh Diep added a comment - I don't think RHEL license check nor change anything in /etc

            mdiep or mkvardakov would be better suited to answer this question

            colmstea Charlie Olmstead added a comment - mdiep or  mkvardakov would be better suited to answer this question

            +1 on b2_15:
            https://testing.whamcloud.com/test_sets/6ffef452-6051-4199-90f0-cec559c8aaf6

            colmstea, is there some RHEL license manager that is running which is randomly updating these files in /etc while the test is running? We can make the test more robust by having it re-check and exclude files in /etc that have been modified since the test was started, but that is not retroactive for all of the branches and interop tests, so it would be nice if whatever is triggering this was turned off, or done during provisioning instead while the test was running...

            adilger Andreas Dilger added a comment - +1 on b2_15: https://testing.whamcloud.com/test_sets/6ffef452-6051-4199-90f0-cec559c8aaf6 colmstea , is there some RHEL license manager that is running which is randomly updating these files in /etc while the test is running? We can make the test more robust by having it re-check and exclude files in /etc that have been modified since the test was started, but that is not retroactive for all of the branches and interop tests, so it would be nice if whatever is triggering this was turned off, or done during provisioning instead while the test was running...
            adilger Andreas Dilger added a comment - - edited

            Unfortunately this was hit again on another test run, with the same files causing issues:
            https://testing.whamcloud.com/test_sets/abfde0a8-37fa-4801-8f76-3d75d434d243

            Files /etc/pki/consumer/cert.pem and /mnt/lustre/d1.runtests//etc/pki/consumer/cert.pem differ
            diff: /etc/pki/entitlement/3741564328631791613.pem: No such file or directory
            diff: /etc/pki/entitlement/3741564328631791613-key.pem: No such file or directory
             runtests test_1: @@@@@@ FAIL: old and new files are different: rc=22 
            

            but it hit during interop testing with a 2.12.9 client, so that eliminates any changes on master clients. It appears the clients were running RHEL, so it seems like something that RHEL is doing itself.

            We can partly work around this by excluding those files from the copy list, but that will only fix new clients and not old ones having issues with interop testing. It would likely be better to eliminate whatever is causing this process to run inside the VM to stop.

            adilger Andreas Dilger added a comment - - edited Unfortunately this was hit again on another test run, with the same files causing issues: https://testing.whamcloud.com/test_sets/abfde0a8-37fa-4801-8f76-3d75d434d243 Files /etc/pki/consumer/cert.pem and /mnt/lustre/d1.runtests//etc/pki/consumer/cert.pem differ diff: /etc/pki/entitlement/3741564328631791613.pem: No such file or directory diff: /etc/pki/entitlement/3741564328631791613-key.pem: No such file or directory runtests test_1: @@@@@@ FAIL: old and new files are different: rc=22 but it hit during interop testing with a 2.12.9 client, so that eliminates any changes on master clients. It appears the clients were running RHEL, so it seems like something that RHEL is doing itself. We can partly work around this by excluding those files from the copy list, but that will only fix new clients and not old ones having issues with interop testing. It would likely be better to eliminate whatever is causing this process to run inside the VM to stop.
            pjones Peter Jones added a comment -

            If I understand correctly, this is no longer happening

            pjones Peter Jones added a comment - If I understand correctly, this is no longer happening

            It might not have been an "rpm update" but rather something to do with enabling a "genuine RHEL" license on this newly-installed system so that it can run. If this starts failing with any frequency, we can exclude those files, and/or re-check after failure if the affected files have been modified since the start of the test, but I don't want to make the code more complex if this only ever happene the one time when we were messing with RHEL licenses...

            adilger Andreas Dilger added a comment - It might not have been an "rpm update" but rather something to do with enabling a "genuine RHEL" license on this newly-installed system so that it can run. If this starts failing with any frequency, we can exclude those files, and/or re-check after failure if the affected files have been modified since the start of the test, but I don't want to make the code more complex if this only ever happene the one time when we were messing with RHEL licenses...

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated: