Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12712

sanity-pfl tests triggering “not SEL magic on SEL file”

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.13.0
    • Lustre 2.13.0
    • None
    • 3
    • 9223372036854775807

    Description

      Looking at the console log for MDSs when running sanity-pfl, we are seeing messages like

      [13646.271068] Lustre: 21851:0:(lod_lov.c:1358:lod_parse_striping()) lustre-MDT0001-mdtlov: not SEL magic on SEL file [0x240014454:0x77a:0x0]: bd30bd0
      [13646.274907] Lustre: 21851:0:(lod_lov.c:1358:lod_parse_striping()) lustre-MDT0001-mdtlov: not SEL magic on SEL file [0x240014454:0x77a:0x0]: bd30bd0
      

      We are seeing this for sanity-pfl tests 19e, 20b, 20c and 20d and when cleaning up after sanity-pfl.

      Examples of this message in the MDS (vm4 and vm5) console logs are at
      https://testing.whamcloud.com/test_sets/fedacd8c-c944-11e9-90ad-52540065bddc
      https://testing.whamcloud.com/test_sets/b04ecaf6-c953-11e9-97d5-52540065bddc

      Attachments

        Issue Links

          Activity

            [LU-12712] sanity-pfl tests triggering “not SEL magic on SEL file”
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36351/
            Subject: LU-12712 lod: fix warning message for non-SEL file
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 544fd725dc5773aa8fdd931e04eb51d658ee1686

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36351/ Subject: LU-12712 lod: fix warning message for non-SEL file Project: fs/lustre-release Branch: master Current Patch Set: Commit: 544fd725dc5773aa8fdd931e04eb51d658ee1686

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36351
            Subject: LU-12712 lod: fix warning message for non-SEL file
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: ecfc198003dba32d24805f2daa3fe8aa4d4706cb

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36351 Subject: LU-12712 lod: fix warning message for non-SEL file Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: ecfc198003dba32d24805f2daa3fe8aa4d4706cb

            Looking at this more closely, it appears that there is just a bug in how the check is done for the error message, since it is checking the magic of the component (which is always LOV_MAGIC_V3) instead of the magic for the file (which should be LOV_MAGIC_SEL).

            That said, it isn't totally clear what can/should be done with this error message, if it should legitimately be hit in the field, as opposed to (IMHO) being spuriously printed because of a defect in the logic. Clearly it is a discrepancy in the on-disk format, but it doesn't appear to be harmful. However, it would be better if there was an LFSCK check for this and a repair, since no administrator would be able to fix this short of deleting the file, and it will just be console noise.

            adilger Andreas Dilger added a comment - Looking at this more closely, it appears that there is just a bug in how the check is done for the error message, since it is checking the magic of the component (which is always LOV_MAGIC_V3 ) instead of the magic for the file (which should be LOV_MAGIC_SEL ). That said, it isn't totally clear what can/should be done with this error message, if it should legitimately be hit in the field, as opposed to (IMHO) being spuriously printed because of a defect in the logic. Clearly it is a discrepancy in the on-disk format, but it doesn't appear to be harmful. However, it would be better if there was an LFSCK check for this and a repair, since no administrator would be able to fix this short of deleting the file, and it will just be console noise.
            pjones Peter Jones added a comment -

            vitaly_fertman Cory reported on the LWG call that you considered this to be a lower priority issue. As such is it ok for this test to be added to the always except list until it can be fixed properly in a future release?

            pjones Peter Jones added a comment - vitaly_fertman Cory reported on the LWG call that you considered this to be a lower priority issue. As such is it ok for this test to be added to the always except list until it can be fixed properly in a future release?

            Having taken a look at this, these are all of the sanity-pfl tests which have SEL & replay in them.

            The issue seems to be this:
            https://review.whamcloud.com/#/c/35144/8/lustre/lod/lod_object.c@1551

            I'm hoping Vitaly can comment on why that line is there, because I can't see why it's there at all.

            Also, as noted there, I think once this warning is fixed, we should make it return an error or assert or something - It is an on disk format discrepancy, and the warning clearly isn't enough, because I believe it's been triggering ever since it was added.

            pfarrell Patrick Farrell (Inactive) added a comment - - edited Having taken a look at this, these are all of the sanity-pfl tests which have SEL & replay in them. The issue seems to be this: https://review.whamcloud.com/#/c/35144/8/lustre/lod/lod_object.c@1551 I'm hoping Vitaly can comment on why that line is there, because I can't see why it's there at all. Also, as noted there, I think once this warning is fixed, we should make it return an error or assert or something - It is an on disk format discrepancy, and the warning clearly isn't enough, because I believe it's been triggering ever since it was added.

            People

              adilger Andreas Dilger
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: