[LU-12712] sanity-pfl tests triggering “not SEL magic on SEL file” Created: 28/Aug/19 Updated: 15/Jan/20 Resolved: 09/Oct/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.13.0 |
| Fix Version/s: | Lustre 2.13.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | Andreas Dilger |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
Looking at the console log for MDSs when running sanity-pfl, we are seeing messages like [13646.271068] Lustre: 21851:0:(lod_lov.c:1358:lod_parse_striping()) lustre-MDT0001-mdtlov: not SEL magic on SEL file [0x240014454:0x77a:0x0]: bd30bd0 [13646.274907] Lustre: 21851:0:(lod_lov.c:1358:lod_parse_striping()) lustre-MDT0001-mdtlov: not SEL magic on SEL file [0x240014454:0x77a:0x0]: bd30bd0 We are seeing this for sanity-pfl tests 19e, 20b, 20c and 20d and when cleaning up after sanity-pfl. Examples of this message in the MDS (vm4 and vm5) console logs are at |
| Comments |
| Comment by Patrick Farrell (Inactive) [ 28/Aug/19 ] |
|
Having taken a look at this, these are all of the sanity-pfl tests which have SEL & replay in them. The issue seems to be this: I'm hoping Vitaly can comment on why that line is there, because I can't see why it's there at all. Also, as noted there, I think once this warning is fixed, we should make it return an error or assert or something - It is an on disk format discrepancy, and the warning clearly isn't enough, because I believe it's been triggering ever since it was added. |
| Comment by Peter Jones [ 21/Sep/19 ] |
|
vitaly_fertman Cory reported on the LWG call that you considered this to be a lower priority issue. As such is it ok for this test to be added to the always except list until it can be fixed properly in a future release? |
| Comment by Andreas Dilger [ 01/Oct/19 ] |
|
Looking at this more closely, it appears that there is just a bug in how the check is done for the error message, since it is checking the magic of the component (which is always LOV_MAGIC_V3) instead of the magic for the file (which should be LOV_MAGIC_SEL). That said, it isn't totally clear what can/should be done with this error message, if it should legitimately be hit in the field, as opposed to (IMHO) being spuriously printed because of a defect in the logic. Clearly it is a discrepancy in the on-disk format, but it doesn't appear to be harmful. However, it would be better if there was an LFSCK check for this and a repair, since no administrator would be able to fix this short of deleting the file, and it will just be console noise. |
| Comment by Gerrit Updater [ 01/Oct/19 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36351 |
| Comment by Gerrit Updater [ 09/Oct/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36351/ |
| Comment by Peter Jones [ 09/Oct/19 ] |
|
Landed for 2.13 |