Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.10.5
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for sarah <sarah@whamcloud.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/7c77283e-9882-11e8-b0aa-52540065bddc
test_0b failed with the following error:
trevis-9vm10 crashed during replay-ost-single test_0b
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
replay-ost-single test_0b - trevis-9vm10 crashed during replay-ost-single test_0b
Attachments
Issue Links
- duplicates
-
LU-10573 mdt_destroy_export()) ASSERTION( list_empty(&exp->u.eu_mdt_data.med_open_head) ) failed
-
- Open
-
-
LU-9806 tgt_client_free()) ASSERTION( lut && lut->lut_client_bitmap ) failed
-
- Resolved
-
-
LU-10806 Hard crash when mounting DNE MDT
-
- Resolved
-
- mentioned in
-
Page No Confluence page found with the given URL.
There is a second issue happening here. The LFSCK check at MDT mount is detecting that the MDT was restored from backup and then running a full scrub on the filesystem.
This causes the MDT is being mounted read-only because an unused inode was accessed:
It isn't clear if there is a race condition between OI Scrub running and accessing a file that was deleted, or something is accessing a stale inode via the OI and unlinking it while OI Scrub is processing it? This might happen because LFSCK is running without full locking on the files to avoid blocking other MDS threads that are accessing the filesystem.
If this is happening on a regular basis, it would potentially be useful to change this error message to include the parent name/inode so that we can see what kind of file it is (Lustre internal or part of the namespace). Failing that, running "debugfs -c -R 'ncheck <inode_number>' /dev/mdtdev" on the MDT filesystem after the failure would report the pathname, so long as we have access to it before it is reformatted (maybe as part of the test script).