[LU-14714] allow starting with only MGS config log if local llog write fails Created: 27/May/21  Updated: 12/Aug/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.0
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Andreas Dilger Assignee: WC Triage
Resolution: Unresolved Votes: 2
Labels: None

Issue Links:
Related
Rank (Obsolete): 9223372036854775807

 Description   

It should be possible to mount an MDT or OST filesystem using only the MGS config llog if the local OSD filesystem is full. Currently the mount fails with -28 = -ENOSPC when it can't write a local copy of the config llog. This makes it impossible to mount the MDT/OST with Lustre and clean up space in a consistent manner without expert knowledge of the filesystem structure to do the cleanup when mounted as type ldiskfs.

mds02 kernel: LDISKFS-fs (dm-6): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache
,nodelalloc
mds02 kernel: LustreError: 5826:0:(osd_io.c:2172:osd_ldiskfs_write_record()) dm-6: error reading offset 0 (block 0, size 8192, offs 0), credits 29/29: rc = -28
mds02 kernel: LustreError: 5826:0:(llog.c:1419:llog_backup()) MGC10.10.1.17@o2ib: failed to backup log lfs1-MDT0001: rc = -28
mds02 kernel: LustreError: 5826:0:(mgc_request.c:1883:mgc_llog_local_copy()) MGC10.1.1.17@o2ib: failed to copy remote log lfs1-MDT0001: rc = -28
mds02 kernel: LustreError: 5989:0:(osp_sync.c:1524:osp_sync_init()) lfs1-OST0001-osc-MDT0001: can't initialize llog: rc = -28
mds02 kernel: LustreError: 5989:0:(obd_config.c:559:class_setup()) setup lfs1-OST0001-osc-MDT0001 failed (-28)
mds02 kernel: LustreError: 5989:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.10.1.17@o2ib: cfg command failed: rc = -28
mds02 kernel: Lustre:    cmd=cf003 0:lfs1-OST0001-osc-MDT0001  1:lfs1-OST0001_UUID  2:10.10.1.19@o2ib  \x0a
mds02 kernel: LustreError: 15c-8: MGC10.10.1.17@o2ib: The configuration from log 'lfs1-MDT0001' failed (-28). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.


 Comments   
Comment by Peggy Gazzola [ 11/Aug/23 ]

This problem was recently hit on one of our in-house test systems for an OST that had free space available, but no free inodes.  The OST was recovered by removing some unused precreated objects.

Just wondering whether there's any pending activity on this issue.

Comment by Andreas Dilger [ 12/Aug/23 ]

Hello Peggy, long time no see...

I'm not aware of anyone on our side working on this area. Definitely it makes sense to allow this to work. We've also discussed at times to have one or more "emergency object(s)" that can be deleted if there is no space in the filesystem at startup (also important for ZFS) and then recreated once there is some space.

Generated at Sat Feb 10 03:12:08 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.