Details
-
Improvement
-
Resolution: Not a Bug
-
Major
-
None
-
Lustre 2.10.0
-
None
-
9223372036854775807
Description
I am splitting this from LU-684 that would concentrate on the test-framework side of things.
Currently osd interface declares dt_ro (osd_ro) operation to mark a device read-only for failover purposes.
In zfs case it's achieved internally, but for ldiskfs we are carrying a special patch to do it.
Now, its desirable to do away with the patch.
In the past James Simmons came up with the approach of using make_request fault location that was vetted down because it also makes reads fail: https://review.whamcloud.com/#/c/6651/3
Since then apparently no better method appeared to make a similar fault only to fail writes.
So I wonder if we can play some smart tricks to overcome this?
Stuff I have in mind are:
1. We can probably forcefully remount everything readonly (also can patch ldiskfs not to issue any writes). In order to prevent any journal updates from happening, we can also abort journal and I think that causes all writes there to stop as well.
- This still might cause some cached writes to go to disk flushed by pdflush, we probably can preemptively truncate those?
2. We might attempt to play some games on the actual underlying block device to render it readonly, like try to hijack the ->make_request_fn() function pointer, but I guess that's kind of shady.
Any other good ideas I am missing?
The dt_ro() method is only called from ioctl(OBD_IOC_SET_READONLY), which is itself only called from the lctl readonly command. The osd_ro() method for osd-ldiskfs only returns -EOPNOTSUPP if HAVE_DEV_SET_RDONLY is not set.
This shouldn't be used for anything other than testing - normal unmount should be used for soft failover, and STONITH should be used for hard failover.