Usually when agent sends several RESTORE requests to the same fid MDT processes only the first.
When the 2nd arrives MDT doesn't add new action because see the first in llog.
But there is a chance that the 1st RESTORE action is not written into llog when the 2nd comes to MDT:
int mdt_hsm_add_actions(struct mdt_thread_info *mti,
struct hsm_action_list *hal, __u64 *compound_id)
{
...
rc = hsm_find_compatible(mti->mti_env, mdt, hal);
...
/* test result of hsm_find_compatible()
* if request redundant or cancel of nothing
* do not record
*/
/* redundant case */
if (hai->hai_action != HSMA_CANCEL && hai->hai_cookie != 0)
continue;
...
/* take LAYOUT lock so that accessing the layout will
* be blocked until the restore is finished */
mdt_lock_reg_init(&crh->crh_lh, LCK_EX);
rc = mdt_object_lock(mti, obj, &crh->crh_lh,
...
/* record request */
rc = mdt_agent_record_add(mti->mti_env, mdt, *compound_id,
archive_id, flags, hai);
Even If MDT doesn't find compatible request in llog it tries to take LAYOUT lock. This lock is already taken by the 1st RESTORE request.
Normally 2nd RESTORE request may take LAYOUT lock only AFTER the end of 1st RESOTRE action. In such case 2nd request finds that object is already RESTORED and does nothing.
But ldlm_resource_clean called from umount brakes this order and 2nd request may add the same action to the llog:
John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28441/
Subject: LU-9266 hsm: don't add request when cdt is stopped
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: d488337c04b52392e11a784617d902e1f12c7cba
Gerrit Updater
added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28441/
Subject: LU-9266 hsm: don't add request when cdt is stopped
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: d488337c04b52392e11a784617d902e1f12c7cba
Do I correctly understand that the conditions required to fall into this situation are very unlikely and racy?
Yes at first look it is very unlikely. But on the other hand seagate's customer faced this problem.
So I guess it is not so unlikely on the systems with high hsm activity.
I mean, 2 MDT request handler threads handling 2 restore requests for the same FID/file, and a concurrent MDT umount/stop, leading to 1st restore request to have granted layout-lock and recorded a 1st restore action to llog, but this lock has been canceled as part of umount process allowing the 2nd restore request to grant it and thus be able to add a 2nd restore action to llog, finally leading to a hang during next MDT mount/start when replaying all the layout-locks for all recorded restores.
Am I right?
Correct.
And if yes, why don't you fix this very specific case in hsm_restore_cb() by finding/discarding (EALREADY ?) any duplicates ?
Because easier to don't add new requests when cdt is stopped then parsing llog later during the mount. Furthermore mdt_hsm_add_actions has the same condition(cdt->cdt_state == CDT_STOPPED) at the beginning - so ideally we shouldn't serve any requests when coordinator is stopped.
Sergey Cheremencev
added a comment - Thanks for feedback.
Do I correctly understand that the conditions required to fall into this situation are very unlikely and racy?
Yes at first look it is very unlikely. But on the other hand seagate's customer faced this problem.
So I guess it is not so unlikely on the systems with high hsm activity.
I mean, 2 MDT request handler threads handling 2 restore requests for the same FID/file, and a concurrent MDT umount/stop, leading to 1st restore request to have granted layout-lock and recorded a 1st restore action to llog, but this lock has been canceled as part of umount process allowing the 2nd restore request to grant it and thus be able to add a 2nd restore action to llog, finally leading to a hang during next MDT mount/start when replaying all the layout-locks for all recorded restores.
Am I right?
Correct.
And if yes, why don't you fix this very specific case in hsm_restore_cb() by finding/discarding (EALREADY ?) any duplicates ?
Because easier to don't add new requests when cdt is stopped then parsing llog later during the mount. Furthermore mdt_hsm_add_actions has the same condition(cdt->cdt_state == CDT_STOPPED) at the beginning - so ideally we shouldn't serve any requests when coordinator is stopped.
Sergei, Hongchao,
Do I correctly understand that the conditions required to fall into this situation are very unlikely and racy?
I mean, 2 MDT request handler threads handling 2 restore requests for the same FID/file, and a concurrent MDT umount/stop, leading to 1st restore request to have granted layout-lock and recorded a 1st restore action to llog, but this lock has been canceled as part of umount process allowing the 2nd restore request to grant it and thus be able to add a 2nd restore action to llog, finally leading to a hang during next MDT mount/start when replaying all the layout-locks for all recorded restores.
Am I right?
And if yes, why don't you fix this very specific case in hsm_restore_cb() by finding/discarding (EALREADY ?) any duplicates ? We may want to add a hashing mechanism for the cdt_restore_handle structs and not require to browse crh_list, in case we will need to handle huge number of restores.
Bruno Faccini (Inactive)
added a comment - Sergei, Hongchao,
Do I correctly understand that the conditions required to fall into this situation are very unlikely and racy?
I mean, 2 MDT request handler threads handling 2 restore requests for the same FID/file, and a concurrent MDT umount/stop, leading to 1st restore request to have granted layout-lock and recorded a 1st restore action to llog, but this lock has been canceled as part of umount process allowing the 2nd restore request to grant it and thus be able to add a 2nd restore action to llog, finally leading to a hang during next MDT mount/start when replaying all the layout-locks for all recorded restores.
Am I right?
And if yes, why don't you fix this very specific case in hsm_restore_cb() by finding/discarding (EALREADY ?) any duplicates ? We may want to add a hashing mechanism for the cdt_restore_handle structs and not require to browse crh_list, in case we will need to handle huge number of restores.
[{"id":-1,"name":"My open issues","jql":"assignee = currentUser() AND resolution = Unresolved order by updated DESC","isSystem":true,"sharePermissions":[],"requiresLogin":true},{"id":-2,"name":"Reported by me","jql":"reporter = currentUser() order by created DESC","isSystem":true,"sharePermissions":[],"requiresLogin":true},{"id":-4,"name":"All issues","jql":"order by created DESC","isSystem":true,"sharePermissions":[],"requiresLogin":false},{"id":-5,"name":"Open issues","jql":"resolution = Unresolved order by priority DESC,updated DESC","isSystem":true,"sharePermissions":[],"requiresLogin":false},{"id":-9,"name":"Done issues","jql":"statusCategory = Done order by updated DESC","isSystem":true,"sharePermissions":[],"requiresLogin":false},{"id":-3,"name":"Viewed recently","jql":"issuekey in issueHistory() order by lastViewed DESC","isSystem":true,"sharePermissions":[],"requiresLogin":false},{"id":-6,"name":"Created recently","jql":"created >= -1w order by created DESC","isSystem":true,"sharePermissions":[],"requiresLogin":false},{"id":-7,"name":"Resolved recently","jql":"resolutiondate >= -1w order by updated DESC","isSystem":true,"sharePermissions":[],"requiresLogin":false},{"id":-8,"name":"Updated recently","jql":"updated >= -1w order by updated DESC","isSystem":true,"sharePermissions":[],"requiresLogin":false}]
John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28441/
Subject:
LU-9266hsm: don't add request when cdt is stoppedProject: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: d488337c04b52392e11a784617d902e1f12c7cba