[LU-4594] evict client by nid not implemented in MDT Created: 06/Feb/14  Updated: 24/Nov/14  Resolved: 20/Nov/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Sergey Cheremencev Assignee: Cliff White (Inactive)
Resolution: Won't Fix Votes: 0
Labels: patch

Severity: 3
Rank (Obsolete): 12547

 Description   

In lprocfs_mdt_wr_evict_client(), it says evict client by nid not implemented:

 
static int lprocfs_mdt_wr_evict_client(struct file *file, const char *buffer,
                                       unsigned long count, void *data)
{
...
        if (strncmp(tmpbuf, "nid:", 4) != 0) {
                count = lprocfs_wr_evict_client(file, buffer, count, data);
                goto out;
        }

        CERROR("NOT implement evict client by nid %s\n", tmpbuf);

...
}

Is there a reason why evict_client by nid still denied ?

        if (strncmp(tmpbuf, "nid:", 4) != 0) {

As i see above case came from times when evict_by_nid was not supported.



 Comments   
Comment by Oleg Drokin [ 07/Feb/14 ]

I think the functionality still lives in mds code and has not been migrated?

Comment by Sergey Cheremencev [ 11/Feb/14 ]

Hello Oleg

As i see that functionality is available in following places in proc:

[root@192 tests]# find /proc/fs/lustre/ -name evict_client
/proc/fs/lustre/obdfilter/lustre-OST0001/evict_client
/proc/fs/lustre/obdfilter/lustre-OST0000/evict_client
/proc/fs/lustre/mdt/lustre-MDT0000/evict_client
/proc/fs/lustre/mgs/MGS/evict_client

I've added a patch with simple test. Hope it will be useful http://review.whamcloud.com/#/c/9202/

Comment by Oleg Drokin [ 13/Feb/14 ]

The thing with evict client by nid on MDT is such that it must be sending updates to all the servers, not just locally.

That was the functionality in 1.x and it needs to be updated to happen in 2.x as well, otherwise we have change in expected behavior. That's why it was disabled on MDT I presume.

Comment by Sergey Cheremencev [ 14/Feb/14 ]

evict client by uuid does the same things as evict client by nid:

...
class_fail_export(doomed_exp);      
class_export_put(doomed_exp);

As I understand evict client by uuid also doesn't send updates to all servers.
Possible the both functions(evict by nid and by uuid) should be denied or allowed ?

Thanks

Comment by Oleg Drokin [ 19/Feb/14 ]

You need t ocheck original b1_8 code:

static int lprocfs_mds_wr_evict_client(struct file *file, const char *buffer,
                                       unsigned long count, void *data)
{
        struct obd_device *obd = data;
        struct mds_obd *mds = &obd->u.mds;
        char tmpbuf[sizeof(struct obd_uuid)];
        struct ptlrpc_request_set *set;
        int rc;

        sscanf(buffer, "%40s", tmpbuf);

        if (strncmp(tmpbuf, "nid:", 4) != 0)
                return lprocfs_wr_evict_client(file, buffer, count, data);

        set = ptlrpc_prep_set();
        if (!set)
                return -ENOMEM;

        if (obd->u.mds.mds_evict_ost_nids) {
                rc = obd_set_info_async(mds->mds_lov_exp,sizeof(KEY_EVICT_BY_NID),
                                        KEY_EVICT_BY_NID, strlen(tmpbuf + 4) + 1,
                                        tmpbuf + 4, set);
                if (rc)
                        CERROR("Failed to evict nid %s from OSTs: rc %d\n",
                               tmpbuf + 4, rc);
                ptlrpc_check_set(set);
        }

        /* See the comments in function lprocfs_wr_evict_client()
         * in ptlrpc/lproc_ptlrpc.c for details. - jay */
        class_incref(obd);
        LPROCFS_EXIT();

        obd_export_evict_by_nid(obd, tmpbuf+4);


        rc = ptlrpc_set_wait(set);
        if (rc)
                CERROR("Failed to evict nid %s from OSTs: rc %d\n", tmpbuf + 4,
                       rc);

        LPROCFS_ENTRY();
        class_decref(obd);

        ptlrpc_set_destroy(set);
        return count;
}

This is the functionality that needs to be restored.
It might be better to move this to mgs as a more "Central" and logical location too. We'll need to update the documentation and possibly add some compatibility hooks for older code.

Comment by Oleg Drokin [ 19/Feb/14 ]

btw, if you plan to add this, make sure to properly use copy_from_user to avid the bugs present in this code. Thanks.

Comment by Oleg Drokin [ 19/Feb/14 ]

oh, to answer your other question, there's different uuid as seen by every server per client I think, that's why evict by uuid is a totally local operation.

Comment by Cliff White (Inactive) [ 11/Jul/14 ]

Sergey, will you be updating this patch to address Oleg's concerns?

Comment by Sergey Cheremencev [ 15/Jul/14 ]

Yes, i will update it.

Comment by Sergey Cheremencev [ 25/Jul/14 ]

Hi, Oleg.
Thanks for your answers.

I tried to restore functionality from b1_8 that you mentioned above and faced several problems.
I put this code into mgs/lproc_mgs.c, but mgs self_export doesn't have set_info_async operation

obd->obd_self_export->exp_obd->obd_type->typ_dt_ops->o_set_info_async == NULL 

mdt export has this operation but you suggested to use mgs аs a more "Central" location.

Did I miss something ?

Comment by Cliff White (Inactive) [ 09/Oct/14 ]

Sergey - the patch needs to be updated, is it possible for you to do this?

Comment by Sergey Cheremencev [ 09/Oct/14 ]

I need answers from Oleg to understand how patch should be updated.
In any cases current patch is not suitable. I think it could be abandoned.

Comment by Cliff White (Inactive) [ 20/Nov/14 ]

We have abandoned the patch and are closing this issue. Please reopen if you wish.

Comment by Oleg Drokin [ 24/Nov/14 ]

basically what is needed is this:

proc handler that would get the nid list.
then will send set_info async RPCs to MDS and OSTs to relay this information and also communicate this internally too.

Whenever you choose to implement MGS side of thins as a set_info handler (with a corresponding implementation) or by some other means is less important, though set_info handler might be better just for uniformity reasons.

Generated at Sat Feb 10 01:44:09 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.