Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8880

sanity test_1: fails w/DNE @@@ faked source MDT

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.10.0
    • Lustre 2.9.0
    • None
    • 3
    • 9223372036854775807

    Description

      The `/usr/bin/lfs setdirstripe -i1 -c2 /mnt/lustre/d1.sanity` command issued by sanity.sh test_1 hangs when mdt2mdt connections are secured with the krb5p security flavor. This test has failed with SSK flavors skpi and ski; other flavors have not yet been tested.

      == MDS2 logs ==

      [  341.920887] Lustre: DEBUG MARKER: == sanity test 1: mkdir; remkdir; rmdir ============================================================== 09:13:33 (1480515213)
      [  341.954824] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) @@@ faked source MDT  req@ffff8800d368ed00 x1552432553273552/t0(0) o105-><?>@<?>:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/0/ffffffff rc 0/-1
      [  343.088324] Lustre: 2602:0:(sec_gss.c:2063:gss_svc_handle_init()) create svc ctx ffff8802051cc440: user from 172.18.0.18@tcp authenticated as mds
      [  343.088328] Lustre: 2602:0:(sec_gss.c:2063:gss_svc_handle_init()) Skipped 9 previous similar messages
      [  343.088346] Lustre: 2602:0:(sec_gss.c:371:gss_cli_ctx_uptodate()) server installed reverse ctx ffff8802051a6900 idx 0x35ebf0324b78e726, expiry 1480599087(+83873s)
      [  343.088348] Lustre: 2602:0:(sec_gss.c:371:gss_cli_ctx_uptodate()) Skipped 9 previous similar messages
      [  352.954000] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) @@@ faked source MDT  req@ffff880203a13300 x1552432553273552/t0(0) o105-><?>@<?>:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1
      [  363.953599] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) @@@ faked source MDT  req@ffff880204ecc600 x1552432553273552/t0(0) o105-><?>@<?>:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1
      [  374.953137] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) @@@ faked source MDT  req@ffff880205722850 x1552432553273552/t0(0) o105-><?>@<?>:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1
      [  382.074673] LNet: Service thread pid 3152 was inactive for 40.12s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [  382.074679] Pid: 3152, comm: mdt00_003
      [  382.074681] 
      Call Trace:
      [  382.074692]  [<ffffffff8163bd69>] schedule+0x29/0x70
      [  382.074697]  [<ffffffff816399c5>] schedule_timeout+0x175/0x2d0
      [  382.074704]  [<ffffffff8108bff0>] ? process_timeout+0x0/0x10
      [  382.074723]  [<ffffffffa0642ef1>] ? cfs_block_sigsinv+0x71/0xa0 [libcfs]
      [  382.074777]  [<ffffffffa0970fa0>] ? ldlm_expired_completion_wait+0x0/0x250 [ptlrpc]
      [  382.074820]  [<ffffffffa0971791>] ldlm_completion_ast+0x5a1/0x910 [ptlrpc]
      [  382.074860]  [<ffffffffa095e730>] ? ldlm_resource_add_lock+0x70/0x1b0 [ptlrpc]
      [  382.074867]  [<ffffffff810b8940>] ? default_wake_function+0x0/0x20
      [  382.074910]  [<ffffffffa0973448>] ldlm_cli_enqueue_fini+0x938/0xdb0 [ptlrpc]
      [  382.074952]  [<ffffffffa09763b2>] ldlm_cli_enqueue+0x6c2/0x810 [ptlrpc]
      [  382.074994]  [<ffffffffa09711f0>] ? ldlm_completion_ast+0x0/0x910 [ptlrpc]
      [  382.075012]  [<ffffffffa0cfda80>] ? mdt_remote_blocking_ast+0x0/0x590 [mdt]
      [  382.075032]  [<ffffffffa0eafc82>] osp_md_object_lock+0x172/0x2e0 [osp]
      [  382.075054]  [<ffffffffa0e56c51>] ? mdd_la_get+0x81/0x1a0 [mdd]
      [  382.075077]  [<ffffffffa0decb20>] lod_object_lock+0xf0/0x950 [lod]
      [  382.075093]  [<ffffffffa06524be>] ? cfs_hash_bd_add_locked+0x1e/0x80 [libcfs]
      [  382.075105]  [<ffffffffa0652869>] ? cfs_hash_spin_unlock+0x9/0x10 [libcfs]
      [  382.075171]  [<ffffffffa077fdb7>] ? lu_object_find_try+0x207/0x2b0 [obdclass]
      [  382.075177]  [<ffffffffa0e5410b>] mdd_object_lock+0x3b/0xd0 [mdd]
      [  382.075186]  [<ffffffffa0d0ca72>] mdt_remote_object_lock+0x1e2/0x710 [mdt]
      [  382.075202]  [<ffffffffa077ff0c>] ? lu_object_find_at+0xac/0xe0 [obdclass]
      [  382.075228]  [<ffffffffa09a3bc7>] ? lustre_msg_buf+0x17/0x60 [ptlrpc]
      [  382.075237]  [<ffffffffa0d0dfc6>] mdt_object_lock_internal+0x166/0x300 [mdt]
      [  382.075244]  [<ffffffffa0d0e174>] mdt_object_lock+0x14/0x20 [mdt]
      [  382.075254]  [<ffffffffa0d246a1>] mdt_md_create+0x421/0xba0 [mdt]
      [  382.075268]  [<ffffffffa075d838>] ? lprocfs_counter_add+0x128/0x1a0 [obdclass]
      [  382.075285]  [<ffffffffa0793ef1>] ? lprocfs_job_stats_log+0xd1/0x630 [obdclass]
      [  382.075294]  [<ffffffffa0d24f8b>] mdt_reint_create+0x16b/0x350 [mdt]
      [  382.075302]  [<ffffffffa0d26490>] mdt_reint_rec+0x80/0x210 [mdt]
      [  382.075309]  [<ffffffffa0d08db2>] mdt_reint_internal+0x5b2/0x9b0 [mdt]
      [  382.075317]  [<ffffffffa0d141e7>] mdt_reint+0x67/0x140 [mdt]
      [  382.075351]  [<ffffffffa0a05225>] tgt_request_handle+0x915/0x1320 [ptlrpc]
      [  382.075378]  [<ffffffffa09b11ab>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc]
      [  382.075404]  [<ffffffffa09aed68>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
      [  382.075407]  [<ffffffff810b8952>] ? default_wake_function+0x12/0x20
      [  382.075409]  [<ffffffff810af0b8>] ? __wake_up_common+0x58/0x90
      [  382.075434]  [<ffffffffa09b5260>] ptlrpc_main+0xaa0/0x1de0 [ptlrpc]
      [  382.075458]  [<ffffffffa09b47c0>] ? ptlrpc_main+0x0/0x1de0 [ptlrpc]
      [  382.075461]  [<ffffffff810a5b8f>] kthread+0xcf/0xe0
      [  382.075463]  [<ffffffff810a5ac0>] ? kthread+0x0/0xe0
      [  382.075466]  [<ffffffff81646c98>] ret_from_fork+0x58/0x90
      [  382.075469]  [<ffffffff810a5ac0>] ? kthread+0x0/0xe0
      
      [  382.075472] LustreError: dumping log to /tmp/lustre-log.1480515253.3152
      [  385.952974] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) @@@ faked source MDT  req@ffff880205375800 x1552432553273552/t0(0) o105-><?>@<?>:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1
      [  396.952564] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) @@@ faked source MDT  req@ffff880204d3db00 x1552432553273552/t0(0) o105-><?>@<?>:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1
      [  418.952006] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) @@@ faked source MDT  req@ffff8800d72c8050 x1552432553273552/t0(0) o105-><?>@<?>:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1
      [  418.952150] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) Skipped 1 previous similar message
      [  451.951069] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) @@@ faked source MDT  req@ffff8800d04f0f00 x1552432553273552/t0(0) o105-><?>@<?>:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1
      [  451.951177] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) Skipped 2 previous similar messages
      [  517.949644] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) @@@ faked source MDT  req@ffff8800d0514600 x1552432553273552/t0(0) o105-><?>@<?>:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1
      [  517.949761] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) Skipped 5 previous similar messages
      [  641.950918] Lustre: lustre-MDT0000-osp-MDT0001: Connection to lustre-MDT0000 (at 172.18.0.18@tcp) was lost; in progress operations using this service will wait for recovery to complete
      [  641.950949] LustreError: 3152:0:(ldlm_request.c:148:ldlm_expired_completion_wait()) ### lock timed out (enqueued at 1480515213, 300s ago), entering recovery for lustre-MDT0000_UUID@172.18.0.18@tcp ns: lustre-MDT0000-osp-MDT0001 lock: ffff880211f20e00/0xd408ec1632ce7580 lrc: 4/0,1 mode: --/EX res: [0x200000007:0x1:0x0].0x0 bits 0x2 rrc: 1 type: IBT flags: 0x1000001000000 nid: local remote: 0xf8f327625a9c3943 expref: -99 pid: 3152 timeout: 0 lvb_type: 0
      [  641.951489] Lustre: lustre-MDT0000-osp-MDT0001: Connection restored to 172.18.0.18@tcp (at 172.18.0.18@tcp)
      [  641.951493] Lustre: Skipped 3 previous similar messages
      [  652.950923] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) @@@ faked source MDT  req@ffff8802051a8900 x1552432553273552/t0(0) o105-><?>@<?>:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1
      [  652.951025] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) Skipped 12 previous similar messages
      [  916.946544] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) @@@ faked source MDT  req@ffff8800d04d3c00 x1552432553273552/t0(0) o105-><?>@<?>:0/0 lens 336/0 e 0 to 0 dl 0 ref 1 fl New:/2/ffffffff rc 0/-1
      [  916.946651] LustreError: 2582:0:(sec.c:2010:sptlrpc_svc_check_from()) Skipped 23 previous similar messages
      [  936.436828] Lustre: 2596:0:(service.c:1331:ptlrpc_at_send_early_reply()) @@@ Couldn't add any time (5/5), not sending early reply
        req@ffff8800d368ea00 x1552432529149200/t0(0) o36->c91cef1e-2de0-ca74-2459-2c580f0f0023@172.18.0.22@tcp:-1/-1 lens 672/568 e 24 to 0 dl 1480515813 ref 2 fl Interpret:/0/0 rc 0/0
      [  942.942067] Lustre: lustre-MDT0001: Client c91cef1e-2de0-ca74-2459-2c580f0f0023 (at 172.18.0.22@tcp) reconnecting
      [  942.942090] Lustre: lustre-MDT0001: Connection restored to c91cef1e-2de0-ca74-2459-2c580f0f0023 (at 172.18.0.22@tcp)
      

      The test was executed with:

      export SEC=krb5p; ./auster -f multinode -rsv -d /import/results/krb5p sanity --only 1
      

      Attachments

        Issue Links

          Activity

            [LU-8880] sanity test_1: fails w/DNE @@@ faked source MDT
            pjones Peter Jones added a comment -

            Landed for 2.10

            pjones Peter Jones added a comment - Landed for 2.10

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24236/
            Subject: LU-8880 gss: fix GSS support for DNE
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 8b7a8056ef50c1a27e9ade1f0523beef3a5fe393

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24236/ Subject: LU-8880 gss: fix GSS support for DNE Project: fs/lustre-release Branch: master Current Patch Set: Commit: 8b7a8056ef50c1a27e9ade1f0523beef3a5fe393

            Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/24236
            Subject: LU-8880 gss: fix GSS support for DNE
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 4728fb8df9bd85db7df65a648f000454b1a5b3d5

            gerrit Gerrit Updater added a comment - Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/24236 Subject: LU-8880 gss: fix GSS support for DNE Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 4728fb8df9bd85db7df65a648f000454b1a5b3d5

            yes, looks good to me. Thanks.

            di.wang Di Wang (Inactive) added a comment - yes, looks good to me. Thanks.

            Hi,

            Your proposal to use cl_sp_to field of client_obd is interesting. I have tested the following code for import_to_gss_svc(), it works:

            static inline
            __u32 import_to_gss_svc(struct obd_import *imp)
            {
            	int cl_sp_to = 0xFF;
            
            	if (imp->imp_obd)
            		cl_sp_to = imp->imp_obd->u.cli.cl_sp_to;
            
            	switch (cl_sp_to) {
            	case LUSTRE_SP_MDT:
            		return LUSTRE_GSS_TGT_MDS;
            	case LUSTRE_SP_OST:
            		return LUSTRE_GSS_TGT_OSS;
            	case LUSTRE_SP_MGC:
            	case LUSTRE_SP_MGS:
            		return LUSTRE_GSS_TGT_MGS;
            	case LUSTRE_SP_CLI:
            	case LUSTRE_SP_ANY:
            	default:
            		return 0;
            	}
            }
            

            But the sensitive question was more about the real meaning of gsc_usr_{mdc,oss,root} entries. After some investigation, I think we could also use cl_sp_to in gas_svc_upcall_install_rvs_ctx(). I tested the following modification, it avoids setdirstripe hanging:

            @@ -833,12 +836,18 @@ int gss_svc_upcall_install_rvs_ctx(struct obd_import *imp,
                     }
                     rsci.h.expiry_time = (time_t) ctx_expiry;
             
            -        if (strcmp(imp->imp_obd->obd_type->typ_name, LUSTRE_MDC_NAME) == 0)
            -                rsci.ctx.gsc_usr_mds = 1;
            -        else if (strcmp(imp->imp_obd->obd_type->typ_name, LUSTRE_OSC_NAME) == 0)
            -                rsci.ctx.gsc_usr_oss = 1;
            -        else
            -                rsci.ctx.gsc_usr_root = 1;
            +	switch (imp->imp_obd->u.cli.cl_sp_to) {
            +	case LUSTRE_SP_MDT:
            +		rsci.ctx.gsc_usr_mds = 1;
            +		break;
            +	case LUSTRE_SP_OST:
            +		rsci.ctx.gsc_usr_oss = 1;
            +		break;
            +	case LUSTRE_SP_CLI:
            +		rsci.ctx.gsc_usr_root = 1;
            +	default:
            +		break;
            +	}
             
                     rscp = rsc_update(&rsci, rscp);
                     if (rscp == NULL)
            

            What do you think?
            Thanks,
            Sebastien.

            sbuisson Sebastien Buisson (Inactive) added a comment - - edited Hi, Your proposal to use cl_sp_to field of client_obd is interesting. I have tested the following code for import_to_gss_svc(), it works: static inline __u32 import_to_gss_svc(struct obd_import *imp) { int cl_sp_to = 0xFF; if (imp->imp_obd) cl_sp_to = imp->imp_obd->u.cli.cl_sp_to; switch (cl_sp_to) { case LUSTRE_SP_MDT: return LUSTRE_GSS_TGT_MDS; case LUSTRE_SP_OST: return LUSTRE_GSS_TGT_OSS; case LUSTRE_SP_MGC: case LUSTRE_SP_MGS: return LUSTRE_GSS_TGT_MGS; case LUSTRE_SP_CLI: case LUSTRE_SP_ANY: default : return 0; } } But the sensitive question was more about the real meaning of gsc_usr_{mdc,oss,root} entries. After some investigation, I think we could also use cl_sp_to in gas_svc_upcall_install_rvs_ctx(). I tested the following modification, it avoids setdirstripe hanging: @@ -833,12 +836,18 @@ int gss_svc_upcall_install_rvs_ctx(struct obd_import *imp, } rsci.h.expiry_time = (time_t) ctx_expiry; - if (strcmp(imp->imp_obd->obd_type->typ_name, LUSTRE_MDC_NAME) == 0) - rsci.ctx.gsc_usr_mds = 1; - else if (strcmp(imp->imp_obd->obd_type->typ_name, LUSTRE_OSC_NAME) == 0) - rsci.ctx.gsc_usr_oss = 1; - else - rsci.ctx.gsc_usr_root = 1; + switch (imp->imp_obd->u.cli.cl_sp_to) { + case LUSTRE_SP_MDT: + rsci.ctx.gsc_usr_mds = 1; + break ; + case LUSTRE_SP_OST: + rsci.ctx.gsc_usr_oss = 1; + break ; + case LUSTRE_SP_CLI: + rsci.ctx.gsc_usr_root = 1; + default : + break ; + } rscp = rsc_update(&rsci, rscp); if (rscp == NULL) What do you think? Thanks, Sebastien.

            I saw there is already cl_sp_to in client_obd. Any reason you can not use that?

            If it has to get the type from the obd name, then you probably should analyze a bit more, in case "OST" is included in some crazy fsname. Please check those examples in obd_mount.c, like server_name_is_ost() ...

            di.wang Di Wang (Inactive) added a comment - I saw there is already cl_sp_to in client_obd. Any reason you can not use that? If it has to get the type from the obd name, then you probably should analyze a bit more, in case "OST" is included in some crazy fsname. Please check those examples in obd_mount.c, like server_name_is_ost() ...

            Di or Alex, can you please comment on Sebastien's proposal.

            adilger Andreas Dilger added a comment - Di or Alex, can you please comment on Sebastien's proposal.

            Hi,

            Coincidentally I also worked on this issue on Monday. I ended up with a patch similar to what Jeremy proposes for gss_svc_upcall_install_rvs_ctx(), but I do not feel well about it.

            To me the semantic of gsc_usr_mds, gsc_usr_oss and gsc_usr_root is not very clear, but if we look at how it is used in gas_svc_accept(), maybe we can infer this:
            gsc_usr_mds: device is authenticated as part of MDT
            gsc_usr_oss: device is authenticated as part of OST
            gsc_usr_root: device is authenticated as part of client, or MGT?

            It would tend to be confirmed by the code in sptlrpc_svc_check_from().
            In this case, it means we should not set gsc_usr_mds to 1 for an MDC obd, neither gsc_usr_oss to 1 for an OSC obd...

            On a related note, in order to have Kerberos authentication work with DNE, it is necessary to modify import_to_gss_svc() so that it makes a difference between OSP for MDT-to-MDT and OSP for MDT-to-OST.

            diff --git a/lustre/ptlrpc/gss/gss_internal.h b/lustre/ptlrpc/gss/gss_internal.h
            index 690704e..fbfb7f0 100644
            --- a/lustre/ptlrpc/gss/gss_internal.h                                                                                                 +++ b/lustre/ptlrpc/gss/gss_internal.h                                                                                                 @@ -126,9 +126,14 @@ __u32 import_to_gss_svc(struct obd_import 
                    if (!strcmp(name, LUSTRE_MDC_NAME) ||
                        !strcmp(name, LUSTRE_LWP_NAME))
                            return LUSTRE_GSS_TGT_MDS;
            -       if (!strcmp(name, LUSTRE_OSC_NAME) ||
            -           !strcmp(name, LUSTRE_OSP_NAME))
            +       if (!strcmp(name, LUSTRE_OSC_NAME))
                            return LUSTRE_GSS_TGT_OSS;
            +       if (!strcmp(name, LUSTRE_OSP_NAME)) {
            +               if (strstr(imp->imp_obd->obd_name, "OST") == NULL)
            +                       return LUSTRE_GSS_TGT_MDS;
            +               else
            +                       return LUSTRE_GSS_TGT_OSS;
            +       }
                    return 0;
             }
            
            

            Thanks,
            Sebastien.

            sbuisson Sebastien Buisson (Inactive) added a comment - - edited Hi, Coincidentally I also worked on this issue on Monday. I ended up with a patch similar to what Jeremy proposes for gss_svc_upcall_install_rvs_ctx(), but I do not feel well about it. To me the semantic of gsc_usr_mds, gsc_usr_oss and gsc_usr_root is not very clear, but if we look at how it is used in gas_svc_accept(), maybe we can infer this: gsc_usr_mds: device is authenticated as part of MDT gsc_usr_oss: device is authenticated as part of OST gsc_usr_root: device is authenticated as part of client, or MGT? It would tend to be confirmed by the code in sptlrpc_svc_check_from(). In this case, it means we should not set gsc_usr_mds to 1 for an MDC obd, neither gsc_usr_oss to 1 for an OSC obd... On a related note, in order to have Kerberos authentication work with DNE, it is necessary to modify import_to_gss_svc() so that it makes a difference between OSP for MDT-to-MDT and OSP for MDT-to-OST. diff --git a/lustre/ptlrpc/gss/gss_internal.h b/lustre/ptlrpc/gss/gss_internal.h index 690704e..fbfb7f0 100644 --- a/lustre/ptlrpc/gss/gss_internal.h +++ b/lustre/ptlrpc/gss/gss_internal.h @@ -126,9 +126,14 @@ __u32 import_to_gss_svc(struct obd_import if (!strcmp(name, LUSTRE_MDC_NAME) || !strcmp(name, LUSTRE_LWP_NAME)) return LUSTRE_GSS_TGT_MDS; - if (!strcmp(name, LUSTRE_OSC_NAME) || - !strcmp(name, LUSTRE_OSP_NAME)) + if (!strcmp(name, LUSTRE_OSC_NAME)) return LUSTRE_GSS_TGT_OSS; + if (!strcmp(name, LUSTRE_OSP_NAME)) { + if (strstr(imp->imp_obd->obd_name, "OST" ) == NULL) + return LUSTRE_GSS_TGT_MDS; + else + return LUSTRE_GSS_TGT_OSS; + } return 0; } Thanks, Sebastien.

            This appears to be because of targets of type osp for MDT to MDT communication (eg. lustre-MDT0001-osp-MDT0000). I don't really know anything about DNE but I always though MDT to MDT would be using lwp. Because this is of obd_type osp the following code is not working correctly in gss_svc_upcall_install_rvs_ctx():

             842         if (strcmp(imp->imp_obd->obd_type->typ_name, LUSTRE_MDC_NAME) == 0)
             843                 rsci.ctx.gsc_usr_mds = 1;
             844         else if (strcmp(imp->imp_obd->obd_type->typ_name, LUSTRE_OSC_NAME) == 0)
             845                 rsci.ctx.gsc_usr_oss = 1;
             846         else
             847                 rsci.ctx.gsc_usr_root = 1;
            
            

            The gsc_usr_mds value sets rq_auth_user_mds in gss_svc_accept() which is checked during the unwrap with sptlrpc_svc_check_from(). If the gsc_usr_mds was set but the source claims to be from an MDT you get the faked source messages.

            Something like the following patch fixes it but I'm not sure that's the right thing because I just don't know enough about DNE. Andreas can you provide some input here before I post a patch.

            --- a/lustre/ptlrpc/gss/gss_svc_upcall.c
            +++ b/lustre/ptlrpc/gss/gss_svc_upcall.c
            @@ -839,9 +839,12 @@ int gss_svc_upcall_install_rvs_ctx(struct obd_import *imp,
                     }
                     rsci.h.expiry_time = (time_t) ctx_expiry;
             
            -        if (strcmp(imp->imp_obd->obd_type->typ_name, LUSTRE_MDC_NAME) == 0)
            +        if (strcmp(imp->imp_obd->obd_type->typ_name, LUSTRE_MDC_NAME) == 0 ||
            +            strcmp(imp->imp_obd->obd_type->typ_name, LUSTRE_LWP_NAME) == 0 ||
            +            !strstr(imp->imp_obd->obd_name, "-osp-MDT"))
                             rsci.ctx.gsc_usr_mds = 1;
            -        else if (strcmp(imp->imp_obd->obd_type->typ_name, LUSTRE_OSC_NAME) == 0)
            +        else if (strcmp(imp->imp_obd->obd_type->typ_name, LUSTRE_OSC_NAME) == 0 ||
            +                strcmp(imp->imp_obd->obd_type->typ_name, LUSTRE_OSP_NAME) == 0)
                             rsci.ctx.gsc_usr_oss = 1;
                     else
                             rsci.ctx.gsc_usr_root = 1;
            
            
            jfilizetti Jeremy Filizetti added a comment - This appears to be because of targets of type osp for MDT to MDT communication (eg. lustre-MDT0001-osp-MDT0000). I don't really know anything about DNE but I always though MDT to MDT would be using lwp. Because this is of obd_type osp the following code is not working correctly in gss_svc_upcall_install_rvs_ctx(): 842 if (strcmp(imp->imp_obd->obd_type->typ_name, LUSTRE_MDC_NAME) == 0) 843 rsci.ctx.gsc_usr_mds = 1; 844 else if (strcmp(imp->imp_obd->obd_type->typ_name, LUSTRE_OSC_NAME) == 0) 845 rsci.ctx.gsc_usr_oss = 1; 846 else 847 rsci.ctx.gsc_usr_root = 1; The gsc_usr_mds value sets rq_auth_user_mds in gss_svc_accept() which is checked during the unwrap with sptlrpc_svc_check_from(). If the gsc_usr_mds was set but the source claims to be from an MDT you get the faked source messages. Something like the following patch fixes it but I'm not sure that's the right thing because I just don't know enough about DNE. Andreas can you provide some input here before I post a patch. --- a/lustre/ptlrpc/gss/gss_svc_upcall.c +++ b/lustre/ptlrpc/gss/gss_svc_upcall.c @@ -839,9 +839,12 @@ int gss_svc_upcall_install_rvs_ctx(struct obd_import *imp, } rsci.h.expiry_time = (time_t) ctx_expiry; - if (strcmp(imp->imp_obd->obd_type->typ_name, LUSTRE_MDC_NAME) == 0) + if (strcmp(imp->imp_obd->obd_type->typ_name, LUSTRE_MDC_NAME) == 0 || + strcmp(imp->imp_obd->obd_type->typ_name, LUSTRE_LWP_NAME) == 0 || + !strstr(imp->imp_obd->obd_name, "-osp-MDT")) rsci.ctx.gsc_usr_mds = 1; - else if (strcmp(imp->imp_obd->obd_type->typ_name, LUSTRE_OSC_NAME) == 0) + else if (strcmp(imp->imp_obd->obd_type->typ_name, LUSTRE_OSC_NAME) == 0 || + strcmp(imp->imp_obd->obd_type->typ_name, LUSTRE_OSP_NAME) == 0) rsci.ctx.gsc_usr_oss = 1; else rsci.ctx.gsc_usr_root = 1;

            There was a similar issue fixed a while ago in patch http://review.whamcloud.com/14040 "LU-3778 sptlrpc: OSP and LWP don't know sptlrpc" that was fixed by Sebastien.

            adilger Andreas Dilger added a comment - There was a similar issue fixed a while ago in patch http://review.whamcloud.com/14040 " LU-3778 sptlrpc: OSP and LWP don't know sptlrpc" that was fixed by Sebastien.

            People

              jfilizetti Jeremy Filizetti
              nblavend Nathan Lavender (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: