Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7524

sanity-hsm_404 test failed: >LustreError: 7253:0:(client.c:771:__ptlrpc_request_alloc()) ASSERTION( (unsigned long)imp > 0x1000 ) failed: (null)

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.9.0
    • Lustre 2.8.0
    • Configuration : 4 node dne_singlemds . 1 MDS / 1 OSS / 2 Clients
      Release
      2.6.32_431.29.2.el6.x86_64
      git hash - 2d11035
      Server 2.7.63
      Client 2.7.63
    • 3
    • 9223372036854775807

    Description

      • dmesg
        <4>Lustre: DEBUG MARKER: == sanity-hsm test 404: Inactive MDT does not block requests for active MDTs == 09:51:24 (1449481884)
        <4>Lustre: setting import lustre-MDT0001_UUID INACTIVE by administrator request
        <6>format at client.c:771:__ptlrpc_request_alloc doesn't end in newline
        <0>LustreError: 7253:0:(client.c:771:__ptlrpc_request_alloc()) ASSERTION( (unsigned long)imp > 0x1000 ) failed: (null)
        <0>LustreError: 7253:0:(client.c:771:__ptlrpc_request_alloc()) LBUG
        <4>Pid: 7253, comm: lhsmtool_posix
        <4>
        <4>Call Trace:
        <4> [<ffffffffa02f3875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
        <4> [<ffffffffa02f3e77>] lbug_with_loc+0x47/0xb0 [libcfs]
        <4> [<ffffffffa05eec88>] ptlrpc_request_alloc_internal+0x2e8/0x360 [ptlrpc]
        <4> [<ffffffffa05eed23>] ptlrpc_request_alloc+0x13/0x20 [ptlrpc]
        <4> [<ffffffffa05eed53>] ptlrpc_request_alloc_pack+0x23/0x60 [ptlrpc]
        <4> [<ffffffffa022d5c9>] fld_client_rpc+0x1c9/0x510 [fld]
        <4> [<ffffffffa022dbd1>] fld_client_lookup+0x2c1/0x470 [fld]
        <4> [<ffffffffa07a82c1>] lmv_fld_lookup+0xf1/0x440 [lmv]
        <4> [<ffffffffa07a2eba>] lmv_iocontrol+0x11fa/0x3230 [lmv]
        <4> [<ffffffffa02f327b>] ? cfs_set_ptldebug_header+0x2b/0xc0 [libcfs]
        <4> [<ffffffffa02ff083>] ? libcfs_debug_vmsg2+0x5e3/0xbe0 [libcfs]
        <4> [<ffffffff8116fa6c>] ? __kmalloc+0x20c/0x220
        <4> [<ffffffffa088d98b>] ll_fid2path+0x3fb/0x870 [lustre]
        <4> [<ffffffffa08730fc>] ll_dir_ioctl+0x135c/0x7440 [lustre]
        <4> [<ffffffffa08c56ec>] ? ll_authorize_statahead+0x2c/0xc0 [lustre]
        <4> [<ffffffffa088a7e3>] ? ll_file_open+0x5b3/0xca0 [lustre]
        <4> [<ffffffffa05ee020>] ? ptlrpc_req_finished+0x10/0x20 [ptlrpc]
        <4> [<ffffffffa088387d>] ? __ll_inode_revalidate+0x1bd/0xc60 [lustre]
        <4> [<ffffffff811961b3>] ? generic_permission+0x23/0xb0
        <4> [<ffffffffa086db40>] ? ll_dir_open+0x0/0xf0 [lustre]
        <4> [<ffffffffa086db40>] ? ll_dir_open+0x0/0xf0 [lustre]
        <4> [<ffffffff81185f6f>] ? __dentry_open+0x23f/0x360
        <4> [<ffffffff81227e9f>] ? security_inode_permission+0x1f/0x30
        <4> [<ffffffff811861a4>] ? nameidata_to_filp+0x54/0x70
        <4> [<ffffffff8119be8a>] ? do_filp_open+0x6ea/0xd20
        <4> [<ffffffff8104fa68>] ? flush_tlb_others_ipi+0x128/0x130
        <4> [<ffffffff8119e4e2>] vfs_ioctl+0x22/0xa0
        <4> [<ffffffff8119e684>] do_vfs_ioctl+0x84/0x580
        <4> [<ffffffff81196946>] ? final_putname+0x26/0x50
        <4> [<ffffffff8119ec01>] sys_ioctl+0x81/0xa0
        <4> [<ffffffff810e1bfe>] ? __audit_syscall_exit+0x25e/0x290
        <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
        
      • reproducible.

      Attachments

        1. 404.lctl.tgz
          495 kB
          parinay v kondekar
        2. vmcore-dmesg.txt
          26 kB
          parinay v kondekar

        Activity

          [LU-7524] sanity-hsm_404 test failed: >LustreError: 7253:0:(client.c:771:__ptlrpc_request_alloc()) ASSERTION( (unsigned long)imp > 0x1000 ) failed: (null)
          pjones Peter Jones made changes -
          Link Original: This issue is related to LDEV-335 [ LDEV-335 ]
          pjones Peter Jones made changes -
          Link New: This issue is related to LDEV-341 [ LDEV-341 ]
          adilger Andreas Dilger made changes -
          Link New: This issue is related to LDEV-335 [ LDEV-335 ]
          adilger Andreas Dilger made changes -
          Fix Version/s New: Lustre 2.9.0 [ 11891 ]
          Resolution New: Fixed [ 1 ]
          Status Original: Open [ 1 ] New: Resolved [ 5 ]

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17683/
          Subject: LU-7524 fld: fld_clientlookup retries next target
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 8d4ef45e078010d98f5c4f786e551d487f3e6e18

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17683/ Subject: LU-7524 fld: fld_clientlookup retries next target Project: fs/lustre-release Branch: master Current Patch Set: Commit: 8d4ef45e078010d98f5c4f786e551d487f3e6e18

          This is regarding the "lu_env *env" code in lmv_fld_lookup() and fld_client_lookup()

          The function lmv_fld_lookup() calls fld_client_lookup() by passing NULL for lu_env.

          fld_client_lookup(&lmv->lmv_fld, fid_seq(fid), mds,
                                         LU_SEQ_RANGE_MDT, NULL);
          

          And in fld_client_lookup(), it checks for env != NULL which will always be false.
          There will always be an assertion if target->ft_srv != NULL.

          #ifdef HAVE_SERVER_SUPPORT
          if (target->ft_srv != NULL) {
          		LASSERT(env != NULL);
          			rc = fld_server_lookup(env, target->ft_srv, seq, &res);
          	}
          #endif /* HAVE_SERVER_SUPPORT */
          

          Is the env initialisation missed here?

          In the old code (please see d2d56f38da01001c92a09afc6b52b5acbd9bc13c), fld_client_lookup was called from lmv_fld_lookup and from cmm_fld_lookup.
          Today fld_client_lookup is called only from lmv_fld_lookup, but env is always set to NULL in both, the old code and current code.

          Is it required to change the prototype of fld_client_lookup - delete "lu_env *env" and clean all code connected with "env" in fld_client_lookup?

          Our use case: The server build is installed on lustre client. We need to have possibility to start server lustre on clients. Hence, the HAVE_SERVER_SUPPORT part of the code needs to be executed on client.

          noopur.maheshwari Noopur Maheshwari (Inactive) added a comment - This is regarding the "lu_env *env" code in lmv_fld_lookup() and fld_client_lookup() The function lmv_fld_lookup() calls fld_client_lookup() by passing NULL for lu_env. fld_client_lookup(&lmv->lmv_fld, fid_seq(fid), mds, LU_SEQ_RANGE_MDT, NULL); And in fld_client_lookup(), it checks for env != NULL which will always be false. There will always be an assertion if target->ft_srv != NULL. #ifdef HAVE_SERVER_SUPPORT if (target->ft_srv != NULL) { LASSERT(env != NULL); rc = fld_server_lookup(env, target->ft_srv, seq, &res); } #endif /* HAVE_SERVER_SUPPORT */ Is the env initialisation missed here? In the old code (please see d2d56f38da01001c92a09afc6b52b5acbd9bc13c), fld_client_lookup was called from lmv_fld_lookup and from cmm_fld_lookup. Today fld_client_lookup is called only from lmv_fld_lookup, but env is always set to NULL in both, the old code and current code. Is it required to change the prototype of fld_client_lookup - delete "lu_env *env" and clean all code connected with "env" in fld_client_lookup? Our use case: The server build is installed on lustre client. We need to have possibility to start server lustre on clients. Hence, the HAVE_SERVER_SUPPORT part of the code needs to be executed on client.

          With 2 MDTs (MDT0000 AND MDT0001), test_404/sanity-hsm deactivates the last MDT(MDT0001) in the list of export targets(&target->ft_chain).
          fld_client_lookup() retries for another target, if the remote target is deactive. This was introduced in http://review.whamcloud.com/#/c/14313/
          While getting the next export target from the list using:

          target = list_entry(target->ft_chain.next, struct lu_fld_target,ft_chain);
          

          &(target->ft_chain) is the last entry in the list, and the next of the last entry(target->ft_chain.next) is the head of the list.
          Now, using the macro, ​list_entry​ maps the head of the list pointer back into a pointer to the structure that contains the list_head. Thus, it turns the head of the list into its containing structure(lu_fld_target).
          Now since the head of the list does not have any data associated with it, the containing structure(i.e. target) formed from the head of the list also does not have any data. Hence, an export target with no obd device data is generated.
          This corrupted export target(generated from the head of the list) causes the assertion.

          The fix is: While fld_client_lookup retries for another target,if the next entry in the export target list is the head of the list(&fld->lcf_targets), move to the next entry after
          the head(target->ft_chain.next->next) and retrieve the target. Else retrieve the next target entry(target->ft_chain.next).

          noopur.maheshwari Noopur Maheshwari (Inactive) added a comment - With 2 MDTs (MDT0000 AND MDT0001), test_404/sanity-hsm deactivates the last MDT(MDT0001) in the list of export targets(&target->ft_chain). fld_client_lookup() retries for another target, if the remote target is deactive. This was introduced in http://review.whamcloud.com/#/c/14313/ While getting the next export target from the list using: target = list_entry(target->ft_chain.next, struct lu_fld_target,ft_chain); &(target->ft_chain) is the last entry in the list, and the next of the last entry(target->ft_chain.next) is the head of the list. Now, using the macro, ​list_entry​ maps the head of the list pointer back into a pointer to the structure that contains the list_head. Thus, it turns the head of the list into its containing structure(lu_fld_target). Now since the head of the list does not have any data associated with it, the containing structure(i.e. target) formed from the head of the list also does not have any data. Hence, an export target with no obd device data is generated. This corrupted export target(generated from the head of the list) causes the assertion. The fix is: While fld_client_lookup retries for another target,if the next entry in the export target list is the head of the list(&fld->lcf_targets), move to the next entry after the head(target->ft_chain.next->next) and retrieve the target. Else retrieve the next target entry(target->ft_chain.next).
          pjones Peter Jones made changes -
          Labels New: patch

          Noopur Maheshwari (noopur.maheshwari@seagate.com) uploaded a new patch: http://review.whamcloud.com/17683
          Subject: LU-7524 fld: fld_clientlookup retries next target
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 105678859fdb40fca74493de70cc971299147ce4

          gerrit Gerrit Updater added a comment - Noopur Maheshwari (noopur.maheshwari@seagate.com) uploaded a new patch: http://review.whamcloud.com/17683 Subject: LU-7524 fld: fld_clientlookup retries next target Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 105678859fdb40fca74493de70cc971299147ce4
          parinay parinay v kondekar (Inactive) created issue -

          People

            wc-triage WC Triage
            parinay parinay v kondekar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: