[LU-7524] sanity-hsm_404 test failed: >LustreError: 7253:0:(client.c:771:__ptlrpc_request_alloc()) ASSERTION( (unsigned long)imp > 0x1000 ) failed: (null) Created: 07/Dec/15  Updated: 19/Sep/16  Resolved: 14/Mar/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Minor
Reporter: parinay v kondekar (Inactive) Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: patch
Environment:

Configuration : 4 node dne_singlemds . 1 MDS / 1 OSS / 2 Clients
Release
2.6.32_431.29.2.el6.x86_64
git hash - 2d11035
Server 2.7.63
Client 2.7.63


Attachments: File 404.lctl.tgz     Text File vmcore-dmesg.txt    
Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
  • dmesg
    <4>Lustre: DEBUG MARKER: == sanity-hsm test 404: Inactive MDT does not block requests for active MDTs == 09:51:24 (1449481884)
    <4>Lustre: setting import lustre-MDT0001_UUID INACTIVE by administrator request
    <6>format at client.c:771:__ptlrpc_request_alloc doesn't end in newline
    <0>LustreError: 7253:0:(client.c:771:__ptlrpc_request_alloc()) ASSERTION( (unsigned long)imp > 0x1000 ) failed: (null)
    <0>LustreError: 7253:0:(client.c:771:__ptlrpc_request_alloc()) LBUG
    <4>Pid: 7253, comm: lhsmtool_posix
    <4>
    <4>Call Trace:
    <4> [<ffffffffa02f3875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
    <4> [<ffffffffa02f3e77>] lbug_with_loc+0x47/0xb0 [libcfs]
    <4> [<ffffffffa05eec88>] ptlrpc_request_alloc_internal+0x2e8/0x360 [ptlrpc]
    <4> [<ffffffffa05eed23>] ptlrpc_request_alloc+0x13/0x20 [ptlrpc]
    <4> [<ffffffffa05eed53>] ptlrpc_request_alloc_pack+0x23/0x60 [ptlrpc]
    <4> [<ffffffffa022d5c9>] fld_client_rpc+0x1c9/0x510 [fld]
    <4> [<ffffffffa022dbd1>] fld_client_lookup+0x2c1/0x470 [fld]
    <4> [<ffffffffa07a82c1>] lmv_fld_lookup+0xf1/0x440 [lmv]
    <4> [<ffffffffa07a2eba>] lmv_iocontrol+0x11fa/0x3230 [lmv]
    <4> [<ffffffffa02f327b>] ? cfs_set_ptldebug_header+0x2b/0xc0 [libcfs]
    <4> [<ffffffffa02ff083>] ? libcfs_debug_vmsg2+0x5e3/0xbe0 [libcfs]
    <4> [<ffffffff8116fa6c>] ? __kmalloc+0x20c/0x220
    <4> [<ffffffffa088d98b>] ll_fid2path+0x3fb/0x870 [lustre]
    <4> [<ffffffffa08730fc>] ll_dir_ioctl+0x135c/0x7440 [lustre]
    <4> [<ffffffffa08c56ec>] ? ll_authorize_statahead+0x2c/0xc0 [lustre]
    <4> [<ffffffffa088a7e3>] ? ll_file_open+0x5b3/0xca0 [lustre]
    <4> [<ffffffffa05ee020>] ? ptlrpc_req_finished+0x10/0x20 [ptlrpc]
    <4> [<ffffffffa088387d>] ? __ll_inode_revalidate+0x1bd/0xc60 [lustre]
    <4> [<ffffffff811961b3>] ? generic_permission+0x23/0xb0
    <4> [<ffffffffa086db40>] ? ll_dir_open+0x0/0xf0 [lustre]
    <4> [<ffffffffa086db40>] ? ll_dir_open+0x0/0xf0 [lustre]
    <4> [<ffffffff81185f6f>] ? __dentry_open+0x23f/0x360
    <4> [<ffffffff81227e9f>] ? security_inode_permission+0x1f/0x30
    <4> [<ffffffff811861a4>] ? nameidata_to_filp+0x54/0x70
    <4> [<ffffffff8119be8a>] ? do_filp_open+0x6ea/0xd20
    <4> [<ffffffff8104fa68>] ? flush_tlb_others_ipi+0x128/0x130
    <4> [<ffffffff8119e4e2>] vfs_ioctl+0x22/0xa0
    <4> [<ffffffff8119e684>] do_vfs_ioctl+0x84/0x580
    <4> [<ffffffff81196946>] ? final_putname+0x26/0x50
    <4> [<ffffffff8119ec01>] sys_ioctl+0x81/0xa0
    <4> [<ffffffff810e1bfe>] ? __audit_syscall_exit+0x25e/0x290
    <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
    
  • reproducible.


 Comments   
Comment by Gerrit Updater [ 21/Dec/15 ]

Noopur Maheshwari (noopur.maheshwari@seagate.com) uploaded a new patch: http://review.whamcloud.com/17683
Subject: LU-7524 fld: fld_clientlookup retries next target
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 105678859fdb40fca74493de70cc971299147ce4

Comment by Noopur Maheshwari (Inactive) [ 18/Jan/16 ]

With 2 MDTs (MDT0000 AND MDT0001), test_404/sanity-hsm deactivates the last MDT(MDT0001) in the list of export targets(&target->ft_chain).
fld_client_lookup() retries for another target, if the remote target is deactive. This was introduced in http://review.whamcloud.com/#/c/14313/
While getting the next export target from the list using:

target = list_entry(target->ft_chain.next, struct lu_fld_target,ft_chain);

&(target->ft_chain) is the last entry in the list, and the next of the last entry(target->ft_chain.next) is the head of the list.
Now, using the macro, ​list_entry​ maps the head of the list pointer back into a pointer to the structure that contains the list_head. Thus, it turns the head of the list into its containing structure(lu_fld_target).
Now since the head of the list does not have any data associated with it, the containing structure(i.e. target) formed from the head of the list also does not have any data. Hence, an export target with no obd device data is generated.
This corrupted export target(generated from the head of the list) causes the assertion.

The fix is: While fld_client_lookup retries for another target,if the next entry in the export target list is the head of the list(&fld->lcf_targets), move to the next entry after
the head(target->ft_chain.next->next) and retrieve the target. Else retrieve the next target entry(target->ft_chain.next).

Comment by Noopur Maheshwari (Inactive) [ 29/Jan/16 ]

This is regarding the "lu_env *env" code in lmv_fld_lookup() and fld_client_lookup()

The function lmv_fld_lookup() calls fld_client_lookup() by passing NULL for lu_env.

fld_client_lookup(&lmv->lmv_fld, fid_seq(fid), mds,
                               LU_SEQ_RANGE_MDT, NULL);

And in fld_client_lookup(), it checks for env != NULL which will always be false.
There will always be an assertion if target->ft_srv != NULL.

#ifdef HAVE_SERVER_SUPPORT
if (target->ft_srv != NULL) {
		LASSERT(env != NULL);
			rc = fld_server_lookup(env, target->ft_srv, seq, &res);
	}
#endif /* HAVE_SERVER_SUPPORT */

Is the env initialisation missed here?

In the old code (please see d2d56f38da01001c92a09afc6b52b5acbd9bc13c), fld_client_lookup was called from lmv_fld_lookup and from cmm_fld_lookup.
Today fld_client_lookup is called only from lmv_fld_lookup, but env is always set to NULL in both, the old code and current code.

Is it required to change the prototype of fld_client_lookup - delete "lu_env *env" and clean all code connected with "env" in fld_client_lookup?

Our use case: The server build is installed on lustre client. We need to have possibility to start server lustre on clients. Hence, the HAVE_SERVER_SUPPORT part of the code needs to be executed on client.

Comment by Gerrit Updater [ 14/Mar/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17683/
Subject: LU-7524 fld: fld_clientlookup retries next target
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 8d4ef45e078010d98f5c4f786e551d487f3e6e18

Generated at Sat Feb 10 02:09:38 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.