[LU-7313] sanity-hsm test_404 test failed: LustreError: 11377:0:(fld_request.c:489:fld_client_lookup()) ASSERTION( env != ((void *)0) ) failed Created: 19/Oct/15  Updated: 07/Dec/15  Resolved: 07/Dec/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: parinay v kondekar (Inactive) Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

4 node setup ( MDS / OSS / 2 Clients), DNE, Single MDS
MDSCOUNT=2 OSTCOUNT=2


Attachments: File 404.lctl.tgz     Text File dmesg.txt     File lustre.log     Text File vmcore    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Server 2.7.61
Client 2.7.61

404.console.fre0304.log
LustreError: 11377:0:(fld_request.c:489:fld_client_lookup()) ASSERTION( env != ((void *)0) ) failed: 
LustreError: 11377:0:(fld_request.c:489:fld_client_lookup()) LBUG

stdout.log
 sanity-hsm test_404: @@@@@@ FAIL: request on 0x200000405:0x1:0x0 is not SUCCEED on mds1 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4784:error_noexit()
  = /usr/lib64/lustre/tests/test-framework.sh:4815:error()
  = /usr/lib64/lustre/tests/sanity-hsm.sh:719:wait_request_state()
  = /usr/lib64/lustre/tests/sanity-hsm.sh:4495:test_404()
  = /usr/lib64/lustre/tests/test-framework.sh:5062:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:5099:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:4916:run_test()
  = /usr/lib64/lustre/tests/sanity-hsm.sh:4505:main()
Dumping lctl log to /tmp/test_logs/1445245656/sanity-hsm.test_404.*.1445245912.log
fre0304: open /proc/sys/lnet/dump_kernel failed: No such file or directory
fre0304: open(dump_kernel) failed: No such file or directory
fre0304: Warning: Permanently added 'fre0303,192.168.103.3' (RSA) to the list of known hosts.
fre0301: Warning: Permanently added 'fre0303,192.168.103.3' (RSA) to the list of known hosts.
fre0302: Warning: Permanently added 'fre0303,192.168.103.3' (RSA) to the list of known hosts.
FAIL 404 (227s)
sanity-hsm: FAIL: test_404 request on 0x200000405:0x1:0x0 is not SUCCEED on mds1
Stopping clients: fre0303,fre0304 /mnt/lustre2 (opts:)
Stopping client fre0303 /mnt/lustre2 opts:


stderr.log
Using TIMEOUT=20
running as uid/gid/euid/egid 500/500/500/500, groups:
 [touch] [/mnt/lustre/d0_runas_test/f22269]
excepting tests: 34 35 36
pdsh@fre0303: fre0304: ssh exited with exit code 1
== sanity-hsm test complete, duration 257 sec == 09:11:53 (1445245913)





 Comments   
Comment by Andreas Dilger [ 21/Oct/15 ]

Could you please provide the stack trace for the failing thread. I don't think we can use the vmcore unless we have the exact kernel build and modules available, but it isn't mentioned if you are using our build, or for which kernel/distro/arch it is.

Are you doing anything different in your testing or configuration to trigger this? We haven't hit anything similar in our testing.

Comment by parinay v kondekar (Inactive) [ 26/Oct/15 ]

My bad, apologies for incomplete info

Here are the details

491 <4>Lustre: DEBUG MARKER: == sanity-hsm test 404: Inactive MDT does not block requests for active MDTs == 09:08:05 (1445245685)
492 <4>Lustre: setting import lustre-MDT0001_UUID INACTIVE by administrator request
493 <4>Lustre: Skipped 1 previous similar message
494 <0>LustreError: 11377:0:(fld_request.c:489:fld_client_lookup()) ASSERTION( env != ((void *)0) ) failed:
495 <0>LustreError: 11377:0:(fld_request.c:489:fld_client_lookup()) LBUG
496 <4>Pid: 11377, comm: lhsmtool_posix
497 <4>
498 <4>Call Trace:
499 <4> [<ffffffffa02f3875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
500 <4> [<ffffffffa02f3e77>] lbug_with_loc+0x47/0xb0 [libcfs]
501 <4> [<ffffffffa08abd8b>] fld_client_lookup+0x47b/0x4e0 [fld]
502 <4> [<ffffffffa08df2e1>] lmv_fld_lookup+0xf1/0x440 [lmv]
503 <4> [<ffffffffa08d9eda>] lmv_iocontrol+0x11fa/0x3230 [lmv]
504 <4> [<ffffffffa02f327b>] ? cfs_set_ptldebug_header+0x2b/0xc0 [libcfs]
505 <4> [<ffffffffa02ff523>] ? libcfs_debug_vmsg2+0x5e3/0xbe0 [libcfs]
506 <4> [<ffffffff8116fe9c>] ? __kmalloc+0x20c/0x220
507 <4> [<ffffffffa09ce9bb>] ll_fid2path+0x3fb/0x870 [lustre]
508 <4> [<ffffffffa09b40fc>] ll_dir_ioctl+0x135c/0x7440 [lustre]
509 <4> [<ffffffffa0a0666c>] ? ll_authorize_statahead+0x2c/0xc0 [lustre]
510 <4> [<ffffffffa09cb823>] ? ll_file_open+0x5b3/0xca0 [lustre]
511 <4> [<ffffffffa063d740>] ? ptlrpc_req_finished+0x10/0x20 [ptlrpc]
512 <4> [<ffffffffa09c48bd>] ? __ll_inode_revalidate+0x1bd/0xc60 [lustre]
513 <4> [<ffffffff81196643>] ? generic_permission+0x23/0xb0
514 <4> [<ffffffffa09aeb40>] ? ll_dir_open+0x0/0xf0 [lustre]
515 <4> [<ffffffffa09aeb40>] ? ll_dir_open+0x0/0xf0 [lustre]
516 <4> [<ffffffff8118639f>] ? __dentry_open+0x23f/0x360
517 <4> [<ffffffff812284cf>] ? security_inode_permission+0x1f/0x30
518 <4> [<ffffffff811865d4>] ? nameidata_to_filp+0x54/0x70
519 <4> [<ffffffff8119c31a>] ? do_filp_open+0x6ea/0xd20
520 <4> [<ffffffff8104fa68>] ? flush_tlb_others_ipi+0x128/0x130
521 <4> [<ffffffff8119e972>] vfs_ioctl+0x22/0xa0
522 <4> [<ffffffff8119eb14>] do_vfs_ioctl+0x84/0x580
523 <4> [<ffffffff81196dd6>] ? final_putname+0x26/0x50
524 <4> [<ffffffff8119f091>] sys_ioctl+0x81/0xa0
525 <4> [<ffffffff810e202e>] ? __audit_syscall_exit+0x25e/0x290
526 <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
527 <4>

Lustre: Build Version: 2.7.61-gef63c03-PRISTINE-2.6.32-431.29.2.el6

Comment by parinay v kondekar (Inactive) [ 26/Oct/15 ]
  • > Are you doing anything different in your testing or configuration to trigger this? We haven't hit anything similar in our testing.
    Its 4 node, DNE setup.
  • cmd line
SLOW=YES NAME=ncli  NETTYPE=tcp mds1_HOST=fre0301 MDSDEV1=/dev/vdb mds_HOST=fre0301 MDSDEV=/dev/vdb mds2_HOST=fre0301 MDSDEV2=/dev/vdc MDSCOUNT=2 ost1_HOST=fre0302 OSTDEV1=/dev/vdb ost2_HOST=fre0302 OSTDEV2=/dev/vdc OSTCOUNT=2 CLIENTS=fre0303 RCLIENTS="fre0304"  DIR=/mnt/lustre PDSH="/usr/bin/pdsh -R ssh -S -w " ONLY=404 MDS_MOUNT_OPTS="-o rw,user_xattr" OST_MOUNT_OPTS="-o user_xattr" MDSSIZE=0 OSTSIZE=0 ENABLE_QUOTA="yes" MDSJOURNALSIZE="22" MAXFREE="1400000" mdtest_nFiles="50000" mdtest_iteration="5"  SHARED_DIRECTORY="/shared/fremont/test-results/xperior-custom/914//quad3-quartet-1/shared-dir//sanity-hsm"  /usr/lib64/lustre/tests/sanity-hsm.sh 2>     /var/log/xperior/test_stderr.166789.log 1>  /var/log/xperior/test_stdout.166789.log

Hope this helps. Let me know, if anything more is needed.

Comment by parinay v kondekar (Inactive) [ 07/Dec/15 ]

Just realized that the client was not running "patchless" client RPMs. Re-test (MULTIRUN=10) with patchless clients RPMS on client, passed the test. The issue can be closed.

Thanks

Comment by parinay v kondekar (Inactive) [ 07/Dec/15 ]

sorry ran a wrong test. sanity-hsm/test_404 rerun in progress. Ignore earlier comment. Thanks

Comment by parinay v kondekar (Inactive) [ 07/Dec/15 ]

Its observed during re-run that, as the clients running with patchless client RPMs, the ASSERT "ASSERTION( env != ((void *)0) ) failed" is not reproducible. The issue can b closed.

Thanks

Generated at Sat Feb 10 02:07:48 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.