[LU-7313] sanity-hsm test_404 test failed: LustreError: 11377:0:(fld_request.c:489:fld_client_lookup()) ASSERTION( env != ((void *)0) ) failed Created: 19/Oct/15 Updated: 07/Dec/15 Resolved: 07/Dec/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | parinay v kondekar (Inactive) | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
4 node setup ( MDS / OSS / 2 Clients), DNE, Single MDS |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Server 2.7.61 404.console.fre0304.log LustreError: 11377:0:(fld_request.c:489:fld_client_lookup()) ASSERTION( env != ((void *)0) ) failed: LustreError: 11377:0:(fld_request.c:489:fld_client_lookup()) LBUG stdout.log sanity-hsm test_404: @@@@@@ FAIL: request on 0x200000405:0x1:0x0 is not SUCCEED on mds1 Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:4784:error_noexit() = /usr/lib64/lustre/tests/test-framework.sh:4815:error() = /usr/lib64/lustre/tests/sanity-hsm.sh:719:wait_request_state() = /usr/lib64/lustre/tests/sanity-hsm.sh:4495:test_404() = /usr/lib64/lustre/tests/test-framework.sh:5062:run_one() = /usr/lib64/lustre/tests/test-framework.sh:5099:run_one_logged() = /usr/lib64/lustre/tests/test-framework.sh:4916:run_test() = /usr/lib64/lustre/tests/sanity-hsm.sh:4505:main() Dumping lctl log to /tmp/test_logs/1445245656/sanity-hsm.test_404.*.1445245912.log fre0304: open /proc/sys/lnet/dump_kernel failed: No such file or directory fre0304: open(dump_kernel) failed: No such file or directory fre0304: Warning: Permanently added 'fre0303,192.168.103.3' (RSA) to the list of known hosts. fre0301: Warning: Permanently added 'fre0303,192.168.103.3' (RSA) to the list of known hosts. fre0302: Warning: Permanently added 'fre0303,192.168.103.3' (RSA) to the list of known hosts. FAIL 404 (227s) sanity-hsm: FAIL: test_404 request on 0x200000405:0x1:0x0 is not SUCCEED on mds1 Stopping clients: fre0303,fre0304 /mnt/lustre2 (opts:) Stopping client fre0303 /mnt/lustre2 opts: stderr.log Using TIMEOUT=20 running as uid/gid/euid/egid 500/500/500/500, groups: [touch] [/mnt/lustre/d0_runas_test/f22269] excepting tests: 34 35 36 pdsh@fre0303: fre0304: ssh exited with exit code 1 == sanity-hsm test complete, duration 257 sec == 09:11:53 (1445245913) |
| Comments |
| Comment by Andreas Dilger [ 21/Oct/15 ] |
|
Could you please provide the stack trace for the failing thread. I don't think we can use the vmcore unless we have the exact kernel build and modules available, but it isn't mentioned if you are using our build, or for which kernel/distro/arch it is. Are you doing anything different in your testing or configuration to trigger this? We haven't hit anything similar in our testing. |
| Comment by parinay v kondekar (Inactive) [ 26/Oct/15 ] |
|
My bad, apologies for incomplete info Here are the details 491 <4>Lustre: DEBUG MARKER: == sanity-hsm test 404: Inactive MDT does not block requests for active MDTs == 09:08:05 (1445245685) 492 <4>Lustre: setting import lustre-MDT0001_UUID INACTIVE by administrator request 493 <4>Lustre: Skipped 1 previous similar message 494 <0>LustreError: 11377:0:(fld_request.c:489:fld_client_lookup()) ASSERTION( env != ((void *)0) ) failed: 495 <0>LustreError: 11377:0:(fld_request.c:489:fld_client_lookup()) LBUG 496 <4>Pid: 11377, comm: lhsmtool_posix 497 <4> 498 <4>Call Trace: 499 <4> [<ffffffffa02f3875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 500 <4> [<ffffffffa02f3e77>] lbug_with_loc+0x47/0xb0 [libcfs] 501 <4> [<ffffffffa08abd8b>] fld_client_lookup+0x47b/0x4e0 [fld] 502 <4> [<ffffffffa08df2e1>] lmv_fld_lookup+0xf1/0x440 [lmv] 503 <4> [<ffffffffa08d9eda>] lmv_iocontrol+0x11fa/0x3230 [lmv] 504 <4> [<ffffffffa02f327b>] ? cfs_set_ptldebug_header+0x2b/0xc0 [libcfs] 505 <4> [<ffffffffa02ff523>] ? libcfs_debug_vmsg2+0x5e3/0xbe0 [libcfs] 506 <4> [<ffffffff8116fe9c>] ? __kmalloc+0x20c/0x220 507 <4> [<ffffffffa09ce9bb>] ll_fid2path+0x3fb/0x870 [lustre] 508 <4> [<ffffffffa09b40fc>] ll_dir_ioctl+0x135c/0x7440 [lustre] 509 <4> [<ffffffffa0a0666c>] ? ll_authorize_statahead+0x2c/0xc0 [lustre] 510 <4> [<ffffffffa09cb823>] ? ll_file_open+0x5b3/0xca0 [lustre] 511 <4> [<ffffffffa063d740>] ? ptlrpc_req_finished+0x10/0x20 [ptlrpc] 512 <4> [<ffffffffa09c48bd>] ? __ll_inode_revalidate+0x1bd/0xc60 [lustre] 513 <4> [<ffffffff81196643>] ? generic_permission+0x23/0xb0 514 <4> [<ffffffffa09aeb40>] ? ll_dir_open+0x0/0xf0 [lustre] 515 <4> [<ffffffffa09aeb40>] ? ll_dir_open+0x0/0xf0 [lustre] 516 <4> [<ffffffff8118639f>] ? __dentry_open+0x23f/0x360 517 <4> [<ffffffff812284cf>] ? security_inode_permission+0x1f/0x30 518 <4> [<ffffffff811865d4>] ? nameidata_to_filp+0x54/0x70 519 <4> [<ffffffff8119c31a>] ? do_filp_open+0x6ea/0xd20 520 <4> [<ffffffff8104fa68>] ? flush_tlb_others_ipi+0x128/0x130 521 <4> [<ffffffff8119e972>] vfs_ioctl+0x22/0xa0 522 <4> [<ffffffff8119eb14>] do_vfs_ioctl+0x84/0x580 523 <4> [<ffffffff81196dd6>] ? final_putname+0x26/0x50 524 <4> [<ffffffff8119f091>] sys_ioctl+0x81/0xa0 525 <4> [<ffffffff810e202e>] ? __audit_syscall_exit+0x25e/0x290 526 <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b 527 <4> Lustre: Build Version: 2.7.61-gef63c03-PRISTINE-2.6.32-431.29.2.el6 |
| Comment by parinay v kondekar (Inactive) [ 26/Oct/15 ] |
SLOW=YES NAME=ncli NETTYPE=tcp mds1_HOST=fre0301 MDSDEV1=/dev/vdb mds_HOST=fre0301 MDSDEV=/dev/vdb mds2_HOST=fre0301 MDSDEV2=/dev/vdc MDSCOUNT=2 ost1_HOST=fre0302 OSTDEV1=/dev/vdb ost2_HOST=fre0302 OSTDEV2=/dev/vdc OSTCOUNT=2 CLIENTS=fre0303 RCLIENTS="fre0304" DIR=/mnt/lustre PDSH="/usr/bin/pdsh -R ssh -S -w " ONLY=404 MDS_MOUNT_OPTS="-o rw,user_xattr" OST_MOUNT_OPTS="-o user_xattr" MDSSIZE=0 OSTSIZE=0 ENABLE_QUOTA="yes" MDSJOURNALSIZE="22" MAXFREE="1400000" mdtest_nFiles="50000" mdtest_iteration="5" SHARED_DIRECTORY="/shared/fremont/test-results/xperior-custom/914//quad3-quartet-1/shared-dir//sanity-hsm" /usr/lib64/lustre/tests/sanity-hsm.sh 2> /var/log/xperior/test_stderr.166789.log 1> /var/log/xperior/test_stdout.166789.log Hope this helps. Let me know, if anything more is needed. |
| Comment by parinay v kondekar (Inactive) [ 07/Dec/15 ] |
|
Just realized that the client was not running "patchless" client RPMs. Re-test (MULTIRUN=10) with patchless clients RPMS on client, passed the test. The issue can be closed. Thanks |
| Comment by parinay v kondekar (Inactive) [ 07/Dec/15 ] |
|
sorry ran a wrong test. sanity-hsm/test_404 rerun in progress. Ignore earlier comment. Thanks |
| Comment by parinay v kondekar (Inactive) [ 07/Dec/15 ] |
|
Its observed during re-run that, as the clients running with patchless client RPMs, the ASSERT "ASSERTION( env != ((void *)0) ) failed" is not reproducible. The issue can b closed. Thanks |