[LU-7617] replay-single test_70b hangs ASSERTION( lo_header->loh_attr & LOHA_EXISTS ) in rename Created: 29/Dec/15 Updated: 22/Dec/21 Resolved: 22/Dec/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0, Lustre 2.9.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
autotest review-dne-part-2 |
||
| Issue Links: |
|
||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
replay-single test_70b hangs with the following error from the test_log shadow-23vm1: rpc : @@@@@@ FAIL: can't put import for mdc.lustre-MDT0001-mdc-*.mds_server_uuid into FULL state after 1475 sec, have REPLAY
shadow-23vm1: Trace dump:
shadow-23vm1: = /usr/lib64/lustre/tests/test-framework.sh:4767:error_noexit()
shadow-23vm1: = /usr/lib64/lustre/tests/test-framework.sh:4798:error()
shadow-23vm1: = /usr/lib64/lustre/tests/test-framework.sh:5860:_wait_import_state()
shadow-23vm1: = /usr/lib64/lustre/tests/test-framework.sh:5882:wait_import_state()
shadow-23vm1: = /usr/lib64/lustre/tests/test-framework.sh:5891:wait_import_state_mount()
shadow-23vm1: = rpc.sh:20:main()
shadow-23vm1: CMD: shadow-23vm1.shadow.whamcloud.com,shadow-23vm3,shadow-23vm4,shadow-23vm5,shadow-23vm8 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:./../utils:/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/openmpi/bin:/usr/bin:/bin:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh check_logdir /tmp/test_logs/1451294686
shadow-23vm5: rpc : @@@@@@ FAIL: can't put import for mdc.lustre-MDT0001-mdc-*.mds_server_uuid into FULL state after 1475 sec, have REPLAY
shadow-23vm5: Trace dump:
shadow-23vm5: = /usr/lib64/lustre/tests/test-framework.sh:4767:error_noexit()
shadow-23vm5: = /usr/lib64/lustre/tests/test-framework.sh:4798:error()
shadow-23vm5: = /usr/lib64/lustre/tests/test-framework.sh:5860:_wait_import_state()
shadow-23vm5: = /usr/lib64/lustre/tests/test-framework.sh:5882:wait_import_state()
shadow-23vm5: = /usr/lib64/lustre/tests/test-framework.sh:5891:wait_import_state_mount()
shadow-23vm5: = rpc.sh:20:main()
shadow-23vm5: CMD: shadow-23vm3,shadow-23vm4,shadow-23vm5,shadow-23vm5.shadow.whamcloud.com,shadow-23vm8 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:./../utils:/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/openmpi/bin:/usr/bin:/bin:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh check_logdir /tmp/test_logs/1451294686
shadow-23vm1: CMD: shadow-23vm1.shadow.whamcloud.com uname -n
shadow-23vm1: CMD: shadow-23vm3 uname -n
shadow-23vm1: Dumping lctl log to /tmp/test_logs/1451294686/rpc..*.1451296167.log
shadow-23vm1: CMD: shadow-23vm1.shadow.whamcloud.com,shadow-23vm3,shadow-23vm4,shadow-23vm5,shadow-23vm8 /usr/sbin/lctl dk > /tmp/test_logs/1451294686/rpc..debug_log.\$(hostname -s).1451296167.log;
shadow-23vm1: dmesg > /tmp/test_logs/1451294686/rpc..dmesg.\$(hostname -s).1451296167.log
shadow-23vm1: shadow-23vm4: open /proc/sys/lnet/dump_kernel failed: No such file or directory
shadow-23vm1: shadow-23vm4: open(dump_kernel) failed: No such file or directory
shadow-23vm5: CMD: shadow-23vm3 uname -n
shadow-23vm5: Dumping lctl log to /tmp/test_logs/1451294686/rpc..*.1451296168.log
shadow-23vm5: CMD: shadow-23vm3,shadow-23vm4,shadow-23vm5,shadow-23vm5.shadow.whamcloud.com,shadow-23vm8 /usr/sbin/lctl dk > /tmp/test_logs/1451294686/rpc..debug_log.\$(hostname -s).1451296168.log;
shadow-23vm5: dmesg > /tmp/test_logs/1451294686/rpc..dmesg.\$(hostname -s).1451296168.log
shadow-23vm1: CMD: shadow-23vm1.shadow.whamcloud.com,shadow-23vm3,shadow-23vm4,shadow-23vm5,shadow-23vm8 rsync -az /tmp/test_logs/1451294686/rpc..*.1451296167.log shadow-23vm1.shadow.whamcloud.com:/tmp/test_logs/1451294686
shadow-23vm5: shadow-23vm4: open /proc/sys/lnet/dump_kernel failed: No such file or directory
shadow-23vm5: shadow-23vm4: open(dump_kernel) failed: No such file or directory
shadow-23vm5: CMD: shadow-23vm3,shadow-23vm4,shadow-23vm5,shadow-23vm5.shadow.whamcloud.com,shadow-23vm8 rsync -az /tmp/test_logs/1451294686/rpc..*.1451296168.log shadow-23vm5.shadow.whamcloud.com:/tmp/test_logs/1451294686
replay-single test_70b: @@@@@@ FAIL: import is not in FULL state
Trace dump:
= /usr/lib64/lustre/tests/test-framework.sh:4767:error_noexit()
= /usr/lib64/lustre/tests/test-framework.sh:4798:error()
= /usr/lib64/lustre/tests/test-framework.sh:6114:wait_clients_import_state()
= /usr/lib64/lustre/tests/test-framework.sh:2621:fail()
= /usr/lib64/lustre/tests/replay-single.sh:2088:test_70b()
= /usr/lib64/lustre/tests/test-framework.sh:5045:run_one()
= /usr/lib64/lustre/tests/test-framework.sh:5082:run_one_logged()
= /usr/lib64/lustre/tests/test-framework.sh:4947:run_test()
= /usr/lib64/lustre/tests/replay-single.sh:2099:main()
Dumping lctl log to /logdir/test_logs/2015-12-28/lustre-reviews-el6_7-x86_64--review-dne-part-2--1_7_1__36516__-70048159267740-004235/replay-single.test_70b.*.1451296171.log
CMD: shadow-23vm1.shadow.whamcloud.com,shadow-23vm3,shadow-23vm4,shadow-23vm5,shadow-23vm8 /usr/sbin/lctl dk > /logdir/test_logs/2015-12-28/lustre-reviews-el6_7-x86_64--review-dne-part-2--1_7_1__36516__-70048159267740-004235/replay-single.test_70b.debug_log.\$(hostname -s).1451296171.log;
dmesg > /logdir/test_logs/2015-12-28/lustre-reviews-el6_7-x86_64--review-dne-part-2--1_7_1__36516__-70048159267740-004235/replay-single.test_70b.dmesg.\$(hostname -s).1451296171.log
shadow-23vm4: open /proc/sys/lnet/dump_kernel failed: No such file or directory
shadow-23vm4: open(dump_kernel) failed: No such file or directory
From the console on MDS0, we see the following stack trace: 02:24:52:LustreError: 8467:0:(lu_object.h:862:lu_object_attr()) ASSERTION( ((o)->lo_header->loh_attr & LOHA_EXISTS) != 0 ) failed: 02:24:52:LustreError: 8467:0:(lu_object.h:862:lu_object_attr()) LBUG 02:24:52:Pid: 8467, comm: mdt00_000 02:24:52: 02:24:52:Call Trace: 02:24:52: [<ffffffffa0467875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 02:24:52: [<ffffffffa0467e77>] lbug_with_loc+0x47/0xb0 [libcfs] 02:24:52: [<ffffffffa0fb3ed3>] mdd_rename+0x1713/0x1dc0 [mdd] 02:24:52: [<ffffffffa0fa2ecc>] ? __mdd_lookup+0x28c/0x450 [mdd] 02:24:52: [<ffffffffa0e66c45>] mdt_reint_rename_internal+0x1305/0x1a60 [mdt] 02:24:52: [<ffffffffa075da76>] ? ldlm_lock_enqueue+0x2d6/0x970 [ptlrpc] 02:24:52: [<ffffffffa0e6759d>] mdt_reint_rename_or_migrate+0x1fd/0x7e0 [mdt] 02:24:52: [<ffffffffa077d7a0>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc] 02:24:52: [<ffffffffa077f110>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc] 02:24:52: [<ffffffffa0e5bdaf>] ? ucred_set_jobid+0x5f/0x70 [mdt] 02:24:52: [<ffffffffa0e67bb3>] mdt_reint_rename+0x13/0x20 [mdt] 02:24:52: [<ffffffffa0e5fa5d>] mdt_reint_rec+0x5d/0x200 [mdt] 02:24:52: [<ffffffffa0e4b84b>] mdt_reint_internal+0x62b/0xb80 [mdt] 02:24:52: [<ffffffffa0e4c23b>] mdt_reint+0x6b/0x120 [mdt] 02:24:52: [<ffffffffa08174cc>] tgt_request_handle+0x8ec/0x1470 [ptlrpc] 02:24:52: [<ffffffffa07beb41>] ptlrpc_main+0xe41/0x1910 [ptlrpc] 02:24:52: [<ffffffffa07bdd00>] ? ptlrpc_main+0x0/0x1910 [ptlrpc] 02:24:52: [<ffffffff810a0fce>] kthread+0x9e/0xc0 02:24:52: [<ffffffff8100c28a>] child_rip+0xa/0x20 02:24:52: [<ffffffff810a0f30>] ? kthread+0x0/0xc0 02:24:52: [<ffffffff8100c280>] ? child_rip+0x0/0x20 02:24:52: 02:24:52:Kernel panic - not syncing: LBUG 02:24:52:Pid: 8467, comm: mdt00_000 Not tainted 2.6.32-573.8.1.el6_lustre.gbd3d354.x86_64 #1 We’ve seen this failure about eight times since early October. Logs for this failure are at: There are several tickets open for replay-single test 70b failing with the same assertion, but the stack trace for those include mdd_migrate and were seen running racer. Two examples are |
| Comments |
| Comment by James Nunez (Inactive) [ 31/Dec/15 ] |
|
Another failure on master: |
| Comment by Andreas Dilger [ 04/Jan/16 ] |
|
There are a number of different failures for replay-single 70b, which may be related if there is some kind of random memory corruption during this test - |