Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 2.8.0, Lustre 2.9.0
-
None
-
autotest review-dne-part-2
-
3
-
9223372036854775807
Description
replay-single test_70b hangs with the following error from the test_log
shadow-23vm1: rpc : @@@@@@ FAIL: can't put import for mdc.lustre-MDT0001-mdc-*.mds_server_uuid into FULL state after 1475 sec, have REPLAY shadow-23vm1: Trace dump: shadow-23vm1: = /usr/lib64/lustre/tests/test-framework.sh:4767:error_noexit() shadow-23vm1: = /usr/lib64/lustre/tests/test-framework.sh:4798:error() shadow-23vm1: = /usr/lib64/lustre/tests/test-framework.sh:5860:_wait_import_state() shadow-23vm1: = /usr/lib64/lustre/tests/test-framework.sh:5882:wait_import_state() shadow-23vm1: = /usr/lib64/lustre/tests/test-framework.sh:5891:wait_import_state_mount() shadow-23vm1: = rpc.sh:20:main() shadow-23vm1: CMD: shadow-23vm1.shadow.whamcloud.com,shadow-23vm3,shadow-23vm4,shadow-23vm5,shadow-23vm8 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:./../utils:/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/openmpi/bin:/usr/bin:/bin:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh check_logdir /tmp/test_logs/1451294686 shadow-23vm5: rpc : @@@@@@ FAIL: can't put import for mdc.lustre-MDT0001-mdc-*.mds_server_uuid into FULL state after 1475 sec, have REPLAY shadow-23vm5: Trace dump: shadow-23vm5: = /usr/lib64/lustre/tests/test-framework.sh:4767:error_noexit() shadow-23vm5: = /usr/lib64/lustre/tests/test-framework.sh:4798:error() shadow-23vm5: = /usr/lib64/lustre/tests/test-framework.sh:5860:_wait_import_state() shadow-23vm5: = /usr/lib64/lustre/tests/test-framework.sh:5882:wait_import_state() shadow-23vm5: = /usr/lib64/lustre/tests/test-framework.sh:5891:wait_import_state_mount() shadow-23vm5: = rpc.sh:20:main() shadow-23vm5: CMD: shadow-23vm3,shadow-23vm4,shadow-23vm5,shadow-23vm5.shadow.whamcloud.com,shadow-23vm8 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:./../utils:/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/openmpi/bin:/usr/bin:/bin:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh check_logdir /tmp/test_logs/1451294686 shadow-23vm1: CMD: shadow-23vm1.shadow.whamcloud.com uname -n shadow-23vm1: CMD: shadow-23vm3 uname -n shadow-23vm1: Dumping lctl log to /tmp/test_logs/1451294686/rpc..*.1451296167.log shadow-23vm1: CMD: shadow-23vm1.shadow.whamcloud.com,shadow-23vm3,shadow-23vm4,shadow-23vm5,shadow-23vm8 /usr/sbin/lctl dk > /tmp/test_logs/1451294686/rpc..debug_log.\$(hostname -s).1451296167.log; shadow-23vm1: dmesg > /tmp/test_logs/1451294686/rpc..dmesg.\$(hostname -s).1451296167.log shadow-23vm1: shadow-23vm4: open /proc/sys/lnet/dump_kernel failed: No such file or directory shadow-23vm1: shadow-23vm4: open(dump_kernel) failed: No such file or directory shadow-23vm5: CMD: shadow-23vm3 uname -n shadow-23vm5: Dumping lctl log to /tmp/test_logs/1451294686/rpc..*.1451296168.log shadow-23vm5: CMD: shadow-23vm3,shadow-23vm4,shadow-23vm5,shadow-23vm5.shadow.whamcloud.com,shadow-23vm8 /usr/sbin/lctl dk > /tmp/test_logs/1451294686/rpc..debug_log.\$(hostname -s).1451296168.log; shadow-23vm5: dmesg > /tmp/test_logs/1451294686/rpc..dmesg.\$(hostname -s).1451296168.log shadow-23vm1: CMD: shadow-23vm1.shadow.whamcloud.com,shadow-23vm3,shadow-23vm4,shadow-23vm5,shadow-23vm8 rsync -az /tmp/test_logs/1451294686/rpc..*.1451296167.log shadow-23vm1.shadow.whamcloud.com:/tmp/test_logs/1451294686 shadow-23vm5: shadow-23vm4: open /proc/sys/lnet/dump_kernel failed: No such file or directory shadow-23vm5: shadow-23vm4: open(dump_kernel) failed: No such file or directory shadow-23vm5: CMD: shadow-23vm3,shadow-23vm4,shadow-23vm5,shadow-23vm5.shadow.whamcloud.com,shadow-23vm8 rsync -az /tmp/test_logs/1451294686/rpc..*.1451296168.log shadow-23vm5.shadow.whamcloud.com:/tmp/test_logs/1451294686 replay-single test_70b: @@@@@@ FAIL: import is not in FULL state Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:4767:error_noexit() = /usr/lib64/lustre/tests/test-framework.sh:4798:error() = /usr/lib64/lustre/tests/test-framework.sh:6114:wait_clients_import_state() = /usr/lib64/lustre/tests/test-framework.sh:2621:fail() = /usr/lib64/lustre/tests/replay-single.sh:2088:test_70b() = /usr/lib64/lustre/tests/test-framework.sh:5045:run_one() = /usr/lib64/lustre/tests/test-framework.sh:5082:run_one_logged() = /usr/lib64/lustre/tests/test-framework.sh:4947:run_test() = /usr/lib64/lustre/tests/replay-single.sh:2099:main() Dumping lctl log to /logdir/test_logs/2015-12-28/lustre-reviews-el6_7-x86_64--review-dne-part-2--1_7_1__36516__-70048159267740-004235/replay-single.test_70b.*.1451296171.log CMD: shadow-23vm1.shadow.whamcloud.com,shadow-23vm3,shadow-23vm4,shadow-23vm5,shadow-23vm8 /usr/sbin/lctl dk > /logdir/test_logs/2015-12-28/lustre-reviews-el6_7-x86_64--review-dne-part-2--1_7_1__36516__-70048159267740-004235/replay-single.test_70b.debug_log.\$(hostname -s).1451296171.log; dmesg > /logdir/test_logs/2015-12-28/lustre-reviews-el6_7-x86_64--review-dne-part-2--1_7_1__36516__-70048159267740-004235/replay-single.test_70b.dmesg.\$(hostname -s).1451296171.log shadow-23vm4: open /proc/sys/lnet/dump_kernel failed: No such file or directory shadow-23vm4: open(dump_kernel) failed: No such file or directory
From the console on MDS0, we see the following stack trace:
02:24:52:LustreError: 8467:0:(lu_object.h:862:lu_object_attr()) ASSERTION( ((o)->lo_header->loh_attr & LOHA_EXISTS) != 0 ) failed: 02:24:52:LustreError: 8467:0:(lu_object.h:862:lu_object_attr()) LBUG 02:24:52:Pid: 8467, comm: mdt00_000 02:24:52: 02:24:52:Call Trace: 02:24:52: [<ffffffffa0467875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 02:24:52: [<ffffffffa0467e77>] lbug_with_loc+0x47/0xb0 [libcfs] 02:24:52: [<ffffffffa0fb3ed3>] mdd_rename+0x1713/0x1dc0 [mdd] 02:24:52: [<ffffffffa0fa2ecc>] ? __mdd_lookup+0x28c/0x450 [mdd] 02:24:52: [<ffffffffa0e66c45>] mdt_reint_rename_internal+0x1305/0x1a60 [mdt] 02:24:52: [<ffffffffa075da76>] ? ldlm_lock_enqueue+0x2d6/0x970 [ptlrpc] 02:24:52: [<ffffffffa0e6759d>] mdt_reint_rename_or_migrate+0x1fd/0x7e0 [mdt] 02:24:52: [<ffffffffa077d7a0>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc] 02:24:52: [<ffffffffa077f110>] ? ldlm_completion_ast+0x0/0x9b0 [ptlrpc] 02:24:52: [<ffffffffa0e5bdaf>] ? ucred_set_jobid+0x5f/0x70 [mdt] 02:24:52: [<ffffffffa0e67bb3>] mdt_reint_rename+0x13/0x20 [mdt] 02:24:52: [<ffffffffa0e5fa5d>] mdt_reint_rec+0x5d/0x200 [mdt] 02:24:52: [<ffffffffa0e4b84b>] mdt_reint_internal+0x62b/0xb80 [mdt] 02:24:52: [<ffffffffa0e4c23b>] mdt_reint+0x6b/0x120 [mdt] 02:24:52: [<ffffffffa08174cc>] tgt_request_handle+0x8ec/0x1470 [ptlrpc] 02:24:52: [<ffffffffa07beb41>] ptlrpc_main+0xe41/0x1910 [ptlrpc] 02:24:52: [<ffffffffa07bdd00>] ? ptlrpc_main+0x0/0x1910 [ptlrpc] 02:24:52: [<ffffffff810a0fce>] kthread+0x9e/0xc0 02:24:52: [<ffffffff8100c28a>] child_rip+0xa/0x20 02:24:52: [<ffffffff810a0f30>] ? kthread+0x0/0xc0 02:24:52: [<ffffffff8100c280>] ? child_rip+0x0/0x20 02:24:52: 02:24:52:Kernel panic - not syncing: LBUG 02:24:52:Pid: 8467, comm: mdt00_000 Not tainted 2.6.32-573.8.1.el6_lustre.gbd3d354.x86_64 #1
We’ve seen this failure about eight times since early October. Logs for this failure are at:
2015-10-23 18:45:53 - https://testing.hpdd.intel.com/test_sets/f4909c1c-7a10-11e5-b618-5254006e85c2
2015-10-24 09:06:03 - https://testing.hpdd.intel.com/test_sets/2e3667b2-7a75-11e5-b618-5254006e85c2
2015-11-27 08:05:42 - https://testing.hpdd.intel.com/test_sets/77835c1e-9522-11e5-bdeb-5254006e85c2
2015-11-29 04:56:08 - https://testing.hpdd.intel.com/test_sets/139e4958-96a2-11e5-aba1-5254006e85c2
2015-12-10 13:56:36 - https://testing.hpdd.intel.com/test_sets/ccf059d4-9f7e-11e5-9f74-5254006e85c2
2015-12-14 03:15:43 - https://testing.hpdd.intel.com/test_sets/0dd3b23a-a258-11e5-afd0-5254006e85c2
2015-12-20 06:59:17 - https://testing.hpdd.intel.com/test_sets/5517883e-a72f-11e5-a6c1-5254006e85c2
2015-12-28 08:43:43 - https://testing.hpdd.intel.com/test_sets/21213400-ad81-11e5-bf32-5254006e85c2
There are several tickets open for replay-single test 70b failing with the same assertion, but the stack trace for those include mdd_migrate and were seen running racer. Two examples are LU-5163 and LU-6963.
Attachments
Issue Links
- is related to
-
LU-7145 mdd_object_type() uses need audit
- Open
-
LU-6844 replay-single test 70b failure: 'rundbench load on * failed!'
- Resolved
-
LU-8004 replay-single test_70b: ASSERTION( ((o)->lo_header->loh_attr & LOHA_EXISTS) != 0 ) failed
- Closed
- is related to
-
LU-6963 racer test_1: lu_object_attr() ASSERTION(lo_header->loh_attr & LOHA_EXISTS) failed in rename
- Open