Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.2.0
-
None
-
LLNL Hyperion
-
3
-
4755
Description
System crashes during fsstress testing
-------
2011-12-15 10:34:18 Lustre: 8786:0:(cmm_object.c:689:cml_rename_warn()) cml_rename failed for mdo_rename, should revoke: [mo_po [0x200000402:0x13:0x0]] [mo_pn [0x200000402:0x13:0x0]] [lf [0x200000402:0x13e8e:0x0]] [sname fstest_60e48179a98fb92fa7071967001e58b6] [mo_t [0x200000402:0x13e97:0x0]] [tname fstest_88b31a7611fd2bb789377f1366b2ce5b] [err -39]
2011-12-15 10:34:18 Lustre: 8786:0:(cmm_object.c:689:cml_rename_warn()) Skipped 14 previous similar messages
2011-12-15 10:41:34 Lustre: 8787:0:(cmm_object.c:689:cml_rename_warn()) cml_rename failed for mdo_rename, should revoke: [mo_po [0x200000408:0x16:0x0]] [mo_pn [0x200000408:0x16:0x0]] [lf [0x200000408:0x13bf3:0x0]] [sname fstest_4a6a5646e7b447858249641047b14d4a] [mo_t [0x200000408:0x13bfa:0x0]] [tname fstest_568da43f014966bf12ba58244b28bb2f] [err -39]
2011-12-15 10:50:18 Lustre: mdt: This server is not able to keep up with request traffic (cpu-bound).
2011-12-15 10:50:18 Lustre: 8761:0:(service.c:1186:ptlrpc_at_check_timed()) earlyQ=0 reqQ=0 recA=29, svcEst=19, delay=0(jiff)
2011-12-15 10:50:43 Lustre: mdt: This server is not able to keep up with request traffic (cpu-bound).
2011-12-15 10:50:43 Lustre: 8994:0:(service.c:1186:ptlrpc_at_check_timed()) earlyQ=1 reqQ=0 recA=1, svcEst=31, delay=0(jiff)
2011-12-15 10:50:43 Lustre: 8994:0:(service.c:983:ptlrpc_at_send_early_reply()) @@@ Already past deadline (18s), not sending early reply. Consider increasing at_early_margin (5)? req@ffff8801a63ac050 x1388238484166088/t0(0) o401>LOV_OSC_UUID@192.168.127.61@o2ib1:0/0 lens 4096/0 e 0 to 0 dl 1323975025 ref 2 fl Interpret:/0/ffffffff rc 0/-1
2011-12-15 10:50:43 Lustre: 8981:0:(service.c:1732:ptlrpc_server_handle_request()) @@@ Request x1388238484166088 took longer than estimated (6:18s); client may timeout. req@ffff8801a63ac050 x1388238484166088/t0(0) o401->LOV_OSC_UUID@192.168.127.61@o2ib1:0/0 lens 4096/192 e 0 to 0 dl 1323975025 ref 1 fl Complete:/0/0 rc 0/0
2011-12-15 10:50:43 Lustre: Skipped 1 previous similar message
2011-12-15 10:52:33 LustreError: 8763:0:(lustre_idl.h:766:lu_fid_eq()) ASSERTION(fid_is_igif(f0) || fid_ver(f0) == 0) failed: [0x5a5a5a5a5a5a5a5a:0x5a5a5a5a:0x5a5a5a5a]
2011-12-15 10:52:33 LustreError: 8763:0:(lustre_idl.h:766:lu_fid_eq()) LBUG
2011-12-15 10:52:33 Pid: 8763, comm: mdt_46
2011-12-15 10:52:33
2011-12-15 10:52:33 Dec 15 10:52:33 Call Trace:
2011-12-15 10:52:33 hyperion-rst6 ke [<ffffffffa03e3855>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
2011-12-15 10:52:33 rnel: LustreErro [<ffffffffa03e3e95>] lbug_with_loc+0x75/0xe0 [libcfs]
2011-12-15 10:52:33 r: 8763:0:(lustr [<ffffffffa09d8b5a>] mdd_attr_set_internal+0x30a/0x310 [mdd]
2011-12-15 10:52:33 e_idl.h:766:lu_f [<ffffffffa09d8eb5>] mdd_attr_check_set_internal+0x355/0x390 [mdd]
2011-12-15 10:52:33 id_eq()) ASSERTI [<ffffffffa09d635d>] ? mdd_la_get+0xad/0xb0 [mdd]
2011-12-15 10:52:33 ON(fid_is_igif(f [<ffffffffa09d96f9>] mdd_attr_check_set_internal_locked+0x69/0x180 [mdd]
2011-12-15 10:52:33 0) || fid_ver(f0 [<ffffffffa09fefd0>] ? md_capainfo+0x20/0x30 [mdd]
2011-12-15 10:52:33 ) == 0) failed: [<ffffffffa09f2b56>] ? mdd_object_capa+0x16/0x190 [mdd]
2011-12-15 10:52:33 [0x5a5a5a5a5a5a5 [<ffffffffa09fc345>] mdd_rename+0x1865/0x2220 [mdd]
2011-12-15 10:52:33 a5a:0x5a5a5a5a:0 [<ffffffffa03f2bcf>] ? cfs_hash_bd_from_key+0x3f/0xc0 [libcfs]
2011-12-15 10:52:33 x5a5a5a5a]
2011-12-15 10:52:33 Dec 1 [<ffffffffa0ac5239>] ? cmm_mode_get+0x109/0x320 [cmm]
2011-12-15 10:52:33 5 10:52:33 hyper [<ffffffffa0ac5d5a>] cml_rename+0x33a/0xbb0 [cmm]
2011-12-15 10:52:33 ion-rst6 kernel: [<ffffffffa03f2fe7>] ? cfs_hash_bd_get+0x37/0x90 [libcfs]
2011-12-15 10:52:33 LustreError: 87 [<ffffffffa0ac54bd>] ? cmm_is_subdir+0x6d/0x2f0 [cmm]
2011-12-15 10:52:33 63:0:(lustre_idl [<ffffffffa04c55e2>] ? lu_object_put+0x92/0x210 [obdclass]
2011-12-15 10:52:33 .h:766:lu_fid_eq [<ffffffffa0a50276>] mdt_reint_rename+0x1f96/0x23e0 [mdt]
2011-12-15 10:52:33 ()) LBUG
2011-12-15 10:52:33 [<ffffffffa03f993b>] ? upcall_cache_get_entry+0x28b/0xa14 [libcfs]
2011-12-15 10:52:33 [<ffffffffa0a4882f>] ? mdt_rename_unpack+0x44f/0x6a0 [mdt]
2011-12-15 10:52:33 [<ffffffffa09ff006>] ? md_ucred+0x26/0x60 [mdd]
2011-12-15 10:52:33 [<ffffffffa0a48abf>] mdt_reint_rec+0x3f/0x100 [mdt]
2011-12-15 10:52:33 [<ffffffffa05bbf94>] ? lustre_msg_get_flags+0x34/0xa0 [ptlrpc]
2011-12-15 10:52:33 [<ffffffffa0a40f64>] mdt_reint_internal+0x6d4/0x9f0 [mdt]
2011-12-15 10:52:33 [<ffffffffa0a36a86>] ? mdt_reint_opcode+0x96/0x160 [mdt]
2011-12-15 10:52:33 [<ffffffffa0a412cc>] mdt_reint+0x4c/0x120 [mdt]
2011-12-15 10:52:33 [<ffffffffa05bba68>] ? lustre_msg_check_version+0xc8/0xe0 [ptlrpc]
2011-12-15 10:52:33 [<ffffffffa0a33955>] mdt_handle_common+0x8d5/0x1810 [mdt]
2011-12-15 10:52:33 [<ffffffffa05b96f4>] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc]
2011-12-15 10:52:33 [<ffffffffa0a34965>] mdt_regular_handle+0x15/0x20 [mdt]
2011-12-15 10:52:33 [<ffffffffa05ca39e>] ptlrpc_main+0xb8e/0x1900 [ptlrpc]
2011-12-15 10:52:33 [<ffffffffa05c9810>] ? ptlrpc_main+0x0/0x1900 [ptlrpc]
2011-12-15 10:52:33 [<ffffffff8100c1ca>] child_rip+0xa/0x20
2011-12-15 10:52:33 [<ffffffffa05c9810>] ? ptlrpc_main+0x0/0x1900 [ptlrpc]
2011-12-15 10:52:33 [<ffffffffa05c9810>] ? ptlrpc_main+0x0/0x1900 [ptlrpc]
2011-12-15 10:52:33 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
2011-12-15 10:52:33
2011-12-15 10:52:33 Kernel panic - not syncing: LBUG
2011-12-15 10:52:33 Pid: 8763, comm: mdt_46 Tainted: G ---------------- T 2.6.32-131.6.1.el6_lustre.x86_64 #1
2011-12-15 10:52:33 Dec 15 10:52:33 Call Trace:
2011-12-15 10:52:33 hyperion-rst6 ke [<ffffffff814da878>] ? panic+0x78/0x143
2011-12-15 10:52:33 rnel: Kernel pan [<ffffffffa03e3eeb>] ? lbug_with_loc+0xcb/0xe0 [libcfs]
2011-12-15 10:52:33 ic - not syncing [<ffffffffa09d8b5a>] ? mdd_attr_set_internal+0x30a/0x310 [mdd]
2011-12-15 10:52:33 : LBUG
2011-12-15 10:52:33 [<ffffffffa09d8eb5>] ? mdd_attr_check_set_internal+0x355/0x390 [mdd]
2011-12-15 10:52:33 [<ffffffffa09d635d>] ? mdd_la_get+0xad/0xb0 [mdd]
2011-12-15 10:52:33 [<ffffffffa09d96f9>] ? mdd_attr_check_set_internal_locked+0x69/0x180 [mdd]
2011-12-15 10:52:33 [<ffffffffa09fefd0>] ? md_capainfo+0x20/0x30 [mdd]
2011-12-15 10:52:33 [<ffffffffa09f2b56>] ? mdd_object_capa+0x16/0x190 [mdd]
2011-12-15 10:52:33 [<ffffffffa09fc345>] ? mdd_rename+0x1865/0x2220 [mdd]
2011-12-15 10:52:33 [<ffffffffa03f2bcf>] ? cfs_hash_bd_from_key+0x3f/0xc0 [libcfs]
2011-12-15 10:52:33 [<ffffffffa0ac5239>] ? cmm_mode_get+0x109/0x320 [cmm]
2011-12-15 10:52:33 [<ffffffffa0ac5d5a>] ? cml_rename+0x33a/0xbb0 [cmm]
2011-12-15 10:52:33 [<ffffffffa03f2fe7>] ? cfs_hash_bd_get+0x37/0x90 [libcfs]
2011-12-15 10:52:33 [<ffffffffa0ac54bd>] ? cmm_is_subdir+0x6d/0x2f0 [cmm]
2011-12-15 10:52:33 [<ffffffffa04c55e2>] ? lu_object_put+0x92/0x210 [obdclass]
2011-12-15 10:52:33 [<ffffffffa0a50276>] ? mdt_reint_rename+0x1f96/0x23e0 [mdt]
2011-12-15 10:52:33 [<ffffffffa03f993b>] ? upcall_cache_get_entry+0x28b/0xa14 [libcfs]
2011-12-15 10:52:33 [<ffffffffa0a4882f>] ? mdt_rename_unpack+0x44f/0x6a0 [mdt]
2011-12-15 10:52:33 [<ffffffffa09ff006>] ? md_ucred+0x26/0x60 [mdd]
2011-12-15 10:52:33 [<ffffffffa0a48abf>] ? mdt_reint_rec+0x3f/0x100 [mdt]
2011-12-15 10:52:33 [<ffffffffa05bbf94>] ? lustre_msg_get_flags+0x34/0xa0 [ptlrpc]
2011-12-15 10:52:33 [<ffffffffa0a40f64>] ? mdt_reint_internal+0x6d4/0x9f0 [mdt]
2011-12-15 10:52:33 [<ffffffffa0a36a86>] ? mdt_reint_opcode+0x96/0x160 [mdt]
2011-12-15 10:52:33 [<ffffffffa0a412cc>] ? mdt_reint+0x4c/0x120 [mdt]
2011-12-15 10:52:33 [<ffffffffa05bba68>] ? lustre_msg_check_version+0xc8/0xe0 [ptlrpc]
2011-12-15 10:52:33 [<ffffffffa0a33955>] ? mdt_handle_common+0x8d5/0x1810 [mdt]
2011-12-15 10:52:33 [<ffffffffa05b96f4>] ? lustre_msg_get_opc+0x94/0x100 [ptlrpc]
2011-12-15 10:52:33 [<ffffffffa0a34965>] ? mdt_regular_handle+0x15/0x20 [mdt]
2011-12-15 10:52:33 [<ffffffffa05ca39e>] ? ptlrpc_main+0xb8e/0x1900 [ptlrpc]
2011-12-15 10:52:33 [<ffffffffa05c9810>] ? ptlrpc_main+0x0/0x1900 [ptlrpc]
2011-12-15 10:52:33 [<ffffffff8100c1ca>] ? child_rip+0xa/0x20
2011-12-15 10:52:33 [<ffffffffa05c9810>] ? ptlrpc_main+0x0/0x1900 [ptlrpc]
2011-12-15 10:52:33 [<ffffffffa05c9810>] ? ptlrpc_main+0x0/0x1900 [ptlrpc]
2011-12-15 10:52:33 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
2011-12-15 10:52:33 Initializing cgroup subsys cpuset
2011-12-15 10:52:33 Initializing cgroup subsys cpu
Attachments
Issue Links
- Trackbacks
-
Lustre 2.2.0 mini release testing tracker Lustre 2.2.0 Mini Release Tag: 2.1.52.0 Build: https://newbuild.whamcloud....
-
Lustre 2.2.0 release testing tracker Lustre 2.2.0 RC1 Tag: 2.2.0RC1 Build: https://build.whamcloud.com/job/lustreb22/11/ Google doc: https://docs.google.com/a/whamcloud.com/spreadsheet/ccc?key=0AkK5hBTd2cvHdDFsSWt2RlBocE5kdi03OUYtX21ZYkE#gid=3 Lustre 2.2....
-
Changelog 2.1 Changes from version 2.1.1 to version 2.1.2 Server support for kernels: 2.6.18308.4.1.el5 (RHEL5) 2.6.32220.17.1.el6 (RHEL6) Client support for unpatched kernels: 2.6.18308.4.1.el5 (RHEL5) 2.6.32220.17.1....
-
Changelog 2.2 version 2.2.0 Support for networks: o2iblnd OFED 1.5.4 Server support for kernels: 2.6.32220.4.2.el6 (RHEL6) Client support for unpatched kernels: 2.6.18274.18.1.el5 (RHEL5) 2.6.32220.4.2.el6 (RHEL6) 2.6.32.360....