[LU-2991] (osd_internal.h:909:osd_trans_exec_op()) ASSERTION( oti->oti_declare_ops_rb[rb] > 0 ) failed: rb = 2 Created: 19/Mar/13 Updated: 15/Apr/13 Resolved: 15/Apr/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Minh Diep | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | MB | ||
| Environment: | |||
| Severity: | 3 |
| Rank (Obsolete): | 7290 |
| Description |
|
1 mds, 2 oss with 150 osts each rm -rf /mnt/lustre/dir* LustreError: 3627:0:(osd_internal.h:909:osd_trans_exec_op()) ASSERTION( oti->oti_declare_ops_rb[rb] > 0 ) failed: rb = 2 Call Trace: LustreError: dumping log to /tmp/lustre-log.1363711711.3627 |
| Comments |
| Comment by Andreas Dilger [ 20/Mar/13 ] |
|
Looks like the transaction credit debugging caught a case where we didn't declare any credits for the rollback or something. The http://review.whamcloud.com/5698 change will disable this accounting for 2.3.90 and beyond, but I suspect we are missing some error handling code? |
| Comment by Bruno Faccini (Inactive) [ 22/Mar/13 ] |
|
Minh, |
| Comment by Minh Diep [ 22/Mar/13 ] |
|
I ran this on our lab creating osts on LVM. I see if I can get the crash dump. I'll ping you offline to see if you can access the system |
| Comment by Bruno Faccini (Inactive) [ 22/Mar/13 ] |
|
Assert comes from Patch/commit 28fc8fcc8bfe092c9a1a8c192ab6fe22d92820e7 for Both the transaction credits counters declaration and the Assert comes from Patch/commit 28fc8fcc8bfe092c9a1a8c192ab6fe22d92820e7 for |
| Comment by Alex Zhuravlev [ 22/Mar/13 ] |
|
with wide striping - yes. |
| Comment by Bruno Faccini (Inactive) [ 25/Mar/13 ] |
|
Thank's Alex, and assume this may have occur whith the 150 OSTs Minh was using ! I changed the size of the transaction credits counters from uchar to ushort within patch http://review.whamcloud.com/5830 for master I just pushed to gerrit. Minh, if no build failures (I still expect side-effects during build+run when changing size !!) can you give it a try ?? |
| Comment by Bruno Faccini (Inactive) [ 26/Mar/13 ] |
|
Humm, tests of build for my patch failed in sanity/test_24v with same LBUG : 04:13:52:Lustre: DEBUG MARKER: == sanity test 24v: list directory with large files (handle hash collision, bug: 17560) == 04:04:38 (1364209478) 04:13:52:LustreError: 11418:0:(osd_internal.h:909:osd_trans_exec_op()) ASSERTION( oti->oti_declare_ops_rb[rb] > 0 ) failed: rb = 5 04:13:52:LustreError: 11418:0:(osd_internal.h:909:osd_trans_exec_op()) LBUG 04:13:52:Pid: 11418, comm: mdt00_000 04:13:52: 04:13:53:Call Trace: 04:13:53: [<ffffffffa04d7895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 04:13:53: [<ffffffffa04d7e97>] lbug_with_loc+0x47/0xb0 [libcfs] 04:13:53: [<ffffffffa0d44360>] osd_object_ref_del+0x2d0/0x310 [osd_ldiskfs] 04:13:53: [<ffffffffa0f157fb>] lod_ref_del+0x3b/0xd0 [lod] 04:13:53: [<ffffffffa0c6acd5>] mdo_ref_del+0x35/0xc0 [mdd] 04:13:54: [<ffffffffa0c72446>] mdd_unlink+0x6c6/0xe20 [mdd] 04:13:54: [<ffffffffa0e61a18>] mdo_unlink+0x18/0x50 [mdt] 04:13:54: [<ffffffffa0e64cd9>] mdt_reint_unlink+0x739/0xfd0 [mdt] 04:13:54: [<ffffffffa0e616d1>] mdt_reint_rec+0x41/0xe0 [mdt] 04:13:54: [<ffffffffa0e5ad53>] mdt_reint_internal+0x4e3/0x7d0 [mdt] 04:13:55: [<ffffffffa0e5b084>] mdt_reint+0x44/0xe0 [mdt] 04:13:55: [<ffffffffa0e49078>] mdt_handle_common+0x648/0x1660 [mdt] 04:13:56: [<ffffffffa0e85125>] mds_regular_handle+0x15/0x20 [mdt] 04:13:56: [<ffffffffa07f91dc>] ptlrpc_server_handle_request+0x41c/0xdf0 [ptlrpc] 04:13:56: [<ffffffffa04d85de>] ? cfs_timer_arm+0xe/0x10 [libcfs] 04:13:57: [<ffffffffa07f0819>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] 04:13:57: [<ffffffff81052223>] ? __wake_up+0x53/0x70 04:13:57: [<ffffffffa07fa725>] ptlrpc_main+0xb75/0x1870 [ptlrpc] 04:13:58: [<ffffffffa07f9bb0>] ? ptlrpc_main+0x0/0x1870 [ptlrpc] 04:13:58: [<ffffffff8100c0ca>] child_rip+0xa/0x20 04:13:58: [<ffffffffa07f9bb0>] ? ptlrpc_main+0x0/0x1870 [ptlrpc] 04:13:58: [<ffffffffa07f9bb0>] ? ptlrpc_main+0x0/0x1870 [ptlrpc] 04:13:58: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 04:13:58: 04:13:59:Kernel panic - not syncing: LBUG 04:13:59:Pid: 11418, comm: mdt00_000 Not tainted 2.6.32-279.19.1.el6_lustre.gc4681d8.x86_64 #1 04:13:59:Call Trace: 04:14:00: [<ffffffff814e9811>] ? panic+0xa0/0x168 04:14:00: [<ffffffffa04d7eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] 04:14:00: [<ffffffffa0d44360>] ? osd_object_ref_del+0x2d0/0x310 [osd_ldiskfs] 04:14:00: [<ffffffffa0f157fb>] ? lod_ref_del+0x3b/0xd0 [lod] 04:14:00: [<ffffffffa0c6acd5>] ? mdo_ref_del+0x35/0xc0 [mdd] 04:14:01: [<ffffffffa0c72446>] ? mdd_unlink+0x6c6/0xe20 [mdd] 04:14:01: [<ffffffffa0e61a18>] ? mdo_unlink+0x18/0x50 [mdt] 04:14:01: [<ffffffffa0e64cd9>] ? mdt_reint_unlink+0x739/0xfd0 [mdt] 04:14:01: [<ffffffffa0e616d1>] ? mdt_reint_rec+0x41/0xe0 [mdt] 04:14:01: [<ffffffffa0e5ad53>] ? mdt_reint_internal+0x4e3/0x7d0 [mdt] 04:14:01: [<ffffffffa0e5b084>] ? mdt_reint+0x44/0xe0 [mdt] 04:14:02: [<ffffffffa0e49078>] ? mdt_handle_common+0x648/0x1660 [mdt] 04:14:02: [<ffffffffa0e85125>] ? mds_regular_handle+0x15/0x20 [mdt] 04:14:02: [<ffffffffa07f91dc>] ? ptlrpc_server_handle_request+0x41c/0xdf0 [ptlrpc] 04:14:02: [<ffffffffa04d85de>] ? cfs_timer_arm+0xe/0x10 [libcfs] 04:14:02: [<ffffffffa07f0819>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] 04:14:03: [<ffffffff81052223>] ? __wake_up+0x53/0x70 04:14:03: [<ffffffffa07fa725>] ? ptlrpc_main+0xb75/0x1870 [ptlrpc] 04:14:03: [<ffffffffa07f9bb0>] ? ptlrpc_main+0x0/0x1870 [ptlrpc] 04:14:03: [<ffffffff8100c0ca>] ? child_rip+0xa/0x20 04:14:03: [<ffffffffa07f9bb0>] ? ptlrpc_main+0x0/0x1870 [ptlrpc] 04:14:03: [<ffffffffa07f9bb0>] ? ptlrpc_main+0x0/0x1870 [ptlrpc] 04:14:03: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 Unless my very little change of counter's size introduced a regression, I am afraid that recent landed patches from |
| Comment by Bruno Faccini (Inactive) [ 26/Mar/13 ] |
|
I am trying to reproduce on a single Toro node and loop devices but no success/LBUG till now ... export OSTCOUNT=150
/usr/lib64/lustre/tests/llmount.sh
while true ; do
for i in $(seq 1 300); do
mkdir /mnt/lustre/dir$i
lfs setstripe -c 150 /mnt/lustre/dir$i
touch /mnt/lustre/dir$i/file
lfs getstripe /mnt/lustre/dir$i
done
rm -rf /mnt/lustre/dir*
done
may be I need more Nodes/VMs to trigger it ?? |
| Comment by Minh Diep [ 26/Mar/13 ] |
|
Yes, I think you might need more than one node to reproduce it. Did auto test provide a crash dump? |
| Comment by Bruno Faccini (Inactive) [ 26/Mar/13 ] |
|
Yes, according to MDS Console log, but then how/where can I find it ? |
| Comment by Minh Diep [ 26/Mar/13 ] |
|
I hit this during your patch test. I am not sure if it's related, likely not. I will reproduce this bug on master and collect the crash dump. ----------- Pid: 674, comm: mdt00_003 Not tainted 2.6.32-279.19.1.el6_lustre.gc4681d8.x86_64 #1 Supermicro X8DTH-i/6/iF/6F/X8DTH I will file a separate bug if I hit this on master |
| Comment by Bruno Faccini (Inactive) [ 27/Mar/13 ] |
|
According to stack-trace, I don't think it is related, but as I already said, with item size changes some weird consequences can show-up ... Also, Chris confirmed that actually auto-test generated crash-dumps are not saved, it is in their to-do list. Thus I rely on your testing to reproduce the LBUG, in // I try to build a configuration with more Nodes/VMs too, like the failed auto-test one. |
| Comment by Minh Diep [ 27/Mar/13 ] |
|
I have reproduced it on master and got the debug log with debug=-1. I also did a crash dump but since this is a LBUG, not sure if a crash dump is useful. |
| Comment by Bruno Faccini (Inactive) [ 28/Mar/13 ] |
|
crash-dump is not really useful in this case since it can only confirm that oti->oti_declare_ops_rb[OSD_OT_XATTR_SET] = 0 which caused the Assert to trigger. But the lustre-logs are much more interesting since they confirm that the thread that trigger the Assert was gathering transaction credits stats for a work-load where wide-striping was used, and each time near the 256 (in fact 253 + ??) "fatal" value. So I still suspect an overflow due to uchar size ... Minh, I know that for the crash-dump you were running with a genuine master build, but was it also the case for the full/debug=1 trace case ? In fact, I would like to know if you got such Assert/LBUG when running with my patch ?? |
| Comment by Alex Zhuravlev [ 28/Mar/13 ] |
|
to check the theory about uchar size, you could put LASSERT(currect_value < 255); just before increment? |
| Comment by Bruno Faccini (Inactive) [ 28/Mar/13 ] |
|
You are right Alex, I thought I can find it by counting accurate traces in the full debug log from Minh, but seems we miss these traces in fact ... Minh, I just submitted http://review.whamcloud.com/5869 to trigger if we reach UCHAR_MAX, can you give it a try and expose it to your reproducer+configuration ?? |
| Comment by Bruno Faccini (Inactive) [ 29/Mar/13 ] |
|
Ok, fact that we can reach UCHAR_MAX and overflow oti_declare_ops counter has been definitely proven. Then, changing the size of both oti_declare_ops/oti_declare_ops_rb from uchar to ushort, as per 1st patch/change 5830, seems to allow us to go further, but then we finally trigger the same Assert, now because oti_rollback is true but oti_declare_ops_rb is Null !!… Seems that we missed something in the rollback logic … So, we continue to debug by adding traces/prints to see how counters progress and find where things break. |
| Comment by Alex Zhuravlev [ 29/Mar/13 ] |
|
notice the checks in rollback path are not quite correct. in short, from ldiskfs/zfs point of view, there is no point to declare rollback changes: if you have insert for a name declared, then you'll be able to remove that name as part of rollback. this is why we plan to just disable these checks in production (until a better mechanism to track changes is implemented). |
| Comment by Minh Diep [ 01/Apr/13 ] |
|
Alex, can you confirm that if we disable these check in production, this ASSERTION will no happen? |
| Comment by Alex Zhuravlev [ 01/Apr/13 ] |
|
the assertions in OSD - yes. but I think we still need to understand the root cause for kernel BUG at fs/jbd2/transaction.c:1033! |
| Comment by Bruno Faccini (Inactive) [ 01/Apr/13 ] |
|
Minh, Yes, undefine of compile-time OSD_TRACK_DECLARES will fully disable the concerned transaction credits counters feature/code. |
| Comment by Minh Diep [ 01/Apr/13 ] |
|
Disable by taking out the define OSD_TRACK_DECLARES results in compile error /home/minh.diep/build/lu2991/lustre/osd-ldiskfs/osd_handler.c: In function ‘osd_trans_create’: Bruno, no bug was filed for the kernel BUG and haven't hit it again |
| Comment by Bruno Faccini (Inactive) [ 01/Apr/13 ] |
|
Humm, in osd_trans_create() function oti_declare_ops[_rb,_cred] and rollback fields memset()s/initializations need to be put under #ifdef OSD_TRACK_DECLARES. Same for last_credits/last_printed variables declares in osd_trans_start(). Then we may run your tests again and see if we trigger Oops in jbd2 again. |
| Comment by Minh Diep [ 02/Apr/13 ] |
|
ok, fixed the compile issue and ran the test for couple hours now. No ASSERTION. I will try to reproduce the Oops but in the mean time, this bug's priority can be lowered. |
| Comment by Bruno Faccini (Inactive) [ 03/Apr/13 ] |
|
Patch-set #3 submitted according to the fact patch-set #1 did not correctly initialized/memset()s the new sized fields. If it still shows the same wrong behavior we will definitely revert unset/undefine OSD_TRACK_DECLARES. |
| Comment by Bruno Faccini (Inactive) [ 05/Apr/13 ] |
|
I had to re-base patch #6 due to auto-tests failures like racer/test_1 [LustreError: 31880:0:(file.c:930:ll_file_io_generic()) ASSERTION( io->u.ci_rw.crw_count == count ) failed: 21504 != 4194304] which has been fixed with |
| Comment by Bruno Faccini (Inactive) [ 07/Apr/13 ] |
|
Wow too bad, now patch-set #6 experienced 2 auto-tests failures (sanity/test_56w linked to So, patch-set #7 just submitted after rebase ... |
| Comment by Bruno Faccini (Inactive) [ 08/Apr/13 ] |
|
Again 4x, and different from the preceding!, auto-tests failures. But all in review-zfs group, with at least 2 of them being known/perf ZFS issues ( |