[LU-2193] lvbo_init failed for resource XXX: rc -2, after recovery Created: 10/May/12 Updated: 15/Oct/13 Resolved: 06/May/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0, Lustre 2.4.1, Lustre 2.5.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Brian Behlendorf | Assignee: | Zhenyu Xu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | LB | ||
| Attachments: |
|
| Severity: | 3 |
| Epic: | server |
| Project: | Orion |
| Rank (Obsolete): | 3014 |
| Description |
|
We are seeing a lot (>30,000) of the following errors following recovery on our OSTs. This is only observed when the OSTs are under load when they are restarted. I believe this is the same as issue Lustre: Lustre: Build Version: orion-2_2_49_57_1-44chaos-44chaos--PRISTINE-2.6.32-220.13.1.2chaos.ch5.x86_64 LustreError: 11-0: MGC10.1.1.211@o2ib9: Communicating with 10.1.1.211@o2ib9, operation llog_origin_handle_create failed with -2 LustreError: 5497:0:(mgc_request.c:250:do_config_log_add()) failed processing sptlrpc log: -2 Lustre: lcz-OST0000: Mounted Lustre: lcz-OST0000: Will be in recovery for at least 5:00, or until 60 clients reconnect. Lustre: lcz-OST0000: Recovery over after 1:12, of 60 clients 60 recovered and 0 were evicted. LustreError: 5669:0:(ldlm_resource.c:1096:ldlm_resource_get()) lvbo_init failed for resource 25345259: rc -2 LustreError: 5663:0:(ldlm_resource.c:1096:ldlm_resource_get()) lvbo_init failed for resource 25352185: rc -2 LustreError: 5669:0:(ldlm_resource.c:1096:ldlm_resource_get()) Skipped 2 previous similar messages LustreError: 5670:0:(ldlm_resource.c:1096:ldlm_resource_get()) lvbo_init failed for resource 25353836: rc -2 LustreError: 5670:0:(ldlm_resource.c:1096:ldlm_resource_get()) Skipped 168 previous similar messages LustreError: 5665:0:(ldlm_resource.c:1096:ldlm_resource_get()) lvbo_init failed for resource 25418903: rc -2 LustreError: 5665:0:(ldlm_resource.c:1096:ldlm_resource_get()) Skipped 561 previous similar messages LustreError: 5659:0:(ldlm_resource.c:1096:ldlm_resource_get()) lvbo_init failed for resource 25627504: rc -2 LustreError: 5659:0:(ldlm_resource.c:1096:ldlm_resource_get()) Skipped 2043 previous similar messages LustreError: 5673:0:(ldlm_resource.c:1096:ldlm_resource_get()) lvbo_init failed for resource 25772020: rc -2 LustreError: 5673:0:(ldlm_resource.c:1096:ldlm_resource_get()) Skipped 4424 previous similar messages LustreError: 5662:0:(ldlm_resource.c:1096:ldlm_resource_get()) lvbo_init failed for resource 26456381: rc -2 LustreError: 5662:0:(ldlm_resource.c:1096:ldlm_resource_get()) Skipped 10386 previous similar messages LustreError: 5672:0:(ldlm_resource.c:1096:ldlm_resource_get()) lvbo_init failed for resource 28049681: rc -2 LustreError: 5672:0:(ldlm_resource.c:1096:ldlm_resource_get()) Skipped 23629 previous similar messages |
| Comments |
| Comment by Andreas Dilger [ 11/May/12 ] |
|
Brian, can you confirm that the objects in question actually exist in the underlying OST filesystem O/0/d {oid % 32}/ {oid}? This could be a result of the objects being deleted somehow (e.g. broken orphan recovery or precreate), or problems with OI lookup. |
| Comment by Mikhail Pershin [ 11/May/12 ] |
|
Can you get also call trace for those errors? That can be just harmless object deletion which were deleted already. OFD takes local ldlm lock for object destroy, so that can be not client even but destroys from MDS. In that case it is harmless at least. |
| Comment by Brian Behlendorf [ 14/May/12 ] |
|
I'll attempt to recreate the errors under a more controlled test workload and grab the needed logs. |
| Comment by Andreas Dilger [ 15/Oct/12 ] |
|
I hit this in sanity.sh test_17m() with single-node testing (dual-core x86_64): Lustre: DEBUG MARKER: == sanity test 17m: run e2fsck against MDT which contains short/long symlink ========================= 15:54:56 (1350338096) Lustre: server umount testfs-MDT0000 complete Lustre: Skipped 3 previous similar messages LDISKFS-fs (dm-9): mounted filesystem with ordered data mode. quota=on. Opts: LustreError: 166-1: MGC192.168.20.154@tcp: Connection to MGS (at 0@lo) was lost; in progress operations using this service will fail Lustre: MGC192.168.20.154@tcp: Reactivating import Lustre: Found index 0 for testfs-MDT0000, updating log Lustre: Skipped 3 previous similar messages Lustre: Modifying parameter testfs-MDT0000-mdtlov.lov.stripesize in log testfs-MDT0000 Lustre: Skipped 9 previous similar messages Lustre: testfs-MDT0000: used disk, loading LustreError: 21318:0:(sec_config.c:1024:sptlrpc_target_local_copy_conf()) missing llog context Lustre: 21318:0:(mdt_lproc.c:418:lprocfs_wr_identity_upcall()) testfs-MDT0000: identity upcall set to /usr/src/lustre-head/lustre/utils/l_getidentity LustreError: 10244:0:(ldlm_resource.c:1103:ldlm_resource_get()) lvbo_init failed for resource 2236: rc -2 LustreError: 10244:0:(ldlm_resource.c:1103:ldlm_resource_get()) Skipped 107 previous similar messages I also have the debug log for the test run that I'm attaching to the bug. |
| Comment by Andreas Dilger [ 16/Oct/12 ] |
|
It seems that the OFD/OSD code could do the object lookup before the DLM lock is taken, and possibly avoid a bunch of overhead? I don't think the ordering of the object lookup vs. locking is important. ost_destroy()) Process entered ofd_destroy()) Process entered ofd_destroy_by_fid()) Process entered ldlm_cli_enqueue_local()) Process entered ldlm_resource_new()) slab-alloced 'res': 320 at ffff880068e1ac80. ofd_lvbo_init()) Process entered ofd_lvbo_init()) kmalloced 'lvb': 40 at ffff88005c0dad00. ofd_object_find()) Process entered ofd_object_init()) object init, fid = [0x100000000:0xaf3:0x0] osd_compat_objid_lookup()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe) osd_fid_lookup()) Process leaving via out (rc=0 : 0 : 0x0) lu_object_alloc()) Process leaving (rc=18446612134214173264 : -131939495378352 : ffff8800713f0250) ofd_object_find()) Process leaving (rc=18446612134214173176 : -131939495378440 : ffff8800713f01f8) ofd_attr_get()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe) ofd_lvbo_init()) Process leaving via out_put (rc=18446744073709551614 : -2 : 0xfffffffffffffffe) fid_flatten32()) Process leaving (rc=4278192627 : 4278192627 : ff0009f3) ldlm_resource_get()) lvbo_init failed for resource 2803: rc -2 ofd_object_find()) Process leaving (rc=18446612134214173176 : -131939495378440 : ffff8800713f01f8) ofd_object_destroy()) Process leaving via unlock (rc=18446744073709551614 : -2 : 0xfffffffffffffffe) fid_flatten32()) Process leaving (rc=4278192627 : 4278192627 : ff0009f3) ofd_destroy_by_fid()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe) ofd_destroy()) destroying non-existent object 2803 ofd_destroy()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe) obd_destroy()) Process leaving (rc=18446744073709551614 : -2 : fffffffffffffffe) ost_destroy()) Process leaving (rc=0 : 0 : 0) ost_handle()) Process leaving It also appears ofd_destroy->ofd_destroy_by_fid->ldlm_cli_enqueue_local->ldlm->ofd_lvbo_init() is creating and destroying an lu_env for every call... Is it possible to just pass the lu_env down via the lvbo_init code? |
| Comment by Andreas Dilger [ 16/Oct/12 ] |
|
Untested patch: http://review.whamcloud.com/4276 |
| Comment by Alex Zhuravlev [ 16/Oct/12 ] |
|
> Is it possible to just pass the lu_env down via the lvbo_init code? yes, and there are another places in ldlm we want to pass env in. |
| Comment by Andreas Dilger [ 09/Nov/12 ] |
|
Minor patch to improve lvbo_init() error message to print full FID: http://review.whamcloud.com/4501 |
| Comment by Hiroya Nozaki [ 06/Dec/12 ] |
|
Hi, Andreas. We Fujitsu have almost the same issue besides one thing, So now, I'm planning to translate this patch (http://review.whamcloud.com/4276) into b1_8. Our customer got the below message, which is described here, on OSS. 14761:0:(ldlm_resource.c:1011:ldlm_resource_add()) filter-fefs-OST0113_UUID: lvbo_init failed for resource 332822: rc -2 Then, after a while, umount got LBUG in ldlm_lock_cancel on a client node because of an extra refcount of ldlm_lock->l_readers of the extent lock. And the messages showed the same res_name[0] to the oss message. 5538:0:(ldlm_lock.c:1606:ldlm_lock_cancel()) ### lock still has references ns: cx-OST0113-osc-ffff883fd57f4800 lock: ffff881dc8ddb000/0xead40fdb3d64db52 lrc: 4/1,0 mode: PR/PR res: 332822/0 rrc: 2 type: EXT [0->18446744073 5538:0:(ldlm_lock.c:1607:ldlm_lock_cancel()) LBUG crash> bt
PID: 5538 TASK: ffff883577ac6080 CPU: 8 COMMAND: "umount"
#0 [ffff883db6139640] machine_kexec at ffffffff810310db
#1 [ffff883db61396a0] crash_kexec at ffffffff810b63b2
#2 [ffff883db6139770] panic at ffffffff814dac2f
#3 [ffff883db61397f0] lbug_with_loc at ffffffffa04ba453
#4 [ffff883db6139870] ldlm_cli_cancel_local at ffffffffa0697651
#5 [ffff883db61398d0] ldlm_cli_cancel at ffffffffa0699d23
#6 [ffff883db6139950] cleanup_resource at ffffffffa0681e83
#7 [ffff883db61399d0] ldlm_namespace_cleanup at ffffffffa068218f
#8 [ffff883db6139a60] __ldlm_namespace_free at ffffffffa068239b
#9 [ffff883db6139b40] ldlm_namespace_free_prior at ffffffffa06829bc
#10 [ffff883db6139b80] client_disconnect_export at ffffffffa0687766
#11 [ffff883db6139c00] osc_disconnect at ffffffffa07a6df4
#12 [ffff883db6139c60] lov_putref at ffffffffa0829f21
#13 [ffff883db6139d30] ll_put_super at ffffffffa08bbf0d
#14 [ffff883db6139e30] generic_shutdown_super at ffffffff81174adb
#15 [ffff883db6139e50] kill_anon_super at ffffffff81174bc6
#16 [ffff883db6139e70] lustre_kill_super at ffffffffa05dc588
#17 [ffff883db6139e90] deactivate_super at ffffffff81175c40
#18 [ffff883db6139eb0] mntput_no_expire at ffffffff8119113f
#19 [ffff883db6139ee0] sys_umount at ffffffff8119156b
#20 [ffff883db6139f80] tracesys at ffffffff8100b387 (via system_call)
RIP: 00007fdce2a7b717 RSP: 00007fff1f6b5728 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: ffffffff8100b387 RCX: ffffffffffffffff
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007fdce3c11a10
RBP: 00007fdce3c119d0 R8: 00007fdce3c11a30 R9: 0000000000000000
R10: 00007fff1f6b54b0 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 00007fdce3c11a70
ORIG_RAX: 00000000000000a6 CS: 0033 SS: 002b
|
| Comment by Alex Zhuravlev [ 06/Dec/12 ] |
|
no, that patch is not supposed to fix this LBUG. |
| Comment by Hiroya Nozaki [ 06/Dec/12 ] |
|
Hi, Alex. Thank you for your instant reply. hm, but, anyway, it seems that I'd better read lustre source code much more. |
| Comment by Andreas Dilger [ 06/Dec/12 ] |
|
Sorry, the patches here will not fix this problem. My first patch was only for a bug in the master branch, and second patch is only for improving the error message. |
| Comment by Hiroya Nozaki [ 07/Dec/12 ] |
|
Hi, Andreas. I am sorry to have caused trouble, and thank you !!! |
| Comment by Andreas Dilger [ 03/May/13 ] |
|
Seems that the initial patch to do the object lookup before destroy has slowed this message down but not removed it entirely, or maybe I just don't see as many messages under my test load. It appears that the lvbo_init objects are NOT in the range of the just-cleaned-up orphan objects, which is what I would expect from the original debugging: Lustre: testfs-OST0000: deleting orphan objects from 0x0:28913 to 28929 Lustre: testfs-OST0001: deleting orphan objects from 0x0:29944 to 29985 : : Lustre: testfs-OST0001: deleting orphan objects from 0x0:55389 to 57441 Lustre: testfs-OST0000: deleting orphan objects from 0x0:54397 to 54977 LustreError: 6979:0:ldlm_resource_get()) testfs-OST0001: lvbo_init failed for resource 0x7e32:0x0: rc = -2 LustreError: 6999:0:ldlm_resource_get()) testfs-OST0001: lvbo_init failed for resource 0x80c4:0x0: rc = -2 LustreError: 6995:0:ldlm_resource_get()) testfs-OST0000: lvbo_init failed for resource 0x998d:0x0: rc = -2 (gdb) p 0x7e32 $5 = 32306 (gdb) p 0x80c4 $6 = 32964 (gdb) p 0x998d $7 = 39309 There are of course messages being skipped in these console logs, but it seems like a disjoint set of objects. |
| Comment by Zhenyu Xu [ 06/May/13 ] |
|
the lvbo_init still happens in ofd_orphans_destory(). |
| Comment by Zhenyu Xu [ 06/May/13 ] |
|
patch http://review.whamcloud.com/6266 commit message LU-2193 ofd: check object exists before orphan destroy MDS replay object destroys after recovery could meet non-existing objects, skip it before following futile actions. need check existence before trying to destroy an orphan object. |
| Comment by Andreas Dilger [ 06/May/13 ] |
|
Looks like my earlier patch was close but not quite enough to fix the real issue, and I forgot to follow up after it was tested and landed. I think a real problem here is that everyone runs the tests in Gerrit and not enough people look at the console output anymore. I was just looking though the console logs from an acceptance-small test run on my local system on Friday, and there are a lot of extra messages that could be quieted (I submitted http://review.whamcloud.com/6264 to quiet some already), and issues like this are visible among the noise. I don't think we need to read the logs for every successful test, but I think it makes sense to look through it on occasion. I'm not sure if it is practical, but I'm wondering if it would make sense to have test pass/fail also depend on "expect" output for both the stdout and console error message? |
| Comment by Brian Behlendorf [ 06/May/13 ] |
|
> I'm not sure if it is practical, but I'm wondering if it would make sense to have test pass/fail also depend on "expect" output for both the stdout and console error message? I think that's an excellent idea. I currently do something similar with the ZFS builds by automatically flagging any tests which generate console messages with a warning. It's helped me catch several issues immediately which might otherwise have slipped through. For Lustre you'll probably need something more sophisticated to filter out the expected messages, or better yet to go on a crusade silencing those warnings. |
| Comment by Andreas Dilger [ 06/May/13 ] |
|
The difficulty I foresee is related to regexp matching of FID and NID strings in the output messages. I don't know what kind of regexp matching is in expect, since I've only ever used it with e2fsprogs, where it just does literal output matching, and there is a sed preprocessor script that strips out all of the test-unique lines (e.g. e2fsprogs version, format time, UUID, etc). |
| Comment by Andreas Dilger [ 06/May/13 ] |
|
Patch landed for 2.4.0. |