Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.0.0
-
None
-
3
-
4751
Description
We have got this general protection fault on a client after some toubles with
the MDS :
crash> bt
PID: 6696 TASK: ffff88184352e700 CPU: 7 COMMAND: "robinhood"
#0 [ffff8810f44fbc10] machine_kexec at ffffffff8102e77b
#1 [ffff8810f44fbc70] crash_kexec at ffffffff810a6cd8
#2 [ffff8810f44fbd40] oops_end at ffffffff8146aad0
#3 [ffff8810f44fbd70] die at ffffffff8101021b
#4 [ffff8810f44fbda0] do_general_protection at ffffffff8146a642
#5 [ffff8810f44fbdd0] general_protection at ffffffff81469e15
[exception RIP: llog_cat_put+53]
RIP: ffffffffa09f4f65 RSP: ffff8810f44fbe80 RFLAGS: 00010296
RAX: 0000000000000000 RBX: ffff8816756d7600 RCX: 000000000000001d
RDX: 000000000000001d RSI: ffffffff81675340 RDI: 5a5a5a5a5a5a59fa
RBP: ffff8810f44fbec0 R8: 0000000000000000 R9: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8820517c90c0
R13: 00000000ffffffe2 R14: ffffffffffffffe2 R15: 0000000040086694
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#6 [ffff8810f44fbec8] mdc_changelog_send_thread at ffffffffa0c8572f
#7 [ffff8810f44fbf48] kernel_thread at ffffffff8100d1aa
[exception RIP: child_rip]
RIP: ffffffff8100d1a0 RSP: ffff8810f44fbf58 RFLAGS: 00000200
RAX: 0000000000000000 RBX: ffff8818436c80f8 RCX: 0000000000000001
RDX: 0000000000000500 RSI: ffff8816756d7600 RDI: ffffffffa0c85490
RBP: ffff8816a7ddfa58 R8: 0000000000000246 R9: 0000000000000000
R10: ffff8818436a6800 R11: 0000000000000000 R12: ffff880019b772c0
R13: ffff8816756d7600 R14: 0000000000000000 R15: 0000000040086694
ORIG_RAX: 0000000000000000 CS: 0010 SS: 0018
bt: WARNING: possibly bogus exception frame
with this error message shown just before in dmesg :
LustreError: 6696:0:(mdc_request.c:1256:mdc_changelog_send_thread()) llog_create() failed -30
After some investigation, I found that the handle used in llog_cat_put was
created in the following call path :
mdc_changelog_send_thread
llog_create
lop->lop_create = llog_client_create
llog_client_create
where the way the llog_handle is returned in case of error doesn't look
compatible with the error handling done in mdc_changelog_send_thread (llog_create
don't do anything with llog_handle)
In llog_client_create, 'handle' is assigned with the return value of llog_alloc_handle,
which (if !NULL) is directly assigned to the return parameter '*res' (pointer
to llh in mdc_changelog_send_thread). Then, if something goes wrong, the code
jump to err_free where 'handle' is free & poison but res is keep untouched
llog_client_create return a value != 0 and a pointer (in *res) to a freed &
poisoned area.
In mdc_changelog_send_thread, in case of error the error label (out) rely on
the fact that if llh is not NULL, it is valid and should be processed by
llog_cat_put. But it isn't since it as been poisoned in llog_client_create
during the previous error handling.
A simple fix could be to move the *res = handle in llog_client_create just
before the EXIT macro. The *res will stay untouched until the function
successfuly complete.
Attachments
Issue Links
- Trackbacks
-
Changelog 2.1 Changes from version 2.1.1 to version 2.1.2 Server support for kernels: 2.6.18308.4.1.el5 (RHEL5) 2.6.32220.17.1.el6 (RHEL6) Client support for unpatched kernels: 2.6.18308.4.1.el5 (RHEL5) 2.6.32220.17.1....
-
Changelog 2.2 version 2.2.0 Support for networks: o2iblnd OFED 1.5.4 Server support for kernels: 2.6.32220.4.2.el6 (RHEL6) Client support for unpatched kernels: 2.6.18274.18.1.el5 (RHEL5) 2.6.32220.4.2.el6 (RHEL6) 2.6.32.360....