[LU-980] incompatbile error handling Created: 10/Jan/12  Updated: 18/Apr/23  Resolved: 29/Mar/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.0.0
Fix Version/s: Lustre 2.2.0, Lustre 2.1.2

Type: Bug Priority: Major
Reporter: Alexandre Louvet Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 4751

 Description   

We have got this general protection fault on a client after some toubles with
the MDS :

crash> bt
PID: 6696 TASK: ffff88184352e700 CPU: 7 COMMAND: "robinhood"
#0 [ffff8810f44fbc10] machine_kexec at ffffffff8102e77b
#1 [ffff8810f44fbc70] crash_kexec at ffffffff810a6cd8
#2 [ffff8810f44fbd40] oops_end at ffffffff8146aad0
#3 [ffff8810f44fbd70] die at ffffffff8101021b
#4 [ffff8810f44fbda0] do_general_protection at ffffffff8146a642
#5 [ffff8810f44fbdd0] general_protection at ffffffff81469e15
[exception RIP: llog_cat_put+53]
RIP: ffffffffa09f4f65 RSP: ffff8810f44fbe80 RFLAGS: 00010296
RAX: 0000000000000000 RBX: ffff8816756d7600 RCX: 000000000000001d
RDX: 000000000000001d RSI: ffffffff81675340 RDI: 5a5a5a5a5a5a59fa
RBP: ffff8810f44fbec0 R8: 0000000000000000 R9: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8820517c90c0
R13: 00000000ffffffe2 R14: ffffffffffffffe2 R15: 0000000040086694
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#6 [ffff8810f44fbec8] mdc_changelog_send_thread at ffffffffa0c8572f
#7 [ffff8810f44fbf48] kernel_thread at ffffffff8100d1aa
[exception RIP: child_rip]
RIP: ffffffff8100d1a0 RSP: ffff8810f44fbf58 RFLAGS: 00000200
RAX: 0000000000000000 RBX: ffff8818436c80f8 RCX: 0000000000000001
RDX: 0000000000000500 RSI: ffff8816756d7600 RDI: ffffffffa0c85490
RBP: ffff8816a7ddfa58 R8: 0000000000000246 R9: 0000000000000000
R10: ffff8818436a6800 R11: 0000000000000000 R12: ffff880019b772c0
R13: ffff8816756d7600 R14: 0000000000000000 R15: 0000000040086694
ORIG_RAX: 0000000000000000 CS: 0010 SS: 0018
bt: WARNING: possibly bogus exception frame

with this error message shown just before in dmesg :
LustreError: 6696:0:(mdc_request.c:1256:mdc_changelog_send_thread()) llog_create() failed -30

After some investigation, I found that the handle used in llog_cat_put was
created in the following call path :
mdc_changelog_send_thread
llog_create
lop->lop_create = llog_client_create
llog_client_create

where the way the llog_handle is returned in case of error doesn't look
compatible with the error handling done in mdc_changelog_send_thread (llog_create
don't do anything with llog_handle)

In llog_client_create, 'handle' is assigned with the return value of llog_alloc_handle,
which (if !NULL) is directly assigned to the return parameter '*res' (pointer
to llh in mdc_changelog_send_thread). Then, if something goes wrong, the code
jump to err_free where 'handle' is free & poison but res is keep untouched
llog_client_create return a value != 0 and a pointer (in *res) to a freed &
poisoned area.

In mdc_changelog_send_thread, in case of error the error label (out) rely on
the fact that if llh is not NULL, it is valid and should be processed by
llog_cat_put. But it isn't since it as been poisoned in llog_client_create
during the previous error handling.

A simple fix could be to move the *res = handle in llog_client_create just
before the EXIT macro. The *res will stay untouched until the function
successfuly complete.



 Comments   
Comment by Peter Jones [ 10/Jan/12 ]

Hongchao

Can you please look into this one?

Thanks

Peter

Comment by Hongchao Zhang [ 12/Jan/12 ]

the patch is tracked at http://review.whamcloud.com/#change,1958

Comment by Build Master (Inactive) [ 03/Feb/12 ]

Integrated in lustre-master » x86_64,server,el5,ofa #452
LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2)

Result = SUCCESS
Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2
Files :

  • lustre/ptlrpc/llog_client.c
Comment by Build Master (Inactive) [ 03/Feb/12 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #452
LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2)

Result = SUCCESS
Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2
Files :

  • lustre/ptlrpc/llog_client.c
Comment by Build Master (Inactive) [ 03/Feb/12 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #452
LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2)

Result = SUCCESS
Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2
Files :

  • lustre/ptlrpc/llog_client.c
Comment by Build Master (Inactive) [ 03/Feb/12 ]

Integrated in lustre-master » i686,server,el6,inkernel #452
LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2)

Result = SUCCESS
Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2
Files :

  • lustre/ptlrpc/llog_client.c
Comment by Build Master (Inactive) [ 03/Feb/12 ]

Integrated in lustre-master » x86_64,client,el5,ofa #452
LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2)

Result = SUCCESS
Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2
Files :

  • lustre/ptlrpc/llog_client.c
Comment by Build Master (Inactive) [ 03/Feb/12 ]

Integrated in lustre-master » i686,server,el5,ofa #452
LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2)

Result = SUCCESS
Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2
Files :

  • lustre/ptlrpc/llog_client.c
Comment by Build Master (Inactive) [ 03/Feb/12 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #452
LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2)

Result = SUCCESS
Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2
Files :

  • lustre/ptlrpc/llog_client.c
Comment by Build Master (Inactive) [ 03/Feb/12 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #452
LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2)

Result = SUCCESS
Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2
Files :

  • lustre/ptlrpc/llog_client.c
Comment by Build Master (Inactive) [ 03/Feb/12 ]

Integrated in lustre-master » i686,client,el6,inkernel #452
LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2)

Result = SUCCESS
Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2
Files :

  • lustre/ptlrpc/llog_client.c
Comment by Build Master (Inactive) [ 03/Feb/12 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #452
LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2)

Result = SUCCESS
Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2
Files :

  • lustre/ptlrpc/llog_client.c
Comment by Build Master (Inactive) [ 03/Feb/12 ]

Integrated in lustre-master » i686,server,el5,inkernel #452
LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2)

Result = SUCCESS
Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2
Files :

  • lustre/ptlrpc/llog_client.c
Comment by Build Master (Inactive) [ 03/Feb/12 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #452
LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2)

Result = SUCCESS
Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2
Files :

  • lustre/ptlrpc/llog_client.c
Comment by Build Master (Inactive) [ 03/Feb/12 ]

Integrated in lustre-master » i686,client,el5,inkernel #452
LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2)

Result = SUCCESS
Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2
Files :

  • lustre/ptlrpc/llog_client.c
Comment by Build Master (Inactive) [ 03/Feb/12 ]

Integrated in lustre-master » i686,client,el5,ofa #452
LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2)

Result = SUCCESS
Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2
Files :

  • lustre/ptlrpc/llog_client.c
Comment by Build Master (Inactive) [ 17/Feb/12 ]

Integrated in lustre-master » x86_64,server,el6,ofa #480
LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2)

Result = FAILURE
Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2
Files :

  • lustre/ptlrpc/llog_client.c
Comment by Build Master (Inactive) [ 17/Feb/12 ]

Integrated in lustre-master » x86_64,client,el6,ofa #480
LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2)

Result = FAILURE
Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2
Files :

  • lustre/ptlrpc/llog_client.c
Comment by Build Master (Inactive) [ 17/Feb/12 ]

Integrated in lustre-master » i686,client,el6,ofa #480
LU-980 llog: cleanup return value in llog_client_create (Revision e9396abf4199c5877026677c344d25555371fad2)

Result = ABORTED
Oleg Drokin : e9396abf4199c5877026677c344d25555371fad2
Files :

  • lustre/ptlrpc/llog_client.c
Comment by Peter Jones [ 29/Mar/12 ]

Landed for 2.2

Comment by Bob Glossman (Inactive) [ 16/May/12 ]

http://review.whamcloud.com/#change,2806
back port to b2_1

Comment by Gerrit Updater [ 03/Apr/23 ]

"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50511
Subject: LU-980 mount: improve mount/unmount messages
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 40638bcfcc4e77dbbaea96f08f5cd78205ff1a34

Comment by Gerrit Updater [ 18/Apr/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50511/
Subject: LU-980 mount: improve mount/unmount messages
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: cba8c65b384f92d269944042047f7a58d2808530

Generated at Sat Feb 10 01:12:20 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.