Details
-
Technical task
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.1.1
-
None
-
9747
Description
Hit the following LBUG:
2012-04-29 07:43:52 LustreError: 83833:0:(lovsub_lock.c:381:lovsub_lock_delete_one()) lock@ffff8802615024d8[4 2 0 0 0 00000005] P(0):[0, 18446744073709551615]@[0x16b9ac4cbe:0xc:0x0] { 2012-04-29 07:43:52 LustreError: 83833:0:(lovsub_lock.c:381:lovsub_lock_delete_one()) vvp@ffff88025946e9e8: 2012-04-29 07:43:52 LustreError: 83833:0:(lovsub_lock.c:381:lovsub_lock_delete_one()) lov@ffff880431e80cf8: 2 2012-04-29 07:43:52 LustreError: 83833:0:(lovsub_lock.c:381:lovsub_lock_delete_one()) 0 0: --- 2012-04-29 07:43:52 LustreError: 83833:0:(lovsub_lock.c:381:lovsub_lock_delete_one()) 1 1: lock@ffff8803c465faf8[1 3 0 1 1 00000000] R(1):[0, 18446744073709551615]@[0x100a80000:0x1b3b212:0x0] { 2012-04-29 07:43:52 LustreError: 83833:0:(lovsub_lock.c:381:lovsub_lock_delete_one()) lovsub@ffff8801f6d645a0: [1 ffff880431e80cf8 P(0):[0, 18446744073709551615]@[0x16b9ac4cbe:0xc:0x0]] 2012-04-29 07:43:52 LustreError: 83833:0:(lovsub_lock.c:381:lovsub_lock_delete_one()) osc@ffff8803c4694b50: ffff88010cd766c0 00101001 0xe12a56a3ad7ca7fa 3 ffff880428397e48 size: 0 mtime: 1335700029 atime: 1335700029 ctime: 1335700029 blocks: 0 2012-04-29 07:43:52 LustreError: 83833:0:(lovsub_lock.c:381:lovsub_lock_delete_one()) } lock@ffff8803c465faf8 2012-04-29 07:43:52 LustreError: 83833:0:(lovsub_lock.c:381:lovsub_lock_delete_one()) 2012-04-29 07:43:52 LustreError: 83833:0:(lovsub_lock.c:381:lovsub_lock_delete_one()) } lock@ffff8802615024d8 2012-04-29 07:43:52 LustreError: 83833:0:(lovsub_lock.c:381:lovsub_lock_delete_one()) Delete CLS_HELD lock 2012-04-29 07:43:52 LustreError: 83833:0:(lovsub_lock.c:383:lovsub_lock_delete_one()) Impossible state: 2 2012-04-29 07:43:52 LustreError: 83833:0:(lovsub_lock.c:384:lovsub_lock_delete_one()) LBUG
Yes, original code can trigger fake OOM because wrong error code was returned by ll_fault(), so I guess OOM you have seen should go away after
LU-1299is applied.From what I have seen from log, it looks very like that a glimpse of file size was interrupted by a signal. Now that you mentioned it was hit in a normal file system usage like ls, it would exist another path to have the same back trace because ls won't issue signals afaik.
Anyway, please you apply this patch and try to reproduce it again, this way we can get more information and move steps forward.