[LU-3669] (mdt_xattr.c:178:mdt_getxattr()) ASSERTION( lu_object_assert_exists(&info->mti_object->mot_obj) ) failed: - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Blocker
Fix Version/s: Lustre 2.5.0
Affects Version/s: Lustre 2.5.0
Labels:
- HB
- mdt
- patch
- xattr

Severity:
3
Rank (Obsolete):
9457

Description

Using today's master 2.4.52-72-g643e0ae with the extra racer operations from http://review.whamcloud.com/#/c/5936/ I see LDLM hangs and LBUGs is
the getxattr handler.

# MOUNT_2=y llmount.sh
# sh ./lustre/tests/racer.sh

LNet: Service thread pid 16149 was inactive for 40.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
Pid: 16149, comm: mdt01_004

# cat /proc/16149/stack
[<ffffffffa04ed641>] cfs_waitq_timedwait+0x11/0x20 [libcfs]
[<ffffffffa07f912d>] ldlm_completion_ast+0x4ed/0x960 [ptlrpc]
[<ffffffffa07f8870>] ldlm_cli_enqueue_local+0x1f0/0x5c0 [ptlrpc]
[<ffffffffa0cd3c0b>] mdt_object_lock0+0x33b/0xaf0 [mdt]
[<ffffffffa0cd4484>] mdt_object_lock+0x14/0x20 [mdt]
[<ffffffffa0cd4520>] mdt_intent_getxattr+0x90/0x160 [mdt]
[<ffffffffa0cd111e>] mdt_intent_policy+0x3ae/0x770 [mdt]
[<ffffffffa07d9471>] ldlm_lock_enqueue+0x361/0x8c0 [ptlrpc]
[<ffffffffa080219f>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc]
[<ffffffffa0cd15e6>] mdt_enqueue+0x46/0xe0 [mdt]
[<ffffffffa0cd7db7>] mdt_handle_common+0x647/0x16d0 [mdt]
[<ffffffffa0d12d25>] mds_regular_handle+0x15/0x20 [mdt]
[<ffffffffa0831fd8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
[<ffffffffa083335d>] ptlrpc_main+0xabd/0x1700 [ptlrpc]
[<ffffffff81096936>] kthread+0x96/0xa0
[<ffffffff8100c0ca>] child_rip+0xa/0x20
[<ffffffffffffffff>] 0xffffffffffffffff

LustreError: 11483:0:(mdt_xattr.c:178:mdt_getxattr()) ASSERTION( lu_object_assert_exists(&info->mti_object->mot_obj) ) failed: 
LustreError: 11483:0:(mdt_xattr.c:178:mdt_getxattr()) LBUG
Pid: 11483, comm: mdt00_005

crash> bt
PID: 11483  TASK: ffff8801f2651540  CPU: 0   COMMAND: "mdt00_005"
 #0 [ffff8801feb478f8] machine_kexec at ffffffff81035c9b
 #1 [ffff8801feb47958] crash_kexec at ffffffff810c0d22
 #2 [ffff8801feb47a28] panic at ffffffff8150eeff
 #3 [ffff8801feb47aa8] lbug_with_loc at ffffffffa0c08eeb [libcfs]
 #4 [ffff8801feb47ac8] mdt_getxattr at ffffffffa06ad35f [mdt]
 #5 [ffff8801feb47b68] mdt_intent_getxattr at ffffffffa068a52e [mdt]
 #6 [ffff8801feb47bb8] mdt_intent_policy at ffffffffa068711e [mdt]
 #7 [ffff8801feb47bf8] ldlm_lock_enqueue at ffffffffa0e6e471 [ptlrpc]
 #8 [ffff8801feb47c58] ldlm_handle_enqueue0 at ffffffffa0e9719f [ptlrpc]
 #9 [ffff8801feb47cc8] mdt_enqueue at ffffffffa06875e6 [mdt]
#10 [ffff8801feb47ce8] mdt_handle_common at ffffffffa068ddf7 [mdt]
#11 [ffff8801feb47d38] mds_regular_handle at ffffffffa06c8ca5 [mdt]
#12 [ffff8801feb47d48] ptlrpc_server_handle_request at ffffffffa0ec6fd8 [ptlrpc]
#13 [ffff8801feb47e48] ptlrpc_main at ffffffffa0ec835d [ptlrpc]
#14 [ffff8801feb47ee8] kthread at ffffffff81096936
#15 [ffff8801feb47f48] kernel_thread at ffffffff8100c0ca

We can't keep adding IBITs without defining some ordering or hierarchy for when we use them.

00010000:00010000:6.0:1375198422.885704:0:5685:0:(ldlm_resource.c:1448:ldlm_resource_dump()) --- Resource: [0x200000401:0x8:0x0].0 (ffff8801eceee480) refcount = 8
00010000:00010000:6.0:1375198422.885707:0:5685:0:(ldlm_resource.c:1451:ldlm_resource_dump()) Granted locks (in reverse order):
00010000:00010000:6.0:1375198422.885708:0:5685:0:(ldlm_resource.c:1454:ldlm_resource_dump()) ### ### ns: mdt-lustre-MDT0000_UUID lock: ffff8801e9cc9d00/0x132cecfff5acfc2f lrc: 3/0,0 mode: PW/PW res: [0x200000401:0x8:0x0].0 bits 0x20 rrc: 8 type: IBT flags: 0x60200000000020 nid: 0@lo remote: 0x132cecfff5acfc05 expref: 30 pid: 3274 timeout: 4294866401 lvb_type: 0
00010000:00010000:6.0:1375198422.885715:0:5685:0:(ldlm_resource.c:1454:ldlm_resource_dump()) ### ### ns: mdt-lustre-MDT0000_UUID lock: ffff8801eeae4d40/0x132cecfff5acf6e1 lrc: 2/0,0 mode: CR/CR res: [0x200000401:0x8:0x0].0 bits 0x9 rrc: 8 type: IBT flags: 0x40200000000000 nid: 0@lo remote: 0x132cecfff5acf6c5 expref: 30 pid: 5252 timeout: 0 lvb_type: 0
00010000:00010000:6.0:1375198422.885721:0:5685:0:(ldlm_resource.c:1454:ldlm_resource_dump()) ### ### ns: mdt-lustre-MDT0000_UUID lock: ffff8801ec5246c0/0x132cecfff5acebf8 lrc: 2/0,0 mode: CR/CR res: [0x200000401:0x8:0x0].0 bits 0x9 rrc: 8 type: IBT flags: 0x40200000000000 nid: 0@lo remote: 0x132cecfff5acebdc expref: 39 pid: 3274 timeout: 0 lvb_type: 0
00010000:00010000:6.0:1375198422.885727:0:5685:0:(ldlm_resource.c:1454:ldlm_resource_dump()) ### ### ns: mdt-lustre-MDT0000_UUID lock: ffff8801eb589280/0x132cecfff5ace354 lrc: 2/0,0 mode: CR/CR res: [0x200000401:0x8:0x0].0 bits 0x8 rrc: 8 type: IBT flags: 0x40000000000000 nid: 0@lo remote: 0x132cecfff5ace34d expref: 30 pid: 3810 timeout: 0 lvb_type: 3
00010000:00010000:6.0:1375198422.885734:0:5685:0:(ldlm_resource.c:1469:ldlm_resource_dump()) Waiting locks:
00010000:00010000:6.0:1375198422.885736:0:5685:0:(ldlm_resource.c:1471:ldlm_resource_dump()) ### ### ns: mdt-lustre-MDT0000_UUID lock: ffff8801eea356c0/0x132cecfff5ad0e42 lrc: 3/1,0 mode: --/PR res: [0x200000401:0x8:0x0].0 bits 0x20 rrc: 8 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 5306 timeout: 0 lvb_type: 0
00010000:00010000:6.0:1375198422.885742:0:5685:0:(ldlm_resource.c:1471:ldlm_resource_dump()) ### ### ns: mdt-lustre-MDT0000_UUID lock: ffff8801e95e8d00/0x132cecfff5ad0f1b lrc: 3/0,1 mode: --/EX res: [0x200000401:0x8:0x0].0 bits 0x2 rrc: 8 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 3274 timeout: 0 lvb_type: 0

Attachments

Issue Links

is duplicated by

LU-3795 replay-single test_58b test_58c: trusted.foo: No such attribute

Resolved

LU-3730 sanity-hsm test_3 Error: 'could not create file'

Resolved

LU-3746 Test failure on test suite lustre-rsync-test, subtest test_1

Resolved

LU-3911 Failure on test suite sanityn test_14a

Resolved

is related to

LU-3713 Failure on test suite sanity test_17k: get_xattr_names: llistxattr failed

Resolved

LU-3938 Test failure on test suite conf-sanity, test_61: getxattr failed: -34

Resolved

is related to

LU-2869 extended attribute cache for Lustre

Resolved

(1 is related to, 1 is related to )

(mdt_xattr.c:178:mdt_getxattr()) ASSERTION( lu_object_assert_exists(&info->mti_object->mot_obj) ) failed:

Details

Description

Attachments

Issue Links

Activity

People

Dates