Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.5.0
-
3
-
9457
Description
Using today's master 2.4.52-72-g643e0ae with the extra racer operations from http://review.whamcloud.com/#/c/5936/ I see LDLM hangs and LBUGs is
the getxattr handler.
# MOUNT_2=y llmount.sh # sh ./lustre/tests/racer.sh LNet: Service thread pid 16149 was inactive for 40.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Pid: 16149, comm: mdt01_004 # cat /proc/16149/stack [<ffffffffa04ed641>] cfs_waitq_timedwait+0x11/0x20 [libcfs] [<ffffffffa07f912d>] ldlm_completion_ast+0x4ed/0x960 [ptlrpc] [<ffffffffa07f8870>] ldlm_cli_enqueue_local+0x1f0/0x5c0 [ptlrpc] [<ffffffffa0cd3c0b>] mdt_object_lock0+0x33b/0xaf0 [mdt] [<ffffffffa0cd4484>] mdt_object_lock+0x14/0x20 [mdt] [<ffffffffa0cd4520>] mdt_intent_getxattr+0x90/0x160 [mdt] [<ffffffffa0cd111e>] mdt_intent_policy+0x3ae/0x770 [mdt] [<ffffffffa07d9471>] ldlm_lock_enqueue+0x361/0x8c0 [ptlrpc] [<ffffffffa080219f>] ldlm_handle_enqueue0+0x4ef/0x10b0 [ptlrpc] [<ffffffffa0cd15e6>] mdt_enqueue+0x46/0xe0 [mdt] [<ffffffffa0cd7db7>] mdt_handle_common+0x647/0x16d0 [mdt] [<ffffffffa0d12d25>] mds_regular_handle+0x15/0x20 [mdt] [<ffffffffa0831fd8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] [<ffffffffa083335d>] ptlrpc_main+0xabd/0x1700 [ptlrpc] [<ffffffff81096936>] kthread+0x96/0xa0 [<ffffffff8100c0ca>] child_rip+0xa/0x20 [<ffffffffffffffff>] 0xffffffffffffffff
LustreError: 11483:0:(mdt_xattr.c:178:mdt_getxattr()) ASSERTION( lu_object_assert_exists(&info->mti_object->mot_obj) ) failed: LustreError: 11483:0:(mdt_xattr.c:178:mdt_getxattr()) LBUG Pid: 11483, comm: mdt00_005 crash> bt PID: 11483 TASK: ffff8801f2651540 CPU: 0 COMMAND: "mdt00_005" #0 [ffff8801feb478f8] machine_kexec at ffffffff81035c9b #1 [ffff8801feb47958] crash_kexec at ffffffff810c0d22 #2 [ffff8801feb47a28] panic at ffffffff8150eeff #3 [ffff8801feb47aa8] lbug_with_loc at ffffffffa0c08eeb [libcfs] #4 [ffff8801feb47ac8] mdt_getxattr at ffffffffa06ad35f [mdt] #5 [ffff8801feb47b68] mdt_intent_getxattr at ffffffffa068a52e [mdt] #6 [ffff8801feb47bb8] mdt_intent_policy at ffffffffa068711e [mdt] #7 [ffff8801feb47bf8] ldlm_lock_enqueue at ffffffffa0e6e471 [ptlrpc] #8 [ffff8801feb47c58] ldlm_handle_enqueue0 at ffffffffa0e9719f [ptlrpc] #9 [ffff8801feb47cc8] mdt_enqueue at ffffffffa06875e6 [mdt] #10 [ffff8801feb47ce8] mdt_handle_common at ffffffffa068ddf7 [mdt] #11 [ffff8801feb47d38] mds_regular_handle at ffffffffa06c8ca5 [mdt] #12 [ffff8801feb47d48] ptlrpc_server_handle_request at ffffffffa0ec6fd8 [ptlrpc] #13 [ffff8801feb47e48] ptlrpc_main at ffffffffa0ec835d [ptlrpc] #14 [ffff8801feb47ee8] kthread at ffffffff81096936 #15 [ffff8801feb47f48] kernel_thread at ffffffff8100c0ca
We can't keep adding IBITs without defining some ordering or hierarchy for when we use them.
00010000:00010000:6.0:1375198422.885704:0:5685:0:(ldlm_resource.c:1448:ldlm_resource_dump()) --- Resource: [0x200000401:0x8:0x0].0 (ffff8801eceee480) refcount = 8 00010000:00010000:6.0:1375198422.885707:0:5685:0:(ldlm_resource.c:1451:ldlm_resource_dump()) Granted locks (in reverse order): 00010000:00010000:6.0:1375198422.885708:0:5685:0:(ldlm_resource.c:1454:ldlm_resource_dump()) ### ### ns: mdt-lustre-MDT0000_UUID lock: ffff8801e9cc9d00/0x132cecfff5acfc2f lrc: 3/0,0 mode: PW/PW res: [0x200000401:0x8:0x0].0 bits 0x20 rrc: 8 type: IBT flags: 0x60200000000020 nid: 0@lo remote: 0x132cecfff5acfc05 expref: 30 pid: 3274 timeout: 4294866401 lvb_type: 0 00010000:00010000:6.0:1375198422.885715:0:5685:0:(ldlm_resource.c:1454:ldlm_resource_dump()) ### ### ns: mdt-lustre-MDT0000_UUID lock: ffff8801eeae4d40/0x132cecfff5acf6e1 lrc: 2/0,0 mode: CR/CR res: [0x200000401:0x8:0x0].0 bits 0x9 rrc: 8 type: IBT flags: 0x40200000000000 nid: 0@lo remote: 0x132cecfff5acf6c5 expref: 30 pid: 5252 timeout: 0 lvb_type: 0 00010000:00010000:6.0:1375198422.885721:0:5685:0:(ldlm_resource.c:1454:ldlm_resource_dump()) ### ### ns: mdt-lustre-MDT0000_UUID lock: ffff8801ec5246c0/0x132cecfff5acebf8 lrc: 2/0,0 mode: CR/CR res: [0x200000401:0x8:0x0].0 bits 0x9 rrc: 8 type: IBT flags: 0x40200000000000 nid: 0@lo remote: 0x132cecfff5acebdc expref: 39 pid: 3274 timeout: 0 lvb_type: 0 00010000:00010000:6.0:1375198422.885727:0:5685:0:(ldlm_resource.c:1454:ldlm_resource_dump()) ### ### ns: mdt-lustre-MDT0000_UUID lock: ffff8801eb589280/0x132cecfff5ace354 lrc: 2/0,0 mode: CR/CR res: [0x200000401:0x8:0x0].0 bits 0x8 rrc: 8 type: IBT flags: 0x40000000000000 nid: 0@lo remote: 0x132cecfff5ace34d expref: 30 pid: 3810 timeout: 0 lvb_type: 3 00010000:00010000:6.0:1375198422.885734:0:5685:0:(ldlm_resource.c:1469:ldlm_resource_dump()) Waiting locks: 00010000:00010000:6.0:1375198422.885736:0:5685:0:(ldlm_resource.c:1471:ldlm_resource_dump()) ### ### ns: mdt-lustre-MDT0000_UUID lock: ffff8801eea356c0/0x132cecfff5ad0e42 lrc: 3/1,0 mode: --/PR res: [0x200000401:0x8:0x0].0 bits 0x20 rrc: 8 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 5306 timeout: 0 lvb_type: 0 00010000:00010000:6.0:1375198422.885742:0:5685:0:(ldlm_resource.c:1471:ldlm_resource_dump()) ### ### ns: mdt-lustre-MDT0000_UUID lock: ffff8801e95e8d00/0x132cecfff5ad0f1b lrc: 3/0,1 mode: --/EX res: [0x200000401:0x8:0x0].0 bits 0x2 rrc: 8 type: IBT flags: 0x40210000000000 nid: local remote: 0x0 expref: -99 pid: 3274 timeout: 0 lvb_type: 0
Attachments
Issue Links
- is duplicated by
-
LU-3795 replay-single test_58b test_58c: trusted.foo: No such attribute
- Resolved
-
LU-3730 sanity-hsm test_3 Error: 'could not create file'
- Resolved
-
LU-3746 Test failure on test suite lustre-rsync-test, subtest test_1
- Resolved
-
LU-3911 Failure on test suite sanityn test_14a
- Resolved
- is related to
-
LU-3713 Failure on test suite sanity test_17k: get_xattr_names: llistxattr failed
- Resolved
-
LU-3938 Test failure on test suite conf-sanity, test_61: getxattr failed: -34
- Resolved
- is related to
-
LU-2869 extended attribute cache for Lustre
- Resolved