Details
-
Bug
-
Resolution: Unresolved
-
Medium
-
None
-
Lustre 2.14.0
-
None
-
3
-
9223372036854775807
Description
I have a thread on my server that is spinning in a tight loop trying to get an xattr from an inode, possibly trusted.link
to get the parent FID:
1697:0:(mdd_dir.c:228:mdd_parent_fid()) Process entered 1697:0:(lu_object.c:2491:lu_buf_alloc()) kmalloced '(buf->lb_buf)': 4096 at 0000000027350465. 1697:0:(lod_object.c:1575:lod_xattr_get()) Process entered 1697:0:(lod_object.c:1680:lod_xattr_get()) Process leaving (rc=18446744073709551555 : -61 : ffffffffffffffc3) 1697:0:(lu_object.c:2479:lu_buf_free()) kfreed 'buf->lb_buf': 4096 at 0000000027350465. 1697:0:(mdd_dir.c:242:mdd_parent_fid()) Process leaving via lookup (rc=18446744073709551555 : -61 : 0xffffffffffffffc3) 1697:0:(mdd_dir.c:88:__mdd_lookup()) Process entered 1697:0:(mdd_permission.c:259:__mdd_permission_internal()) Process entered 1697:0:(mdd_permission.c:262:__mdd_permission_internal()) Process leaving (rc=0 : 0 : 0) 1697:0:(osd_handler.c:8034:osd_index_ea_lookup()) Process entered 1697:0:(osd_handler.c:8050:osd_index_ea_lookup()) Process leaving via out (rc=1 : 1 : 0x1) 1697:0:(osd_handler.c:8061:osd_index_ea_lookup()) Process leaving (rc=1 : 1 : 1) 1697:0:(mdd_dir.c:112:__mdd_lookup()) Process leaving (rc=0 : 0 : 0) 1697:0:(mdd_dir.c:259:mdd_parent_fid()) Process leaving (rc=0 : 0 : 0) 1697:0:(lu_object.c:224:lu_object_put()) Add 000000003a2f90b6/00000000c3552857 to site lru. bkt: 0000000091bd31a9 1697:0:(lu_object.c:816:lu_object_find_at()) Process entered 1697:0:(lu_object.c:855:lu_object_find_at()) Process leaving (rc=18446617571478768680 : -126502230782936 : ffff8cf267788c28) 1697:0:(mdd_dir.c:228:mdd_parent_fid()) Process entered 1697:0:(lu_object.c:2491:lu_buf_alloc()) kmalloced '(buf->lb_buf)': 4096 at 0000000027350465. 1697:0:(lod_object.c:1575:lod_xattr_get()) Process entered 1697:0:(lod_object.c:1680:lod_xattr_get()) Process leaving (rc=18446744073709551555 : -61 : ffffffffffffffc3) 1697:0:(lu_object.c:2479:lu_buf_free()) kfreed 'buf->lb_buf': 4096 at 0000000027350465. 1697:0:(mdd_dir.c:242:mdd_parent_fid()) Process leaving via lookup (rc=18446744073709551555 : -61 : 0xffffffffffffffc3) 1697:0:(mdd_dir.c:88:__mdd_lookup()) Process entered 1697:0:(mdd_permission.c:259:__mdd_permission_internal()) Process entered 1697:0:(mdd_permission.c:262:__mdd_permission_internal()) Process leaving (rc=0 : 0 : 0) 1697:0:(osd_handler.c:8034:osd_index_ea_lookup()) Process entered 1697:0:(osd_handler.c:8050:osd_index_ea_lookup()) Process leaving via out (rc=1 : 1 : 0x1) 1697:0:(osd_handler.c:8061:osd_index_ea_lookup()) Process leaving (rc=1 : 1 : 1) 1697:0:(mdd_dir.c:112:__mdd_lookup()) Process leaving (rc=0 : 0 : 0) 1697:0:(mdd_dir.c:259:mdd_parent_fid()) Process leaving (rc=0 : 0 : 0) 1697:0:(lu_object.c:224:lu_object_put()) Add 000000003a2f90b6/00000000c3552857 to site lru. bkt: 0000000091bd31a9 1697:0:(lu_object.c:816:lu_object_find_at()) Process entered 1697:0:(lu_object.c:855:lu_object_find_at()) Process leaving (rc=18446617571478768680 : -126502230782936 : ffff8cf267788c28)
The stack trace appears as follows, but doesn't provide much information about the request that is being handled. This is not handling new requests, but appears to be stuck in a loop processing a single request:
kernel: task:mdt_io00_002 state:R running task stack:0 pid:1697 ppid:2 flags:0x80004080 kernel: Call Trace: kernel: ? libcfs_log_return+0x1e/0x30 [libcfs] kernel: ? __kmalloc+0x246/0x250 kernel: ? __mdd_lookup.isra.21+0x286/0x370 [mdd] kernel: ? mdd_parent_fid+0x1a3/0x410 [mdd] kernel: ? mdd_is_subdir+0x27d/0x3b0 [mdd] kernel: ? mdt_reint_rename+0x53a/0x1d20 [mdt] kernel: ? sptlrpc_svc_alloc_rs+0x62/0x330 [ptlrpc] kernel: ? lustre_msg_check_version+0x30/0xf0 [ptlrpc] kernel: ? mdt_root_squash+0x1e/0x410 [mdt] kernel: ? mdt_reint_rec+0x127/0x260 [mdt] kernel: ? mdt_reint_internal+0x4ac/0x7a0 [mdt] kernel: ? mdt_reint+0x5e/0x100 [mdt] kernel: ? tgt_request_handle+0xc9c/0x1970 [ptlrpc] kernel: ? ptlrpc_nrs_req_get_nolock0+0xff/0x1f0 [ptlrpc] kernel: ? ptlrpc_update_export_timer+0x3d/0x520 [ptlrpc] kernel: ? ptlrpc_server_handle_request+0x346/0xc10 [ptlrpc] kernel: ? lprocfs_counter_add+0x10e/0x180 [obdclass] kernel: ? ptlrpc_main+0xb45/0x13a0 [ptlrpc]
It looks like the thread is stuck looping forever in mdd_is_subdir()->mdd_is_parent() from the while(1) loop?
It seems similar to the symptoms in LU-12800, but that was reported fixed in commit v2_12_58-150-ga38c587cbf (and confirmed fix is still present).
Attachments
Issue Links
- is related to
-
LU-12800 mdd_is_parent() goes into infinite loop
-
- Resolved
-