Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19531

MDS thread spinning on lod_xattr_get()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Medium
    • None
    • Lustre 2.14.0
    • None
    • 3
    • 9223372036854775807

    Description

      I have a thread on my server that is spinning in a tight loop trying to get an xattr from an inode, possibly trusted.link to get the parent FID:

      1697:0:(mdd_dir.c:228:mdd_parent_fid()) Process entered
      1697:0:(lu_object.c:2491:lu_buf_alloc()) kmalloced '(buf->lb_buf)': 4096 at 0000000027350465.
      1697:0:(lod_object.c:1575:lod_xattr_get()) Process entered
      1697:0:(lod_object.c:1680:lod_xattr_get()) Process leaving (rc=18446744073709551555 : -61 : ffffffffffffffc3)
      1697:0:(lu_object.c:2479:lu_buf_free()) kfreed 'buf->lb_buf': 4096 at 0000000027350465.
      1697:0:(mdd_dir.c:242:mdd_parent_fid()) Process leaving via lookup (rc=18446744073709551555 : -61 : 0xffffffffffffffc3)
      1697:0:(mdd_dir.c:88:__mdd_lookup()) Process entered
      1697:0:(mdd_permission.c:259:__mdd_permission_internal()) Process entered
      1697:0:(mdd_permission.c:262:__mdd_permission_internal()) Process leaving (rc=0 : 0 : 0)
      1697:0:(osd_handler.c:8034:osd_index_ea_lookup()) Process entered
      1697:0:(osd_handler.c:8050:osd_index_ea_lookup()) Process leaving via out (rc=1 : 1 : 0x1)
      1697:0:(osd_handler.c:8061:osd_index_ea_lookup()) Process leaving (rc=1 : 1 : 1)
      1697:0:(mdd_dir.c:112:__mdd_lookup()) Process leaving (rc=0 : 0 : 0)
      1697:0:(mdd_dir.c:259:mdd_parent_fid()) Process leaving (rc=0 : 0 : 0)
      1697:0:(lu_object.c:224:lu_object_put()) Add 000000003a2f90b6/00000000c3552857 to site lru. bkt: 0000000091bd31a9
      1697:0:(lu_object.c:816:lu_object_find_at()) Process entered
      1697:0:(lu_object.c:855:lu_object_find_at()) Process leaving (rc=18446617571478768680 : -126502230782936 : ffff8cf267788c28)
      1697:0:(mdd_dir.c:228:mdd_parent_fid()) Process entered
      1697:0:(lu_object.c:2491:lu_buf_alloc()) kmalloced '(buf->lb_buf)': 4096 at 0000000027350465.
      1697:0:(lod_object.c:1575:lod_xattr_get()) Process entered
      1697:0:(lod_object.c:1680:lod_xattr_get()) Process leaving (rc=18446744073709551555 : -61 : ffffffffffffffc3)
      1697:0:(lu_object.c:2479:lu_buf_free()) kfreed 'buf->lb_buf': 4096 at 0000000027350465.
      1697:0:(mdd_dir.c:242:mdd_parent_fid()) Process leaving via lookup (rc=18446744073709551555 : -61 : 0xffffffffffffffc3)
      1697:0:(mdd_dir.c:88:__mdd_lookup()) Process entered
      1697:0:(mdd_permission.c:259:__mdd_permission_internal()) Process entered
      1697:0:(mdd_permission.c:262:__mdd_permission_internal()) Process leaving (rc=0 : 0 : 0)
      1697:0:(osd_handler.c:8034:osd_index_ea_lookup()) Process entered
      1697:0:(osd_handler.c:8050:osd_index_ea_lookup()) Process leaving via out (rc=1 : 1 : 0x1)
      1697:0:(osd_handler.c:8061:osd_index_ea_lookup()) Process leaving (rc=1 : 1 : 1)
      1697:0:(mdd_dir.c:112:__mdd_lookup()) Process leaving (rc=0 : 0 : 0)
      1697:0:(mdd_dir.c:259:mdd_parent_fid()) Process leaving (rc=0 : 0 : 0)
      1697:0:(lu_object.c:224:lu_object_put()) Add 000000003a2f90b6/00000000c3552857 to site lru. bkt: 0000000091bd31a9
      1697:0:(lu_object.c:816:lu_object_find_at()) Process entered
      1697:0:(lu_object.c:855:lu_object_find_at()) Process leaving (rc=18446617571478768680 : -126502230782936 : ffff8cf267788c28)
      

      The stack trace appears as follows, but doesn't provide much information about the request that is being handled. This is not handling new requests, but appears to be stuck in a loop processing a single request:

      kernel: task:mdt_io00_002    state:R  running task     stack:0     pid:1697  ppid:2      flags:0x80004080
      kernel: Call Trace:
      kernel: ? libcfs_log_return+0x1e/0x30 [libcfs]
      kernel: ? __kmalloc+0x246/0x250
      kernel: ? __mdd_lookup.isra.21+0x286/0x370 [mdd]
      kernel: ? mdd_parent_fid+0x1a3/0x410 [mdd]
      kernel: ? mdd_is_subdir+0x27d/0x3b0 [mdd]
      kernel: ? mdt_reint_rename+0x53a/0x1d20 [mdt]
      kernel: ? sptlrpc_svc_alloc_rs+0x62/0x330 [ptlrpc]
      kernel: ? lustre_msg_check_version+0x30/0xf0 [ptlrpc]
      kernel: ? mdt_root_squash+0x1e/0x410 [mdt]
      kernel: ? mdt_reint_rec+0x127/0x260 [mdt]
      kernel: ? mdt_reint_internal+0x4ac/0x7a0 [mdt]
      kernel: ? mdt_reint+0x5e/0x100 [mdt]
      kernel: ? tgt_request_handle+0xc9c/0x1970 [ptlrpc]
      kernel: ? ptlrpc_nrs_req_get_nolock0+0xff/0x1f0 [ptlrpc]
      kernel: ? ptlrpc_update_export_timer+0x3d/0x520 [ptlrpc]
      kernel: ? ptlrpc_server_handle_request+0x346/0xc10 [ptlrpc]
      kernel: ? lprocfs_counter_add+0x10e/0x180 [obdclass]
      kernel: ? ptlrpc_main+0xb45/0x13a0 [ptlrpc]
      

      It looks like the thread is stuck looping forever in mdd_is_subdir()->mdd_is_parent() from the while(1) loop?

      It seems similar to the symptoms in LU-12800, but that was reported fixed in commit v2_12_58-150-ga38c587cbf (and confirmed fix is still present).

      Attachments

        Issue Links

          Activity

            People

              adilger Andreas Dilger
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: