Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8509

drop_caches hangs in cl_inode_fini()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.9.0
    • Lustre 2.8.0
    • 3
    • 9223372036854775807

    Description

      Running lustre 2.8.0_0.0.llnlpreview.18 on the clients (see the lustre-release-fe-llnl) , we are regularly seeing hangs of the /etc/slurm/prolog script when it triggers drop_caches. This script runs before each job to clear out the cache from any previous jobs.

      In particular it hangs here:

      #  Flush slab cache entries
      echo 2 >/proc/sys/vm/drop_caches
      

      And this is the backtrace for where it is getting stuck:

      crash> bt -xs 1386
      PID: 1386   TASK: ffff88201b0a5080  CPU: 10  COMMAND: "prolog"
       #0 [ffff882011bd3af8] __schedule+0x295 at ffffffff81651975
       #1 [ffff882011bd3b60] schedule+0x29 at ffffffff81652049
       #2 [ffff882011bd3b70] cl_inode_fini+0x1ac at ffffffffa0c6b3ac [lustre]
       #3 [ffff882011bd3c10] ll_clear_inode+0x21c at ffffffffa0c377ec [lustre]
       #4 [ffff882011bd3c38] ll_delete_inode+0x58 at ffffffffa0c39048 [lustre]
       #5 [ffff882011bd3c60] evict+0xa7 at ffffffff81204077
       #6 [ffff882011bd3c88] dispose_list+0x3e at ffffffff8120417e
       #7 [ffff882011bd3cb0] prune_icache_sb+0x163 at ffffffff81205113
       #8 [ffff882011bd3d18] prune_super+0x143 at ffffffff811ea343
       #9 [ffff882011bd3d50] shrink_slab+0x175 at ffffffff81183a25
      #10 [ffff882011bd3e08] drop_caches_sysctl_handler+0x283 at ffffffff8124a743
      #11 [ffff882011bd3e90] proc_sys_call_handler+0xd3 at ffffffff81260f03
      #12 [ffff882011bd3ee8] proc_sys_write+0x14 at ffffffff81260f34
      #13 [ffff882011bd3ef8] vfs_write+0xbd at ffffffff811e7bfd
      #14 [ffff882011bd3f38] sys_write+0x7f at ffffffff811e869f
      #15 [ffff882011bd3f80] system_call_fastpath+0x16 at ffffffff8165d709
          RIP: 00007ffff76d3500  RSP: 00007fffffffe180  RFLAGS: 00010206
          RAX: 0000000000000001  RBX: ffffffff8165d709  RCX: 0000000000000400
          RDX: 0000000000000002  RSI: 00007ffff7ff8000  RDI: 0000000000000001
          RBP: 00007ffff7ff8000   R8: 000000000000000a   R9: 00007ffff7fbd740
          R10: 00007fffffffe670  R11: 0000000000000246  R12: 0000000000000001
          R13: 0000000000000002  R14: 00007ffff79a7400  R15: 0000000000000002
          ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
      

      Attachments

        Issue Links

          Activity

            People

              bobijam Zhenyu Xu
              morrone Christopher Morrone (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: