Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9193

Multiple hangs observed with many open/getattr

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.13.0, Lustre 2.12.7
    • Lustre 2.7.0, Lustre 2.5.3, Lustre 2.8.0, Lustre 2.9.0
    • None
    • Centos 7.2
      Centos 6.[7-8]
      SELinux enforcing
    • 3
    • 9223372036854775807

    Description

      Tested (reproduced) on 2.5 , 2.7, 2.8 and 2.9

      MPI job on 300 nodes: 2/3 open and 1/3 stat on same file => hang (The MDS server threads are idle and the load is close to 0, threads are waiting for a lock but no threads have an active lock. After a long time, 15/30mn, threads become responsive again and resume operations normally).
      Same job with only stat => no problem
      Same job with only open => no problem

      Some of the logs were similar to LU-5497 and LU-4579 but patches did not fix the issue.
      If all job's clients are evicted manually then lustre recover and resume to a normal state.
      Lustre 2.7.2 with patch of LU-5781 (Solve a race for LRU lock cancel) was tested too.

      So far what prevents the issue is to disable SELINUX:

      1. cat policy-noxattr-lustre.patch
          • serefpolicy-3.13.1/policy/modules/kernel/filesystem.te.orig 2016-08-02 19:56:29.997519918 +0000
            +++ serefpolicy-3.13.1/policy/modules/kernel/filesystem.te 2016-08-02 19:57:10.124519918 +0000
            @@ -32,7 +32,8 @@ fs_use_xattr gfs2 gen_context(system_u:o
            fs_use_xattr gpfs gen_context(system_u:object_r:fs_t,s0);
            fs_use_xattr jffs2 gen_context(system_u:object_r:fs_t,s0);
            fs_use_xattr jfs gen_context(system_u:object_r:fs_t,s0);
            -fs_use_xattr lustre gen_context(system_u:object_r:fs_t,s0);
            +# Lustre is not supported Selinux correctly
            +#fs_use_xattr lustre gen_context(system_u:object_r:fs_t,s0);
            fs_use_xattr ocfs2 gen_context(system_u:object_r:fs_t,s0);
            fs_use_xattr overlay gen_context(system_u:object_r:fs_t,s0);
            fs_use_xattr xfs gen_context(system_u:object_r:fs_t,s0);

      Reproducer (127 clients: vm3 to vm130, vm0 is MDS, vm1 and 2 are OSS):
      mkdir /lustre/testfs/testuser/testdir; sleep 4; clush -bw vm[3-130] 'seq 0 1000 | xargs -P 7 -I{} sh -c "(({}%3==0)) && touch /lustre/testfs/testuser/testdir/foo$(hostname -s | tr -d vm) || stat /lustre/testfs/testuser/testdir > /dev/null"'

      Tested disabling statahead. No impact.

      Traces of stuck processes on the MDS look all the same (could be related to DDN-366):
      8631 TASK: ffff880732202280 CPU:
      [ffff88071587f760] __schedule at ffffffff8163b6cd
      [ffff88071587f7c8] schedule at ffffffff8163bd69
      [ffff88071587f7d8] schedule_timeout at ffffffff816399c5
      [ffff88071587f880] ldlm_completion_ast at ffffffffa08b7fe1
      [ffff88071587f920] ldlm_cli_enqueue_local at ffffffffa08b9c20
      [ffff88071587f9b8] mdt_object_local_lock at ffffffffa0f3c6d2
      [ffff88071587fa60] mdt_object_lock_internal at ffffffffa0f3cffb
      [ffff88071587faa0] mdt_getattr_name_lock at ffffffffa0f3ddf6
      [ffff88071587fb28] mdt_intent_getattr at ffffffffa0f3f3f0
      [ffff88071587fb68] mdt_intent_policy at ffffffffa0f42e7c
      [ffff88071587fbd0] ldlm_lock_enqueue at ffffffffa089f1f7
      [ffff88071587fc28] ldlm_handle_enqueue0 at ffffffffa08c4fb2
      [ffff88071587fcb8] tgt_enqueue at ffffffffa09493f2
      [ffff88071587fcd8] tgt_request_handle at ffffffffa094ddf5
      [ffff88071587fd20] ptlrpc_server_handle_request at ffffffffa08f87cb
      [ffff88071587fde8] ptlrpc_main at ffffffffa08fc0f0
      [ffff88071587fec8] kthread at ffffffff810a5b8f
      [ffff88071587ff50] ret_from_fork at ffffffff81646c98

      I do not have (yet) the clients traces but last call on clients is mdc_enqueue.

      Crash dump analysis started (in progress). So far nothing obvious about SELinux, first analysis lead to ldlm_handle_enqueue0() and ldlm_lock_enqueue(). We see some processes idle for 280s, it's not entirely stuck but very very slow (we had to crash the VM to get proper dump because it was very hard to use crash as the threads are not 100% stuck).

      Attachments

        1. logs_050717.tar.gz
          89.55 MB
          Jean-Baptiste Riaux
        2. logs-9193-patchset11-tests.tar.gz
          8.46 MB
          Jean-Baptiste Riaux
        3. LU-9193.tar.gz
          16.73 MB
          Jean-Baptiste Riaux
        4. LU-9193-patchset8.tar.gz
          71.05 MB
          Jean-Baptiste Riaux
        5. lustre-logs.tar.gz
          12.65 MB
          Jean-Baptiste Riaux
        6. lustre-logs-210617.tar.gz
          14.31 MB
          Jean-Baptiste Riaux
        7. lustre-LU9193-240817.tar.gz
          79.36 MB
          Jean-Baptiste Riaux
        8. vm0.tar.gz
          13.93 MB
          Jean-Baptiste Riaux
        9. vm105.tar.gz
          134 kB
          Jean-Baptiste Riaux
        10. vm62.tar.gz
          133 kB
          Jean-Baptiste Riaux

        Issue Links

          Activity

            People

              bfaccini Bruno Faccini (Inactive)
              riauxjb Jean-Baptiste Riaux (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              23 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: