Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5333

rm cause MDS to complain hung tasks and disconnecting clients

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.4.3
    • Linux puma-mds-10-6.local 2.6.32-358.23.2.el6_lustre.x86_64 #1 SMP Thu Dec 19 19:57:45 PST 2013 x86_64 x86_64 x86_64 GNU/Linux
    • 2
    • 14881

    Description

      A client was running "rm" to remove a couple of million files when MDS system load shot to 30 and kernel trace dumping complaining hung tasks - See attached output from "dmesg"

      I would think this is normal workload for a duo-westmere CPU / 24GB RAM bonded myricom 10Gbps system.

      We have been seeing happening more frequently in 2.4.3 than when we were at 1.8.7.

      Anything suggestion?

      thanks,
      Haisong

      Attachments

        1. dmesg_log
          56 kB
        2. dmesg.3369
          450 kB
        3. lustre-log.tgz
          1.65 MB

        Issue Links

          Activity

            [LU-5333] rm cause MDS to complain hung tasks and disconnecting clients
            niu Niu Yawei (Inactive) made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            pjones Peter Jones made changes -
            End date New: 06/Feb/15
            Start date New: 11/Jul/14
            niu Niu Yawei (Inactive) added a comment - b2_5 port: http://review.whamcloud.com/#/c/13464/
            pjones Peter Jones added a comment -

            Haisong

            To be clear LU-5726 is targeted to be fixed in the 2.7 release but is not fixed yet. Your interest in this issue will raise the priority on this work and Niu will look at the possibilities/options to backport a fix to 2.5.x as part of this effort.

            Regards

            Peter

            pjones Peter Jones added a comment - Haisong To be clear LU-5726 is targeted to be fixed in the 2.7 release but is not fixed yet. Your interest in this issue will raise the priority on this work and Niu will look at the possibilities/options to backport a fix to 2.5.x as part of this effort. Regards Peter

            Hi Yawei,

            LU-5726 indicates the issue is fixed in 2.7.0.
            Could you comment on whether the fix can be back-ported into earlier versions, specifically 2.5.*?

            thanks,
            Haisong

            haisong Haisong Cai (Inactive) added a comment - Hi Yawei, LU-5726 indicates the issue is fixed in 2.7.0. Could you comment on whether the fix can be back-ported into earlier versions, specifically 2.5.*? thanks, Haisong

            I think this could be related with LU-5726, and LU-5503 looks another instance of such problem.

            niu Niu Yawei (Inactive) added a comment - I think this could be related with LU-5726 , and LU-5503 looks another instance of such problem.
            niu Niu Yawei (Inactive) made changes -
            Link New: This issue is related to LU-5503 [ LU-5503 ]
            niu Niu Yawei (Inactive) made changes -
            Link New: This issue is related to LU-5726 [ LU-5726 ]

            Correction: server is running 2.4.2 not 2.4.3

            haisong Haisong Cai (Inactive) added a comment - Correction: server is running 2.4.2 not 2.4.3
            haisong Haisong Cai (Inactive) made changes -
            Attachment New: dmesg.3369 [ 16016 ]
            Attachment New: lustre-log.tgz [ 16017 ]

            People

              niu Niu Yawei (Inactive)
              haisong Haisong Cai (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: