Details

    • Bug
    • Resolution: Won't Fix
    • Minor
    • None
    • Lustre 2.4.2
    • None
    • Linux dolphin-mds-9-2.local 2.6.32-358.23.2.el6_lustre.x86_64 #1 SMP Thu Dec 19 19:57:45 PST 2013 x86_64 x86_64 x86_64 GNU/Linux
    • 3
    • 14744

    Description

      MDS becomes sluggish and error logged:

      Jul 1 14:21:13 dolphin-mds-9-2 kernel: LustreError: 13372:0:(mdt_recovery.c:418:mdt_last_rcvd_update()) Trying to overwrite bigger transno:on-disk: 40602062209, new: 40602062208 replay: 0. see LU-617.
      Jul 1 14:45:04 dolphin-mds-9-2 kernel: LustreError: 3640:0:(mdt_recovery.c:418:mdt_last_rcvd_update()) Trying to overwrite bigger transno:on-disk: 40602786157, new: 40602786156 replay: 0. see LU-617.
      Jul 1 15:02:12 dolphin-mds-9-2 kernel: LustreError: 19741:0:(mdt_recovery.c:418:mdt_last_rcvd_update()) Trying to overwrite bigger transno:on-disk: 40603454105, new: 40603454104 replay: 0. see LU-617.
      Jul 1 15:03:03 dolphin-mds-9-2 kernel: LustreError: 6134:0:(mdt_recovery.c:418:mdt_last_rcvd_update()) Trying to overwrite bigger transno:on-disk: 40603487989, new: 40603487988 replay: 0. see LU-617.
      Jul 1 15:03:08 dolphin-mds-9-2 kernel: LustreError: 13372:0:(mdt_recovery.c:418:mdt_last_rcvd_update()) Trying to overwrite bigger transno:on-disk: 40603492527, new: 40603492526 replay: 0. see LU-617.
      Jul 1 15:19:00 dolphin-mds-9-2 kernel: LustreError: 19741:0:(mdt_recovery.c:418:mdt_last_rcvd_update()) Trying to overwrite bigger transno:on-disk: 40604082957, new: 40604082956 replay: 0. see LU-617.

      Attachments

        Issue Links

          Activity

            [LU-5283] LU-617 reappears in 2.4.2

            Upgrade legacy clients to later version (2.1.2 above) can fix the problem.

            niu Niu Yawei (Inactive) added a comment - Upgrade legacy clients to later version (2.1.2 above) can fix the problem.

            LU-617 was fixed in 2.1.2, I suggest you upgrade your client to 2.1.2 or later.

            niu Niu Yawei (Inactive) added a comment - LU-617 was fixed in 2.1.2, I suggest you upgrade your client to 2.1.2 or later.

            Hi Yawei,

            Yes we are running both 1.8.9 (a few of them) and 2.5.3 clients.

            thanks,
            Haisong

            haisong Haisong Cai (Inactive) added a comment - Hi Yawei, Yes we are running both 1.8.9 (a few of them) and 2.5.3 clients. thanks, Haisong

            Is there any client running older version rather than 2.5.*?

            niu Niu Yawei (Inactive) added a comment - Is there any client running older version rather than 2.5.*?

            = No particular workload was running but the file-system was in
            production when the bug appeared.
            We did notice MDS system load was relatively high, hovering between
            10 and 30.

            = We decided failover to standby MDS when the file-system became
            sluggish. After about 7 hours, the standby MDS crashed. No LU-617 logged.

            = This morning we brought back the MDS from crash and in the first hour,
            it logged 5 instances of LU-617.

            = Another thing to note is, our clients are running 2.5.* client. Not
            for particular reason, but admin of the cluster thought it was the
            latest release.

            thanks,
            Haisong

            haisong Haisong Cai (Inactive) added a comment - = No particular workload was running but the file-system was in production when the bug appeared. We did notice MDS system load was relatively high, hovering between 10 and 30. = We decided failover to standby MDS when the file-system became sluggish. After about 7 hours, the standby MDS crashed. No LU-617 logged. = This morning we brought back the MDS from crash and in the first hour, it logged 5 instances of LU-617 . = Another thing to note is, our clients are running 2.5.* client. Not for particular reason, but admin of the cluster thought it was the latest release. thanks, Haisong

            Is there a particular workload being run that triggers this problem?

            adilger Andreas Dilger added a comment - Is there a particular workload being run that triggers this problem?
            pjones Peter Jones added a comment -

            Niu

            Could you please advise?

            Thanks

            Peter

            pjones Peter Jones added a comment - Niu Could you please advise? Thanks Peter

            People

              niu Niu Yawei (Inactive)
              haisong Haisong Cai (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: