Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.12.0
    • Lustre 2.10.3
    • None
    • 3.10.0-693.2.2.el7_lustre.pl1.x86_64
    • 3
    • 9223372036854775807

    Description

      2.10 LTS continues to causing us some trouble, and this time it's the MDT that has been quite unstable lately. A typical problem looks to start with this: 

      [52968.460453] Lustre: 5897:0:(mdd_device.c:1597:mdd_changelog_clear()) oak-MDD0000: Failure to clear the changelog for user 1: -22
      [53955.181780] Lustre: 8034:0:(mdd_device.c:1597:mdd_changelog_clear()) oak-MDD0000: Failure to clear the changelog for user 1: -22
      [53955.182270] Lustre: 8034:0:(mdd_device.c:1597:mdd_changelog_clear()) Skipped 2 previous similar messages
      [54269.697741] Lustre: 5911:0:(mdd_device.c:1597:mdd_changelog_clear()) oak-MDD0000: Failure to clear the changelog for user 1: -22
      [54362.202025] Lustre: 7920:0:(mdd_device.c:1597:mdd_changelog_clear()) oak-MDD0000: Failure to clear the changelog for user 1: -22
      [54362.202511] Lustre: 7920:0:(mdd_device.c:1597:mdd_changelog_clear()) Skipped 1 previous similar message
      [54808.564380] perf: interrupt took too long (2579 > 2500), lowering kernel.perf_event_max_sample_rate to 77000
      [56283.046677] Lustre: 7958:0:(mdd_device.c:1597:mdd_changelog_clear()) oak-MDD0000: Failure to clear the changelog for user 1: -22
      [56283.047150] Lustre: 7958:0:(mdd_device.c:1597:mdd_changelog_clear()) Skipped 14 previous similar messages
      [56867.725405] Lustre: 5397:0:(mdd_device.c:1597:mdd_changelog_clear()) oak-MDD0000: Failure to clear the changelog for user 1: -22
      [60578.888437] Lustre: 8014:0:(mdd_device.c:1597:mdd_changelog_clear()) oak-MDD0000: Failure to clear the changelog for user 1: -22
      [60579.796894] Lustre: 8078:0:(mdd_device.c:1597:mdd_changelog_clear()) oak-MDD0000: Failure to clear the changelog for user 1: -22
      [60579.797374] Lustre: 8078:0:(mdd_device.c:1597:mdd_changelog_clear()) Skipped 2 previous similar messages
      [62145.590529] LNet: Service thread pid 157540 was inactive for 200.33s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      [62145.591250] Pid: 157540, comm: mdt_rdpg01_050
      [62145.591486] 
                     Call Trace:
      [62145.591949]  [<ffffffff816a94e9>] schedule+0x29/0x70
      [62145.592200]  [<ffffffff816a6ff9>] schedule_timeout+0x239/0x2c0
      [62145.592444]  [<ffffffff810c9ef9>] ? select_task_rq_fair+0x549/0x700
      [62145.592708]  [<ffffffff816a989d>] wait_for_completion+0xfd/0x140
      [62145.592971]  [<ffffffff810c4810>] ? default_wake_function+0x0/0x20
      [62145.593223]  [<ffffffff810b07ea>] kthread_create_on_node+0xaa/0x140
      [62145.593472]  [<ffffffffc12f0ac0>] ? mdd_chlg_garbage_collect+0x0/0x640 [mdd]
      [62145.593760]  [<ffffffffc091346a>] ? llog_add+0x7a/0x1a0 [obdclass]
      [62145.594025]  [<ffffffff810ea8ca>] ? __getnstimeofday64+0x3a/0xd0
      [62145.594272]  [<ffffffffc12f7ac4>] mdd_changelog_store+0x194/0x540 [mdd]
      [62145.594537]  [<ffffffffc130532d>] mdd_changelog_data_store_by_fid+0xed/0x1a0 [mdd]
      [62145.595020]  [<ffffffffc130644b>] mdd_changelog_data_store+0x15b/0x210 [mdd]
      [62145.595267]  [<ffffffffc130779b>] mdd_close+0x25b/0xcf0 [mdd]
      [62145.595540]  [<ffffffffc11c9073>] mdt_mfd_close+0x353/0x610 [mdt]
      [62145.595790]  [<ffffffffc11ce651>] mdt_close_internal+0x121/0x220 [mdt]
      [62145.596051]  [<ffffffffc11ce970>] mdt_close+0x220/0x780 [mdt]
      [62145.596352]  [<ffffffffc0bebda5>] tgt_request_handle+0x925/0x1370 [ptlrpc]
      [62145.596633]  [<ffffffffc0b94b16>] ptlrpc_server_handle_request+0x236/0xa90 [ptlrpc]
      [62145.597138]  [<ffffffffc0b91148>] ? ptlrpc_wait_event+0x98/0x340 [ptlrpc]
      [62145.597385]  [<ffffffff810c4822>] ? default_wake_function+0x12/0x20
      [62145.597651]  [<ffffffff810ba588>] ? __wake_up_common+0x58/0x90
      [62145.597927]  [<ffffffffc0b98252>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      [62145.598176]  [<ffffffff81029557>] ? __switch_to+0xd7/0x510
      [62145.598420]  [<ffffffff816a8f00>] ? __schedule+0x2f0/0x8b0
      [62145.598695]  [<ffffffffc0b977c0>] ? ptlrpc_main+0x0/0x1e40 [ptlrpc]
      [62145.598955]  [<ffffffff810b098f>] kthread+0xcf/0xe0
      [62145.604690]  [<ffffffff810b08c0>] ? kthread+0x0/0xe0
      [62145.604932]  [<ffffffff816b4f58>] ret_from_fork+0x58/0x90
      [62145.605175]  [<ffffffff810b08c0>] ? kthread+0x0/0xe0
      
      

      ... leading to the end of connectivity, but the server doesn't really crash. So I took a crash dump when the MDS was unresponsive, it is available at the following link:

       

      vmcore-oak-md1-s2-2018-02-16-11:02:21.gz:

      [https://stanford.box.com/s/n20wj34xhkmmltiav427l3btdz9qjwvn

      ]

      vmlinux-3.10.0-693.2.2.el7_lustre.pl1.x86_64.gz:

      https://stanford.box.com/s/r6u6xjzmsgys2kzcq26562bqo6p45xcp

      also attaching foreach bt and memory usage

      Thanks!

      Stephane

      Attachments

        Issue Links

          Activity

            [LU-10680] MDT becoming unresponsive in 2.10.3
            pjones Peter Jones added a comment -

            Landed for 2.12

            pjones Peter Jones added a comment - Landed for 2.12

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31376/
            Subject: LU-10680 mdd: create gc thread when no current transaction
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 31fef6845e8be39ae8dfd242714a84001925b7ad

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31376/ Subject: LU-10680 mdd: create gc thread when no current transaction Project: fs/lustre-release Branch: master Current Patch Set: Commit: 31fef6845e8be39ae8dfd242714a84001925b7ad

            Moving over to 2.12 now that the GC feature is no longer enabled by default, and the failing tests have been disabled. The remaining patch "LU-10680 mdd: create gc thread when no current transaction" for this ticket can land when it is ready, but no longer blocks the 2.11 release.

            adilger Andreas Dilger added a comment - Moving over to 2.12 now that the GC feature is no longer enabled by default, and the failing tests have been disabled. The remaining patch " LU-10680 mdd: create gc thread when no current transaction " for this ticket can land when it is ready, but no longer blocks the 2.11 release.

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31552/
            Subject: LU-10680 mdd: disable changelog garbage collection by default
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 57719d0c8833b8a8a8b6516d3bd4f2ad2b226285

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31552/ Subject: LU-10680 mdd: disable changelog garbage collection by default Project: fs/lustre-release Branch: master Current Patch Set: Commit: 57719d0c8833b8a8a8b6516d3bd4f2ad2b226285

            John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/31552
            Subject: LU-10680 mdd: disable changelog garbage collection by default
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 62e4519de3dcb588b88311950e8c6b7fefcca6a8

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/31552 Subject: LU-10680 mdd: disable changelog garbage collection by default Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 62e4519de3dcb588b88311950e8c6b7fefcca6a8

            Hey Bruno,

            Thanks for the update! And yes, this happened with vm.zone_reclaim_mode=0 on MDS.

            sthiell Stephane Thiell added a comment - Hey Bruno, Thanks for the update! And yes, this happened with vm.zone_reclaim_mode=0 on MDS.

            Hello Stephane,
            Just one more comment I have in mind and to explain why you have been able to trigger the 2nd problem/deadlock.
            As part of LU-10709 I understand that you are running with vm.zone_reclaim_mode=0 also on MDS, so, according to mm code in this area, this seem to be definitely the reason why you have triggered memory-reclaim, with shrinkers run, during khreadd create of new kthread, instead to grab new memory/pages from preferred/current node+zone free-list.

            Hopefully this specific case should be handled and fixed by my 2nd fix for this ticket.

            bfaccini Bruno Faccini (Inactive) added a comment - Hello Stephane, Just one more comment I have in mind and to explain why you have been able to trigger the 2nd problem/deadlock. As part of LU-10709 I understand that you are running with vm.zone_reclaim_mode=0 also on MDS, so, according to mm code in this area, this seem to be definitely the reason why you have triggered memory-reclaim, with shrinkers run, during khreadd create of new kthread, instead to grab new memory/pages from preferred/current node+zone free-list. Hopefully this specific case should be handled and fixed by my 2nd fix for this ticket.

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31347/
            Subject: LU-10680 mdd: fix run_gc_task uninitialized
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 25cdbbb475b8a42e3a1f594898d2ac1406b466f6

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31347/ Subject: LU-10680 mdd: fix run_gc_task uninitialized Project: fs/lustre-release Branch: master Current Patch Set: Commit: 25cdbbb475b8a42e3a1f594898d2ac1406b466f6

            Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: https://review.whamcloud.com/31376
            Subject: LU-10680 mdd: create gc thread when no current transaction
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 2985625ebbb404c01f88ada84427d5b4386fbfa4

            gerrit Gerrit Updater added a comment - Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: https://review.whamcloud.com/31376 Subject: LU-10680 mdd: create gc thread when no current transaction Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2985625ebbb404c01f88ada84427d5b4386fbfa4

            Thanks Bruno! Hope you'll find a way to fix this race condition.

            sthiell Stephane Thiell added a comment - Thanks Bruno! Hope you'll find a way to fix this race condition.

            Thinking more about it, it seems to me that khreadd should be excluded to run the memory reclaim mechanism in its context, in this case it would be better to do nothing (and leave next thread that will try to allocate some memory to trigger the same conditions/threshold and start to reclaim in its context) or to simply/only wake-up kswapd to do the job asynchronously ?

            But anyway, we need to protect against such possible cases, and only request for changelog garbage-collection thread creation, outside of any current journal transaction/handle.

            bfaccini Bruno Faccini (Inactive) added a comment - Thinking more about it, it seems to me that khreadd should be excluded to run the memory reclaim mechanism in its context, in this case it would be better to do nothing (and leave next thread that will try to allocate some memory to trigger the same conditions/threshold and start to reclaim in its context) or to simply/only wake-up kswapd to do the job asynchronously ? But anyway, we need to protect against such possible cases, and only request for changelog garbage-collection thread creation, outside of any current journal transaction/handle.

            People

              bfaccini Bruno Faccini (Inactive)
              sthiell Stephane Thiell
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: