Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5695

watchdog dispatch thread disappears

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.11.0
    • None
    • 3
    • 15940

    Description

      Sometimes lc_watchdogd disappears w/o any messages and lustre logs are not dumped after watchdog triggered.

      How the correct behaviour should look:

      LNet: Service thread pid 7096 was inactive for 10.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Pid: 7096, comm: lctl
      
      Call Trace:
       [<ffffffff81528eb2>] schedule_timeout+0x192/0x2e0
       [<ffffffff81084220>] ? process_timeout+0x0/0x10
       [<ffffffffa0380df7>] proc_trigger_watchdog+0x67/0x80 [libcfs]
       [<ffffffff811fd8e7>] proc_sys_call_handler+0x97/0xd0
       [<ffffffff811fd934>] proc_sys_write+0x14/0x20
       [<ffffffff81188f68>] vfs_write+0xb8/0x1a0
       [<ffffffff81189861>] sys_write+0x51/0x90
       [<ffffffff8152b2be>] ? do_device_not_available+0xe/0x10
       [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      
      LustreError: dumping log to /tmp/lustre-log.1411548646.7096
      

      and how it may look in the kernel logs when lustre logs are not dumped:

      Lustre: DEBUG MARKER: == sanity test 242: Check that watchdog causes kernel log dump == 09:19:38 (1411550378)
      LNet: Service thread pid 12742 stopped after 20.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources).
      Lustre: DEBUG MARKER: sanity test_242: @@@@@@ FAIL: Lustre log wasn't dumped
      Lustre: DEBUG MARKER: == sanity test complete, duration 29 sec == 09:20:01 (1411550401)
      

      Attachments

        Issue Links

          Activity

            People

              simmonsja James A Simmons
              zam Alexander Zarochentsev
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: