Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
3
-
15940
Description
Sometimes lc_watchdogd disappears w/o any messages and lustre logs are not dumped after watchdog triggered.
How the correct behaviour should look:
LNet: Service thread pid 7096 was inactive for 10.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Pid: 7096, comm: lctl Call Trace: [<ffffffff81528eb2>] schedule_timeout+0x192/0x2e0 [<ffffffff81084220>] ? process_timeout+0x0/0x10 [<ffffffffa0380df7>] proc_trigger_watchdog+0x67/0x80 [libcfs] [<ffffffff811fd8e7>] proc_sys_call_handler+0x97/0xd0 [<ffffffff811fd934>] proc_sys_write+0x14/0x20 [<ffffffff81188f68>] vfs_write+0xb8/0x1a0 [<ffffffff81189861>] sys_write+0x51/0x90 [<ffffffff8152b2be>] ? do_device_not_available+0xe/0x10 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b LustreError: dumping log to /tmp/lustre-log.1411548646.7096
and how it may look in the kernel logs when lustre logs are not dumped:
Lustre: DEBUG MARKER: == sanity test 242: Check that watchdog causes kernel log dump == 09:19:38 (1411550378) LNet: Service thread pid 12742 stopped after 20.00s. This indicates the system was overloaded (too many service threads, or there were not enough hardware resources). Lustre: DEBUG MARKER: sanity test_242: @@@@@@ FAIL: Lustre log wasn't dumped Lustre: DEBUG MARKER: == sanity test complete, duration 29 sec == 09:20:01 (1411550401)
Attachments
Issue Links
- is related to
-
LU-8066 Move lustre procfs handling to sysfs and debugfs.
- Open