Details
-
Bug
-
Resolution: Duplicate
-
Blocker
-
None
-
Lustre 2.6.0
-
3
-
12451
Description
As of yesterday, when testing master with mmstress, I saw a huge number of threads stuck waiting here, with IO failing to complete:
sleep_on_page+0xe/0x20; wait_on_page_bit+0x74/0x80; vvp_io_fault_start+0x855/0xc20 [lustre]; cl_io_start+0x72/0x140 [obdclass]; cl_io_loop+0xac/0x1a0 [obdclass]; ll_page_mkwrite+0x280/0x6c0 [lustre]; __do_fault+0xe7/0x570; handle_pte_fault+0xa4/0xcc0; handle_mm_fault+0x1ae/0x240; do_page_fault+0x18f/0x420; page_fault+0x1f/0x30; 0x200007ea; 0xffffffffffffffff
Effectively, they seem to be unable to do page faulting. We ran a quick Cray IO regression suite on a system and many (or perhaps most) of those tests failed as well.
I looked at the list of new commits since I had last built & used master successfully, and this one jumped out at me:
LU-3531 mdc: release dir page cache after accessing
Release the dir page cache in llite/lmv, so the page will be hold until entires was filled by filldir.
Signed-off-by: wang di <di.wang@intel.com>
Change-Id: I8b24bec74b14ff2b65130c02294821fc16ca1421
Reviewed-on: http://review.whamcloud.com/8935
Tested-by: Jenkins
Reviewed-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Tested-by: Oleg Drokin <oleg.drokin@intel.com>
But I reverted only this commit and problems continued.
I rolled back about a week of commits to get back to something I knew was good. I rolled back everything after this and the problem went away:
commit b9b4614c1e302058ed9863b1ab73b7def2c5c924
Author: Oleg Drokin <oleg.drokin@intel.com>
Date: Mon Jan 20 23:10:06 2014 +0000
Revert "LU-3319 procfs: move osp proc handling to seq_files"
This seems to be causing issues like LU-45-13 and LU-4510
This reverts commit a97e4898ad9e0b65f457b01bdfa954f7d7cd272d.
Change-Id: I6066a255ded24dbdb76b4804e82a377f1069af5f
Reviewed-on: http://review.whamcloud.com/8931
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Tested-by: Oleg Drokin <oleg.drokin@intel.com>
—
That puts me 11 commits behind master (or it was 11 when I last checked). I'm not sure which patch caused the problem, but current master is broken.
Attachments
Issue Links
- is duplicated by
-
LU-4540 Test failure sanity-quota test_8: dbench hung in vvp_page_assume
- Resolved