Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Cannot Reproduce
Priority: Critical
Fix Version/s: None
Affects Version/s: Lustre 2.1.3
Labels:
- ptr

Severity:
2
Rank (Obsolete):
5857

Description

We are seeing the following error.

Dec 13 08:35:39 nbp2-oss1 kernel: LustreError: 0:0:(ldlm_lockd.c:358:waiting_locks_callback()) ### lock callback timer expired after 351s: evicting client at 10.151.34.219@o2ib ns: filter-nbp2-OST0018_UUID lock: ffff8804c55d8480/0x1ca7e7e6c780ff4d lrc: 3/0,0 mode: PW/PW res: 182889173/0 rrc: 5 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0xd281632991b12020 expref: 6 pid: 19246 timeout 7391670727

With the client evicted we get dirty_page_discards like this.

Dec 13 08:35:40 r305i3n1 kernel: [1164772.491928] Lustre: 7178:0:(llite_lib.c:2283:ll_dirty_page_discard_warn()) nbp2: dirty page discard: 10.151.26.5@o2ib:/nbp2/fid: [0x5677ca33040:0x2d5:0x0]//mlellis/RunStilt/runs/20120523-Cherskii-d01-WRF-TEST-20121213.15.46.32.UTC/run_d01/Exe/Copy8/cdump may get corrupted (rc -4)

We have seen this happen at the beginning of a job. Now we are runing lflush before the start of every job. Could lflush cause this?

We stilling trying to to reproduce it and gather additional logs.

Attachments

Activity

People

Assignee:: Zhenyu Xu

Reporter:: Mahmoud Hanafi

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 14/Dec/12 6:53 PM

Updated:: 29/Oct/13 6:40 PM

Resolved:: 29/Oct/13 6:40 PM