Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2499

Help debug waiting_locks_callback causing client eviction

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Critical
    • None
    • Lustre 2.1.3
    • 2
    • 5857

    Description

      We are seeing the following error.

      Dec 13 08:35:39 nbp2-oss1 kernel: LustreError: 0:0:(ldlm_lockd.c:358:waiting_locks_callback()) ### lock callback timer expired after 351s: evicting client at 10.151.34.219@o2ib ns: filter-nbp2-OST0018_UUID lock: ffff8804c55d8480/0x1ca7e7e6c780ff4d lrc: 3/0,0 mode: PW/PW res: 182889173/0 rrc: 5 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0xd281632991b12020 expref: 6 pid: 19246 timeout 7391670727

      With the client evicted we get dirty_page_discards like this.

      Dec 13 08:35:40 r305i3n1 kernel: [1164772.491928] Lustre: 7178:0:(llite_lib.c:2283:ll_dirty_page_discard_warn()) nbp2: dirty page discard: 10.151.26.5@o2ib:/nbp2/fid: [0x5677ca33040:0x2d5:0x0]//mlellis/RunStilt/runs/20120523-Cherskii-d01-WRF-TEST-20121213.15.46.32.UTC/run_d01/Exe/Copy8/cdump may get corrupted (rc -4)

      We have seen this happen at the beginning of a job. Now we are runing lflush before the start of every job. Could lflush cause this?

      We stilling trying to to reproduce it and gather additional logs.

      Attachments

        Activity

          People

            bobijam Zhenyu Xu
            mhanafi Mahmoud Hanafi
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: