Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2499

Help debug waiting_locks_callback causing client eviction

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Critical
    • None
    • Lustre 2.1.3
    • 2
    • 5857

    Description

      We are seeing the following error.

      Dec 13 08:35:39 nbp2-oss1 kernel: LustreError: 0:0:(ldlm_lockd.c:358:waiting_locks_callback()) ### lock callback timer expired after 351s: evicting client at 10.151.34.219@o2ib ns: filter-nbp2-OST0018_UUID lock: ffff8804c55d8480/0x1ca7e7e6c780ff4d lrc: 3/0,0 mode: PW/PW res: 182889173/0 rrc: 5 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x20 remote: 0xd281632991b12020 expref: 6 pid: 19246 timeout 7391670727

      With the client evicted we get dirty_page_discards like this.

      Dec 13 08:35:40 r305i3n1 kernel: [1164772.491928] Lustre: 7178:0:(llite_lib.c:2283:ll_dirty_page_discard_warn()) nbp2: dirty page discard: 10.151.26.5@o2ib:/nbp2/fid: [0x5677ca33040:0x2d5:0x0]//mlellis/RunStilt/runs/20120523-Cherskii-d01-WRF-TEST-20121213.15.46.32.UTC/run_d01/Exe/Copy8/cdump may get corrupted (rc -4)

      We have seen this happen at the beginning of a job. Now we are runing lflush before the start of every job. Could lflush cause this?

      We stilling trying to to reproduce it and gather additional logs.

      Attachments

        Activity

          [LU-2499] Help debug waiting_locks_callback causing client eviction

          People

            bobijam Zhenyu Xu
            mhanafi Mahmoud Hanafi
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: