Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7321

LustreError: 61814:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) ### client (nid 172.20.17.9@o2ib500) returned -5 from glimpse AST

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 2.5.3
    • LLNL lustre-2.5.4-4chaos_2.6.32_504.16.2.1chaos.ch5.3.x86_64.x86_64
      ZFS Backend
      BG/Q clients
    • 3
    • 9223372036854775807

    Description

      One the file systems that have BlueGene/Q clients, I am seeing these log messages:

      vestai:/g/g0/marc# tail /var/log/lustre.log
      Sep 16 09:25:32 vesta56 kernel: LustreError: 137188:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) Skipped 4 previous similar messages
      Sep 16 09:25:49 vesta56 kernel: LustreError: 137212:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) ### client (nid 172.20.17.11@o2ib500) returned -5 from glimpse AST ns: filter-fsv-OST0037_UUID lock: ffff8807adc3eb40/0xef9b0828fb0cdca lrc: 5/0,0 mode: PW/PW res: [0x4b17112:0x0:0x0].0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x60000080010020 nid: 172.20.17.11@o2ib500 remote: 0x4ceffd8a2ed5c33a expref: 829 pid: 87731 timeout: 9654300922 lvb_type: 0
      Sep 16 09:25:49 vesta56 kernel: LustreError: 137212:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) Skipped 3 previous similar messages
      Sep 16 09:26:37 vesta56 kernel: LustreError: 87731:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) ### client (nid 172.20.17.12@o2ib500) returned -5 from glimpse AST ns: filter-fsv-OST0037_UUID lock: ffff88064a4bc880/0xef9b0828fb0dd59 lrc: 6/0,0 mode: PW/PW res: [0x4b17112:0x0:0x0].0 rrc: 4 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x60000080010020 nid: 172.20.17.12@o2ib500 remote: 0xdec93cb7eb4eca85 expref: 790 pid: 137212 timeout: 9654349575 lvb_type: 0
      Sep 16 09:26:37 vesta56 kernel: LustreError: 87731:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) Skipped 8 previous similar messages
      Sep 16 09:29:19 vesta2 kernel: LustreError: 82827:0:(ldlm_lockd.c:463:ldlm_add_waiting_lock()) ### not waiting on destroyed lock (bug 5653) ns: filter-fsv-OST0001_UUID lock: ffff880e14fefd40/0xca78e45d33428446 lrc: 1/0,0 mode: --/PW res: [0x4d27886:0x0:0x0].0 rrc: 6 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x74801080010020 nid: 172.20.17.2@o2ib500 remote: 0xa0f4c9e1f12378c4 expref: 1893 pid: 10293 timeout: 0 lvb_type: 0
      Sep 16 09:29:19 vesta2 kernel: LustreError: 82827:0:(ldlm_lockd.c:463:ldlm_add_waiting_lock()) Skipped 2 previous similar messages
      Sep 16 09:30:59 vesta20 kernel: LustreError: 61820:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) ### client (nid 172.20.17.9@o2ib500) returned -5 from glimpse AST ns: filter-fsv-OST0013_UUID lock: ffff88047e3127c0/0x1475582a881a7576 lrc: 5/0,0 mode: PW/PW res: [0x4a8bd2d:0x0:0x0].0 rrc: 5 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x60000080010020 nid: 172.20.17.9@o2ib500 remote: 0xa29fd336c2734cc9 expref: 1031 pid: 61818 timeout: 9654614287 lvb_type: 0
      Sep 16 09:31:15 vesta20 kernel: LustreError: 61814:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) ### client (nid 172.20.17.9@o2ib500) returned -5 from glimpse AST ns: filter-fsv-OST0013_UUID lock: ffff8806b70d6380/0x1475582a881a7f4e lrc: 5/0,0 mode: PW/PW res: [0x4a8bd2d:0x0:0x0].0 rrc: 4 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x60001080010020 nid: 172.20.17.9@o2ib500 remote: 0x79abd336c2734cc9 expref: 1031 pid: 61818 timeout: 9654630757 lvb_type: 0
      Sep 16 09:31:15 vesta20 kernel: LustreError: 61814:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) Skipped 27 previous similar messages
      vestai:/g/g0/marc# 
      

      There were over 3000 messages printed to the logs that day, not counting the suppressed messages:
      vestai:/g/g0/marc# zgrep -c "returned -5 from glimpse AST" /var/log/lustre.log-20150915.gz
      3232
      vestai:/g/g0/marc#

      These messages only show up with NIDs related to BlueGene/Q clients, and I don't see them on file systems that do not have BlueGene/Q clients. I don't see anything in the console or syslog logs on the clients.

      On the BG/Q clients we are running:
      vulcanio14-ib0@root: cat /proc/fs/lustre/version
      lustre: 2.5.4
      kernel: patchless_client
      build: 2.5.4-4chaos-4chaos--PRISTINE-2.6.32-504.8.2.bgq.3blueos.V1R2M3.bl2.2_1.ppc64

      We saw spurious messages years ago, but after doing a splunk query of our logs it looks like it has been ramping up since late 2014:
      Number of messages per month (not counting "x similar messages":
      10/14: 172
      11/14: 8043
      12/14: 15,002
      1/15: 17,191
      2/15: 36,258
      3/15: 53,145
      4/15: 69,215
      5/15: 96,518
      6/15: 54,504
      7/15: 91,304
      8/15: 144,691
      9/15: 236,491

      Attachments

        1. debug_log.vesta-mds1.LU-7321.gz
          177 kB
        2. debug_log.vesta29.LU-7321.gz
          2.45 MB
        3. debug_log.vulcanio17
          557 kB
        4. dump-oss.log.gz
          12.38 MB
        5. dump-client.log
          186 kB

        Issue Links

          Activity

            People

              green Oleg Drokin
              marc@llnl.gov D. Marc Stearman (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: