Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7321

LustreError: 61814:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) ### client (nid 172.20.17.9@o2ib500) returned -5 from glimpse AST

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 2.5.3
    • LLNL lustre-2.5.4-4chaos_2.6.32_504.16.2.1chaos.ch5.3.x86_64.x86_64
      ZFS Backend
      BG/Q clients
    • 3
    • 9223372036854775807

    Description

      One the file systems that have BlueGene/Q clients, I am seeing these log messages:

      vestai:/g/g0/marc# tail /var/log/lustre.log
      Sep 16 09:25:32 vesta56 kernel: LustreError: 137188:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) Skipped 4 previous similar messages
      Sep 16 09:25:49 vesta56 kernel: LustreError: 137212:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) ### client (nid 172.20.17.11@o2ib500) returned -5 from glimpse AST ns: filter-fsv-OST0037_UUID lock: ffff8807adc3eb40/0xef9b0828fb0cdca lrc: 5/0,0 mode: PW/PW res: [0x4b17112:0x0:0x0].0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x60000080010020 nid: 172.20.17.11@o2ib500 remote: 0x4ceffd8a2ed5c33a expref: 829 pid: 87731 timeout: 9654300922 lvb_type: 0
      Sep 16 09:25:49 vesta56 kernel: LustreError: 137212:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) Skipped 3 previous similar messages
      Sep 16 09:26:37 vesta56 kernel: LustreError: 87731:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) ### client (nid 172.20.17.12@o2ib500) returned -5 from glimpse AST ns: filter-fsv-OST0037_UUID lock: ffff88064a4bc880/0xef9b0828fb0dd59 lrc: 6/0,0 mode: PW/PW res: [0x4b17112:0x0:0x0].0 rrc: 4 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x60000080010020 nid: 172.20.17.12@o2ib500 remote: 0xdec93cb7eb4eca85 expref: 790 pid: 137212 timeout: 9654349575 lvb_type: 0
      Sep 16 09:26:37 vesta56 kernel: LustreError: 87731:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) Skipped 8 previous similar messages
      Sep 16 09:29:19 vesta2 kernel: LustreError: 82827:0:(ldlm_lockd.c:463:ldlm_add_waiting_lock()) ### not waiting on destroyed lock (bug 5653) ns: filter-fsv-OST0001_UUID lock: ffff880e14fefd40/0xca78e45d33428446 lrc: 1/0,0 mode: --/PW res: [0x4d27886:0x0:0x0].0 rrc: 6 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x74801080010020 nid: 172.20.17.2@o2ib500 remote: 0xa0f4c9e1f12378c4 expref: 1893 pid: 10293 timeout: 0 lvb_type: 0
      Sep 16 09:29:19 vesta2 kernel: LustreError: 82827:0:(ldlm_lockd.c:463:ldlm_add_waiting_lock()) Skipped 2 previous similar messages
      Sep 16 09:30:59 vesta20 kernel: LustreError: 61820:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) ### client (nid 172.20.17.9@o2ib500) returned -5 from glimpse AST ns: filter-fsv-OST0013_UUID lock: ffff88047e3127c0/0x1475582a881a7576 lrc: 5/0,0 mode: PW/PW res: [0x4a8bd2d:0x0:0x0].0 rrc: 5 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x60000080010020 nid: 172.20.17.9@o2ib500 remote: 0xa29fd336c2734cc9 expref: 1031 pid: 61818 timeout: 9654614287 lvb_type: 0
      Sep 16 09:31:15 vesta20 kernel: LustreError: 61814:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) ### client (nid 172.20.17.9@o2ib500) returned -5 from glimpse AST ns: filter-fsv-OST0013_UUID lock: ffff8806b70d6380/0x1475582a881a7f4e lrc: 5/0,0 mode: PW/PW res: [0x4a8bd2d:0x0:0x0].0 rrc: 4 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x60001080010020 nid: 172.20.17.9@o2ib500 remote: 0x79abd336c2734cc9 expref: 1031 pid: 61818 timeout: 9654630757 lvb_type: 0
      Sep 16 09:31:15 vesta20 kernel: LustreError: 61814:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) Skipped 27 previous similar messages
      vestai:/g/g0/marc# 
      

      There were over 3000 messages printed to the logs that day, not counting the suppressed messages:
      vestai:/g/g0/marc# zgrep -c "returned -5 from glimpse AST" /var/log/lustre.log-20150915.gz
      3232
      vestai:/g/g0/marc#

      These messages only show up with NIDs related to BlueGene/Q clients, and I don't see them on file systems that do not have BlueGene/Q clients. I don't see anything in the console or syslog logs on the clients.

      On the BG/Q clients we are running:
      vulcanio14-ib0@root: cat /proc/fs/lustre/version
      lustre: 2.5.4
      kernel: patchless_client
      build: 2.5.4-4chaos-4chaos--PRISTINE-2.6.32-504.8.2.bgq.3blueos.V1R2M3.bl2.2_1.ppc64

      We saw spurious messages years ago, but after doing a splunk query of our logs it looks like it has been ramping up since late 2014:
      Number of messages per month (not counting "x similar messages":
      10/14: 172
      11/14: 8043
      12/14: 15,002
      1/15: 17,191
      2/15: 36,258
      3/15: 53,145
      4/15: 69,215
      5/15: 96,518
      6/15: 54,504
      7/15: 91,304
      8/15: 144,691
      9/15: 236,491

      Attachments

        1. debug_log.vesta29.LU-7321.gz
          2.45 MB
        2. debug_log.vesta-mds1.LU-7321.gz
          177 kB
        3. debug_log.vulcanio17
          557 kB
        4. dump-client.log
          186 kB
        5. dump-oss.log.gz
          12.38 MB

        Issue Links

          Activity

            [LU-7321] LustreError: 61814:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) ### client (nid 172.20.17.9@o2ib500) returned -5 from glimpse AST

            And just to add if it helps, the IO node has been up for 6 days:
            [root@vulcansn1:~]# mrsh vulcanio17 uptime
            15:20:58 up 6 days, 6:02, 1 user, load average: 3.39, 5.00, 5.64
            [root@vulcansn1:~]# mrsh vulcanio17 cat /proc/sys/lnet/debug
            ioctl neterror warning error emerg config console
            [root@vulcansn1:~]#

            marc@llnl.gov D. Marc Stearman (Inactive) added a comment - And just to add if it helps, the IO node has been up for 6 days: [root@vulcansn1:~] # mrsh vulcanio17 uptime 15:20:58 up 6 days, 6:02, 1 user, load average: 3.39, 5.00, 5.64 [root@vulcansn1:~] # mrsh vulcanio17 cat /proc/sys/lnet/debug ioctl neterror warning error emerg config console [root@vulcansn1:~] #

            I not sure if the logs have cycled around, but here is the log from vulcanio17, which I just dumped. I'm not sure which field is the timestamp, but it seems like there are no messages since it last mounted.

            marc@llnl.gov D. Marc Stearman (Inactive) added a comment - I not sure if the logs have cycled around, but here is the log from vulcanio17, which I just dumped. I'm not sure which field is the timestamp, but it seems like there are no messages since it last mounted.

            I'm not seeing any errors on the IB fabric that would account for this. Could there be some swabbing issues in the IB LND code? We only see these errors on the file systems connected to PPC clients (the BlueGene BG/Q I/O nodes and login nodes). 172.20.17.17@o2ib500 is vulcanio17 (an I/O node on BG/Q).

            marc@llnl.gov D. Marc Stearman (Inactive) added a comment - I'm not seeing any errors on the IB fabric that would account for this. Could there be some swabbing issues in the IB LND code? We only see these errors on the file systems connected to PPC clients (the BlueGene BG/Q I/O nodes and login nodes). 172.20.17.17@o2ib500 is vulcanio17 (an I/O node on BG/Q).

            There looks like there's a lot of IB errors in the debug logs that are attached (these messages appear in both debug logs that are attached).

            16471:0:(o2iblnd_cb.c:1011:kiblnd_tx_complete()) Tx -> 172.20.17.17@o2ib500 cookie 0xbca6a70ba sending 1 waiting 0: failed 12
            16471:0:(o2iblnd_cb.c:1895:kiblnd_close_conn_locked()) Closing conn to 172.20.17.17@o2ib500: error -5(waiting)
            

            Could these be the source of the errors?

            The error that is printed on the client is coming up from the network stack.

            utopiabound Nathaniel Clark added a comment - There looks like there's a lot of IB errors in the debug logs that are attached (these messages appear in both debug logs that are attached). 16471:0:(o2iblnd_cb.c:1011:kiblnd_tx_complete()) Tx -> 172.20.17.17@o2ib500 cookie 0xbca6a70ba sending 1 waiting 0: failed 12 16471:0:(o2iblnd_cb.c:1895:kiblnd_close_conn_locked()) Closing conn to 172.20.17.17@o2ib500: error -5(waiting) Could these be the source of the errors? The error that is printed on the client is coming up from the network stack.

            Nathaniel, could you please take a step back for a moment and explain the meaning of the error message? I'll start the conversation by giving my best guess at the situation:

            Some server (OST) needs to satisfy a request to lookup the a file's size, but a client currently holds a range lock on the file that means the server doesn't know the size (because there may be outstanding dirty data on the client). The server issues a glimpse AST to the client to get that information. The client returns EIO.

            So if I'm in the ballpark, the first step would be to look at the client code and figure out under which situations the response will be "EIO". I mean, heck, maybe the glimpse AST code just hasn't been correctly swabbed and the client can't make sense of the RPC.

            I could be completely off base. If so, please enlighten us.

            I'm trying to figure out if these errors are appearing because MDT layouts are referencing missing OST objects,

            I am not sure that I understand that rationale. Why would an OST ask a client about an object for which it itself had no knowledge? That would be a bug right there, would it not? The fact that the OST looked up the locks on the object and decided that a client had a lock on the end of the object pretty much tells us that the object existed on the OST at the time it issue the glimpse AST to the client, does it not?

            morrone Christopher Morrone (Inactive) added a comment - Nathaniel, could you please take a step back for a moment and explain the meaning of the error message? I'll start the conversation by giving my best guess at the situation: Some server (OST) needs to satisfy a request to lookup the a file's size, but a client currently holds a range lock on the file that means the server doesn't know the size (because there may be outstanding dirty data on the client). The server issues a glimpse AST to the client to get that information. The client returns EIO. So if I'm in the ballpark, the first step would be to look at the client code and figure out under which situations the response will be "EIO". I mean, heck, maybe the glimpse AST code just hasn't been correctly swabbed and the client can't make sense of the RPC. I could be completely off base. If so, please enlighten us. I'm trying to figure out if these errors are appearing because MDT layouts are referencing missing OST objects, I am not sure that I understand that rationale. Why would an OST ask a client about an object for which it itself had no knowledge? That would be a bug right there, would it not? The fact that the OST looked up the locks on the object and decided that a client had a lock on the end of the object pretty much tells us that the object existed on the OST at the time it issue the glimpse AST to the client, does it not?

            Here are the logs from vesta29, and also from the MDS at the time this log messages appeared:

            Dec 15 14:56:21 vesta29 kernel: LustreError: 9813:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) ### client (nid 172.20.17.17@o2ib500) returned -5 from glimpse AST ns: filter-fsv-OST001c_UUID lock: ffff8803ad803980/0x9749e8c363b75b7f lrc: 5/0,0 mode: PW/PW res: [0x542af58:0x0:0x0].0 rrc: 31 type: EXT [719323136->18446744073709551615] (req 719323136->720371711) flags: 0x60000000000020 nid: 172.20.17.17@o2ib500 remote: 0x1fa09e9911a44c7c expref: 349 pid: 15752 timeout: 7159851604 lvb_type: 0
            
            marc@llnl.gov D. Marc Stearman (Inactive) added a comment - Here are the logs from vesta29, and also from the MDS at the time this log messages appeared: Dec 15 14:56:21 vesta29 kernel: LustreError: 9813:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) ### client (nid 172.20.17.17@o2ib500) returned -5 from glimpse AST ns: filter-fsv-OST001c_UUID lock: ffff8803ad803980/0x9749e8c363b75b7f lrc: 5/0,0 mode: PW/PW res: [0x542af58:0x0:0x0].0 rrc: 31 type: EXT [719323136->18446744073709551615] (req 719323136->720371711) flags: 0x60000000000020 nid: 172.20.17.17@o2ib500 remote: 0x1fa09e9911a44c7c expref: 349 pid: 15752 timeout: 7159851604 lvb_type: 0

            I can probably trigger some dumps on the OSTs and MDS node. Let me tweak some scripts and wait for it to happen.

            marc@llnl.gov D. Marc Stearman (Inactive) added a comment - I can probably trigger some dumps on the OSTs and MDS node. Let me tweak some scripts and wait for it to happen.

            Hello Marc,

            Are you able to provide the information that Nathaniel has requested?

            Any other action you would like us to take on this one?
            Thanks,
            ~ jfc.

            jfc John Fuchs-Chesney (Inactive) added a comment - Hello Marc, Are you able to provide the information that Nathaniel has requested? Any other action you would like us to take on this one? Thanks, ~ jfc.

            Can you provide debug logs from the clients, OSS and MDS when this occurs?

            utopiabound Nathaniel Clark added a comment - Can you provide debug logs from the clients, OSS and MDS when this occurs?

            Just for kicks I picked a couple of other messages on that OST, and found that the files DO exist:

            Oct 20 07:52:13 vesta46 kernel: LustreError: 40313:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) ### client (nid 172.20.17.185@o2ib500) returned -5 from glimpse AST ns: filter-fsv-OST002d_UUID lock: ffff880430e4bb00/0x44146a450a4fb185 lrc: 5/0,0 mode: PW/PW res: [0x4f814c7:0x0:0x0].0 rrc: 17 type: EXT [191496192->18446744073709551615] (req 191496192->191561727) flags: 0x40000000000000 nid: 172.20.17.185@o2ib500 remote: 0x82ceff91b720bbf5 expref: 18 pid: 40313 timeout: 0 lvb_type: 0
            Oct 20 07:52:41 vesta46 kernel: LustreError: 40316:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) ### client (nid 172.20.17.159@o2ib500) returned -5 from glimpse AST ns: filter-fsv-OST002d_UUID lock: ffff880aee7881c0/0x44146a450a4fb903 lrc: 5/0,0 mode: PW/PW res: [0x4f81442:0x0:0x0].0 rrc: 224 type: EXT [1861222400->18446744073709551615] (req 1861222400->18446744073709551615) flags: 0x40000000000000 nid: 172.20.17.159@o2ib500 remote: 0x7363d7a3a2c7e459 expref: 134 pid: 40316 timeout: 0 lvb_type: 0
            Oct 20 08:00:33 vesta46 kernel: LustreError: 42292:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) ### client (nid 172.20.17.177@o2ib500) returned -5 from glimpse AST ns: filter-fsv-OST002d_UUID lock: ffff88068ffb36c0/0x44146a450a50a9cd lrc: 4/0,0 mode: PW/PW res: [0x4f81468:0x0:0x0].0 rrc: 558 type: EXT [1877999616->18446744073709551615] (req 1877999616->18446744073709551615) flags: 0x40000000000000 nid: 172.20.17.177@o2ib500 remote: 0xd4e4f78838543fa8 expref: 291 pid: 42292 timeout: 0 lvb_type: 0
            Oct 20 10:53:33 vesta46 kernel: LustreError: 40317:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) ### client (nid 172.20.17.14@o2ib500) returned -5 from glimpse AST ns: filter-fsv-OST002d_UUID lock: ffff8801c0a3bbc0/0x44146a450a733fa0 lrc: 5/0,0 mode: PW/PW res: [0x4f831ef:0x0:0x0].0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x50801000000000 nid: 172.20.17.14@o2ib500 remote: 0x7266945e442341a2 expref: 5 pid: 40321 timeout: 0 lvb_type: 0
            
            [root@vesta46:snap]# ls O/0/*/83367016
            O/0/d8/83367016
            [root@vesta46:snap]# ls O/0/*/83366978
            O/0/d2/83366978
            [root@vesta46:snap]# ls O/0/*/83374575
            ls: cannot access O/0/*/83374575: No such file or directory
            [root@vesta46:snap]# 
            [root@vesta46:snap]# ls O/0/*/83367111
            O/0/d7/83367111
            [root@vesta46:snap]# 
            

            So, out of the four messages on that OST this morning, three of the files exist, but the one at 10:53:33 does not exist.

            marc@llnl.gov D. Marc Stearman (Inactive) added a comment - Just for kicks I picked a couple of other messages on that OST, and found that the files DO exist: Oct 20 07:52:13 vesta46 kernel: LustreError: 40313:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) ### client (nid 172.20.17.185@o2ib500) returned -5 from glimpse AST ns: filter-fsv-OST002d_UUID lock: ffff880430e4bb00/0x44146a450a4fb185 lrc: 5/0,0 mode: PW/PW res: [0x4f814c7:0x0:0x0].0 rrc: 17 type: EXT [191496192->18446744073709551615] (req 191496192->191561727) flags: 0x40000000000000 nid: 172.20.17.185@o2ib500 remote: 0x82ceff91b720bbf5 expref: 18 pid: 40313 timeout: 0 lvb_type: 0 Oct 20 07:52:41 vesta46 kernel: LustreError: 40316:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) ### client (nid 172.20.17.159@o2ib500) returned -5 from glimpse AST ns: filter-fsv-OST002d_UUID lock: ffff880aee7881c0/0x44146a450a4fb903 lrc: 5/0,0 mode: PW/PW res: [0x4f81442:0x0:0x0].0 rrc: 224 type: EXT [1861222400->18446744073709551615] (req 1861222400->18446744073709551615) flags: 0x40000000000000 nid: 172.20.17.159@o2ib500 remote: 0x7363d7a3a2c7e459 expref: 134 pid: 40316 timeout: 0 lvb_type: 0 Oct 20 08:00:33 vesta46 kernel: LustreError: 42292:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) ### client (nid 172.20.17.177@o2ib500) returned -5 from glimpse AST ns: filter-fsv-OST002d_UUID lock: ffff88068ffb36c0/0x44146a450a50a9cd lrc: 4/0,0 mode: PW/PW res: [0x4f81468:0x0:0x0].0 rrc: 558 type: EXT [1877999616->18446744073709551615] (req 1877999616->18446744073709551615) flags: 0x40000000000000 nid: 172.20.17.177@o2ib500 remote: 0xd4e4f78838543fa8 expref: 291 pid: 42292 timeout: 0 lvb_type: 0 Oct 20 10:53:33 vesta46 kernel: LustreError: 40317:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) ### client (nid 172.20.17.14@o2ib500) returned -5 from glimpse AST ns: filter-fsv-OST002d_UUID lock: ffff8801c0a3bbc0/0x44146a450a733fa0 lrc: 5/0,0 mode: PW/PW res: [0x4f831ef:0x0:0x0].0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x50801000000000 nid: 172.20.17.14@o2ib500 remote: 0x7266945e442341a2 expref: 5 pid: 40321 timeout: 0 lvb_type: 0 [root@vesta46:snap]# ls O/0/*/83367016 O/0/d8/83367016 [root@vesta46:snap]# ls O/0/*/83366978 O/0/d2/83366978 [root@vesta46:snap]# ls O/0/*/83374575 ls: cannot access O/0/*/83374575: No such file or directory [root@vesta46:snap]# [root@vesta46:snap]# ls O/0/*/83367111 O/0/d7/83367111 [root@vesta46:snap]# So, out of the four messages on that OST this morning, three of the files exist, but the one at 10:53:33 does not exist.

            Just to make sure I did the right thing:

            Using this current message:

            Oct 20 10:53:33 vesta46 kernel: LustreError: 40317:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) ### client (nid 172.20.17.14@o2ib500) returned -5 from glimpse AST ns: filter-fsv-OST002d_UUID lock: ffff8801c0a3bbc0/0x44146a450a733fa0 lrc: 5/0,0 mode: PW/PW res: [0x4f831ef:0x0:0x0].0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x50801000000000 nid: 172.20.17.14@o2ib500 remote: 0x7266945e442341a2 expref: 5 pid: 40321 timeout: 0 lvb_type: 0
            

            I want tot look for object "0x4f831ef:0x0:0x0" on vesta46. I created a snapshot of the OST, and mounted it. "0x4f831ef" translates to 83374575 in decimal.

            In the snapshot, I did this:

            [root@vesta46:snap]# ls
            CONFIGS       oi.10   oi.105  oi.110  oi.116  oi.121  oi.127  oi.18  oi.23  oi.29  oi.34  oi.4   oi.45  oi.50  oi.56  oi.61  oi.67  oi.72  oi.78  oi.83  oi.89  oi.94  quota_slave
            O             oi.100  oi.106  oi.111  oi.117  oi.122  oi.13   oi.19  oi.24  oi.3   oi.35  oi.40  oi.46  oi.51  oi.57  oi.62  oi.68  oi.73  oi.79  oi.84  oi.9   oi.95  seq-200000003-lastid
            health_check  oi.101  oi.107  oi.112  oi.118  oi.123  oi.14   oi.2   oi.25  oi.30  oi.36  oi.41  oi.47  oi.52  oi.58  oi.63  oi.69  oi.74  oi.8   oi.85  oi.90  oi.96  seq_srv
            last_rcvd     oi.102  oi.108  oi.113  oi.119  oi.124  oi.15   oi.20  oi.26  oi.31  oi.37  oi.42  oi.48  oi.53  oi.59  oi.64  oi.7   oi.75  oi.80  oi.86  oi.91  oi.97
            oi.0          oi.103  oi.109  oi.114  oi.12   oi.125  oi.16   oi.21  oi.27  oi.32  oi.38  oi.43  oi.49  oi.54  oi.6   oi.65  oi.70  oi.76  oi.81  oi.87  oi.92  oi.98
            oi.1          oi.104  oi.11   oi.115  oi.120  oi.126  oi.17   oi.22  oi.28  oi.33  oi.39  oi.44  oi.5   oi.55  oi.60  oi.66  oi.71  oi.77  oi.82  oi.88  oi.93  oi.99
            [root@vesta46:snap]# ls O/0
            d0  d1  d10  d11  d12  d13  d14  d15  d16  d17  d18  d19  d2  d20  d21  d22  d23  d24  d25  d26  d27  d28  d29  d3  d30  d31  d4  d5  d6  d7  d8  d9
            [root@vesta46:snap]# ls O/0/*/83374575
            ls: cannot access O/0/*/83374575: No such file or directory
            [root@vesta46:snap]# 
            

            This tells me the object does not exist. The next sequence number does, so it seems to be a valid range of objects:

            [root@vesta46:snap]# ls O/0/*/83374576
            O/0/d16/83374576
            [root@vesta46:snap]# 
            

            If I did not look in the right place, let me know and I can check again.

            marc@llnl.gov D. Marc Stearman (Inactive) added a comment - Just to make sure I did the right thing: Using this current message: Oct 20 10:53:33 vesta46 kernel: LustreError: 40317:0:(ldlm_lockd.c:692:ldlm_handle_ast_error()) ### client (nid 172.20.17.14@o2ib500) returned -5 from glimpse AST ns: filter-fsv-OST002d_UUID lock: ffff8801c0a3bbc0/0x44146a450a733fa0 lrc: 5/0,0 mode: PW/PW res: [0x4f831ef:0x0:0x0].0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->1048575) flags: 0x50801000000000 nid: 172.20.17.14@o2ib500 remote: 0x7266945e442341a2 expref: 5 pid: 40321 timeout: 0 lvb_type: 0 I want tot look for object "0x4f831ef:0x0:0x0" on vesta46. I created a snapshot of the OST, and mounted it. "0x4f831ef" translates to 83374575 in decimal. In the snapshot, I did this: [root@vesta46:snap]# ls CONFIGS oi.10 oi.105 oi.110 oi.116 oi.121 oi.127 oi.18 oi.23 oi.29 oi.34 oi.4 oi.45 oi.50 oi.56 oi.61 oi.67 oi.72 oi.78 oi.83 oi.89 oi.94 quota_slave O oi.100 oi.106 oi.111 oi.117 oi.122 oi.13 oi.19 oi.24 oi.3 oi.35 oi.40 oi.46 oi.51 oi.57 oi.62 oi.68 oi.73 oi.79 oi.84 oi.9 oi.95 seq-200000003-lastid health_check oi.101 oi.107 oi.112 oi.118 oi.123 oi.14 oi.2 oi.25 oi.30 oi.36 oi.41 oi.47 oi.52 oi.58 oi.63 oi.69 oi.74 oi.8 oi.85 oi.90 oi.96 seq_srv last_rcvd oi.102 oi.108 oi.113 oi.119 oi.124 oi.15 oi.20 oi.26 oi.31 oi.37 oi.42 oi.48 oi.53 oi.59 oi.64 oi.7 oi.75 oi.80 oi.86 oi.91 oi.97 oi.0 oi.103 oi.109 oi.114 oi.12 oi.125 oi.16 oi.21 oi.27 oi.32 oi.38 oi.43 oi.49 oi.54 oi.6 oi.65 oi.70 oi.76 oi.81 oi.87 oi.92 oi.98 oi.1 oi.104 oi.11 oi.115 oi.120 oi.126 oi.17 oi.22 oi.28 oi.33 oi.39 oi.44 oi.5 oi.55 oi.60 oi.66 oi.71 oi.77 oi.82 oi.88 oi.93 oi.99 [root@vesta46:snap]# ls O/0 d0 d1 d10 d11 d12 d13 d14 d15 d16 d17 d18 d19 d2 d20 d21 d22 d23 d24 d25 d26 d27 d28 d29 d3 d30 d31 d4 d5 d6 d7 d8 d9 [root@vesta46:snap]# ls O/0/*/83374575 ls: cannot access O/0/*/83374575: No such file or directory [root@vesta46:snap]# This tells me the object does not exist. The next sequence number does, so it seems to be a valid range of objects: [root@vesta46:snap]# ls O/0/*/83374576 O/0/d16/83374576 [root@vesta46:snap]# If I did not look in the right place, let me know and I can check again.

            People

              green Oleg Drokin
              marc@llnl.gov D. Marc Stearman (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: