Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13085

(namei.c:87:ll_set_inode()) Can not initialize inode [0x540028b1f:0x2:0x0] without object type: valid = 0x100000001

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.10.8
    • corona82 login node
      RHEL 7.7 derivative
      kernel-3.10.0-1062.7.1.1chaos.ch6.x86_64
      lustre-2.12.3_2.chaos-1.4mofed.ch6.x86_64
      filesystem running lustre-2.10.8_5.chaos
    • 3
    • 9223372036854775807

    Description

      I attempted to list the contents of /p/lustre3/faaland1/ using tab complete to determine the "faaland1" part of the path. There was a delay of several seconds, then the ls command returned with appropriate output. On the console of the client node, I saw the following:

      [Tue Dec 17 18:07:31 2019] LustreError: 24055:0:(namei.c:87:ll_set_inode()) Can not initialize inode [0x2400013a0:0x6b6:0x0] without object type: valid = 0x100000001
      [Tue Dec 17 18:07:31 2019] LustreError: 24055:0:(namei.c:87:ll_set_inode()) Skipped 6 previous similar messages
      [Tue Dec 17 18:07:31 2019] LustreError: 24055:0:(llite_lib.c:2426:ll_prep_inode()) new_inode -fatal: rc -12
      [Tue Dec 17 18:07:31 2019] LustreError: 24055:0:(llite_lib.c:2426:ll_prep_inode()) Skipped 6 previous similar messages 

      See https://github.com/LLNL/lustre for the patch stacks.

      Attachments

        Issue Links

          Activity

            [LU-13085] (namei.c:87:ll_set_inode()) Can not initialize inode [0x540028b1f:0x2:0x0] without object type: valid = 0x100000001
            laisiyao Lai Siyao added a comment -

            This is a duplicate of LU-13099, and the fix is on https://review.whamcloud.com/#/c/37089/.

            laisiyao Lai Siyao added a comment - This is a duplicate of LU-13099 , and the fix is on https://review.whamcloud.com/#/c/37089/ .
            pjones Peter Jones added a comment -

            Lai

            Could you please investigate?

            Thanks

            Peter

            pjones Peter Jones added a comment - Lai Could you please investigate? Thanks Peter
            ofaaland Olaf Faaland added a comment - - edited

            I gathered debug logs from the client and attached them.

            • first instance: dk.corona82.1576534399.gz
            • second instance with more debug: dk.corona82.1576635179.gz

            Also the debug logs from the porter MDS nodes (/p/lustre3 file system):

            • dk.porter8x.1576637982
            ofaaland Olaf Faaland added a comment - - edited I gathered debug logs from the client and attached them. first instance: dk.corona82.1576534399.gz second instance with more debug: dk.corona82.1576635179.gz Also the debug logs from the porter MDS nodes (/p/lustre3 file system): dk.porter8x.1576637982
            ofaaland Olaf Faaland added a comment -

            Let me know what information you'd like me to try to gather for the next occurrence.

            ofaaland Olaf Faaland added a comment - Let me know what information you'd like me to try to gather for the next occurrence.
            ofaaland Olaf Faaland added a comment -

            I saw the same error reported by the client earlier today, but I've not yet been able to reproduce it on demand.

            In the earlier case, the FID reported is on a different file system, lustre2.  That other file system is running a slightly different tag, lustre-2.10.8_4.chaos.

            [root@corona82:~]# dmesg -T | grep ll_prep_inode -C2
            [Mon Dec 16 12:54:27 2019] LNetError: 45583:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) Skipped 14 previous similar messages
            [Mon Dec 16 13:02:43 2019] LustreError: 118530:0:(namei.c:87:ll_set_inode()) Can not initialize inode [0x580023454:0x2:0x0] without object type: valid = 0x100000001
            [Mon Dec 16 13:02:43 2019] LustreError: 118530:0:(llite_lib.c:2426:ll_prep_inode()) new_inode -fatal: rc -12
            [Mon Dec 16 13:04:37 2019] LNetError: 45583:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 192.168.128.86@o2ib36 added to recovery queue. Health = 900
            [Mon Dec 16 13:04:37 2019] LNetError: 45583:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) Skipped 11 previous similar messages
            
            [root@corona82:~]# lfs fid2path /p/lustre2/ [0x580023454:0x2:0x0]
            /p/lustre2/faaland1/make-busy/mdt14
            ofaaland Olaf Faaland added a comment - I saw the same error reported by the client earlier today, but I've not yet been able to reproduce it on demand. In the earlier case, the FID reported is on a different file system, lustre2.  That other file system is running a slightly different tag, lustre-2.10.8_4.chaos. [root@corona82:~]# dmesg -T | grep ll_prep_inode -C2 [Mon Dec 16 12:54:27 2019] LNetError: 45583:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) Skipped 14 previous similar messages [Mon Dec 16 13:02:43 2019] LustreError: 118530:0:(namei.c:87:ll_set_inode()) Can not initialize inode [0x580023454:0x2:0x0] without object type: valid = 0x100000001 [Mon Dec 16 13:02:43 2019] LustreError: 118530:0:(llite_lib.c:2426:ll_prep_inode()) new_inode -fatal: rc -12 [Mon Dec 16 13:04:37 2019] LNetError: 45583:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 192.168.128.86@o2ib36 added to recovery queue. Health = 900 [Mon Dec 16 13:04:37 2019] LNetError: 45583:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) Skipped 11 previous similar messages [root@corona82:~]# lfs fid2path /p/lustre2/ [0x580023454:0x2:0x0] /p/lustre2/faaland1/make-busy/mdt14
            ofaaland Olaf Faaland added a comment -

            The two MDS nodes have nothing in dmesg from the time when the error was reported on the client:

            ----------------
            eporter81
            ----------------
            [Tue Dec 17 16:07:18 2019] Lustre: Skipped 3 previous similar messages
            [Tue Dec 17 16:12:11 2019] Lustre: MGS: Connection restored to ae14aba1-e9ca-73c5-d91e-d5258431f3c5 (at 192.168.128.137@o2ib20)
            [Tue Dec 17 16:12:11 2019] Lustre: Skipped 1 previous similar message
            [Tue Dec 17 16:12:23 2019] Lustre: MGS: Connection restored to 57067c2f-6884-8432-75c3-5d29b7b44621 (at 192.168.128.138@o2ib20)
            [Tue Dec 17 16:12:23 2019] Lustre: Skipped 1 previous similar message
            ----------------
            eporter82
            ----------------
            [Tue Dec 17 15:23:34 2019] Lustre: lustre3-MDT0001: Connection restored to 6e206333-0ad1-f33f-5455-92e3e1592cf5 (at 192.168.128.140@o2ib20)
            [Tue Dec 17 16:07:39 2019] Lustre: lustre3-MDT0001: haven't heard from client a8de14f3-4714-ff41-67ca-ffcfd5d3ce43 (at 192.168.128.138@o2ib20) in 227 seconds. I think it's dead, and I am evicting it. exp ffff99e50e087800, cur 1576627639 expire 1576627489 last 1576627412
            [Tue Dec 17 16:07:39 2019] Lustre: Skipped 1 previous similar message
            [Tue Dec 17 16:12:33 2019] Lustre: lustre3-MDT0001: Connection restored to ae14aba1-e9ca-73c5-d91e-d5258431f3c5 (at 192.168.128.137@o2ib20)
            [Tue Dec 17 16:12:45 2019] Lustre: lustre3-MDT0001: Connection restored to a8de14f3-4714-ff41-67ca-ffcfd5d3ce43 (at 192.168.128.138@o2ib20)
            ofaaland Olaf Faaland added a comment - The two MDS nodes have nothing in dmesg from the time when the error was reported on the client: ---------------- eporter81 ---------------- [Tue Dec 17 16:07:18 2019] Lustre: Skipped 3 previous similar messages [Tue Dec 17 16:12:11 2019] Lustre: MGS: Connection restored to ae14aba1-e9ca-73c5-d91e-d5258431f3c5 (at 192.168.128.137@o2ib20) [Tue Dec 17 16:12:11 2019] Lustre: Skipped 1 previous similar message [Tue Dec 17 16:12:23 2019] Lustre: MGS: Connection restored to 57067c2f-6884-8432-75c3-5d29b7b44621 (at 192.168.128.138@o2ib20) [Tue Dec 17 16:12:23 2019] Lustre: Skipped 1 previous similar message ---------------- eporter82 ---------------- [Tue Dec 17 15:23:34 2019] Lustre: lustre3-MDT0001: Connection restored to 6e206333-0ad1-f33f-5455-92e3e1592cf5 (at 192.168.128.140@o2ib20) [Tue Dec 17 16:07:39 2019] Lustre: lustre3-MDT0001: haven't heard from client a8de14f3-4714-ff41-67ca-ffcfd5d3ce43 (at 192.168.128.138@o2ib20) in 227 seconds. I think it's dead, and I am evicting it. exp ffff99e50e087800, cur 1576627639 expire 1576627489 last 1576627412 [Tue Dec 17 16:07:39 2019] Lustre: Skipped 1 previous similar message [Tue Dec 17 16:12:33 2019] Lustre: lustre3-MDT0001: Connection restored to ae14aba1-e9ca-73c5-d91e-d5258431f3c5 (at 192.168.128.137@o2ib20) [Tue Dec 17 16:12:45 2019] Lustre: lustre3-MDT0001: Connection restored to a8de14f3-4714-ff41-67ca-ffcfd5d3ce43 (at 192.168.128.138@o2ib20)
            ofaaland Olaf Faaland added a comment - - edited

            There are 2 MDTs on this file system.  The listed fid's info and my directory's info:

            [root@corona82:~]# lfs fid2path /p/lustre3/ [0x2400013a0:0x6b6:0x0]
            /p/lustre3/bennion1
            [root@corona82:~]# lfs getdirstripe /p/lustre3/bennion1
            lmv_stripe_count: 0 lmv_stripe_offset: 1 lmv_hash_type: none
            
            [root@corona82:~]# lfs getdirstripe /p/lustre3/faaland1
            lmv_stripe_count: 0 lmv_stripe_offset: 0 lmv_hash_type: none
            [root@corona82:~]# lfs path2fid /p/lustre3/faaland1
            [0x200000bd0:0x745:0x0]
            ofaaland Olaf Faaland added a comment - - edited There are 2 MDTs on this file system.  The listed fid's info and my directory's info: [root@corona82:~]# lfs fid2path /p/lustre3/ [0x2400013a0:0x6b6:0x0] /p/lustre3/bennion1 [root@corona82:~]# lfs getdirstripe /p/lustre3/bennion1 lmv_stripe_count: 0 lmv_stripe_offset: 1 lmv_hash_type: none [root@corona82:~]# lfs getdirstripe /p/lustre3/faaland1 lmv_stripe_count: 0 lmv_stripe_offset: 0 lmv_hash_type: none [root@corona82:~]# lfs path2fid /p/lustre3/faaland1 [0x200000bd0:0x745:0x0]

            People

              laisiyao Lai Siyao
              ofaaland Olaf Faaland
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: