[LU-13085] (namei.c:87:ll_set_inode()) Can not initialize inode [0x540028b1f:0x2:0x0] without object type: valid = 0x100000001 Created: 18/Dec/19  Updated: 01/Feb/20  Resolved: 22/Jan/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.8
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Olaf Faaland Assignee: Lai Siyao
Resolution: Duplicate Votes: 0
Labels: llnl
Environment:

corona82 login node
RHEL 7.7 derivative
kernel-3.10.0-1062.7.1.1chaos.ch6.x86_64
lustre-2.12.3_2.chaos-1.4mofed.ch6.x86_64
filesystem running lustre-2.10.8_5.chaos


Attachments: File dk.corona82.1576534399.gz     File dk.corona82.1576635179.gz     File dk.porter81.1576637982.gz     File dk.porter82.1576637982.gz    
Issue Links:
Related
is related to LU-13099 ll_set_inode()) Can not initialize in... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

I attempted to list the contents of /p/lustre3/faaland1/ using tab complete to determine the "faaland1" part of the path. There was a delay of several seconds, then the ls command returned with appropriate output. On the console of the client node, I saw the following:

[Tue Dec 17 18:07:31 2019] LustreError: 24055:0:(namei.c:87:ll_set_inode()) Can not initialize inode [0x2400013a0:0x6b6:0x0] without object type: valid = 0x100000001
[Tue Dec 17 18:07:31 2019] LustreError: 24055:0:(namei.c:87:ll_set_inode()) Skipped 6 previous similar messages
[Tue Dec 17 18:07:31 2019] LustreError: 24055:0:(llite_lib.c:2426:ll_prep_inode()) new_inode -fatal: rc -12
[Tue Dec 17 18:07:31 2019] LustreError: 24055:0:(llite_lib.c:2426:ll_prep_inode()) Skipped 6 previous similar messages 

See https://github.com/LLNL/lustre for the patch stacks.



 Comments   
Comment by Olaf Faaland [ 18/Dec/19 ]

There are 2 MDTs on this file system.  The listed fid's info and my directory's info:

[root@corona82:~]# lfs fid2path /p/lustre3/ [0x2400013a0:0x6b6:0x0]
/p/lustre3/bennion1
[root@corona82:~]# lfs getdirstripe /p/lustre3/bennion1
lmv_stripe_count: 0 lmv_stripe_offset: 1 lmv_hash_type: none

[root@corona82:~]# lfs getdirstripe /p/lustre3/faaland1
lmv_stripe_count: 0 lmv_stripe_offset: 0 lmv_hash_type: none
[root@corona82:~]# lfs path2fid /p/lustre3/faaland1
[0x200000bd0:0x745:0x0]
Comment by Olaf Faaland [ 18/Dec/19 ]

The two MDS nodes have nothing in dmesg from the time when the error was reported on the client:

----------------
eporter81
----------------
[Tue Dec 17 16:07:18 2019] Lustre: Skipped 3 previous similar messages
[Tue Dec 17 16:12:11 2019] Lustre: MGS: Connection restored to ae14aba1-e9ca-73c5-d91e-d5258431f3c5 (at 192.168.128.137@o2ib20)
[Tue Dec 17 16:12:11 2019] Lustre: Skipped 1 previous similar message
[Tue Dec 17 16:12:23 2019] Lustre: MGS: Connection restored to 57067c2f-6884-8432-75c3-5d29b7b44621 (at 192.168.128.138@o2ib20)
[Tue Dec 17 16:12:23 2019] Lustre: Skipped 1 previous similar message
----------------
eporter82
----------------
[Tue Dec 17 15:23:34 2019] Lustre: lustre3-MDT0001: Connection restored to 6e206333-0ad1-f33f-5455-92e3e1592cf5 (at 192.168.128.140@o2ib20)
[Tue Dec 17 16:07:39 2019] Lustre: lustre3-MDT0001: haven't heard from client a8de14f3-4714-ff41-67ca-ffcfd5d3ce43 (at 192.168.128.138@o2ib20) in 227 seconds. I think it's dead, and I am evicting it. exp ffff99e50e087800, cur 1576627639 expire 1576627489 last 1576627412
[Tue Dec 17 16:07:39 2019] Lustre: Skipped 1 previous similar message
[Tue Dec 17 16:12:33 2019] Lustre: lustre3-MDT0001: Connection restored to ae14aba1-e9ca-73c5-d91e-d5258431f3c5 (at 192.168.128.137@o2ib20)
[Tue Dec 17 16:12:45 2019] Lustre: lustre3-MDT0001: Connection restored to a8de14f3-4714-ff41-67ca-ffcfd5d3ce43 (at 192.168.128.138@o2ib20)
Comment by Olaf Faaland [ 18/Dec/19 ]

I saw the same error reported by the client earlier today, but I've not yet been able to reproduce it on demand.

In the earlier case, the FID reported is on a different file system, lustre2.  That other file system is running a slightly different tag, lustre-2.10.8_4.chaos.

[root@corona82:~]# dmesg -T | grep ll_prep_inode -C2
[Mon Dec 16 12:54:27 2019] LNetError: 45583:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) Skipped 14 previous similar messages
[Mon Dec 16 13:02:43 2019] LustreError: 118530:0:(namei.c:87:ll_set_inode()) Can not initialize inode [0x580023454:0x2:0x0] without object type: valid = 0x100000001
[Mon Dec 16 13:02:43 2019] LustreError: 118530:0:(llite_lib.c:2426:ll_prep_inode()) new_inode -fatal: rc -12
[Mon Dec 16 13:04:37 2019] LNetError: 45583:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 192.168.128.86@o2ib36 added to recovery queue. Health = 900
[Mon Dec 16 13:04:37 2019] LNetError: 45583:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) Skipped 11 previous similar messages

[root@corona82:~]# lfs fid2path /p/lustre2/ [0x580023454:0x2:0x0]
/p/lustre2/faaland1/make-busy/mdt14
Comment by Olaf Faaland [ 18/Dec/19 ]

Let me know what information you'd like me to try to gather for the next occurrence.

Comment by Olaf Faaland [ 18/Dec/19 ]

I gathered debug logs from the client and attached them.

  • first instance: dk.corona82.1576534399.gz
  • second instance with more debug: dk.corona82.1576635179.gz

Also the debug logs from the porter MDS nodes (/p/lustre3 file system):

  • dk.porter8x.1576637982
Comment by Peter Jones [ 18/Dec/19 ]

Lai

Could you please investigate?

Thanks

Peter

Comment by Lai Siyao [ 29/Dec/19 ]

This is a duplicate of LU-13099, and the fix is on https://review.whamcloud.com/#/c/37089/.

Generated at Sat Feb 10 02:58:14 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.