[LU-7913] lustre 2.8 servers and 2.5.5 client - temporary I/O error on 2.8 clients Created: 24/Mar/16  Updated: 13/Oct/21  Resolved: 13/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Olaf Faaland Assignee: Lai Siyao
Resolution: Cannot Reproduce Votes: 0
Labels: llnl, zfs
Environment:

servers:lustre-2.8.0-2.6.32_573.18.1.1chaos.ch5.4.x86_64_g0bbc784.x86_64
client: lustre-2.5.5-3chaos_2.6.32_573.8.1.2chaos.ch5.4.x86_64.x86_64


Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

On 2.8 client, user sees:
ls: cannot access /p/ldne/faaland1: Input/output error

After unmounting and remounting on the 2.8 clients, the problem went away.

Client console log shows:
2016-03-23 16:44:33 LustreError: 32645:0:(llite_lib.c:2309:ll_prep_inode()) new_inode -fatal: rc -5

MDS console log shows:
2016-03-23 16:44:33 LustreError: 44914:0:(mdt_handler.c:1376:mdt_getattr_name_lock()) ldne-MDT0002: parent [0x2c0000403:0x2:0x0] is on r2016-03-23 16:44:33 LustreError: 44914:0:(mdt_handler.c:1376:mdt_getattr_name_lock()) Skipped 20 previous similar messages
2016-03-23 16:45:22 LustreError: 44914:0:(mdt_handler.c:1376:mdt_getattr_name_lock()) ldne-MDT0002: parent [0x2c0000403:0x2:0x0] is on r
2016-03-23 16:45:22 LustreError: 44914:0:(mdt_handler.c:1376:mdt_getattr_name_lock()) Skipped 20 previous similar messages
2016-03-23 16:45:28 LustreError: 44195:0:(mdt_handler.c:1376:mdt_getattr_name_lock()) ldne-MDT0002: parent [0x2c0000403:0x2:0x0] is on r
2016-03-23 16:45:28 LustreError: 44195:0:(mdt_handler.c:1376:mdt_getattr_name_lock()) Skipped 19 previous similar messages
2016-03-23 16:46:03 LustreError: 44199:0:(mdt_handler.c:1376:mdt_getattr_name_lock()) ldne-MDT0002: parent [0x2c0000403:0x2:0x0] is on r2016-03-23 16:51:42 LustreError: 44199:0:(mdt_handler.c:1376:mdt_getattr_name_lock()) ldne-MDT0002: parent [0x2c0000403:0x2:0x0] is on r
2016-03-23 16:51:42 LustreError: 44199:0:(mdt_handler.c:1376:mdt_getattr_name_lock()) Skipped 1 previous similar message
2016-03-23 16:54:19 LustreError: 44199:0:(mdt_handler.c:1376:mdt_getattr_name_lock()) ldne-MDT0002: parent [0x2c0000403:0x2:0x0] is on r2016-03-23 16:54:19 LustreError: 44199:0:(mdt_handler.c:1376:mdt_getattr_name_lock()) Skipped 3 previous similar messages

The following sequence of steps produced the corrupt directory entry:

1. formatted filesystem on servers running 2.8 RC4
2. mounted on nodes running 2.8 RC4 clients
3. mounted on node running 2.55 client
4. on 2.55 client, created directories under root including "faaland1"
5. on 2.8 client zwicky80, renamed faaland1 faaland1.old
6. on 2.8 client zwicky80, lfs mkdir faaland1 --count=4; lfs setdirstripe -D --count=4 faaland1
7. On all 2.8 clients OTHER THAN zwicky80, (e.g. zwicky82), attempt to access faaland1 fails. I thought I saw ENOTSUPP in error output somewhere, but cannot find it now.
8. After umounting on all the clients, and then remounting on the 2.8 clients, access to the directory appeared to work normally again.



 Comments   
Comment by Oleg Drokin [ 24/Mar/16 ]

I take it the failing clients actually accessed the directory before it was renamed and replaced?
Instead of remount, does flushing mdc locks helps to alleviate this too?

Comment by Oleg Drokin [ 24/Mar/16 ]

also does it matter that the original directory is created by 2.5.x client? or any one would do?

Comment by Joseph Gmitter (Inactive) [ 24/Mar/16 ]

Hi Lai,
Can you try and reproduce this and assess what might be happening?
Thanks.
Joe

Comment by Olaf Faaland [ 29/Mar/16 ]

Oleg,

I've only been able to reproduce it again once, since I created the ticket, so I don't have answers to all your questions.

Yes, the failing clients accessed the directory before it was renamed and replaced.

I don't yet know whether creating the original directory with a 2.8 client triggers the same behavior or not, and don't know whether flushing mdc locks alleviates the problem.

-Olaf

Generated at Sat Feb 10 02:13:00 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.