[LU-12209] cannot create stripe dir: Stale file handle Created: 19/Apr/19  Updated: 21/Apr/19  Resolved: 20/Apr/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.7
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Stephane Thiell Assignee: Lai Siyao
Resolution: Duplicate Votes: 0
Labels: None
Environment:

CentOS 7.6, servers 2.10.7, clients 2.12 or 2.10


Attachments: File oak-md1-s1-MDT1.dk.gz     File oak-md1-s2-MDT0.dk.gz     File sh-101-60.dk.gz    
Issue Links:
Duplicate
duplicates LU-11418 hung threads on MDT and MDT won't umount Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

I'm facing a new issue on Oak (2.10.7 servers), tried with both 2.10 and 2.12 clients:

As root:

# cd /oak/stanford/groups/
# lfs mkdir -i 1 caiwei
lfs mkdir: dirstripe error on 'caiwei': Stale file handle
lfs setdirstripe: cannot create stripe dir 'caiwei': Stale file handle

# lfs getdirstripe .
lmv_stripe_count: 0 lmv_stripe_offset: 0 lmv_hash_type: none

does that ring a bell? Oak is only using DNE v1 with statically striped directories. Never seen that before 2.10.7 (we recently upgraded Oak).

a basic lctl dk doesn't show anything on the MDS but I may have to enable specific debug flags to see more. No other traces found so far.

Tried with 2.10 and 2.12 clients, with or without idmap.

Thanks!



 Comments   
Comment by Stephane Thiell [ 19/Apr/19 ]

Note that MDT0001 is still working fine within already-created directories:

[root@oak-rbh01 giocomo]# pwd
/oak/stanford/groups/giocomo
[root@oak-rbh01 giocomo]# lfs getdirstripe .
lmv_stripe_count: 0 lmv_stripe_offset: 1 lmv_hash_type: none
[root@oak-rbh01 giocomo]# mkdir .testdir
[root@oak-rbh01 giocomo]# lfs mkdir -i 1 .testdir2
[root@oak-rbh01 giocomo]# lfs getdirstripe .testdir*
lmv_stripe_count: 0 lmv_stripe_offset: 1 lmv_hash_type: none
lmv_stripe_count: 0 lmv_stripe_offset: 1 lmv_hash_type: none
[root@oak-rbh01 giocomo]# rmdir .testdir*
[root@oak-rbh01 giocomo]# 
Comment by Patrick Farrell (Inactive) [ 19/Apr/19 ]

If it is this repeatable, can you get -1 debug on the client and the server?  I know that may be a pain server side, but if possible it would be great.

Comment by Stephane Thiell [ 19/Apr/19 ]

Hi Patrick,

Ok, I will try (should be in an hour, I'm on my way to the office). But it looks like it is repeatable but only if doing lfs mkdir -i 1 in a parent directory striped on MDT0. See below.

Creating a directory on MDT0 in a parent dir in MDT1 does work:

[root@oak-rbh01 giocomo]# lfs getdirstripe .
lmv_stripe_count: 0 lmv_stripe_offset: 1 lmv_hash_type: none

[root@oak-rbh01 giocomo]# lfs mkdir -i 0 .testdir_mdt0
[root@oak-rbh01 giocomo]# lfs getdirstripe .testdir_mdt0
lmv_stripe_count: 0 lmv_stripe_offset: 0 lmv_hash_type: none
[root@oak-rbh01 giocomo]# rmdir .testdir_mdt0

But not the other way around:

[root@oak-rbh01 giocomo]# cd ../ruthm
[root@oak-rbh01 ruthm]# lfs getdirstripe .
lmv_stripe_count: 0 lmv_stripe_offset: 0 lmv_hash_type: none
[root@oak-rbh01 ruthm]# lfs mkdir -i 0 .testdir_mdt0
[root@oak-rbh01 ruthm]# 
[root@oak-rbh01 ruthm]# lfs mkdir -i 1 .testdir_mdt1
error on LL_IOC_LMV_SETSTRIPE '.testdir_mdt1' (3): Stale file handle
error: mkdir: create stripe dir '.testdir_mdt1' failed
Comment by Stephane Thiell [ 19/Apr/19 ]

This is done.
 
Command issued on client sh-101-60 (10.9.101.60@o2ib4) running 2.12 was:

[root@sh-101-60 ruthm]# lctl clear
[root@sh-101-60 ruthm]# lfs mkdir -i 1 .testdir_mdt1
lfs mkdir: dirstripe error on '.testdir_mdt1': Stale file handle
lfs setdirstripe: cannot create stripe dir '.testdir_mdt1': Stale file handle

Nothing else was running on this client.

Client logs attached as sh-101-60.dk.gz

MDT0 and 1 dk logs attached as oak-md1-s2-MDT0.dk.gz and oak-md1-s1-MDT1.dk.gz

Comment by Patrick Farrell (Inactive) [ 19/Apr/19 ]

Stephane,

Thanks for the more detailed logs.

Here's the source of that ESTALE:

00000040:00000001:18.0:1555692294.403870:0:22597:0:(llog_osd.c:322:llog_osd_declare_write_rec()) Process entered
00000040:00000001:18.0:1555692294.403870:0:22597:0:(llog_osd.c:340:llog_osd_declare_write_rec()) Process leaving (rc=18446744073709551500 : -116 : ffffffffffffff8c)
00000040:00000001:18.0:1555692294.403871:0:22597:0:(llog.c:960:llog_declare_write_rec()) Process leaving (rc=18446744073709551500 : -116 : ffffffffffffff8c)
00000040:00000001:18.0:1555692294.403871:0:22597:0:(llog_cat.c:141:llog_cat_new_log()) Process leaving via out (rc=18446744073709551500 : -116 : 0xffffffffffffff8c) 

Looks to be out of osp_md_declare_write:

         if (dt2osp_obj(dt)->opo_stale)
                return -ESTALE;

But I'm not sure of much more.  I'm going to ask Lai to take a look at this - It's in the DNE area, as you noted.

 

As for what would fix this...  A failover /failbackof MDT1 might do the trick.  It kind of looks like there's confusion over the state of an object in memory, and I think that might clear it up.

Comment by Stephane Thiell [ 19/Apr/19 ]

Thanks Patrick for this analysis. I see that obj->opo_stale = 1; only in osp_invalidate()...

Because it's not impacting production, but just new group creation, we won't failover the MDT today (new groups can wait a bit ). We have some interactive jobs running. But I'll try to find a good time during the weekend to do so. Let me know if you want me to grab more debug info before then.

Comment by Lai Siyao [ 20/Apr/19 ]

This looks to be the same issue which was fixed by https://review.whamcloud.com/#/c/33401/, can you apply this patch on all MDS's and try again?

Comment by Stephane Thiell [ 20/Apr/19 ]

Hi Lai,

We restarted the servers with the patch this morning and the problem is now gone. Thanks!

Comment by Peter Jones [ 20/Apr/19 ]

Nice! Thanks all. sthiell note that this fix is included in the upcoming 2.12.1

Comment by Stephane Thiell [ 20/Apr/19 ]

Peter, this patch (https://review.whamcloud.com/#/c/33401/ - LU-11418 llog: refresh remote llog upon -ESTALE) is already available in 2.12.0:

 

commit 71f409c9b31b90fa432f1f46ad4e612fb65c7fcc
Author: Lai Siyao <lai.siyao@intel.com>
Date:   Wed Oct 17 13:29:53 2018 +0800

    LU-11418 llog: refresh remote llog upon -ESTALE

But it's not included in 2.10.7 (that we're running on our Oak servers).

Comment by Peter Jones [ 21/Apr/19 ]

Stephane

You are correct. Yet another illustration as to why it is confusing to having multiple patches tracked under the same Jira ticket spanning release boundaries

Peter

Generated at Sat Feb 10 02:50:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.