[LU-16806] mkdir: cannot create directory ‘test5’: Object is remote Created: 09/May/23 Updated: 02/Feb/24 Resolved: 05/Jul/23 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.14.0, Lustre 2.15.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Gian-Carlo Defazio | Assignee: | Lai Siyao |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | llnl | ||
| Environment: |
server (garter): client (mutt): |
||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
Attempts to create directories are intermittently failing when using "mkdir". So far, "lfs mkdir" succeeds every time, so does file creation using "touch".
|
| Comments |
| Comment by Gian-Carlo Defazio [ 09/May/23 ] |
|
For my notes, local issue is TOSS-6002 |
| Comment by Gian-Carlo Defazio [ 09/May/23 ] |
|
the llnl lustre versions are at https://github.com/LLNL/lustre/tree/2.14.0_19.llnl https://github.com/LLNL/lustre/tree/2.15.2_5.llnl
|
| Comment by Gian-Carlo Defazio [ 09/May/23 ] |
|
Here's an example of a few failures followed by a successful create
(mutt12):mdts_3_osts_3$ pwd /p/lflood/defazio1/mdtest/mdts_3_osts_3 (mutt12):mdts_3_osts_3$ mkdir junk20 mkdir: cannot create directory ‘junk20’: Object is remote (mutt12):mdts_3_osts_3(1)$ mkdir junk20 mkdir: cannot create directory ‘junk20’: Object is remote (mutt12):mdts_3_osts_3(1)$ mkdir junk20 mkdir: cannot create directory ‘junk20’: Object is remote (mutt12):mdts_3_osts_3(1)$ mkdir junk20 (mutt12):mdts_3_osts_3$ ls -ld junk20 drwx------ 2 defazio1 defazio1 26624 May 8 17:53 junk20
The parent directory is /p/lflood/defazio1/mdtest/mdts_3_osts_3 (mutt12):mdts_3_osts_3$ stat . File: . Size: 26624 Blocks: 52 IO Block: 131072 directory Device: 11602e98h/291516056d Inode: 198161923999531372 Links: 16 Access: (0700/drwx------) Uid: (28153/defazio1) Gid: (28153/defazio1) Access: 2023-05-08 16:51:39.000000000 -0700 Modify: 2023-05-08 17:53:11.000000000 -0700 Change: 2023-05-08 17:53:11.000000000 -0700 Birth: 2023-05-08 11:43:54.000000000 -0700 it was created with the commands
lfs setdirstripe --mdt-index 3 /p/lflood/defazio1/mdtest/mdts_3_osts_3 lfs setstripe --ost 3 /p/lflood/defazio1/mdtest/mdts_3_osts_3 chown defazio1:defazio1 /p/lflood/defazio1/mdtest/mdts_3_osts_3
(mutt12):mdts_3_osts_3(4)$ lfs getdirstripe . lmv_stripe_count: 0 lmv_stripe_offset: 3 lmv_hash_type: none (mutt12):mdts_3_osts_3$ lfs getstripe -d . stripe_count: 1 stripe_size: 1048576 pattern: raid0 stripe_offset: 3
|
| Comment by Gian-Carlo Defazio [ 09/May/23 ] |
|
In this case some directories are created on the first try, but most fail (mutt12):mdts_3_osts_3$ for x in {1..20}; do mkdir test_name$x; done
mkdir: cannot create directory ‘test_name1’: Object is remote
mkdir: cannot create directory ‘test_name2’: Object is remote
mkdir: cannot create directory ‘test_name3’: Object is remote
mkdir: cannot create directory ‘test_name4’: Object is remote
mkdir: cannot create directory ‘test_name5’: Object is remote
mkdir: cannot create directory ‘test_name6’: Object is remote
mkdir: cannot create directory ‘test_name8’: Object is remote
mkdir: cannot create directory ‘test_name9’: Object is remote
mkdir: cannot create directory ‘test_name10’: Object is remote
mkdir: cannot create directory ‘test_name11’: Object is remote
mkdir: cannot create directory ‘test_name12’: Object is remote
mkdir: cannot create directory ‘test_name13’: Object is remote
mkdir: cannot create directory ‘test_name14’: Object is remote
mkdir: cannot create directory ‘test_name15’: Object is remote
mkdir: cannot create directory ‘test_name16’: Object is remote
mkdir: cannot create directory ‘test_name18’: Object is remote
mkdir: cannot create directory ‘test_name19’: Object is remote
mkdir: cannot create directory ‘test_name20’: Object is remote
(mutt12):mdts_3_osts_3(1)$ for x in {1..20}; do mkdir test_name$x; done
mkdir: cannot create directory ‘test_name1’: Object is remote
mkdir: cannot create directory ‘test_name2’: Object is remote
mkdir: cannot create directory ‘test_name3’: Object is remote
mkdir: cannot create directory ‘test_name4’: Object is remote
mkdir: cannot create directory ‘test_name5’: Object is remote
mkdir: cannot create directory ‘test_name7’: File exists
mkdir: cannot create directory ‘test_name8’: Object is remote
mkdir: cannot create directory ‘test_name9’: Object is remote
mkdir: cannot create directory ‘test_name10’: Object is remote
mkdir: cannot create directory ‘test_name12’: Object is remote
mkdir: cannot create directory ‘test_name13’: Object is remote
mkdir: cannot create directory ‘test_name14’: Object is remote
mkdir: cannot create directory ‘test_name16’: Object is remote
mkdir: cannot create directory ‘test_name17’: File exists
mkdir: cannot create directory ‘test_name18’: Object is remote
mkdir: cannot create directory ‘test_name19’: Object is remote
mkdir: cannot create directory ‘test_name20’: Object is remote
The directories that are created inherit the striping (this was done after more mkdir attempts, so more directories exist than the 2 indicated above) (mutt12):mdts_3_osts_3$ lfs getdirstripe test_name* lmv_stripe_count: 0 lmv_stripe_offset: 3 lmv_hash_type: none lmv_stripe_count: 0 lmv_stripe_offset: 3 lmv_hash_type: none lmv_stripe_count: 0 lmv_stripe_offset: 3 lmv_hash_type: none lmv_stripe_count: 0 lmv_stripe_offset: 3 lmv_hash_type: none lmv_stripe_count: 0 lmv_stripe_offset: 3 lmv_hash_type: none lmv_stripe_count: 0 lmv_stripe_offset: 3 lmv_hash_type: none lmv_stripe_count: 0 lmv_stripe_offset: 3 lmv_hash_type: none lmv_stripe_count: 0 lmv_stripe_offset: 3 lmv_hash_type: none lmv_stripe_count: 0 lmv_stripe_offset: 3 lmv_hash_type: none
However, using lfs mkdir seems to work fine (mutt12):mdts_3_osts_3$ for x in {1..20}; do lfs mkdir -i $(($x % 4)) lfs_name$x; done
(mutt12):mdts_3_osts_3$ ls lfs_name*
lfs_name1:
lfs_name10:
lfs_name11:
lfs_name12:
lfs_name13:
lfs_name14:
lfs_name15:
lfs_name16:
lfs_name17:
lfs_name18:
lfs_name19:
lfs_name2:
lfs_name20:
lfs_name3:
lfs_name4:
lfs_name5:
lfs_name6:
lfs_name7:
lfs_name8:
lfs_name9:
|
| Comment by Gian-Carlo Defazio [ 09/May/23 ] |
|
I've uploaded debug logs for the attempted creation of the directory junk8 debug log which contains failed attempt for junk8 is It then succeeded (after failing about 10 times) in |
| Comment by Peter Jones [ 09/May/23 ] |
|
Serguei Can you please advise? Thanks Peter |
| Comment by Serguei Smirnov [ 09/May/23 ] |
|
It looks like this one needs attention of experts in higher-level Lustre. For example, is |
| Comment by Olaf Faaland [ 11/May/23 ] |
|
Hi Serguei or Peter, Can you ask an appropriate Whamcloud-er to advise? The two test clusters involved will be taken away from us for other testing on Tuesday or Wednesday of next week. thanks |
| Comment by Lai Siyao [ 12/May/23 ] |
|
Could you collect MDS debug logs upon failure? I'm afraid some patch for backward compatibility is missing, maybe it's on server side. |
| Comment by Gian-Carlo Defazio [ 17/May/23 ] |
|
I've added mkdir-attempt-client-and-mds.tar.gz which has debug logs for the client and mds's with debug=-1. |
| Comment by Gerrit Updater [ 18/May/23 ] |
|
"Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51045 |
| Comment by Lai Siyao [ 18/May/23 ] |
|
This is caused by a bug in 2.14 code, but it's not an issue in the following release because related code has been removed. |
| Comment by Peter Jones [ 18/May/23 ] |
|
So LLNL's exposure to this issue will disappear once they upgrade to 2.15.x? |
| Comment by Lai Siyao [ 19/May/23 ] |
|
Yes, upgrading to 2.15.x can get around this issue since related code has been removed. |
| Comment by Gian-Carlo Defazio [ 24/May/23 ] |
|
The patch fixed the issue on 2.14. I'm a bit confused by what the patch is doing. Is the idea that when (lds->lds_dir_def_striping_set == 0) that means that the parent directory doesn't have default striping set for its sub-directories, so when the sub-directory is created it should instead inherit from the root of the whole filesystem?
|
| Comment by Olaf Faaland [ 24/May/23 ] |
|
Hi Lai, Please also confirm whether you believe 2.14 servers with this patch will work properly with both Lustre 2.12 and 2.15 clients. As we work to switch to Lustre 2.15, we will have Clients: Lustre 2.12, 2.15 thanks |
| Comment by Lai Siyao [ 25/May/23 ] |
|
Gian, these code is to check whether client sends mkdir request to wrong MDT. If parent has default LMV (2.14 code doesn't check this), and this default LMV is not space balanced, it's not allowed to mkdir on remote MDT.
if (hint->dah_parent &&
dt_object_remote(hint->dah_parent) && lds &&
lds->lds_dir_def_striping_set &&
lds->lds_dir_def_stripe_offset !=
LMV_OFFSET_DEFAULT)
GOTO(out, rc = -EREMOTE);
As for your question, yes, if parent doesn't have default LMV, the filesystem default LMV will be applied. |
| Comment by Lai Siyao [ 25/May/23 ] |
|
Olaf, yes, it works with both 2.12 and 2.15 clients. |
| Comment by Peter Jones [ 03/Jul/23 ] |
|
Is there any further work outstanding here? AFAIK the b2_14 fix has been proven to work and this issue does not exist on more current releases, so I would think that we can close out the ticket... |
| Comment by Gian-Carlo Defazio [ 05/Jul/23 ] |
|
This issue is fixed for us on 2.14 and has been pulled into our local branch. |
| Comment by Gian-Carlo Defazio [ 05/Jul/23 ] |
|
reopening to remove topllnl |
| Comment by Gian-Carlo Defazio [ 05/Jul/23 ] |
|
reclosing after removing topllnl |