[LU-9766] DNE phase 2 - wrong directory inheritance Created: 12/Jul/17 Updated: 09/Nov/21 Resolved: 09/Nov/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0, Lustre 2.9.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Jean-Baptiste Riaux (Inactive) | Assignee: | Jean-Baptiste Riaux (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | cea | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
The inheritance of directory striping with "lfs setdirstripe" is not working as it should: Setting a directory with default mdt striping to 2:
[root@vm4 test]# lfs setdirstripe -D -c 2 2-stripes
Creating directories (which should inherits from parent)
[root@vm4 test]# mkdir 2-stripes/foo.{0..9}
Some are correct but not all of them:
[root@vm4 test]# lfs getdirstripe 2-stripes/
2-stripes/
lmv_stripe_count: 2 lmv_stripe_offset: 0
mdtidx FID[seq:oid:ver]
0 [0x300000401:0x60:0x0]
1 [0x340000402:0x1:0x0]
2-stripes//foo.2
lmv_stripe_count: 2 lmv_stripe_offset: 1
mdtidx FID[seq:oid:ver]
1 [0x340000401:0xb4:0x0]
0 [0x300000402:0x2:0x0]
2-stripes//foo.6
lmv_stripe_count: 2 lmv_stripe_offset: 1
mdtidx FID[seq:oid:ver]
1 [0x340000401:0xb6:0x0]
0 [0x300000402:0x4:0x0]
2-stripes//foo.5
lmv_stripe_count: 1 lmv_stripe_offset: 0
mdtidx FID[seq:oid:ver]
0 [0x300000401:0x63:0x0]
2-stripes//foo.9
lmv_stripe_count: 1 lmv_stripe_offset: 0
mdtidx FID[seq:oid:ver]
0 [0x300000401:0x65:0x0]
2-stripes//foo.0
lmv_stripe_count: 2 lmv_stripe_offset: 1
mdtidx FID[seq:oid:ver]
1 [0x340000401:0xb3:0x0]
0 [0x300000402:0x1:0x0]
2-stripes//foo.1
lmv_stripe_count: 1 lmv_stripe_offset: 0
mdtidx FID[seq:oid:ver]
0 [0x300000401:0x61:0x0]
2-stripes//foo.8
lmv_stripe_count: 2 lmv_stripe_offset: 1
mdtidx FID[seq:oid:ver]
1 [0x340000401:0xb7:0x0]
0 [0x300000402:0x5:0x0]
2-stripes//foo.7
lmv_stripe_count: 1 lmv_stripe_offset: 0
mdtidx FID[seq:oid:ver]
0 [0x300000401:0x64:0x0]
2-stripes//foo.3
lmv_stripe_count: 1 lmv_stripe_offset: 0
mdtidx FID[seq:oid:ver]
0 [0x300000401:0x62:0x0]
2-stripes//foo.4
lmv_stripe_count: 2 lmv_stripe_offset: 1
mdtidx FID[seq:oid:ver]
1 [0x340000401:0xb5:0x0]
0 [0x300000402:0x3:0x0]
On MDS, in logs I can see that lod_cache_parent_striping does not return the defined striping all the time but the default filesystem striping: 57168 00000004:00000001:1.0:1499850278.062218:0:8981:0:(lod_object.c:3008:lod_cache_parent_lmv_striping()) Process leaving 57169 00000004:00000001:1.0:1499850278.062219:0:8981:0:(lod_object.c:3053:lod_cache_parent_striping()) Process leaving (rc=0 : 0 : 0) 57170 00000004:00000040:1.0:1499850278.062220:0:8981:0:(lod_object.c:3155:lod_ah_init()) inherit default EA nr:1 off:-1 t2 57171 00000004:00000040:1.0:1499850278.062220:0:8981:0:(lod_object.c:3187:lod_ah_init()) inherit EA nr:1 off:-1 57172 00000004:00000040:1.0:1499850278.062221:0:8981:0:(lod_object.c:3195:lod_ah_init()) final striping count:1, offset:-1 57173 00000004:00000001:1.0:1499850278.062221:0:8981:0:(lod_object.c:3246:lod_ah_init()) Process leaving 581753 00000004:00000001:1.0:1499850525.180299:0:9133:0:(lod_object.c:3053:lod_cache_parent_striping()) Process leaving (rc=0 : 0 : 0) 581754 00000004:00000040:1.0:1499850525.180299:0:9133:0:(lod_object.c:3155:lod_ah_init()) inherit default EA nr:1 off:-1 t2 581755 00000004:00000040:1.0:1499850525.180300:0:9133:0:(lod_object.c:3175:lod_ah_init()) set stripe EA nr:2 off:0 581756 00000004:00000040:1.0:1499850525.180300:0:9133:0:(lod_object.c:3195:lod_ah_init()) final striping count:2, offset:0 581757 00000004:00000001:1.0:1499850525.180301:0:9133:0:(lod_object.c:3246:lod_ah_init()) Process leaving This is a problem as when the stripe count is incorrect, the assigned resulting MDT is 0, so the MDT0 fills up faster than other MDTs. Also "lfs mkdir -i 1" does not work, it creates a directory with a stripe count of 0 and one mdt index. A workaround is to do an "lfs setdirstripe -D -c 1" on the parent directory then create directories with mkdir. When creating directories where default striping was specified, I have sometimes timeouts in 2.7 and panics on clients in 2.9 2.7: [root@vm4]# mkdir 1-stripe-1/foo.0/foo.{0..9}
mkdir: cannot create directory `1-stripe-1/foo.0/foo.0': Input/output error
mkdir: cannot create directory `1-stripe-1/foo.0/foo.1': Cannot send after transport endpoint shutdown
mkdir: cannot create directory `1-stripe-1/foo.0/foo.2': Cannot send after transport endpoint shutdown
mkdir: cannot create directory `1-stripe-1/foo.0/foo.3': Cannot send after transport endpoint shutdown
mkdir: cannot create directory `1-stripe-1/foo.0/foo.4': Cannot send after transport endpoint shutdown
mkdir: cannot create directory `1-stripe-1/foo.0/foo.5': Cannot send after transport endpoint shutdown
mkdir: cannot create directory `1-stripe-1/foo.0/foo.6': Cannot send after transport endpoint shutdown
mkdir: cannot create directory `1-stripe-1/foo.0/foo.7': Cannot send after transport endpoint shutdown
mkdir: cannot create directory `1-stripe-1/foo.0/foo.8': Cannot send after transport endpoint shutdown
mkdir: cannot create directory `1-stripe-1/foo.0/foo.9': Cannot send after transport endpoint shutdown
[root@vm4]# mkdir 1-stripe-1/foo.0/foo.{0..9}
mkdir: cannot create directory `1-stripe-1/foo.0/foo.0': Input/output error
2.9: crash> bt 2135
PID: 2135 TASK: ffff880035860000 CPU: 1 COMMAND: "mkdir"
#0 [ffff880016c4b670] machine_kexec at ffffffff81059cdb
#1 [ffff880016c4b6d0] __crash_kexec at ffffffff81105182
#2 [ffff880016c4b7a0] crash_kexec at ffffffff81105270
#3 [ffff880016c4b7b8] oops_end at ffffffff8168efc8
#4 [ffff880016c4b7e0] no_context at ffffffff8167ebd3
#5 [ffff880016c4b830] __bad_area_nosemaphore at ffffffff8167ec69
#6 [ffff880016c4b878] bad_area at ffffffff8167ef8d
#7 [ffff880016c4b8a0] __do_page_fault at ffffffff81691e5f
#8 [ffff880016c4b900] do_page_fault at ffffffff81691f05
#9 [ffff880016c4b930] page_fault at ffffffff8168e1c8
[exception RIP: memcpy+22]
RIP: ffffffff813269a6 RSP: ffff880016c4b9e0 RFLAGS: 00010283
RAX: ffff8800395fb4c0 RBX: ffff880016c4baf8 RCX: ffff880016c4bfd8
RDX: ffffffffffffffe5 RSI: 0000000000000000 RDI: ffff8800395fb4c0
RBP: ffff880016c4bab8 R8: 0000000000019a80 R9: 0000000000000000
R10: ffff8800395fb4c0 R11: 0000000000aaaaaa R12: 0000000000000025
R13: ffff880016c4bae8 R14: ffff8800358789a0 R15: 0000000000000025
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#10 [ffff880016c4b9e0] ll_lookup_it_finish at ffffffffa0ab5715 [lustre]
#11 [ffff880016c4bac0] ll_lookup_it at ffffffffa0ab70ae [lustre]
#12 [ffff880016c4bb78] ll_lookup_nd at ffffffffa0ab89dd [lustre]
#13 [ffff880016c4bc10] lookup_real at ffffffff812083dd
#14 [ffff880016c4bc30] __lookup_hash at ffffffff81208d52
#15 [ffff880016c4bc60] lookup_slow at ffffffff816833cb
#16 [ffff880016c4bc98] link_path_walk at ffffffff8120b96f
#17 [ffff880016c4bd48] path_lookupat at ffffffff8120bb6b
#18 [ffff880016c4bde0] filename_lookup at ffffffff8120c2cb
#19 [ffff880016c4be18] filename_create at ffffffff8120c3a2
#20 [ffff880016c4bee8] user_path_create at ffffffff8120eee1
#21 [ffff880016c4bf18] sys_mkdirat at ffffffff812101f6
#22 [ffff880016c4bf70] sys_mkdir at ffffffff812102a9
#23 [ffff880016c4bf80] system_call_fastpath at ffffffff81696709
RIP: 00007f9ddb6d29a7 RSP: 00007ffcb2ec5690 RFLAGS: 00010246
RAX: 0000000000000053 RBX: ffffffff81696709 RCX: 00007ffcb2ec57f0
RDX: 00000000000001ff RSI: 00000000000001ff RDI: 00007ffcb2ec9790
RBP: 00007ffcb2ec87d0 R8: 00000000000001ff R9: 00000000004029f0
R10: 000000000000000b R11: 0000000000000206 R12: ffffffff812102a9
R13: ffff880016c4bf78 R14: 00000000000001ff R15: 00007ffcb2ec8820
ORIG_RAX: 0000000000000053 CS: 0033 SS: 002b
|
| Comments |
| Comment by Peter Jones [ 06/Oct/17 ] |
|
Di/Lai Do you have any advice here? Peter |
| Comment by Di Wang [ 06/Oct/17 ] |
|
Do you still have the debug log? It seems there are some communication issue between MDTs, that is why it will only create stripe on MDT0. According to the debug log you post, the parent's default stripe count is 1, 57168 00000004:00000001:1.0:1499850278.062218:0:8981:0:(lod_object.c:3008:lod_cache_parent_lmv_striping()) Process leaving 57169 00000004:00000001:1.0:1499850278.062219:0:8981:0:(lod_object.c:3053:lod_cache_parent_striping()) Process leaving (rc=0 : 0 : 0) 57170 00000004:00000040:1.0:1499850278.062220:0:8981:0:(lod_object.c:3155:lod_ah_init()) inherit default EA nr:1 off:-1 t2 57171 00000004:00000040:1.0:1499850278.062220:0:8981:0:(lod_object.c:3187:lod_ah_init()) inherit EA nr:1 off:-1 57172 00000004:00000040:1.0:1499850278.062221:0:8981:0:(lod_object.c:3195:lod_ah_init()) final striping count:1, offset:-1 57173 00000004:00000001:1.0:1499850278.062221:0:8981:0:(lod_object.c:3246:lod_ah_init()) Process leaving So the child inherits stripe count correctly. Though the bottom half 581753 00000004:00000001:1.0:1499850525.180299:0:9133:0:(lod_object.c:3053:lod_cache_parent_striping()) Process leaving (rc=0 : 0 : 0) 581754 00000004:00000040:1.0:1499850525.180299:0:9133:0:(lod_object.c:3155:lod_ah_init()) inherit default EA nr:1 off:-1 t2 581755 00000004:00000040:1.0:1499850525.180300:0:9133:0:(lod_object.c:3175:lod_ah_init()) set stripe EA nr:2 off:0 581756 00000004:00000040:1.0:1499850525.180300:0:9133:0:(lod_object.c:3195:lod_ah_init()) final striping count:2, offset:0 581757 00000004:00000001:1.0:1499850525.180301:0:9133:0:(lod_object.c:3246:lod_ah_init(..... The child seems created by "setdirstripe -c2", so this will override the default stripe, then create the directory with 2 stripes. [root@vm4]# mkdir 1-stripe-1/foo.0/foo.{0..9}
mkdir: cannot create directory `1-stripe-1/foo.0/foo.0': Input/output error
mkdir: cannot create directory `1-stripe-1/foo.0/foo.1': Cannot send after transport endpoint shutdown
mkdir: cannot create directory `1-stripe-1/foo.0/foo.2': Cannot send after transport en...
These failures also suggests there are some communication issues between MDTs. |
| Comment by Jean-Baptiste Riaux (Inactive) [ 02/Nov/17 ] |
|
Thanks for the inputs. All MDTs were on the same MDS node and the network failures looked to be more a consequence of the test, not the cause. |
| Comment by Andreas Dilger [ 09/Nov/21 ] |
|
Tested this is working properly in (at least) 2.14.0 and later. |