[LU-3231] fld_cache_lookup() copies fld_cache_entry onto lu_seq_range Created: 25/Apr/13 Updated: 26/Mar/14 Resolved: 29/Apr/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | John Hammond | Assignee: | John Hammond |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | LB, fid | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 7896 | ||||||||
| Description |
|
fld_cache_lookup() keeps a prev pointer to a struct fld_cache_entry. But when it uses this pointer to return the prev range, it copies the entry onto the range argument and not the entry's fce_range member. int fld_cache_lookup(struct fld_cache *cache,
const seqno_t seq, struct lu_seq_range *range)
{
struct fld_cache_entry *flde;
struct fld_cache_entry *prev = NULL;
cfs_list_t *head;
ENTRY;
...
cfs_list_for_each_entry(flde, head, fce_list) {
if (flde->fce_range.lsr_start > seq) {
if (prev != NULL)
memcpy(range, prev, sizeof(*range));
break;
}
...
}
|
| Comments |
| Comment by John Hammond [ 25/Apr/13 ] |
|
Please see http://review.whamcloud.com/6171. |
| Comment by Andreas Dilger [ 26/Apr/13 ] |
|
Di, John, how serious is this bug? Is this crashing, data corrupting, etc? Affecting normal usage, or only DNE? Please describe severity of problem, and mark it a blocker if so. |
| Comment by John Hammond [ 26/Apr/13 ] |
|
Sorry, I spent too much time trying to understand what was going on. DNE blocker. I cannot see any evidence that it affects non-DNE setups. On the current master (2.3.64-43-g507dc87) I'm using MDSCOUNT=2 MOUNT_2=y llmount.sh to setup. About every 1 in 5 times this results in a unusable client mount: # DURATION=10 sh ./lustre/tests/racer.sh Logging to shared log directory: /tmp/test_logs/1366998603 racer: /root/lustre-release/lustre/tests/racer/racer.sh with 2 MDTs excepting tests: m: Checking config lustre mounted on /mnt/lustre2 Checking servers environments Checking clients m environments m: Checking config lustre mounted on /mnt/lustre Checking servers environments Checking clients m environments Using TIMEOUT=20 disable quota as required RACERDIRS=/mnt/lustre /mnt/lustre2 == racer test 1: racer on clients: m DURATION=10 == 12:50:04 (1366998604) racers pids: 18395 18396 18397 18398 ./dir_remote.sh: line 13: /mnt/lustre/racer/5/3: Input/output error ./dir_remote.sh: line 13: /mnt/lustre/racer/0/2: Input/output error ./dir_remote.sh: line 13: /mnt/lustre/racer/5/18: Input/output error ./dir_remote.sh: line 13: /mnt/lustre2/racer/16/5: Input/output error ./dir_remote.sh: line 13: /mnt/lustre2/racer1/7/3: No such file or directory ./dir_remote.sh: line 13: /mnt/lustre2/racer/15/19: Input/output error ./dir_remote.sh: line 13: /mnt/lustre2/racer/14/3: Input/output error ./dir_remote.sh: line 13: /mnt/lustre2/racer1/2/9: No such file or directory ./dir_remote.sh: line 13: /mnt/lustre/racer1/4/7: No such file or directory ./dir_remote.sh: line 13: /mnt/lustre/racer/12/9: Input/output error ./dir_remote.sh: line 13: /mnt/lustre2/racer/10/9: Input/output error ./dir_remote.sh: line 13: /mnt/lustre2/racer/3/16: Input/output error ./dir_remote.sh: line 13: /mnt/lustre2/racer1/3/8: Not a directory ./dir_remote.sh: line 13: /mnt/lustre2/racer/3/16: Input/output error ./dir_remote.sh: line 13: /mnt/lustre2/racer1/3/19: Not a directory ./dir_remote.sh: line 13: /mnt/lustre/racer1/2/5: Not a directory ... Note the requested flag "ffff8801" below is really part of a list_head from the entry: Lustre: DEBUG MARKER: == racer test 1: racer on clients: m DURATION=10 == 12:50:04 (1366998604) Lustre: ctl-lustre-MDT0000: super-sequence allocation rc = 0 [0x0000000280000400-0x00000002c0000400):0:mdt Lustre: cli-ctl-lustre-MDT0000-osp-MDT0001: Allocated super-sequence [0x00000002c0000400-0x0000000300000400):1:mdt] LustreError: 15736:0:(fld_handler.c:158:fld_server_lookup()) srv-lustre-MDT0000: FLD cache range [0x0000000280000400-0x00000002c0000400):0:mdt does not matchrequested flag ffff8801: rc = -5 LustreError: 18811:0:(lmv_fld.c:78:lmv_fld_lookup()) Error while looking for mds number. Seq 0x280000400, err = -5 LustreError: 15736:0:(fld_handler.c:158:fld_server_lookup()) srv-lustre-MDT0000: FLD cache range [0x0000000280000400-0x00000002c0000400):0:mdt does not matchrequested flag ffff8801: rc = -5 LustreError: 18788:0:(lmv_fld.c:78:lmv_fld_lookup()) Error while looking for mds number. Seq 0x280000400, err = -5 LustreError: 15715:0:(mdt_reint.c:332:mdt_md_create()) lustre-MDT0001: remote dir is only permitted on MDT0 or set_param mdt.*.enable_remote_dir=1 LustreError: 15714:0:(mdt_reint.c:332:mdt_md_create()) lustre-MDT0001: remote dir is only permitted on MDT0 or set_param mdt.*.enable_remote_dir=1 LustreError: 19275:0:(lmv_fld.c:78:lmv_fld_lookup()) Error while looking for mds number. Seq 0x280000400, err = -5 LustreError: 19275:0:(lmv_fld.c:78:lmv_fld_lookup()) Skipped 284 previous similar messages |
| Comment by Di Wang [ 26/Apr/13 ] |
|
I think it will affect non-DNE setup too, when it uses up current meta-sequence(128k seqs), then tries to get new seq. This bug should be hit during the sequence range merging in FLD, IMHO. Thanks again, John. |
| Comment by Peter Jones [ 29/Apr/13 ] |
|
Landed for 2.4 |