[LU-1475] lov_update_create_set() error creating fid 0xf99b sub-object on OST idx 0/1: rc = -5 Created: 05/Jun/12 Updated: 29/May/17 Resolved: 29/May/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.3.0, Lustre 2.1.2, Lustre 2.1.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Jian Yu | Assignee: | Oleg Drokin |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre Tag: v2_1_2_RC2 |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 8533 | ||||||||
| Description |
|
While using mdsrate to create large number files to use up the free inodes on a 48TB OST on Juelich test cluster, the following EIO error occurred: + su mpiuser sh -c "/usr/lib64/openmpi/bin/mpirun -mca orte_rsh_agent rsh:ssh -np 25 -machinefile /tmp/large-lun.machines /usr/lib64/lustre/tests/mdsrate --create --verbose --ndirs 25 --dirfmt '/mnt/lustre/mdsrate/dir%d' --nfiles 1993129 --filefmt 'file%%d' " <~snip~> Rank 2: 71.24 creates/sec 140.38 secs (total: 260000 creates 3334.84 secs) Rank 21: 71.24 creates/sec 140.38 secs (total: 260000 creates 3334.84 secs) rank 4: open(file264701) error: Input/output error Dmesg on the MDS node showed that: LustreError: 4988:0:(lov_request.c:572:lov_update_create_set()) error creating fid 0xf99b sub-object on OST idx 0/1: rc = -5 Maloo report: The same issue also occurred on 1TB OST: https://maloo.whamcloud.com/test_sets/f900e0aa-aee8-11e1-b0f7-52540035b04c |
| Comments |
| Comment by Peter Jones [ 05/Jun/12 ] |
|
Oleg could you please look into this one? Thanks! |
| Comment by Jian Yu [ 05/Jun/12 ] |
|
From http://wiki.whamcloud.com/display/ENG/Lustre+2.1.1+release+testing+tracker, I found the large number inodes testing passed on Lustre 2.1.1 RC1 against 128TB OST on DDN SFA10KE storage appliance: The test was run on one single node at that time. Now, on Juelich cluster with separated Lustre Client, MDS and OSS nodes, I performed the same test on Lustre 2.1.1 RC4, the EIO error also occurred while testing against 1TB OST: |
| Comment by Oleg Drokin [ 05/Jun/12 ] |
|
from the logs MDS has a strong belief it got disconnected from OSTs, even though there are no messages supportign this state (evictions or whatever). |
| Comment by Oleg Drokin [ 05/Jun/12 ] |
|
I imagine a higher debug log might shed some light onto the situation, though I am not sure how high youcan reasonably go. Ideally if you increase debug_mb to 1024 and debug level to -1 that probably would be best. |
| Comment by Jian Yu [ 06/Jun/12 ] |
OK. I'm provisioning another MDS node n001... |
| Comment by Jian Yu [ 06/Jun/12 ] |
|
Both n001 and n002 did not work. Finally, I provisioned n003 and booted it up. The testing is ongoing now. |
| Comment by Jian Yu [ 06/Jun/12 ] |
|
Hi Oleg, |
| Comment by Oleg Drokin [ 07/Jun/12 ] |
|
Thanks! I think I see the problem now, introduced by WangDi's patch for bug 21379, change eb7c28ff977f4e0a280558aa74e23f2a9ab0ea0c in lov_check_and_create_object() we start with rc = -EIO for the early exit case of no available OSCs, + /* check if objects has been created on this ost */
+ for (stripe = 0; stripe < lsm->lsm_stripe_count; stripe++) {
+ /* already have object at this stripe */
+ if (ost_idx == lsm->lsm_oinfo[stripe]->loi_ost_idx)
+ break;
+ }
If we did find something (result of a racing create completion that we can clearly see in the logs), then the next if statement is not triggered: + /* check if objects has been created on this ost */
+ for (stripe = 0; stripe < lsm->lsm_stripe_count; stripe++) {
+ /* already have object at this stripe */
+ if (ost_idx == lsm->lsm_oinfo[stripe]->loi_ost_idx)
+ break;
+ }
And as such we exit via following RETURN(rc) with the rc still being initial -EIO. I think the fix should just be to either add "else rc = 0" to the last if statement, or to add rc = 0 before the last if statement. Wangdi, what do you think? |
| Comment by Jian Yu [ 07/Jun/12 ] |
|
Lustre Tag: v2_1_2_RC2 sanity test 220 failed with the same issue: https://maloo.whamcloud.com/test_sets/b0aeb296-b07d-11e1-99c6-52540035b04c I found this failure occurred before and was marked as |
| Comment by Di Wang [ 07/Jun/12 ] |
Yes, it should reset EIO to 0 if it found some objects on the stripe. |
| Comment by Oleg Drokin [ 07/Jun/12 ] |
|
patch at http://review.whamcloud.com/3058 |
| Comment by Sebastien Piechurski [ 06/Jul/12 ] |
|
I uploaded debug logs in What is the status of the patch ? I see there was a regression introduced with the current version, but can you see why ? |
| Comment by Oleg Drokin [ 25/Jul/12 ] |
|
Hm, I just dug some more int logs as part of trying to improve my original patch and came to a conclusion that my initial analysis was wrong (and the "fix" is wrong too). What really happens (in the log file I have at hand): sd 6:0:15:0: attempting task abort! scmd(ffff88060e198d80) sd 6:0:15:0: [sdo] CDB: Write(10): 2a 00 01 cb 62 c0 00 00 08 00 scsi target6:0:15: handle(0x001d), sas_address(0x5000c50040cf2bc1), phy(18) scsi target6:0:15: enclosure_logical_id(0x500304800137caff), slot(6) sd 6:0:15:0: task abort: SUCCESS scmd(ffff88060e198d80) ) As no more objects are created (see OST in trouble). Sebasten: The LU782 log excerpt you posted there has the same sign of fake request timing out (o-1), so likely something very similar is ongoing in that case. Anyway for this bug we need to figure out how come OST is unhappy. |
| Comment by Sebastien Piechurski [ 26/Jul/12 ] |
|
Oleg, Looking back to the logs on the OSS when the problem first appeared, I don't see any obvious problems. However, the problem was raised after a maintenance where a writeconf was done on all the targets, and remounted all in parallel (around 200 targets at once). |
| Comment by Jodi Levi (Inactive) [ 26/Jul/12 ] |
|
Yu Jian, |
| Comment by Jian Yu [ 01/Aug/12 ] |
|
Hi Oleg and Jodi,
I've not run mdsrate on b2_1 branch since Lustre 2.1.2 is released. I'll find out some Toro test nodes to see whether I can reproduce it on the latest b2_1 and master branches or not. However, I found |
| Comment by Oleg Drokin [ 06/Aug/12 ] |
|
So lu1454 was caused by the same poblem, but it's a faulty test and default behavior. What happens there is we have MDT that has just started and then we restart the (only) OST. For Bull reported failures the situation is less clear, on the other hand we had multiple complaints from other sites that registering many targets at the same time on MGS (many = more than 30 or so at a time) results in strange problems, mgs creashes and such. This warrants its own bug, I guess. |
| Comment by Jian Yu [ 13/Aug/12 ] |
|
RHEL6.3/x86_64 (2.1.3 Server + 1.8.8-wc1 Client): |
| Comment by Jian Yu [ 21/Aug/12 ] |
|
Another instance: |
| Comment by Jian Yu [ 15/Oct/12 ] |
|
RHEL6.3/x86_64 (2.3.0 RC3 Server + 1.8.8-wc1 Client): |
| Comment by Andreas Dilger [ 29/May/17 ] |
|
Close old ticket. |