[LU-5204] 2.6 DNE stress testing: EINVAL when attempting to delete file Created: 16/Jun/14 Updated: 18/Feb/15 Resolved: 06/Nov/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical |
| Reporter: | Patrick Farrell (Inactive) | Assignee: | Emoly Liu |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | dne2 | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 14529 | ||||||||
| Description |
|
After our stress testing this weekend, we are unable to delete some (perhaps any?) of the files on a particular OST (OST 38). All of them give EINVAL. For example: However, OST 38 (OST0027) is showing up in lctl dl, and as far as I know, there are no issues with it. (The dk logs on the OSS don't show any issues.) Here's the relevant part of the log from MDT000: I don't know for certain if this is related to DNE2 or not, but this is not an error I've seen before. The file system and objects are still around, so I can provide further data if needed. Any thoughts? |
| Comments |
| Comment by Patrick Farrell (Inactive) [ 16/Jun/14 ] |
|
dk logs (-1) from the client (also MDS008, serving mdt0007) and MDS1/MDT0, which gave the -EINVAL back to the client. There is also an lctl dl from another client showing OST38/OST0027 as available, and an lctl dl from MDS1/MDT0000 showing it as available as well. |
| Comment by Peter Jones [ 17/Jun/14 ] |
|
Di Could you please comment? Thanks Peter |
| Comment by Di Wang [ 17/Jun/14 ] |
|
According to the console log here, it seems OST38 is not being registered correctly on MDT0000. static int validate_lod_and_idx(struct lod_device *md, int idx)
{
if (unlikely(idx >= md->lod_ost_descs.ltd_tgts_size ||
!cfs_bitmap_check(md->lod_ost_bitmap, idx))) {
CERROR("%s: bad idx: %d of %d\n", lod2obd(md)->obd_name, idx,
md->lod_ost_descs.ltd_tgts_size);
return -EINVAL;
}
..........
}
Patrick, Could you please try lctl get_param lod.$your_fsname-MDT0000-mdtlov.target_obd and post the result here. Thanks. |
| Comment by Patrick Farrell (Inactive) [ 17/Jun/14 ] |
|
Looks like you're right, Di... Here's from MDS0: 0: esfprod-OST0000_UUID ACTIVE And here's from MDS008: Any troubleshooting tips for this? Should I just try stopping and starting the file system? (I believe that's been done, but we could do it again.) |
| Comment by Di Wang [ 17/Jun/14 ] |
|
yes, please. and if you can provide -1 level debug log on MDS0 (especially when you mount OST38), that would be great. Hmm, could you please tell me how you restart the FS. I mean the the restart order of each nodes? MDTs first, then OSTs, or mixed? Thanks. |
| Comment by Patrick Farrell (Inactive) [ 17/Jun/14 ] |
|
Di, I will do that as soon as I can get the system cleared. (The file system is connected to one of our development machines, so it has some actual users at the moment. I should be able to get them cleared out soon.) We usually do OSTs - > MDTs. With larger DNE systems, this has been problematic sometimes... So we have tried starting MDTs - > OSTs. It's not mixed - It's all of one kind, then it moves on to the other. Neither order has been 100% reliable, to be honest. Generally if I do one, then the other, I've been able to get the system to start. |
| Comment by Patrick Farrell (Inactive) [ 17/Jun/14 ] |
|
After restart (Order was OSTs - > MDTs), same problem on MDS0: 0: esfprod-OST0000_UUID ACTIVE Attaching start log with debug=-1 from MDS1 as requested... |
| Comment by Patrick Farrell (Inactive) [ 17/Jun/14 ] |
|
Start log of MDS1/MDT0 with OST not registering. |
| Comment by Patrick Farrell (Inactive) [ 17/Jun/14 ] |
|
Start logs of both mds001 and oss003, which is presenting the affected OST. |
| Comment by Di Wang [ 17/Jun/14 ] |
|
Usually we prefer MDT0 start first, then other targets, no matter MDTs or OSTs. Btw: does this FS have separate MGS? Unfortunately, the debug log does not include information I need, it seems debug_level(-1) is being set in a later time, instead of initial mount? And could you please dump the config log and post here, if you can not umount MGT or MDT0. you can do it on MGS like this [root@mds tests]# debugfs /dev/loop0 #your MGT device debugfs 1.42.3.wc3 (15-Aug-2012) debugfs: ls 2 (12) . 2 (12) .. 11 (20) lost+found 25001 (16) CONFIGS 25002 (12) O 25003 (28) REMOTE_PARENT_DIR 13 (16) OI_scrub 14 (16) oi.16.0 15 (16) oi.16.1 16 (16) oi.16.2 17 (16) oi.16.3 18 (16) oi.16.4 19 (16) oi.16.5 20 (16) oi.16.6 21 (16) oi.16.7 22 (16) oi.16.8 23 (16) oi.16.9 24 (16) oi.16.10 25 (16) oi.16.11 26 (16) oi.16.12 27 (16) oi.16.13 28 (16) oi.16.14 29 (16) oi.16.15 30 (16) oi.16.16 31 (16) oi.16.17 32 (16) oi.16.18 33 (16) oi.16.19 34 (16) oi.16.20 35 (16) oi.16.21 36 (16) oi.16.22 37 (16) oi.16.23 38 (16) oi.16.24 39 (16) oi.16.25 40 (16) oi.16.26 41 (16) oi.16.27 42 (16) oi.16.28 43 (16) oi.16.29 44 (16) oi.16.30 45 (16) oi.16.31 46 (16) oi.16.32 47 (16) oi.16.33 48 (16) oi.16.34 49 (16) oi.16.35 50 (16) oi.16.36 51 (16) oi.16.37 52 (16) oi.16.38 53 (16) oi.16.39 54 (16) oi.16.40 55 (16) oi.16.41 56 (16) oi.16.42 57 (16) oi.16.43 58 (16) oi.16.44 59 (16) oi.16.45 60 (16) oi.16.46 61 (16) oi.16.47 62 (16) oi.16.48 63 (16) oi.16.49 64 (16) oi.16.50 65 (16) oi.16.51 66 (16) oi.16.52 67 (16) oi.16.53 68 (16) oi.16.54 69 (16) oi.16.55 70 (16) oi.16.56 71 (16) oi.16.57 72 (16) oi.16.58 73 (16) oi.16.59 74 (16) oi.16.60 75 (16) oi.16.61 76 (16) oi.16.62 77 (16) oi.16.63 25026 (24) NIDTBL_VERSIONS 85 (12) fld 86 (16) seq_ctl 87 (16) seq_srv 88 (20) last_rcvd 50039 (20) quota_master 50042 (20) quota_slave 50043 (12) ROOT 75022 (16) PENDING 98 (28) changelog_catalog 99 (24) changelog_users 100 (20) hsm_actions 101 (24) lfsck_bookmark 102 (24) lfsck_namespace 103 (20) lfsck_layout 109 (20) SLAVE_LOG 116 (20) lov_objid 117 (20) lov_objseq 118 (2600) CATALOGS debugfs: ls CONFIGS 25001 (12) . 2 (12) .. 12 (20) mountdata 81 (24) params-client 82 (16) params 83 (24) lustre-client 84 (24) lustre-MDT0000 104 (24) lustre-MDT0001 105 (24) lustre-MDT0002 106 (24) lustre-MDT0003 107 (24) lustre-OST0000 108 (3868) lustre-OST0001 debugfs: dump_inode -p CONFIGS/lustre-MDT0000 /tmp/config.log debugfs: quite debugfs: Unknown request "quite". Type "?" for a request list. debugfs: q [root@mds tests]# ../utils/llog_reader /tmp/config.log Bit 48 of 69 not set Header size : 8192 Time : Mon Jun 16 18:11:52 2014 Number of records: 69 Target uuid : config_uuid ----------------------- ..................... |
| Comment by Di Wang [ 17/Jun/14 ] |
|
Btw: I do not think this is related with DNE2, probably some config log problem, but let's see after I get the config log. |
| Comment by Patrick Farrell (Inactive) [ 17/Jun/14 ] |
|
Weird about the debug logs. I modprobe'd Lustre, then set debug=-1, then started the fs, so the logs should've been taken before the targets were mounted... Anyway: Yes, the MGT and MDT0 are separate devices. We tend to do that so it's easier for backup/restore, etc. |
| Comment by Patrick Farrell (Inactive) [ 17/Jun/14 ] |
|
MDT0 config log as requested |
| Comment by Di Wang [ 17/Jun/14 ] |
|
Interesting, you can see OST0026 is skipped in the config log, that is why OST38 is not registered on MDT0000. ..... #532 (224)SKIP START marker 1284 (flags=0x05, v2.5.59.0) esfprod 'sys.timeout' Wed Jun 11 16:05:24 2014-Wed Jun 11 16:08:29 2014 #533 (080)SKIP set_timeout=300 #534 (224)SKIP END marker 1284 (flags=0x06, v2.5.59.0) esfprod 'sys.timeout' Wed Jun 11 16:05:24 2014-Wed Jun 11 16:08:29 2014 #535 (224)SKIP START marker 1288 (flags=0x05, v2.5.59.0) esfprod-OST0026 'add osc' Wed Jun 11 16:08:29 2014- #536 (088)SKIP add_uuid nid=10.151.10.11@o2ib8(0x500080a970a0b) 0: 1:10.151.10.11@o2ib8 #537 (088)SKIP add_uuid nid=10.151.10.11@o2ib8002(0x51f420a970a0b) 0: 1:10.151.10.11@o2ib8 #538 (144)SKIP attach 0:esfprod-OST0026-osc-MDT0000 1:osc 2:esfprod-MDT0000-mdtlov_UUID #539 (152)SKIP setup 0:esfprod-OST0026-osc-MDT0000 1:esfprod-OST0026_UUID 2:10.151.10.11@o2ib8 #540 (088)SKIP add_uuid nid=10.150.10.12@o2ib8(0x500080a960a0c) 0: 1:10.150.10.12@o2ib8 #541 (120)SKIP add_conn 0:esfprod-OST0026-osc-MDT0000 1:10.150.10.12@o2ib8 #542 (136)SKIP lov_modify_tgts add 0:esfprod-MDT0000-mdtlov 1:esfprod-OST0026_UUID 2:38 3:1 #543 (224)SKIP END marker 1288 (flags=0x06, v2.5.59.0) esfprod-OST0026 'add osc' Wed Jun 11 16:08:29 2014- #544 (224)SKIP START marker 1335 (flags=0x05, v2.5.59.0) esfprod 'sys.timeout' Wed Jun 11 16:08:29 2014-Wed Jun 11 16:08:32 2014 #545 (080)SKIP set_timeout=300 #547 (224)SKIP END marker 1335 (flags=0x06, v2.5.59.0) esfprod 'sys.timeout' Wed Jun 11 16:08:29 2014-Wed Jun 11 16:08:32 2014 ...... Not sure how this happen? Did you ever tweak config log with tunefs or set_param? |
| Comment by Patrick Farrell (Inactive) [ 17/Jun/14 ] |
|
No, definitely not. We did a stress run of 2.6 with DNE2 (2.6 clients as well), and when it was over and the system had been rebooted, we were in this state where some of the files created during that stress run could not be deleted. We didn't deliberately touch the config at any point in there. |
| Comment by Di Wang [ 17/Jun/14 ] |
|
Patrick, was this FS reformatted before this test? Btw you can always erase the config log by tunefs --writeconf and remount the FS to fix this config log issue. But we still need to understand the issue here. |
| Comment by Di Wang [ 17/Jun/14 ] |
|
Patrick, please also provide mkfs.lustre command line you use to create the filesystem. I checked the master code and did not find any issue there. |
| Comment by Patrick Farrell (Inactive) [ 18/Jun/14 ] |
|
Di, Yes, now that we know it's a config log issue, I figured we could fix it with a writeconf operation.. But like you said, we'd like to understand the issue. It was not reformatted before the test. It WAS upgraded from 2.5, which required a writeconf operation at that time to get it to start. For the mkfs.lustre command for the MDT (I don't have the device name, but these are the options that were used): For the MGT: For one of the OSTs: |
| Comment by Peter Jones [ 18/Jun/14 ] |
|
Emoly Could you please try reproducing this issue? Thanks Peter |
| Comment by Emoly Liu [ 19/Jun/14 ] |
|
Patrick, I will try to upgrade a lustre file system from 2.5.1 to 2.6 to reproduce this problem. Could you please suggest how many OSTs and MDTs are enough for this test? What's more, I know MGS and MDS should be separated in this test, and anything else I should pay attention to? Thanks. |
| Comment by Patrick Farrell (Inactive) [ 19/Jun/14 ] |
|
Emoly, Unfortunately, I don't really know how many is enough. We have 8 MDSes and 8 MDTs, and 4 OSSes and 40 OSTs. It's a test bed system for DNE, which is why it's such a weird configuration. We do have separate MGT and MDT. As far as other things: all I know about what we did is we ran a bunch of different IO tests, like IOR and a large number of tests from the Linux test project in various configurations, all with mkdir replaced by a script which would randomly create striped or remote directories. It would also sometimes create normal directories. We did that last weekend, and had this problem on Monday. No idea what was running when it started. Sorry for not having many specifics on testing, it's a large test suite. We're probably going to fix the system soon by doing a writeconf, so we can continue stress testing DNE2. Let me know if there's anything else I can give you first. |
| Comment by Andreas Dilger [ 19/Jun/14 ] |
|
Is it possible that OST0026 was ever deactivated during testing (e.g. lctl conf_param esfprod-OST0026.osc.active=0 or similar)? That would permanently disable the OST in the config log and seems to me to be the most likely cause of this problem. |
| Comment by Patrick Farrell (Inactive) [ 19/Jun/14 ] |
|
Andreas, It's really unlikely. No one should have been mucking with the system. I can't say it's impossible, but... Now that we've tracked it down to such a strange error, I'm planning to go ahead and fix it, and not worry unless it occurs again in further stress testing. In fact, I'm going to do exactly that unless someone has further information they'd like from the system. (Speak up soon - I'm going to fix it for our stress testing slot tonight.) I've also (in further testing) hit an MDS0 crash bug that could possibly be related to this one I'm going to open shortly. I'll reference that LU here once I've got it open. |
| Comment by Patrick Farrell (Inactive) [ 19/Jun/14 ] |
|
Opened |
| Comment by Andreas Dilger [ 20/Jun/14 ] |
|
The one obvious problem that I see is that it should ALWAYS be possible to delete a file, even if the OST is unavailable, or configured out of the system. Regardless of what the root cause of the problem is, there needs to be a patch to allow the file to be deleted. |
| Comment by Emoly Liu [ 24/Oct/14 ] |
|
Sorry for my late update. I can't reproduce this issue in my testing environment. |
| Comment by Di Wang [ 24/Oct/14 ] |
|
Since we can not reproduce the problem locally, I can not figure out why the config log is "corrupted". If it happens again in DNE testing, please remember what's the step to reproduce it. We will probably have more ideas. |
| Comment by Andreas Dilger [ 06/Nov/14 ] |
|
Unable to figure out what the problem is, please reopen if it is hit again. |