[LU-4554] OI scrub always runs on ldiskfs MDS start up Created: 28/Jan/14 Updated: 23/Apr/14 Resolved: 11/Feb/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.1 |
| Fix Version/s: | Lustre 2.6.0, Lustre 2.5.1 |
| Type: | Bug | Priority: | Major |
| Reporter: | Christopher Morrone | Assignee: | James Nunez (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | mn4 | ||
| Severity: | 3 |
| Rank (Obsolete): | 12442 |
| Description |
|
We are running Lustre 2.4.0-21chaos (see http://github/com/chaos/lustre), and most likely of particular interest are these two patches that we are carrying: We now find that, at least on the ldiskfs MDS nodes, OI scrub runs on every start up of the MDS. The console message looks something like this: 2014-01-28 09:27:28 sumom-mds1 login: LustreError: 0-0: lsc-MDT0000: trigger OI scrub by RPC for [0x7e4d2310f09:0x2ddf:0x0], rc = 0 [1] Given the frequency of MDS reboots (i.e. often) required lately for other bugs, OI scrub is running far too much. |
| Comments |
| Comment by Peter Jones [ 28/Jan/14 ] |
|
Lai Could you please consult with Fan Yong and respond on this one? Thanks Peter |
| Comment by Christopher Morrone [ 28/Jan/14 ] |
|
We need to increase the priority on this ticket. The OI scrub is introducing a lot of production problems for us. First of all, while OI scrub is running, many client nodes hang while holding the mdc lock. Apparently they are doing an operation that triggers osd_fid_lookup() on the server. The FID lookup fails and the server decides to start OI scrub (or OI scrub is already running) and returns EINPROGRESS to the client. Unfortunately, the client is now pretty much unusable to anyone until OI scrub complete. Furthermore, it would appear that OI scrub will not respond to the stop command. I'm told that running "lctl lfsck_stop -M ls5-MDT0000" returns "operation already in progress". |
| Comment by Ned Bass [ 29/Jan/14 ] |
|
I'll note that in the oi_scrub file, the Updated count is 0 and the Success count is 5. From the current scan position relative to number of inodes used, I think it is nearing completion. So it seems the OI scrub is not finding anything to repair, yet certain client requests trigger the scrub. From server-side debug logs, it seems that the trigger is happening here in osd_fid_lookup(): 383 /* Search order: 3. OI files. */ 384 result = osd_oi_lookup(info, dev, fid, id, true); 385 if (result == -ENOENT) { 386 if (!fid_is_norm(fid) || fid_is_on_ost(info, dev, fid) || 387 !ldiskfs_test_bit(osd_oi_fid2idx(dev,fid), 388 sf->sf_oi_bitmap)) 389 GOTO(out, result = 0); 390 391 goto trigger; 392 } There are no debug log entries from osd_iget() so it must be jumping past the iget: goto label, which means it must take the path above. |
| Comment by nasf (Inactive) [ 29/Jan/14 ] |
|
The FID [0x7e4d2310f09:0x2ddf:0x0] is an normal FID, according to above logic, the condition: "(ldiskfs_test_bit(osd_oi_fid2idx(dev,fid), sf->sf_oi_bitmap))" should true. That means the OSD thinks that it is in OI files re-creating, so it is not sure whether it is because the object does not non-exist or because related OI mapping is not re-inserted yet. So would you please to show the OI_scrub file under /proc to check whether the system is really in re-creating the OI files or not? Thanks! |
| Comment by Christopher Morrone [ 29/Jan/14 ] |
|
Here is the oi_scrub output from one of the MDS nodes that was rebooted on our open network this morning: > cat osd-ldiskfs/lsc-MDT0000/oi_scrub name: OI_scrub magic: 0x4c5fd252 oi_files: 1 status: completed flags: param: time_since_last_completed: 28001 seconds time_since_latest_start: 36680 seconds time_since_last_checkpoint: 28001 seconds latest_start_position: 12 last_checkpoint_position: 991133697 first_failure_position: N/A checked: 170974399 updated: 0 failed: 0 prior_updated: 0 noscrub: 2284 igif: 12381456 success_count: 12 run_time: 8678 seconds average_speed: 19702 objects/sec real-time_speed: N/A current_position: N/A |
| Comment by nasf (Inactive) [ 29/Jan/14 ] |
|
Then it is strange, the code should not jump from such place. Is there any (-1 level) debug log when the OI scrub triggered? |
| Comment by Ned Bass [ 29/Jan/14 ] |
|
Fan Yong, the -1 debug log is from a classified system, so I can't send it, but if you have specific questions about it I can look for you. While the scrub was in progress, the flags field only had 'auto'. An example FID from that system that followed the "trigger" path was [0x1a89082ad98:0x4d:0x0]. |
| Comment by Ned Bass [ 29/Jan/14 ] |
|
I also notice osd_fid_lookup() starts the scrub using osd_scrub_start(dev), which only enables the flag SS_AUTO. So even though (ldiskfs_test_bit(osd_oi_fid2idx(dev,fid), sf->sf_oi_bitmap)) is true, (unless I misunderstand something) we would not see the "recreated" flag in oi_scrub. |
| Comment by Ned Bass [ 30/Jan/14 ] |
|
I peeked at the /OI_scrub file while an auto-scrub was running. It showed that bit 0 was set in sf->sf_oi_bitmap. This is wrong, because the OI already exists and OI_scrub has already run to completion several times. I think I see the problem in osd_oi_table_open(). Note the format string assumes the OI containers have names like oi.16.0, oi.16.1, and so on. However, for our upgraded filesystems we have only one OI container named oi.16. So osd_oi_open() returns ENOENT and we proceed to set the "recreated" bit in the bitmap. |
| Comment by Ned Bass [ 30/Jan/14 ] |
|
Patch for master: http://review.whamcloud.com/#/c/9067/ |
| Comment by Andreas Dilger [ 03/Feb/14 ] |
|
James, Lai is on holiday this week. Could you please cherry-pick this patch to b2_4 and b2_5 once it has landed to master. This can now be done directly in Gerrit. Please also add the "Lustre-change:" and "Lustre-commit:" tags to the commit messages as described on https://wiki.hpdd.intel.com/display/PUB/Commit+Comments |
| Comment by James Nunez (Inactive) [ 05/Feb/14 ] |
|
This patch hasn't landed to master yet, but I created a b2_5 and b2_4 patch at: b2_4 - http://review.whamcloud.com/#/c/9140/ |
| Comment by Peter Jones [ 11/Feb/14 ] |
|
Landed for 2.5.1 and 2.6 |