[LU-4858] Race when reading ofd proc entries while unmounting OST Created: 03/Apr/14 Updated: 28/May/14 Resolved: 08/May/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0, Lustre 2.5.1, Lustre 2.4.3 |
| Fix Version/s: | Lustre 2.6.0, Lustre 2.5.2 |
| Type: | Bug | Priority: | Major |
| Reporter: | James A Simmons | Assignee: | Lai Siyao |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | mn4 | ||
| Environment: |
Affects any OSS with any environment. |
||
| Severity: | 3 |
| Rank (Obsolete): | 13397 |
| Description |
|
This bug was discovered due to the fact we run a utility that monitors the OST stats from /proc/fs/lustre/obdfilter/fsname-OSTXXXX. These stats were still being accessed while the test file system was taken down. <4>[85795.199509] Lustre: server umount a1_thin-OST0129 complete We have core dumps as well. |
| Comments |
| Comment by James Nunez (Inactive) [ 03/Apr/14 ] |
|
James, Are you planning on investigating this one and following this up with a patch? Thanks, |
| Comment by Bruno Faccini (Inactive) [ 04/Apr/14 ] |
|
James, But why don't you stop the utility before you take the filesystem down ??!! More seriously, at 1st look it seems not very logical that in ofd_[init0,fini]() routines, the procfs stuff is initialized/stopped (via ofd_procfs_init()/ofd_profs_fini() routines) before/after others players (like the "stack" and its ofd_stack_init()/ofd_stack_fini() routines) it depends on. May be we can just change this order in the code ? If not easy/possible, then some protection will have to be added to prevent the race. |
| Comment by Peter Jones [ 04/Apr/14 ] |
|
Lai Could you please assist with this one? Thanks Peter |
| Comment by James A Simmons [ 04/Apr/14 ] |
|
Matt Ezell just came up with a patch http://review.whamcloud.com/#/c/9885 to expose the problem. No patch from me yet. Next week I will be at LUG so it might be a while before I have a patch to test. |
| Comment by Matt Ezell [ 07/Apr/14 ] |
|
Bruno - the tool is LMT, which is always running in the background. In this specific case, we were only unmounting some of the OSTs from the OSS. So we wanted it to remain running. I had a good reproducer in my test environment based on 2.4, but I couldn't get it to fail in Maloo. It turns out that in master, some of the OBD methods (including statfs) were removed from OFD. So accessing /proc/fs/lustre/obdfilter/fsname-OSTXXXX/filesfree (for example) now returns EOPNOTSUPP. So for master we probably want to symlink the following entries to their corresponding ../../osd-*/fsname-OSTXXXX/<name> entry, similar to how we have brw_stats, read_cache_enable, readcache_max_filesize, and writethrough_cache_enable. If this doesn't happen, tools like LMT may break. The ones I found that don't work are: blocksize filesfree filestotal kbytesavail kbytesfree kbytestotal After that change, I'm not sure if any of the remaining code paths can hit this race or not. Certainly my old reproducer doesn't work anymore. And having a client do something like "df $MOUNT" just hangs while the unmount is happening. I haven't tried Bruno's suggestion of just reordering the OST tear-down procedure, but I would expect it to be safe. This also brings up another question of when to use a LASSERT versus handle the unexpected situation gracefully. That is, when should LASSERT( dev ) be written as if ( dev == NULL ) return -ENODEV; |
| Comment by Lai Siyao [ 24/Apr/14 ] |
|
patches are ready: Matt, could you help verify it works in your system? BTW I think you can revive http://review.whamcloud.com/#/c/9885 since my patch will fix the issue you mentioned above. |
| Comment by Peter Jones [ 08/May/14 ] |
|
Landed for 2.6. Will consider for b2_5 and b2_4 also. |
| Comment by Andreas Dilger [ 28/May/14 ] |
|
Patch landed to b2_5 for 2.5.2. |