[LU-4858] Race when reading ofd proc entries while unmounting OST Created: 03/Apr/14  Updated: 28/May/14  Resolved: 08/May/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0, Lustre 2.5.1, Lustre 2.4.3
Fix Version/s: Lustre 2.6.0, Lustre 2.5.2

Type: Bug Priority: Major
Reporter: James A Simmons Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: mn4
Environment:

Affects any OSS with any environment.


Severity: 3
Rank (Obsolete): 13397

 Description   

This bug was discovered due to the fact we run a utility that monitors the OST stats from /proc/fs/lustre/obdfilter/fsname-OSTXXXX. These stats were still being accessed while the test file system was taken down.

<4>[85795.199509] Lustre: server umount a1_thin-OST0129 complete
<0>[85795.339106] LustreError: 9005:0:(dt_object.h:1257:dt_statfs()) ASSERTION( dev ) failed:
<0>[85795.357847] LustreError: 9005:0:(dt_object.h:1257:dt_statfs()) LBUG
<4>[85795.377690] Pid: 9005, comm: cerebrod
<4>[85795.387927]
<4>[85795.387928] Call Trace:
<4>[85795.406853] [<ffffffffa03d4895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
<4>[85795.427104] [<ffffffffa03d4e97>] lbug_with_loc+0x47/0xb0 [libcfs]
<4>[85795.447295] [<ffffffffa0d8b3be>] ofd_statfs_internal+0x23e/0x2a0 [ofd]
<4>[85795.467649] [<ffffffffa0d8b487>] ofd_statfs+0x67/0x510 [ofd]
<4>[85795.487335] [<ffffffffa05496d0>] lprocfs_rd_filesfree+0x170/0x4c0 [obdclass]
<4>[85795.508376] [<ffffffff811e9823>] ? proc_reg_open+0xc3/0x160
<4>[85795.528056] [<ffffffff811e9760>] ? proc_reg_open+0x0/0x160
<4>[85795.547148] [<ffffffffa0546563>] lprocfs_fops_read+0xf3/0x1f0 [obdclass]
<4>[85795.567489] [<ffffffff811e9d9e>] proc_reg_read+0x7e/0xc0
<4>[85795.587040] [<ffffffff81181cb5>] vfs_read+0xb5/0x1a0
<4>[85795.597941] [<ffffffff81181df1>] sys_read+0x51/0x90
<4>[85795.617517] [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
<4>[85795.637694]
<0>[85795.647408] Kernel panic - not syncing: LBUG

We have core dumps as well.



 Comments   
Comment by James Nunez (Inactive) [ 03/Apr/14 ]

James,

Are you planning on investigating this one and following this up with a patch?

Thanks,
James

Comment by Bruno Faccini (Inactive) [ 04/Apr/14 ]

James, But why don't you stop the utility before you take the filesystem down ??!!

More seriously, at 1st look it seems not very logical that in ofd_[init0,fini]() routines, the procfs stuff is initialized/stopped (via ofd_procfs_init()/ofd_profs_fini() routines) before/after others players (like the "stack" and its ofd_stack_init()/ofd_stack_fini() routines) it depends on. May be we can just change this order in the code ? If not easy/possible, then some protection will have to be added to prevent the race.

Comment by Peter Jones [ 04/Apr/14 ]

Lai

Could you please assist with this one?

Thanks

Peter

Comment by James A Simmons [ 04/Apr/14 ]

Matt Ezell just came up with a patch http://review.whamcloud.com/#/c/9885 to expose the problem. No patch from me yet. Next week I will be at LUG so it might be a while before I have a patch to test.

Comment by Matt Ezell [ 07/Apr/14 ]

Bruno - the tool is LMT, which is always running in the background. In this specific case, we were only unmounting some of the OSTs from the OSS. So we wanted it to remain running.

I had a good reproducer in my test environment based on 2.4, but I couldn't get it to fail in Maloo. It turns out that in master, some of the OBD methods (including statfs) were removed from OFD. So accessing /proc/fs/lustre/obdfilter/fsname-OSTXXXX/filesfree (for example) now returns EOPNOTSUPP. So for master we probably want to symlink the following entries to their corresponding ../../osd-*/fsname-OSTXXXX/<name> entry, similar to how we have brw_stats, read_cache_enable, readcache_max_filesize, and writethrough_cache_enable. If this doesn't happen, tools like LMT may break. The ones I found that don't work are:

blocksize
filesfree
filestotal
kbytesavail
kbytesfree
kbytestotal

After that change, I'm not sure if any of the remaining code paths can hit this race or not. Certainly my old reproducer doesn't work anymore. And having a client do something like "df $MOUNT" just hangs while the unmount is happening.

I haven't tried Bruno's suggestion of just reordering the OST tear-down procedure, but I would expect it to be safe.

This also brings up another question of when to use a LASSERT versus handle the unexpected situation gracefully. That is, when should

LASSERT( dev )

be written as

if ( dev == NULL )
    return -ENODEV;
Comment by Lai Siyao [ 24/Apr/14 ]

patches are ready:
master: http://review.whamcloud.com/#/c/10082/
2.4: http://review.whamcloud.com/#/c/10083/
2.5: http://review.whamcloud.com/#/c/10084/

Matt, could you help verify it works in your system? BTW I think you can revive http://review.whamcloud.com/#/c/9885 since my patch will fix the issue you mentioned above.

Comment by Peter Jones [ 08/May/14 ]

Landed for 2.6. Will consider for b2_5 and b2_4 also.

Comment by Andreas Dilger [ 28/May/14 ]

Patch landed to b2_5 for 2.5.2.

Generated at Sat Feb 10 01:46:28 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.