[LU-395] obd_zombie_barrier is not barrier actually Created: 06/Jun/11 Updated: 05/Sep/11 Resolved: 05/Sep/11 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.1.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Mikhail Pershin | Assignee: | Mikhail Pershin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 4981 |
| Description |
|
obd_zombie_barrier() is supposed to make sure there are no exports remains and we can proceed with obd cleanup. In fact it makes sure only there are no exports in zombie list head, but export can be still not destroyed. This can lead to issues seen in ORI-125 and ORI-211, because lu_target is freed before all exports are gone. The solution could be fix for zombie barrier to make it really barrier which waits for all exports are destroyed. |
| Comments |
| Comment by Build Master (Inactive) [ 13/Jun/11 ] |
|
Integrated in Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
|
| Comment by Build Master (Inactive) [ 13/Jun/11 ] |
|
Integrated in Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
|
| Comment by Build Master (Inactive) [ 13/Jun/11 ] |
|
Integrated in Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
|
| Comment by Build Master (Inactive) [ 13/Jun/11 ] |
|
Integrated in Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
|
| Comment by Build Master (Inactive) [ 13/Jun/11 ] |
|
Integrated in Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
|
| Comment by Build Master (Inactive) [ 13/Jun/11 ] |
|
Integrated in Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
|
| Comment by Build Master (Inactive) [ 13/Jun/11 ] |
|
Integrated in Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
|
| Comment by Build Master (Inactive) [ 13/Jun/11 ] |
|
Integrated in Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
|
| Comment by Build Master (Inactive) [ 13/Jun/11 ] |
|
Integrated in Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
|
| Comment by Build Master (Inactive) [ 13/Jun/11 ] |
|
Integrated in Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
|
| Comment by Build Master (Inactive) [ 13/Jun/11 ] |
|
Integrated in Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
|
| Comment by Build Master (Inactive) [ 13/Jun/11 ] |
|
Integrated in Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
|
| Comment by Build Master (Inactive) [ 13/Jun/11 ] |
|
Integrated in Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
|
| Comment by Build Master (Inactive) [ 13/Jun/11 ] |
|
Integrated in Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
|
| Comment by Jinshan Xiong (Inactive) [ 14/Jun/11 ] |
|
I don't know if this is related to this issue, but the umounting ost can easily hung with the following stack trace: umount S ffff810005736420 0 15952 15951 (NOTLB) |
| Comment by Build Master (Inactive) [ 14/Jun/11 ] |
|
Integrated in Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
|
| Comment by Jinshan Xiong (Inactive) [ 14/Jun/11 ] |
|
it works with this patch: diff --git a/lustre/obdclass/obd_config.c b/lustre/obdclass/obd_config.c index de433ff..7331460 100644 --- a/lustre/obdclass/obd_config.c +++ b/lustre/obdclass/obd_config.c @@ -545,10 +545,6 @@ int class_detach(struct obd_device *obd, struct lustre_cfg *lcfg) obd->obd_name, obd->obd_uuid.uuid); class_decref(obd, "attach", obd); - - /* not strictly necessary, but cleans up eagerly */ - obd_zombie_impexp_cull(); - RETURN(0); } |
| Comment by Lai Siyao [ 14/Jun/11 ] |
|
I met this hang too. And after reviewing the code, I think Jinshan's patch should work: the backtrace above shows sys_umount -> class_detach -> obd_zombie_impexp_cull -> filter_cleanup -> obd_zombie_barrier, but the zombies_count will only be decreased after obd_zombie_impexp_cull, so this process is waiting for itself, this is a deadlock. I think it's okay to let zombie thread to call obd_zombie_impexp_cull, and this should also be the reason why a separate zombie thread is created. |
| Comment by Sarah Liu [ 16/Jun/11 ] |
|
running mmp tests on the latest master build#173 rhel5/x86_64, and also review build#956, tests hang from the beginning. Bobi thought this is the same issue here. The failure can be reproduced every time. |
| Comment by Build Master (Inactive) [ 16/Jun/11 ] |
|
Integrated in Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
|
| Comment by Build Master (Inactive) [ 16/Jun/11 ] |
|
Integrated in Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
|
| Comment by Build Master (Inactive) [ 16/Jun/11 ] |
|
Integrated in Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
|
| Comment by Build Master (Inactive) [ 16/Jun/11 ] |
|
Integrated in Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
|
| Comment by Build Master (Inactive) [ 16/Jun/11 ] |
|
Integrated in Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
|
| Comment by Build Master (Inactive) [ 16/Jun/11 ] |
|
Integrated in Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
|
| Comment by Build Master (Inactive) [ 16/Jun/11 ] |
|
Integrated in Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
|
| Comment by Build Master (Inactive) [ 16/Jun/11 ] |
|
Integrated in Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
|
| Comment by Build Master (Inactive) [ 16/Jun/11 ] |
|
Integrated in Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
|
| Comment by Build Master (Inactive) [ 16/Jun/11 ] |
|
Integrated in Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
|
| Comment by Build Master (Inactive) [ 16/Jun/11 ] |
|
Integrated in Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
|
| Comment by Build Master (Inactive) [ 16/Jun/11 ] |
|
Integrated in Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
|
| Comment by Build Master (Inactive) [ 16/Jun/11 ] |
|
Integrated in Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
|
| Comment by Build Master (Inactive) [ 16/Jun/11 ] |
|
Integrated in Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
|
| Comment by Build Master (Inactive) [ 16/Jun/11 ] |
|
Integrated in Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
|
| Comment by Mikhail Pershin [ 05/Sep/11 ] |
|
Pushed to master |