[LU-395] obd_zombie_barrier is not barrier actually Created: 06/Jun/11  Updated: 05/Sep/11  Resolved: 05/Sep/11

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.1.0

Type: Bug Priority: Minor
Reporter: Mikhail Pershin Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 4981

 Description   

obd_zombie_barrier() is supposed to make sure there are no exports remains and we can proceed with obd cleanup. In fact it makes sure only there are no exports in zombie list head, but export can be still not destroyed. This can lead to issues seen in ORI-125 and ORI-211, because lu_target is freed before all exports are gone.

The solution could be fix for zombie barrier to make it really barrier which waits for all exports are destroyed.



 Comments   
Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #167
LU-395 Fix obd_zombie_barrier()

Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
Files :

  • lustre/obdclass/genops.c
Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #167
LU-395 Fix obd_zombie_barrier()

Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
Files :

  • lustre/obdclass/genops.c
Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #167
LU-395 Fix obd_zombie_barrier()

Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
Files :

  • lustre/obdclass/genops.c
Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #167
LU-395 Fix obd_zombie_barrier()

Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
Files :

  • lustre/obdclass/genops.c
Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » i686,server,el5,ofa #167
LU-395 Fix obd_zombie_barrier()

Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
Files :

  • lustre/obdclass/genops.c
Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #167
LU-395 Fix obd_zombie_barrier()

Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
Files :

  • lustre/obdclass/genops.c
Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #167
LU-395 Fix obd_zombie_barrier()

Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
Files :

  • lustre/obdclass/genops.c
Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #167
LU-395 Fix obd_zombie_barrier()

Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
Files :

  • lustre/obdclass/genops.c
Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,ofa #167
LU-395 Fix obd_zombie_barrier()

Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
Files :

  • lustre/obdclass/genops.c
Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » i686,client,el5,ofa #167
LU-395 Fix obd_zombie_barrier()

Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
Files :

  • lustre/obdclass/genops.c
Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #167
LU-395 Fix obd_zombie_barrier()

Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
Files :

  • lustre/obdclass/genops.c
Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #167
LU-395 Fix obd_zombie_barrier()

Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
Files :

  • lustre/obdclass/genops.c
Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #167
LU-395 Fix obd_zombie_barrier()

Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
Files :

  • lustre/obdclass/genops.c
Comment by Build Master (Inactive) [ 13/Jun/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #167
LU-395 Fix obd_zombie_barrier()

Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
Files :

  • lustre/obdclass/genops.c
Comment by Jinshan Xiong (Inactive) [ 14/Jun/11 ]

I don't know if this is related to this issue, but the umounting ost can easily hung with the following stack trace:

umount S ffff810005736420 0 15952 15951 (NOTLB)
ffff810020677b18 0000000000000082 0000003000000030 ffff810020677b90
ffff810020677a78 0000000000000009 ffff81002b60b040 ffffffff8030db60
0000017b4d2db85e 0000000000000713 ffff81002b60b228 0000000000000000
Call Trace:
[<ffffffff8859ca7c>] :obdclass:obd_zombie_barrier+0x75/0xd4
[<ffffffff8008e41a>] default_wake_function+0x0/0xe
[<ffffffff88a9c648>] :obdfilter:filter_cleanup+0xa9/0x36f
[<ffffffff885ad6db>] :obdclass:class_decref+0x384/0x482
[<ffffffff886b1adb>] :ptlrpc:lut_client_free+0xe9/0x190
[<ffffffff88aa0a36>] :obdfilter:filter_destroy_export+0x21b/0x467
[<ffffffff88598c2b>] :obdclass:obd_zombie_impexp_cull+0x3a3/0x483
[<ffffffff885ae531>] :obdclass:class_detach+0x22f/0x290
[<ffffffff885b1482>] :obdclass:class_process_config+0x1821/0x27c6
[<ffffffff8002ee44>] make_ahead_window+0x82/0x9e
[<ffffffff885b3706>] :obdclass:class_manual_cleanup+0x918/0xb59
[<ffffffff885be684>] :obdclass:server_put_super+0x501/0xc3b
[<ffffffff8002cb33>] mntput_no_expire+0x19/0x89
[<ffffffff80063ae5>] mutex_lock+0xd/0x1d
[<ffffffff8000a712>] __link_path_walk+0xe71/0xfb9
[<ffffffff800515cc>] vfs_quota_sync+0x14b/0x159
[<ffffffff80034f44>] dispose_list+0xc7/0xe0
[<ffffffff800eeca9>] invalidate_inodes+0xce/0xe0
[<ffffffff800e62a8>] generic_shutdown_super+0x79/0xfb
[<ffffffff800e6378>] kill_anon_super+0x9/0x35
[<ffffffff800e6429>] deactivate_super+0x6a/0x82
[<ffffffff800f03e8>] sys_umount+0x245/0x27b
[<ffffffff80023795>] sys_newstat+0x19/0x31
[<ffffffff8005d116>] system_call+0x7e/0x83

Comment by Build Master (Inactive) [ 14/Jun/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #170
LU-395 Fix obd_zombie_barrier()

Oleg Drokin : c4e3a69dd9a335d54f747d2e969edefe7bcce4cd
Files :

  • lustre/obdclass/genops.c
Comment by Jinshan Xiong (Inactive) [ 14/Jun/11 ]

it works with this patch:

diff --git a/lustre/obdclass/obd_config.c b/lustre/obdclass/obd_config.c
index de433ff..7331460 100644
--- a/lustre/obdclass/obd_config.c
+++ b/lustre/obdclass/obd_config.c
@@ -545,10 +545,6 @@ int class_detach(struct obd_device *obd, struct lustre_cfg *lcfg)
                obd->obd_name, obd->obd_uuid.uuid);
 
         class_decref(obd, "attach", obd);
-
-        /* not strictly necessary, but cleans up eagerly */
-        obd_zombie_impexp_cull();
-
         RETURN(0);
 }
Comment by Lai Siyao [ 14/Jun/11 ]

I met this hang too. And after reviewing the code, I think Jinshan's patch should work: the backtrace above shows sys_umount -> class_detach -> obd_zombie_impexp_cull -> filter_cleanup -> obd_zombie_barrier, but the zombies_count will only be decreased after obd_zombie_impexp_cull, so this process is waiting for itself, this is a deadlock. I think it's okay to let zombie thread to call obd_zombie_impexp_cull, and this should also be the reason why a separate zombie thread is created.

Comment by Sarah Liu [ 16/Jun/11 ]

running mmp tests on the latest master build#173 rhel5/x86_64, and also review build#956, tests hang from the beginning. Bobi thought this is the same issue here. The failure can be reproduced every time.

Comment by Build Master (Inactive) [ 16/Jun/11 ]

Integrated in lustre-master » x86_64,client,el5,inkernel #175
LU-395: obd_zombie_barrier is not barrier actually

Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
Files :

  • lustre/obdclass/obd_config.c
Comment by Build Master (Inactive) [ 16/Jun/11 ]

Integrated in lustre-master » x86_64,client,sles11,inkernel #175
LU-395: obd_zombie_barrier is not barrier actually

Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
Files :

  • lustre/obdclass/obd_config.c
Comment by Build Master (Inactive) [ 16/Jun/11 ]

Integrated in lustre-master » x86_64,client,el6,inkernel #175
LU-395: obd_zombie_barrier is not barrier actually

Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
Files :

  • lustre/obdclass/obd_config.c
Comment by Build Master (Inactive) [ 16/Jun/11 ]

Integrated in lustre-master » x86_64,server,el5,inkernel #175
LU-395: obd_zombie_barrier is not barrier actually

Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
Files :

  • lustre/obdclass/obd_config.c
Comment by Build Master (Inactive) [ 16/Jun/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,inkernel #175
LU-395: obd_zombie_barrier is not barrier actually

Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
Files :

  • lustre/obdclass/obd_config.c
Comment by Build Master (Inactive) [ 16/Jun/11 ]

Integrated in lustre-master » i686,client,el5,inkernel #175
LU-395: obd_zombie_barrier is not barrier actually

Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
Files :

  • lustre/obdclass/obd_config.c
Comment by Build Master (Inactive) [ 16/Jun/11 ]

Integrated in lustre-master » i686,server,el5,ofa #175
LU-395: obd_zombie_barrier is not barrier actually

Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
Files :

  • lustre/obdclass/obd_config.c
Comment by Build Master (Inactive) [ 16/Jun/11 ]

Integrated in lustre-master » x86_64,client,el5,ofa #175
LU-395: obd_zombie_barrier is not barrier actually

Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
Files :

  • lustre/obdclass/obd_config.c
Comment by Build Master (Inactive) [ 16/Jun/11 ]

Integrated in lustre-master » i686,server,el6,inkernel #175
LU-395: obd_zombie_barrier is not barrier actually

Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
Files :

  • lustre/obdclass/obd_config.c
Comment by Build Master (Inactive) [ 16/Jun/11 ]

Integrated in lustre-master » x86_64,server,el6,inkernel #175
LU-395: obd_zombie_barrier is not barrier actually

Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
Files :

  • lustre/obdclass/obd_config.c
Comment by Build Master (Inactive) [ 16/Jun/11 ]

Integrated in lustre-master » x86_64,server,el5,ofa #175
LU-395: obd_zombie_barrier is not barrier actually

Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
Files :

  • lustre/obdclass/obd_config.c
Comment by Build Master (Inactive) [ 16/Jun/11 ]

Integrated in lustre-master » x86_64,client,ubuntu1004,ofa #175
LU-395: obd_zombie_barrier is not barrier actually

Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
Files :

  • lustre/obdclass/obd_config.c
Comment by Build Master (Inactive) [ 16/Jun/11 ]

Integrated in lustre-master » i686,client,el5,ofa #175
LU-395: obd_zombie_barrier is not barrier actually

Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
Files :

  • lustre/obdclass/obd_config.c
Comment by Build Master (Inactive) [ 16/Jun/11 ]

Integrated in lustre-master » i686,server,el5,inkernel #175
LU-395: obd_zombie_barrier is not barrier actually

Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
Files :

  • lustre/obdclass/obd_config.c
Comment by Build Master (Inactive) [ 16/Jun/11 ]

Integrated in lustre-master » i686,client,el6,inkernel #175
LU-395: obd_zombie_barrier is not barrier actually

Oleg Drokin : c453ccb0f82fcaa5f537620990c4cc90a769b210
Files :

  • lustre/obdclass/obd_config.c
Comment by Mikhail Pershin [ 05/Sep/11 ]

Pushed to master
Commit c453ccb0f82fcaa5f537620990c4cc90a769b210

Generated at Sat Feb 10 01:06:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.