patch https://review.whamcloud.com/#/c/38845/ introcued a issue that ko2iblnd_shutdown never completed.
A reproducible test case is below.
- Start Lustre with LNET-MR on the Infiniband network
- Turn off two IB ports on one of OSSs
- Umount OSTs on that particular OSS (assumed OSS failover)
- Two IB ports are back .
- OSTs are remounted on that OSS (assumed OSS failback)
- Stop all Lustre service and cleanup (lustre_rmmod) all lustre modules
When lustre modules were unloaded on all OSSs, some of OSS's (or all of OSS) shutdown never completed due to hanging at ko2iblnd_shutdown. I also tried second patch https://review.whamcloud.com/40937, but the problem was still exist.
btw, if server applied patch LU-14499 (reverted LU-13638 patch), this shutdown prolbem was gone.
Hi, Shuichi,
Do you have chance to verify the patch fixes the rmmod issue?
Thanks,
YangSheng