Details
Description
sanity-scrub test_5 hangs on OST unmount with OST waiting for obd_unlinked_exports for ZFS/DNE. We’ve seen this test fail twice with the first failure on 2019-04-17 with logs at https://testing.whamcloud.com/test_sets/b416bc06-60e0-11e9-92fe-52540065bddc.
Looking at the OSS console log, we see
[11834.952415] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-ost2 [11849.146323] Lustre: lustre-OST0001 is waiting for obd_unlinked_exports more than 8 seconds. The obd refcount = 4. Is it stuck? [11849.148320] Lustre: lustre-OST0001: UNLINKED ffff96655cb3f800 028216d1-49cf-def1-f2c5-8e9a09f0ba40 10.9.5.250@tcp 1 (0 0 0) 1 0 1 0: (null) 0 stale:0 [11865.150330] Lustre: lustre-OST0001 is waiting for obd_unlinked_exports more than 16 seconds. The obd refcount = 4. Is it stuck? [11865.152291] Lustre: lustre-OST0001: UNLINKED ffff96655cb3f800 028216d1-49cf-def1-f2c5-8e9a09f0ba40 10.9.5.250@tcp 1 (0 0 0) 1 0 1 0: (null) 0 stale:0 [11897.154323] Lustre: lustre-OST0001 is waiting for obd_unlinked_exports more than 32 seconds. The obd refcount = 4. Is it stuck? [11897.156322] Lustre: lustre-OST0001: UNLINKED ffff96655cb3f800 028216d1-49cf-def1-f2c5-8e9a09f0ba40 10.9.5.250@tcp 1 (0 0 0) 1 0 1 0: (null) 0 stale:0 [11961.158333] Lustre: lustre-OST0001 is waiting for obd_unlinked_exports more than 64 seconds. The obd refcount = 4. Is it stuck? [11961.160328] Lustre: lustre-OST0001: UNLINKED ffff96655cb3f800 028216d1-49cf-def1-f2c5-8e9a09f0ba40 10.9.5.250@tcp 1 (0 0 0) 1 0 1 0: (null) 0 stale:0 [12089.162328] Lustre: lustre-OST0001 is waiting for obd_unlinked_exports more than 128 seconds. The obd refcount = 4. Is it stuck? [12089.164308] Lustre: lustre-OST0001: UNLINKED ffff96655cb3f800 028216d1-49cf-def1-f2c5-8e9a09f0ba40 10.9.5.250@tcp 1 (0 0 0) 1 0 1 0: (null) 0 stale:0 [12240.336376] INFO: task umount:4154 blocked for more than 120 seconds. [12240.337520] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [12240.338841] umount D ffff966553984100 0 4154 4153 0x00000080 [12240.340202] Call Trace: [12240.340846] [<ffffffffc0e04234>] ? print_export_data.isra.18+0x224/0x240 [obdclass] [12240.342282] [<ffffffffbb968c49>] schedule+0x29/0x70 [12240.343148] [<ffffffffbb966668>] schedule_timeout+0x168/0x2d0 [12240.344251] [<ffffffffbb2a9920>] ? __internal_add_timer+0x130/0x130 [12240.345354] [<ffffffffc0e0bddc>] ? dump_exports+0xec/0x100 [obdclass] [12240.346560] [<ffffffffc0e0be99>] obd_exports_barrier+0xa9/0x1a0 [obdclass] [12240.347791] [<ffffffffc13d2873>] ofd_device_fini+0xa3/0x2d0 [ofd] [12240.349010] [<ffffffffc0e231c2>] class_cleanup+0x862/0xbd0 [obdclass] [12240.350134] [<ffffffffc0e241bc>] class_process_config+0x65c/0x2830 [obdclass] [12240.351480] [<ffffffffc0cd1f37>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [12240.352618] [<ffffffffc0e26556>] class_manual_cleanup+0x1c6/0x710 [obdclass] [12240.353967] [<ffffffffc0e5697e>] server_put_super+0x8de/0xcd0 [obdclass] [12240.355140] [<ffffffffbb4441bd>] generic_shutdown_super+0x6d/0x100 [12240.356291] [<ffffffffbb4445b2>] kill_anon_super+0x12/0x20 [12240.357262] [<ffffffffc0e290c2>] lustre_kill_super+0x32/0x50 [obdclass] [12240.358476] [<ffffffffbb44496e>] deactivate_locked_super+0x4e/0x70 [12240.359543] [<ffffffffbb4450f6>] deactivate_super+0x46/0x60 [12240.360648] [<ffffffffbb46367f>] cleanup_mnt+0x3f/0x80 [12240.361549] [<ffffffffbb463712>] __cleanup_mnt+0x12/0x20 [12240.362568] [<ffffffffbb2be7db>] task_work_run+0xbb/0xe0 [12240.363517] [<ffffffffbb22bc65>] do_notify_resume+0xa5/0xc0 [12240.364572] [<ffffffffbb976124>] int_signal+0x12/0x17 [12345.166337] Lustre: lustre-OST0001 is waiting for obd_unlinked_exports more than 256 seconds. The obd refcount = 4. Is it stuck? [12345.168318] Lustre: lustre-OST0001: UNLINKED ffff96655cb3f800 028216d1-49cf-def1-f2c5-8e9a09f0ba40 10.9.5.250@tcp 1 (0 0 0) 1 0 1 0: (null) 0 stale:0
This failure looks like several other tickets where the MGS is waiting for obd_unlinked_exports, but those tickets are closed. LU-8500 for example.
The other hang took place on 2019-05-17 with log at https://testing.whamcloud.com/test_sets/c0369af6-7a17-11e9-be83-52540065bddc.