Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.9.0
-
None
-
autotest review-dne
-
3
-
9223372036854775807
Description
sanity-scrub test_1c and test_4a are timing out with a hang in unmount of a MDT. All failures so far are in review-dne.
From the test_log, we see the test is umounting MDTs:
… stop mds1 CMD: onyx-33vm7 grep -c /mnt/lustre-mds1' ' /proc/mounts CMD: onyx-33vm7 umount -d /mnt/lustre-mds1 CMD: onyx-33vm7 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' stop mds2 CMD: onyx-33vm3 grep -c /mnt/lustre-mds2' ' /proc/mounts CMD: onyx-33vm3 umount -d /mnt/lustre-mds2 CMD: onyx-33vm3 lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' stop mds3 CMD: onyx-33vm7 grep -c /mnt/lustre-mds3' ' /proc/mounts CMD: onyx-33vm7 umount -d /mnt/lustre-mds3
Looking at the MDS that is in the process of unmounting an MDT, in the console logs, we see errors when trying to unmount and the stack trace:
06:49:24:[12121.531608] Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds3 06:49:24:[12141.673117] LustreError: 23989:0:(import.c:338:ptlrpc_invalidate_import()) lustre-MDT0000_UUID: rc = -110 waiting for callback (1 != 0) 06:49:24:[12141.681956] LustreError: 23989:0:(import.c:372:ptlrpc_invalidate_import()) @@@ still on delayed list req@ffff880054be8300 x1539734990549200/t0(0) o1000->lustre-MDT0000-osp-MDT0002@0@lo:24/4 lens 488/192 e 0 to 0 dl 1468417620 ref 2 fl Interpret:ES/0/0 rc -5/-107 06:49:24:[12141.688142] LustreError: 23989:0:(import.c:378:ptlrpc_invalidate_import()) lustre-MDT0000_UUID: Unregistering RPCs found (0). Network is sluggish? Waiting them to error out. 06:49:24:[12161.692141] LustreError: 23989:0:(import.c:338:ptlrpc_invalidate_import()) lustre-MDT0000_UUID: rc = -110 waiting for callback (1 != 0) 06:49:24:[12161.701736] LustreError: 23989:0:(import.c:372:ptlrpc_invalidate_import()) @@@ still on delayed list req@ffff880054be8300 x1539734990549200/t0(0) o1000->lustre-MDT0000-osp-MDT0002@0@lo:24/4 lens 488/192 e 0 to 0 dl 1468417620 ref 2 fl Interpret:ES/0/0 rc -5/-107 06:49:24:[12161.708145] LustreError: 23989:0:(import.c:378:ptlrpc_invalidate_import()) lustre-MDT0000_UUID: Unregistering RPCs found (0). Network is sluggish? Waiting them to error out. 06:49:24:[12181.712143] LustreError: 23989:0:(import.c:338:ptlrpc_invalidate_import()) lustre-MDT0000_UUID: rc = -110 waiting for callback (1 != 0) 06:49:24:[12181.722112] LustreError: 23989:0:(import.c:372:ptlrpc_invalidate_import()) @@@ still on delayed list req@ffff880054be8300 x1539734990549200/t0(0) o1000->lustre-MDT0000-osp-MDT0002@0@lo:24/4 lens 488/192 e 0 to 0 dl 1468417620 ref 2 fl Interpret:ES/0/0 rc -5/-107 06:49:24:[12181.729016] LustreError: 23989:0:(import.c:378:ptlrpc_invalidate_import()) lustre-MDT0000_UUID: Unregistering RPCs found (0). Network is sluggish? Waiting them to error out. 06:49:24:[12201.733130] LustreError: 23989:0:(import.c:338:ptlrpc_invalidate_import()) lustre-MDT0000_UUID: rc = -110 waiting for callback (1 != 0) 06:49:24:[12201.742520] LustreError: 23989:0:(import.c:372:ptlrpc_invalidate_import()) @@@ still on delayed list req@ffff880054be8300 x1539734990549200/t0(0) o1000->lustre-MDT0000-osp-MDT0002@0@lo:24/4 lens 488/192 e 0 to 0 dl 1468417620 ref 2 fl Interpret:ES/0/0 rc -5/-107 06:49:24:[12201.749824] LustreError: 23989:0:(import.c:378:ptlrpc_invalidate_import()) lustre-MDT0000_UUID: Unregistering RPCs found (0). Network is sluggish? Waiting them to error out. 06:49:24:[12221.754157] LustreError: 23989:0:(import.c:338:ptlrpc_invalidate_import()) lustre-MDT0000_UUID: rc = -110 waiting for callback (1 != 0) 06:49:24:[12221.763268] LustreError: 23989:0:(import.c:372:ptlrpc_invalidate_import()) @@@ still on delayed list req@ffff880054be8300 x1539734990549200/t0(0) o1000->lustre-MDT0000-osp-MDT0002@0@lo:24/4 lens 488/192 e 0 to 0 dl 1468417620 ref 2 fl Interpret:ES/0/0 rc -5/-107 06:49:24:[12221.770918] LustreError: 23989:0:(import.c:378:ptlrpc_invalidate_import()) lustre-MDT0000_UUID: Unregistering RPCs found (0). Network is sluggish? Waiting them to error out. 06:49:24:[12241.776150] LustreError: 23989:0:(import.c:338:ptlrpc_invalidate_import()) lustre-MDT0000_UUID: rc = -110 waiting for callback (1 != 0) 06:49:24:[12241.784285] LustreError: 23989:0:(import.c:372:ptlrpc_invalidate_import()) @@@ still on delayed list req@ffff880054be8300 x1539734990549200/t0(0) o1000->lustre-MDT0000-osp-MDT0002@0@lo:24/4 lens 488/192 e 0 to 0 dl 1468417620 ref 2 fl Interpret:ES/0/0 rc -5/-107 06:49:24:[12241.792494] LustreError: 23989:0:(import.c:378:ptlrpc_invalidate_import()) lustre-MDT0000_UUID: Unregistering RPCs found (0). Network is sluggish? Waiting them to error out. 06:49:24:[12261.798128] LustreError: 23989:0:(import.c:338:ptlrpc_invalidate_import()) lustre-MDT0000_UUID: rc = -110 waiting for callback (1 != 0) 06:54:55:[12261.805202] LustreError: 23989:0:(import.c:372:ptlrpc_invalidate_import()) @@@ still on delayed list req@ffff880054be8300 x1539734990549200/t0(0) o1000->lustre-MDT0000-osp-MDT0002@0@lo:24/4 lens 488/192 e 0 to 0 dl 1468417620 ref 2 fl Interpret:ES/0/0 rc -5/-107 06:54:55:[12261.813777] LustreError: 23989:0:(import.c:378:ptlrpc_invalidate_import()) lustre-MDT0000_UUID: Unregistering RPCs found (0). Network is sluggish? Waiting them to error out. 06:54:55:[12301.819116] LustreError: 23989:0:(import.c:338:ptlrpc_invalidate_import()) lustre-MDT0000_UUID: rc = -110 waiting for callback (1 != 0) 06:54:55:[12301.825881] LustreError: 23989:0:(import.c:338:ptlrpc_invalidate_import()) Skipped 1 previous similar message 06:54:55:[12301.831998] LustreError: 23989:0:(import.c:372:ptlrpc_invalidate_import()) @@@ still on delayed list req@ffff880054be8300 x1539734990549200/t0(0) o1000->lustre-MDT0000-osp-MDT0002@0@lo:24/4 lens 488/192 e 0 to 0 dl 1468417620 ref 2 fl Interpret:ES/0/0 rc -5/-107 06:54:55:[12301.840643] LustreError: 23989:0:(import.c:372:ptlrpc_invalidate_import()) Skipped 1 previous similar message 06:54:55:[12301.843920] LustreError: 23989:0:(import.c:378:ptlrpc_invalidate_import()) lustre-MDT0000_UUID: Unregistering RPCs found (0). Network is sluggish? Waiting them to error out. 06:54:55:[12301.849581] LustreError: 23989:0:(import.c:378:ptlrpc_invalidate_import()) Skipped 1 previous similar message 06:54:55:[12360.587118] INFO: task osp_up0-2:22450 blocked for more than 120 seconds. 06:54:55:[12360.592257] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 06:54:55:[12360.595199] osp_up0-2 D ffffc90004019000 0 22450 2 0x00000080 06:54:55:[12360.597925] ffff88007a277a98 0000000000000046 ffff8800441b8000 ffff88007a277fd8 06:54:55:[12360.600858] ffff88007a277fd8 ffff88007a277fd8 ffff8800441b8000 ffff88004435c240 06:54:55:[12360.603645] ffff8800435be000 ffff880042fd6020 0000000000000000 ffffc90004019000 06:54:55:[12360.606403] Call Trace: 06:54:55:[12360.608738] [<ffffffff8163ba29>] schedule+0x29/0x70 06:54:55:[12360.611237] [<ffffffffa07f797d>] lu_object_find_at+0x4d/0xe0 [obdclass] 06:54:55:[12360.613840] [<ffffffff810b88c0>] ? wake_up_state+0x20/0x20 06:54:55:[12360.616305] [<ffffffffa07f7e3f>] lu_object_find_slice+0x1f/0x90 [obdclass] 06:54:55:[12360.618842] [<ffffffffa0f7c5f4>] osp_trans_stop_cb+0x1b4/0x2c0 [osp] 06:54:55:[12360.621315] [<ffffffffa0f7eeeb>] osp_update_interpret+0x21b/0x4b0 [osp] 06:54:55:[12360.623814] [<ffffffffa0a0d725>] ptlrpc_check_set.part.23+0x425/0x1dd0 [ptlrpc] 06:54:55:[12360.626273] [<ffffffff8108bf50>] ? internal_add_timer+0x70/0x70 06:54:55:[12360.628649] [<ffffffffa0a0f12b>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc] 06:54:55:[12360.631038] [<ffffffffa0a0f6a1>] ptlrpc_set_wait+0x4f1/0x900 [ptlrpc] 06:54:55:[12360.633299] [<ffffffff810b88c0>] ? wake_up_state+0x20/0x20 06:54:55:[12360.635543] [<ffffffffa0a0fb2d>] ptlrpc_queue_wait+0x7d/0x220 [ptlrpc] 06:54:55:[12360.637787] [<ffffffffa0f7f6b2>] osp_send_update_req+0x1c2/0x830 [osp] 06:54:55:[12360.640042] [<ffffffffa0f80573>] osp_send_update_thread+0x233/0x5e0 [osp] 06:54:55:[12360.642227] [<ffffffff810b88c0>] ? wake_up_state+0x20/0x20 06:54:55:[12360.644346] [<ffffffffa0f80340>] ? osp_invalidate_request+0x370/0x370 [osp] 06:54:55:[12360.646486] [<ffffffff810a5aef>] kthread+0xcf/0xe0 06:54:55:[12360.648492] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 06:54:55:[12360.650536] [<ffffffff816469d8>] ret_from_fork+0x58/0x90 06:54:55:[12360.652551] [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140 06:54:55:[12372.704108] Lustre: 6959:0:(client.c:2113:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1468417841/real 1468417841] req@ffff880054be9500 x1539734990553872/t0(0) o250->MGC10.2.4.126@tcp@0@lo:26/25 lens 520/544 e 0 to 1 dl 1468417866 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 06:54:55:[12372.714842] Lustre: 6959:0:(client.c:2113:ptlrpc_expire_one_request()) Skipped 46 previous similar messages 06:54:55:[12381.852134] LustreError: 23989:0:(import.c:338:ptlrpc_invalidate_import()) lustre-MDT0000_UUID: rc = -110 waiting for callback (1 != 0) 06:54:55:[12381.861581] LustreError: 23989:0:(import.c:338:ptlrpc_invalidate_import()) Skipped 3 previous similar messages 06:54:55:[12381.864156] LustreError: 23989:0:(import.c:372:ptlrpc_invalidate_import()) @@@ still on delayed list req@ffff880054be8300 x1539734990549200/t0(0) o1000->lustre-MDT0000-osp-MDT0002@0@lo:24/4 lens 488/192 e 0 to 0 dl 1468417620 ref 2 fl Interpret:ES/0/0 rc -5/-107 06:54:55:[12381.870152] LustreError: 23989:0:(import.c:372:ptlrpc_invalidate_import()) Skipped 3 previous similar messages 06:54:55:[12381.872522] LustreError: 23989:0:(import.c:378:ptlrpc_invalidate_import()) lustre-MDT0000_UUID: Unregistering RPCs found (0). Network is sluggish? Waiting them to error out. …
sanity-scrub test_1c started failing with this error on July 13, 2016. Logs for some of these failures are at:
https://testing.hpdd.intel.com/test_sets/c22c1f38-48c5-11e6-bf87-5254006e85c2
https://testing.hpdd.intel.com/test_sets/1bf26ff2-48eb-11e6-bf87-5254006e85c2
https://testing.hpdd.intel.com/test_sets/055ea44e-4900-11e6-bf87-5254006e85c2
sanity-scrub test_4a started failing with this error on July 13, 2016. Logs for some of these failures are at:
https://testing.hpdd.intel.com/test_sets/a3238bf4-48d8-11e6-9f8e-5254006e85c2
https://testing.hpdd.intel.com/test_sets/61c003cc-492d-11e6-8968-5254006e85c2
https://testing.hpdd.intel.com/test_sets/cdb52f20-492e-11e6-8968-5254006e85c2