Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.12.0, Lustre 2.13.0, Lustre 2.12.1, Lustre 2.12.4
-
None
-
master clients with 2.11.0 or 2.10.5 servers
-
3
-
9223372036854775807
Description
conf-sanity test_32b crashes for master clients and 2.10.5 servers. Looking at the client test_log at https://testing.whamcloud.com/test_sets/b1b83d10-cedf-11e8-9238-52540065bddc, we see that the test crashes during umount of mdt1
CMD: /usr/sbin/lctl get_param -n mdc.t32fs-MDT0000*.max_rpcs_in_flight 14 sed: -e expression #1, char 9: unknown option to `s' pdsh@onyx-37vm9: no remote hosts specified CMD: /usr/sbin/lctl get_param -n mdc.t32fs-MDT0000*.max_rpcs_in_flight 14 sed: -e expression #1, char 9: unknown option to `s' pdsh@onyx-37vm9: no remote hosts specified CMD: onyx-37vm12 umount -d /tmp/t32/mnt/mdt CMD: onyx-37vm12 umount -d /tmp/t32/mnt/mdt1
Looking at the MDS (vm12) console log, we see
[23682.075589] Lustre: DEBUG MARKER: cat /tmp/t32/list2 [23682.368852] LustreError: 15627:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000 [23682.888492] LustreError: 15627:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000 [23682.889529] LustreError: 15627:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Skipped 7 previous similar messages [23688.214366] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param -n osd*.*.force_sync=1 [23688.389563] LustreError: 15627:0:(mdt_lib.c:961:mdt_attr_valid_xlate()) Unknown attr bits: 0x60000 [23688.557868] Lustre: DEBUG MARKER: test -f /tmp/t32/sha1sums [23688.885389] Lustre: DEBUG MARKER: cat /tmp/t32/sha1sums [23690.182337] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null || [23690.182337] /usr/sbin/lctl lustre_build_version 2>/dev/null || [23690.182337] /usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2 [23690.526084] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param t32fs-MDT0000.mdc.max_rpcs_in_flight='14' [23690.690187] Lustre: Modifying parameter t32fs-MDT0000-mdc.mdc.max_rpcs_in_flight in log t32fs-client [23690.691354] Lustre: Skipped 6 previous similar messages [23691.038484] Lustre: DEBUG MARKER: umount -d /tmp/t32/mnt/mdt [23691.209348] Lustre: Failing over t32fs-MDT0000 [23691.377221] Lustre: server umount t32fs-MDT0000 complete [23691.678261] Lustre: DEBUG MARKER: umount -d /tmp/t32/mnt/mdt1 [23691.844378] LustreError: 11-0: t32fs-MDT0000-lwp-MDT0001: operation mds_disconnect to node 0@lo failed: rc = -107 [23692.002188] LustreError: 11816:0:(osp_dev.c:1277:osp_device_free()) header@ffff89bbb6989360[0x0, 1, [0x240000401:0x1:0x0] hash exist]{ [23692.002188] [23692.003646] LustreError: 11816:0:(osp_dev.c:1277:osp_device_free()) ....mdt@ffff89bbb69893b0mdt-object@ffff89bbb6989360( , writecount=0) [23692.003646] [23692.005001] LustreError: 11816:0:(osp_dev.c:1277:osp_device_free()) ....mdd@ffff89bb9d517870mdd-object@ffff89bb9d517870(open_count=0, valid=0, cltime=0, flags=0) [23692.005001] [23692.006540] LustreError: 11816:0:(osp_dev.c:1277:osp_device_free()) ....lod@ffff89bbe46fd5b0lod-object@ffff89bbe46fd5b0 [23692.006540] [23692.007729] LustreError: 11816:0:(osp_dev.c:1277:osp_device_free()) ....osp@ffff89bbe218f2d0osp-object@ffff89bbe218f280 [23692.007729] [23692.008942] LustreError: 11816:0:(osp_dev.c:1277:osp_device_free()) } header@ffff89bbb6989360 [23692.008942] [23692.009948] LustreError: 11816:0:(osp_dev.c:1277:osp_device_free()) header@ffff89bbb6843510[0x0, 1, [0x280000401:0x1:0x0] hash exist]{ [23692.009948] [23692.011273] LustreError: 11816:0:(osp_dev.c:1277:osp_device_free()) ....mdt@ffff89bbb6843560mdt-object@ffff89bbb6843510( , writecount=0) [23692.011273] [23692.012624] LustreError: 11816:0:(osp_dev.c:1277:osp_device_free()) ....mdd@ffff89bbe47c2e60mdd-object@ffff89bbe47c2e60(open_count=0, valid=0, cltime=0, flags=0) [23692.012624] [23692.014163] LustreError: 11816:0:(osp_dev.c:1277:osp_device_free()) ....lod@ffff89bbf9a19b60lod-object@ffff89bbf9a19b60 [23692.014163] [23692.015351] LustreError: 11816:0:(osp_dev.c:1277:osp_device_free()) ....osd-ldiskfs@ffff89bbf93f8400osd-ldiskfs-object@ffff89bbf93f8400(i:ffff89bbdb2b9a38:25056/936018527)[plain] [23692.015351] [23692.017030] LustreError: 11816:0:(osp_dev.c:1277:osp_device_free()) } header@ffff89bbb6843510 [23692.017030] [23692.025094] LustreError: 11816:0:(lu_object.c:1177:lu_de[ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Initializing cgroup subsys cpu
Looking at the kernel crash log, we can see the call trace
[23691.678261] Lustre: DEBUG MARKER: umount -d /tmp/t32/mnt/mdt1 [23691.844378] LustreError: 11-0: t32fs-MDT0000-lwp-MDT0001: operation mds_disconnect to node 0@lo failed: rc = -107 [23692.002188] LustreError: 11816:0:(osp_dev.c:1277:osp_device_free()) header@ffff89bbb6989360[0x0, 1, [0x240000401:0x1:0x0] hash exist]{ [23692.003646] LustreError: 11816:0:(osp_dev.c:1277:osp_device_free()) ....mdt@ffff89bbb69893b0mdt-object@ffff89bbb6989360( , writecount=0) [23692.005001] LustreError: 11816:0:(osp_dev.c:1277:osp_device_free()) ....mdd@ffff89bb9d517870mdd-object@ffff89bb9d517870(open_count=0, valid=0, cltime=0, flags=0) [23692.006540] LustreError: 11816:0:(osp_dev.c:1277:osp_device_free()) ....lod@ffff89bbe46fd5b0lod-object@ffff89bbe46fd5b0 [23692.007729] LustreError: 11816:0:(osp_dev.c:1277:osp_device_free()) ....osp@ffff89bbe218f2d0osp-object@ffff89bbe218f280 [23692.008942] LustreError: 11816:0:(osp_dev.c:1277:osp_device_free()) } header@ffff89bbb6989360 [23692.009948] LustreError: 11816:0:(osp_dev.c:1277:osp_device_free()) header@ffff89bbb6843510[0x0, 1, [0x280000401:0x1:0x0] hash exist]{ [23692.011273] LustreError: 11816:0:(osp_dev.c:1277:osp_device_free()) ....mdt@ffff89bbb6843560mdt-object@ffff89bbb6843510( , writecount=0) [23692.012624] LustreError: 11816:0:(osp_dev.c:1277:osp_device_free()) ....mdd@ffff89bbe47c2e60mdd-object@ffff89bbe47c2e60(open_count=0, valid=0, cltime=0, flags=0) [23692.014163] LustreError: 11816:0:(osp_dev.c:1277:osp_device_free()) ....lod@ffff89bbf9a19b60lod-object@ffff89bbf9a19b60 [23692.015351] LustreError: 11816:0:(osp_dev.c:1277:osp_device_free()) ....osd-ldiskfs@ffff89bbf93f8400osd-ldiskfs-object@ffff89bbf93f8400(i:ffff89bbdb2b9a38:25056/936018527)[plain] [23692.017030] LustreError: 11816:0:(osp_dev.c:1277:osp_device_free()) } header@ffff89bbb6843510 [23692.025094] LustreError: 11816:0:(lu_object.c:1177:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1 [23692.025214] LustreError: 17697:0:(mdt_handler.c:4808:mdt_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: [23692.025215] LustreError: 17697:0:(mdt_handler.c:4808:mdt_fini()) LBUG [23692.025217] Pid: 17697, comm: umount 3.10.0-862.9.1.el7_lustre.x86_64 #1 SMP Mon Aug 27 17:48:12 UTC 2018 [23692.025217] Call Trace: [23692.025249] [<ffffffffc09287cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [23692.025254] [<ffffffffc092887c>] lbug_with_loc+0x4c/0xa0 [libcfs] [23692.025266] [<ffffffffc13d2872>] mdt_device_fini+0x8f2/0x930 [mdt] [23692.025302] [<ffffffffc0afc4b7>] class_cleanup+0x987/0xce0 [obdclass] [23692.025318] [<ffffffffc0afe83f>] class_process_config+0x19bf/0x2420 [obdclass] [23692.025332] [<ffffffffc0aff466>] class_manual_cleanup+0x1c6/0x710 [obdclass] [23692.025353] [<ffffffffc0b2c6de>] server_put_super+0x8de/0xcd0 [obdclass] [23692.025376] [<ffffffffa9c1debd>] generic_shutdown_super+0x6d/0x100 [23692.025378] [<ffffffffa9c1e2a2>] kill_anon_super+0x12/0x20 [23692.025393] [<ffffffffc0b01eb2>] lustre_kill_super+0x32/0x50 [obdclass] [23692.025395] [<ffffffffa9c1e65e>] deactivate_locked_super+0x4e/0x70 [23692.025397] [<ffffffffa9c1ede6>] deactivate_super+0x46/0x60 [23692.025402] [<ffffffffa9c3cd4f>] cleanup_mnt+0x3f/0x80 [23692.025404] [<ffffffffa9c3cde2>] __cleanup_mnt+0x12/0x20 [23692.025417] [<ffffffffa9ab803b>] task_work_run+0xbb/0xe0 [23692.025426] [<ffffffffa9a2ac55>] do_notify_resume+0xa5/0xc0 [23692.025440] [<ffffffffaa120ad8>] int_signal+0x12/0x17 [23692.025460] [<ffffffffffffffff>] 0xffffffffffffffff [23692.025463] Kernel panic - not syncing: LBUG [23692.025469] CPU: 1 PID: 17697 Comm: umount Kdump: loaded Tainted: G OE ------------ 3.10.0-862.9.1.el7_lustre.x86_64 #1 [23692.025469] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [23692.025470] Call Trace: [23692.025478] [<ffffffffaa10e84e>] dump_stack+0x19/0x1b [23692.025480] [<ffffffffaa108b50>] panic+0xe8/0x21f [23692.025486] [<ffffffffc09288cb>] lbug_with_loc+0x9b/0xa0 [libcfs] [23692.025493] [<ffffffffc13d2872>] mdt_device_fini+0x8f2/0x930 [mdt] [23692.025508] [<ffffffffc0afc4b7>] class_cleanup+0x987/0xce0 [obdclass] [23692.025522] [<ffffffffc0afe83f>] class_process_config+0x19bf/0x2420 [obdclass] [23692.025529] [<ffffffffc0933bd7>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [23692.025543] [<ffffffffc0aff466>] class_manual_cleanup+0x1c6/0x710 [obdclass] [23692.025558] [<ffffffffc0b2c6de>] server_put_super+0x8de/0xcd0 [obdclass] [23692.025561] [<ffffffffa9c1debd>] generic_shutdown_super+0x6d/0x100 [23692.025563] [<ffffffffa9c1e2a2>] kill_anon_super+0x12/0x20 [23692.025577] [<ffffffffc0b01eb2>] lustre_kill_super+0x32/0x50 [obdclass] [23692.025578] [<ffffffffa9c1e65e>] deactivate_locked_super+0x4e/0x70 [23692.025580] [<ffffffffa9c1ede6>] deactivate_super+0x46/0x60 [23692.025581] [<ffffffffa9c3cd4f>] cleanup_mnt+0x3f/0x80 [23692.025583] [<ffffffffa9c3cde2>] __cleanup_mnt+0x12/0x20 [23692.025584] [<ffffffffa9ab803b>] task_work_run+0xbb/0xe0 [23692.025586] [<ffffffffa9a2ac55>] do_notify_resume+0xa5/0xc0 [23692.025588] [<ffffffffaa120ad8>] int_signal+0x12/0x17
We’ve seen conf-sanity test 32b crash for 2.11.0 servers at https://testing.whamcloud.com/test_sets/fd4e6636-d44e-11e8-9238-52540065bddc
Attachments
Issue Links
- is related to
-
LU-11558 t32_verify_quota() not being called by t32_test()
- Resolved
- is related to
-
LU-7872 conf-sanity: test_50i 'test failed to respond and timed out'
- Open
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...