[LU-3316] ASSERTION(list_empty(&ls->ls_los_list)) failure on test suite sanity-quota / test_7c Created: 10/May/13 Updated: 09/Dec/13 Resolved: 16/Sep/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.1 |
| Fix Version/s: | Lustre 2.5.0, Lustre 2.4.2 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Mikhail Pershin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 8206 | ||||||||
| Description |
|
This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com> This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/570b8f14-b713-11e2-bd0f-52540035b04c. The sub-test test_7c failed with the following error:
Info required for matching: sanity-quota 7c Console log from mds: 02:42:32:Lustre: DEBUG MARKER: == sanity-quota test 7c: Quota reintegration (restart mds during reintegration) == 02:41:58 (1367919718) 02:42:32:Lustre: DEBUG MARKER: lctl get_param -n osc.*MDT*.sync_* 02:42:32:Lustre: DEBUG MARKER: lctl set_param fail_val=0 02:42:32:Lustre: DEBUG MARKER: lctl set_param fail_loc=0 02:42:32:Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.ost=none 02:42:32:Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.ost=ug 02:42:32:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts 02:42:32:Lustre: DEBUG MARKER: umount -d /mnt/mds1 02:42:32:Lustre: Failing over lustre-MDT0000 02:42:32:Lustre: Skipped 1 previous similar message 02:42:32:LustreError: 21170:0:(local_storage.c:184:ls_device_put()) ASSERTION( list_empty(&ls->ls_los_list) ) failed: 02:42:32:LustreError: 21170:0:(local_storage.c:184:ls_device_put()) LBUG 02:42:32:Pid: 21170, comm: umount 02:42:32: 02:42:32:Call Trace: 02:42:32: [<ffffffffa0590895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 02:42:32: [<ffffffffa0590e97>] lbug_with_loc+0x47/0xb0 [libcfs] 02:42:32: [<ffffffffa06e4859>] ls_device_put+0x1a9/0x1e0 [obdclass] 02:42:32: [<ffffffffa06dc6a5>] llog_osd_cleanup+0xc5/0x140 [obdclass] 02:42:32: [<ffffffffa06b772a>] __llog_ctxt_put+0xca/0x140 [obdclass] 02:42:32: [<ffffffffa06b7854>] llog_cleanup+0xb4/0x440 [obdclass] 02:42:32: [<ffffffffa06d0f31>] ? lprocfs_remove+0x31/0x40 [obdclass] 02:42:32: [<ffffffffa06d13ed>] ? lprocfs_obd_cleanup+0x5d/0xb0 [obdclass] 02:42:32: [<ffffffffa0cd7ad5>] mgs_device_fini+0x1c5/0x5a0 [mgs] 02:42:32: [<ffffffffa06f1907>] class_cleanup+0x577/0xda0 [obdclass] 02:42:32: [<ffffffffa06c6ac6>] ? class_name2dev+0x56/0xe0 [obdclass] 02:42:32: [<ffffffffa06f31ec>] class_process_config+0x10bc/0x1c80 [obdclass] 02:42:32: [<ffffffffa06eca13>] ? lustre_cfg_new+0x353/0x7e0 [obdclass] 02:42:32: [<ffffffffa06f3f29>] class_manual_cleanup+0x179/0x6f0 [obdclass] 02:42:32: [<ffffffffa06c6ac6>] ? class_name2dev+0x56/0xe0 [obdclass] 02:42:32: [<ffffffffa072961d>] server_put_super+0x46d/0xf00 [obdclass] 02:42:32: [<ffffffff8118334b>] generic_shutdown_super+0x5b/0xe0 02:42:32: [<ffffffff81183436>] kill_anon_super+0x16/0x60 02:42:32: [<ffffffffa06f5d86>] lustre_kill_super+0x36/0x60 [obdclass] 02:42:32: [<ffffffff81183bd7>] deactivate_super+0x57/0x80 02:42:32: [<ffffffff811a1c4f>] mntput_no_expire+0xbf/0x110 02:42:32: [<ffffffff811a26bb>] sys_umount+0x7b/0x3a0 02:42:32: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b |
| Comments |
| Comment by Nathaniel Clark [ 10/May/13 ] |
|
I can't find another failure quite like this one but there are others on the same test with a different ASSERTION crash: 16:28:34:Lustre: DEBUG MARKER: == sanity-quota test 7c: Quota reintegration (restart mds during reintegration) == 16:27:54 (1364858874) 16:28:34:Lustre: DEBUG MARKER: lctl get_param -n osc.*MDT*.sync_* 16:28:34:Lustre: DEBUG MARKER: lctl set_param fail_val=0 16:28:34:Lustre: DEBUG MARKER: lctl set_param fail_loc=0 16:28:34:Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.ost=none 16:28:34:Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.ost=ug 16:28:34:Lustre: DEBUG MARKER: grep -c /mnt/mds1' ' /proc/mounts 16:28:34:Lustre: DEBUG MARKER: umount -d /mnt/mds1 16:28:34:Lustre: Failing over lustre-MDT0000 16:28:34:LustreError: 3036:0:(lod_dev.c:813:lod_device_free()) ASSERTION( atomic_read(&lu->ld_ref) == 0 ) failed: 16:28:34:LustreError: 3036:0:(lod_dev.c:813:lod_device_free()) LBUG 16:28:34:Pid: 3036, comm: obd_zombid 16:28:34: 16:28:34:Call Trace: 16:28:34: [<ffffffffa05bd895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 16:28:34: [<ffffffffa05bde97>] lbug_with_loc+0x47/0xb0 [libcfs] 16:28:34: [<ffffffffa0e434bb>] lod_device_free+0x1eb/0x220 [lod] 16:28:34: [<ffffffffa0725e4d>] class_decref+0x46d/0x580 [obdclass] 16:28:34: [<ffffffffa0703399>] obd_zombie_impexp_cull+0x309/0x5d0 [obdclass] 16:28:34: [<ffffffffa0703725>] obd_zombie_impexp_thread+0xc5/0x1c0 [obdclass] 16:28:34: [<ffffffff8105fa40>] ? default_wake_function+0x0/0x20 16:28:34: [<ffffffffa0703660>] ? obd_zombie_impexp_thread+0x0/0x1c0 [obdclass] 16:28:34: [<ffffffff8100c0ca>] child_rip+0xa/0x20 16:28:34: [<ffffffffa0703660>] ? obd_zombie_impexp_thread+0x0/0x1c0 [obdclass] 16:28:34: [<ffffffffa0703660>] ? obd_zombie_impexp_thread+0x0/0x1c0 [obdclass] 16:28:34: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 https://maloo.whamcloud.com/test_sets/4518d960-9d2d-11e2-a280-52540035b04c |
| Comment by Jodi Levi (Inactive) [ 10/May/13 ] |
|
Mike, |
| Comment by Mikhail Pershin [ 14/May/13 ] |
|
http://review.whamcloud.com/#change,6334 The ls_device_put() might be called wrongly if local_oid_storage struct is not removed due to race. As for second call traces in comment #1, it doesn't look related. |
| Comment by Andreas Dilger [ 08/Jul/13 ] |
|
Recent failure: |
| Comment by Bruno Faccini (Inactive) [ 04/Aug/13 ] |
|
+1 at https://maloo.whamcloud.com/test_sets/715bb308-fc05-11e2-9222-52540035b04c |
| Comment by Bob Glossman (Inactive) [ 14/Aug/13 ] |
|
another: https://maloo.whamcloud.com/test_sets/c52a6b34-04e6-11e3-b035-52540035b04c This test set was on o2ib, not tcp. I wonder if that is significant. |
| Comment by Mikhail Pershin [ 16/Sep/13 ] |
|
patch was landed |
| Comment by Jian Yu [ 03/Dec/13 ] |
|
Lustre Build: http://build.whamcloud.com/job/lustre-b2_4/59/ sanity-scrub test 0 also hit this failure: Just back-ported the patch to Lustre b2_4 branch: http://review.whamcloud.com/8461 |