[LU-10940] sanity test_802: set mdt quota type failed Created: 23/Apr/18 Updated: 10/Aug/18 Resolved: 10/Aug/18 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0 |
| Fix Version/s: | Lustre 2.12.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | Hongchao Zhang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
This issue was created by maloo for sarah_lw <wei3.liu@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/d3a87ae4-471b-11e8-95c0-52540065bddc test_802 failed with the following error: set mdt quota type failed This failure seems start showing on 2.11.50.51, b3738 on April 9, 2018 test log CMD: trevis-4vm4 lctl get_param -n timeout Using TIMEOUT=20 CMD: trevis-4vm4 lctl dl | grep ' IN osc ' 2>/dev/null | wc -l CMD: trevis-4vm1.trevis.hpdd.intel.com lctl dl | grep ' IN osc ' 2>/dev/null | wc -l error: get_param: param_path 'mdc/*/connect_flags': No such file or directory jobstats not supported by server enable quota as required CMD: trevis-4vm4 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-MDT0000.quota_slave.enabled CMD: trevis-4vm3 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0000.quota_slave.enabled [HOST:trevis-4vm1.trevis.hpdd.intel.com] [old_mdt_qtype:ug] [old_ost_qtype:ug] [new_qtype:ug3] CMD: trevis-4vm4 /usr/sbin/lctl conf_param lustre.quota.mdt=ug3 trevis-4vm4: error: conf_param: Read-only file system sanity test_802: @@@@@@ FAIL: set mdt quota type failed Trace dump: MDS dmesg
[ 7400.522030] Lustre: DEBUG MARKER: SKIP: sanity test_801c
[ 7400.803247] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity test 802: simulate readonly device ========================================================= 23:28:21 \(1524439701\)
[ 7400.993579] Lustre: DEBUG MARKER: == sanity test 802: simulate readonly device ========================================================= 23:28:21 (1524439701)
[ 7401.164206] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null ||
/usr/sbin/lctl lustre_build_version 2>/dev/null ||
/usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2
[ 7401.912727] Lustre: DEBUG MARKER: lctl set_param -n os[cd]*.*MDT*.force_sync=1
[ 7405.691947] Lustre: DEBUG MARKER: lctl set_param -n os[cd]*.*MDT*.force_sync=1
[ 7407.328094] Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true
[ 7407.639629] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1
[ 7412.735986] Lustre: lustre-MDT0000: Not available for connect from 10.9.4.31@tcp (stopping)
[ 7412.738161] Lustre: Skipped 3 previous similar messages
[ 7417.729349] LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 10.9.4.31@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
[ 7417.733806] LustreError: Skipped 15 previous similar messages
[ 7419.993052] Lustre: 7085:0:(client.c:2099:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1524439714/real 1524439714] req@ffff880061347900 x1598483208664528/t0(0) o251->MGC10.9.4.32@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1524439720 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
[ 7420.027093] Lustre: server umount lustre-MDT0000 complete
[ 7420.199653] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
lctl dl | grep ' ST ' || true
[ 7420.520293] Lustre: DEBUG MARKER: modprobe dm-flakey;
dmsetup targets | grep -q flakey
[ 7432.809742] Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre-mds1' ' /proc/mounts);
mpts=$(mount | grep -c /mnt/lustre-mds1' ');
if [ $running -ne $mpts ]; then
echo $(hostname) env are INSANE!;
exit 1;
fi
[ 7433.175430] Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre-mds1' ' /proc/mounts);
mpts=$(mount | grep -c /mnt/lustre-mds1' ');
if [ $running -ne $mpts ]; then
echo $(hostname) env are INSANE!;
exit 1;
fi
[ 7434.341465] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1
[ 7434.650345] Lustre: DEBUG MARKER: modprobe dm-flakey;
dmsetup targets | grep -q flakey
[ 7434.951416] Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1
[ 7435.251720] Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey 2>&1
[ 7435.550827] Lustre: DEBUG MARKER: test -b /dev/mapper/mds1_flakey
[ 7435.846295] Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey
[ 7436.141605] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1; mount -t lustre -o rdonly_dev /dev/mapper/mds1_flakey /mnt/lustre-mds1
[ 7436.313923] LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
[ 7436.317692] Turning device dm-3 (0xfc00003) read-only
[ 7436.319501] Lustre: lustre-MDT0000-osd: set dev_rdonly on this device
[ 7436.395144] Lustre: lustre-MDT0000: Imperative Recovery not enabled, recovery window 60-180
[ 7436.566211] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
[ 7436.878021] Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
[ 7437.467981] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-4vm4.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7437.468228] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-4vm4.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7437.655837] Lustre: DEBUG MARKER: trevis-4vm4.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7437.655862] Lustre: DEBUG MARKER: trevis-4vm4.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7437.827912] Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1
[ 7438.131105] Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null | grep -E ':[a-zA-Z]\{3}[0-9]\{4}'
[ 7438.434939] Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null | grep -E ':[a-zA-Z]\{3}[0-9]\{4}'
[ 7438.762419] Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null
[ 7439.087312] Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1
[ 7442.556447] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7442.755098] Lustre: DEBUG MARKER: trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7447.098744] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7447.288273] Lustre: DEBUG MARKER: trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7451.644719] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7451.830105] Lustre: DEBUG MARKER: trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7456.196503] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7456.388152] Lustre: DEBUG MARKER: trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7460.757075] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7460.957519] Lustre: DEBUG MARKER: trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7465.345849] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7465.538568] Lustre: DEBUG MARKER: trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7469.992735] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7470.193712] Lustre: DEBUG MARKER: trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7471.395975] Lustre: DEBUG MARKER: lctl get_param -n timeout
[ 7471.796323] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Using TIMEOUT=20
[ 7471.989513] Lustre: DEBUG MARKER: Using TIMEOUT=20
[ 7472.150495] Lustre: DEBUG MARKER: lctl dl | grep ' IN osc ' 2>/dev/null | wc -l
[ 7472.495736] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-MDT0000.quota_slave.enabled
[ 7473.143673] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.mdt=ug3
[ 7473.296428] LustreError: 9740:0:(osd_handler.c:1689:osd_trans_create()) lustre-MDT0000: someone try to start transaction under readonly mode, should be disabled.
[ 7473.302031] CPU: 0 PID: 9740 Comm: llog_process_th Tainted: G OE ------------ 3.10.0-693.21.1.el7_lustre.x86_64 #1
[ 7473.307279] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[ 7473.309907] Call Trace:
[ 7473.312287] [<ffffffff816ae7c8>] dump_stack+0x19/0x1b
[ 7473.314849] [<ffffffffc0d2ea9c>] osd_trans_create+0x5cc/0x610 [osd_ldiskfs]
[ 7473.317607] [<ffffffffc0877c71>] llog_write+0x91/0x3d0 [obdclass]
[ 7473.320207] [<ffffffffc0db012a>] mgs_modify_handler+0x36a/0x440 [mgs]
[ 7473.322805] [<ffffffffc08759c9>] llog_process_thread+0x839/0x1560 [obdclass]
[ 7473.325492] [<ffffffffc089fc19>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
[ 7473.328172] [<ffffffffc08770ff>] llog_process_thread_daemonize+0x9f/0xe0 [obdclass]
[ 7473.330884] [<ffffffffc0877060>] ? llog_backup+0x500/0x500 [obdclass]
[ 7473.333483] [<ffffffff810b4031>] kthread+0xd1/0xe0
[ 7473.335897] [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
[ 7473.338389] [<ffffffff816c0577>] ret_from_fork+0x77/0xb0
[ 7473.340792] [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
[ 7473.343238] LustreError: 9739:0:(mgs_llog.c:954:mgs_modify()) MGS: modify lustre/quota.mdt failed: rc = -30
[ 7473.345910] LustreError: 9739:0:(mgs_llog.c:1940:mgs_write_log_direct_all()) MGS: Can't modify llog lustre-MDT0000: rc = -30
[ 7473.348694] CPU: 1 PID: 9739 Comm: lctl Tainted: G OE ------------ 3.10.0-693.21.1.el7_lustre.x86_64 #1
[ 7473.351406] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[ 7473.353790] Call Trace:
[ 7473.355798] [<ffffffff816ae7c8>] dump_stack+0x19/0x1b
[ 7473.358052] [<ffffffffc0d2ea9c>] osd_trans_create+0x5cc/0x610 [osd_ldiskfs]
[ 7473.360387] [<ffffffffc0877c71>] llog_write+0x91/0x3d0 [obdclass]
[ 7473.362665] [<ffffffffc0dad80e>] record_marker+0x15e/0x2b0 [mgs]
[ 7473.364843] [<ffffffffc0dae9f2>] mgs_write_log_direct+0xe2/0x2d0 [mgs]
[ 7473.367092] [<ffffffffc0dbd6cb>] mgs_write_log_direct_all+0x38b/0x640 [mgs]
[ 7473.369279] [<ffffffffc0dd06ea>] mgs_write_log_quota+0x2d7/0x31d [mgs]
[ 7473.371448] [<ffffffffc0dbe4bb>] mgs_write_log_param+0x5ab/0x1e30 [mgs]
[ 7473.373529] [<ffffffffc0dbfd87>] ? mgs_find_fsdb+0x47/0x70 [mgs]
[ 7473.375591] [<ffffffffc0dc2677>] ? mgs_find_or_make_fsdb+0x67/0x1c0 [mgs]
[ 7473.377614] [<ffffffffc0dc6d6c>] mgs_set_param+0xabc/0xd40 [mgs]
[ 7473.379604] [<ffffffffc0dac23a>] mgs_iocontrol+0xd2a/0xde0 [mgs]
[ 7473.381507] [<ffffffffc088aae3>] class_handle_ioctl+0x18d3/0x1de0 [obdclass]
[ 7473.383517] [<ffffffff811b1f16>] ? do_read_fault.isra.44+0xe6/0x130
[ 7473.385376] [<ffffffff812b72be>] ? security_capable+0x1e/0x20
[ 7473.387227] [<ffffffffc086f802>] obd_class_ioctl+0xd2/0x170 [obdclass]
[ 7473.389074] [<ffffffff81219e90>] do_vfs_ioctl+0x350/0x560
[ 7473.390832] [<ffffffff816bb521>] ? __do_page_fault+0x171/0x450
[ 7473.392525] [<ffffffff8121a141>] SyS_ioctl+0xa1/0xc0
[ 7473.394199] [<ffffffff816c0655>] ? system_call_after_swapgs+0xa2/0x146
[ 7473.395942] [<ffffffff816c0715>] system_call_fastpath+0x1c/0x21
[ 7473.397679] [<ffffffff816c0661>] ? system_call_after_swapgs+0xae/0x146
[ 7473.399422] LustreError: 9739:0:(mgs_llog.c:1948:mgs_write_log_direct_all()) MGS: writing log lustre-MDT0000: rc = -30
[ 7473.401661] CPU: 0 PID: 9741 Comm: llog_process_th Tainted: G OE ------------ 3.10.0-693.21.1.el7_lustre.x86_64 #1
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |
| Comments |
| Comment by Peter Jones [ 02/May/18 ] |
|
Hongchao Could you please investigate? Thanks Peter |
| Comment by Andreas Dilger [ 02/May/18 ] |
It probably makes sense to see which quota-related patches landed just before then. |
| Comment by James Nunez (Inactive) [ 28/Jul/18 ] |
|
sanity test 802 fails when we run the ‘full’ test group, but passes for all other testing; review-ldiskfs, review-dne, etc. On difference between full and all other testing is, for ‘full’ testing, we enable quotas for all test suites and for all other testing, we don’t enable quotas. For sanity test 802, we stop all servers and then mount the servers as read only. When we bring up the server, in read only mode, we try and reset quotas in setup_quota() and the following call to conf_param on the mgs fails 2119 do_facet mgs $LCTL conf_param $FSNAME.quota.mdt=$QUOTA_TYPE || 2120 error "set mdt quota type failed" 2121 do_facet mgs $LCTL conf_param $FSNAME.quota.ost=$QUOTA_TYPE || 2122 error "set ost quota type failed" One question is, is Lustre behaving properly and not allowing calls to conf_param when a server is read-only or, more specifically, should we be able to set quotas by calling conf_param on a read only server? |
| Comment by Andreas Dilger [ 28/Jul/18 ] |
|
Since enabling quota requires changes to the filesystem on the targets, it doesn't make sense to enable it on a read-only filesystem. |
| Comment by Gerrit Updater [ 30/Jul/18 ] |
|
James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32900 |
| Comment by Gerrit Updater [ 09/Aug/18 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32900/ |
| Comment by James Nunez (Inactive) [ 10/Aug/18 ] |
|
Patch landed to master |