[LU-10940] sanity test_802: set mdt quota type failed Created: 23/Apr/18  Updated: 10/Aug/18  Resolved: 10/Aug/18

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: Lustre 2.12.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Hongchao Zhang
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/d3a87ae4-471b-11e8-95c0-52540065bddc

test_802 failed with the following error:

set mdt quota type failed

This failure seems start showing on 2.11.50.51, b3738 on April 9, 2018

test log

CMD: trevis-4vm4 lctl get_param -n timeout
Using TIMEOUT=20
CMD: trevis-4vm4 lctl dl | grep ' IN osc ' 2>/dev/null | wc -l
CMD: trevis-4vm1.trevis.hpdd.intel.com lctl dl | grep ' IN osc ' 2>/dev/null | wc -l
error: get_param: param_path 'mdc/*/connect_flags': No such file or directory
jobstats not supported by server
enable quota as required
CMD: trevis-4vm4 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-MDT0000.quota_slave.enabled
CMD: trevis-4vm3 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0000.quota_slave.enabled
[HOST:trevis-4vm1.trevis.hpdd.intel.com] [old_mdt_qtype:ug] [old_ost_qtype:ug] [new_qtype:ug3]
CMD: trevis-4vm4 /usr/sbin/lctl conf_param lustre.quota.mdt=ug3
trevis-4vm4: error: conf_param: Read-only file system
 sanity test_802: @@@@@@ FAIL: set mdt quota type failed 
 Trace dump:

MDS dmesg

[ 7400.522030] Lustre: DEBUG MARKER: SKIP: sanity test_801c
[ 7400.803247] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity test 802: simulate readonly device ========================================================= 23:28:21 \(1524439701\)
[ 7400.993579] Lustre: DEBUG MARKER: == sanity test 802: simulate readonly device ========================================================= 23:28:21 (1524439701)
[ 7401.164206] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n version 2>/dev/null ||
 /usr/sbin/lctl lustre_build_version 2>/dev/null ||
 /usr/sbin/lctl --version 2>/dev/null | cut -d' ' -f2
[ 7401.912727] Lustre: DEBUG MARKER: lctl set_param -n os[cd]*.*MDT*.force_sync=1
[ 7405.691947] Lustre: DEBUG MARKER: lctl set_param -n os[cd]*.*MDT*.force_sync=1
[ 7407.328094] Lustre: DEBUG MARKER: grep -c /mnt/lustre-mds1' ' /proc/mounts || true
[ 7407.639629] Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds1
[ 7412.735986] Lustre: lustre-MDT0000: Not available for connect from 10.9.4.31@tcp (stopping)
[ 7412.738161] Lustre: Skipped 3 previous similar messages
[ 7417.729349] LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 10.9.4.31@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
[ 7417.733806] LustreError: Skipped 15 previous similar messages
[ 7419.993052] Lustre: 7085:0:(client.c:2099:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1524439714/real 1524439714] req@ffff880061347900 x1598483208664528/t0(0) o251->MGC10.9.4.32@tcp@0@lo:26/25 lens 224/224 e 0 to 1 dl 1524439720 ref 2 fl Rpc:XN/0/ffffffff rc 0/-1
[ 7420.027093] Lustre: server umount lustre-MDT0000 complete
[ 7420.199653] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
lctl dl | grep ' ST ' || true
[ 7420.520293] Lustre: DEBUG MARKER: modprobe dm-flakey;
 dmsetup targets | grep -q flakey
[ 7432.809742] Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre-mds1' ' /proc/mounts);
mpts=$(mount | grep -c /mnt/lustre-mds1' ');
if [ $running -ne $mpts ]; then
 echo $(hostname) env are INSANE!;
 exit 1;
fi
[ 7433.175430] Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre-mds1' ' /proc/mounts);
mpts=$(mount | grep -c /mnt/lustre-mds1' ');
if [ $running -ne $mpts ]; then
 echo $(hostname) env are INSANE!;
 exit 1;
fi
[ 7434.341465] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1
[ 7434.650345] Lustre: DEBUG MARKER: modprobe dm-flakey;
 dmsetup targets | grep -q flakey
[ 7434.951416] Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1
[ 7435.251720] Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey 2>&1
[ 7435.550827] Lustre: DEBUG MARKER: test -b /dev/mapper/mds1_flakey
[ 7435.846295] Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey
[ 7436.141605] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1; mount -t lustre -o rdonly_dev /dev/mapper/mds1_flakey /mnt/lustre-mds1
[ 7436.313923] LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
[ 7436.317692] Turning device dm-3 (0xfc00003) read-only
[ 7436.319501] Lustre: lustre-MDT0000-osd: set dev_rdonly on this device
[ 7436.395144] Lustre: lustre-MDT0000: Imperative Recovery not enabled, recovery window 60-180
[ 7436.566211] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n health_check
[ 7436.878021] Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
[ 7437.467981] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-4vm4.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7437.468228] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-4vm4.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7437.655837] Lustre: DEBUG MARKER: trevis-4vm4.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7437.655862] Lustre: DEBUG MARKER: trevis-4vm4.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7437.827912] Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1
[ 7438.131105] Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null | grep -E ':[a-zA-Z]\{3}[0-9]\{4}'
[ 7438.434939] Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null | grep -E ':[a-zA-Z]\{3}[0-9]\{4}'
[ 7438.762419] Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey 2>/dev/null
[ 7439.087312] Lustre: DEBUG MARKER: lctl set_param -n mdt.lustre*.enable_remote_dir=1
[ 7442.556447] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7442.755098] Lustre: DEBUG MARKER: trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7447.098744] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7447.288273] Lustre: DEBUG MARKER: trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7451.644719] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7451.830105] Lustre: DEBUG MARKER: trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7456.196503] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7456.388152] Lustre: DEBUG MARKER: trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7460.757075] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7460.957519] Lustre: DEBUG MARKER: trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7465.345849] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7465.538568] Lustre: DEBUG MARKER: trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7469.992735] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7470.193712] Lustre: DEBUG MARKER: trevis-4vm3.trevis.hpdd.intel.com: executing set_default_debug vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck all 4
[ 7471.395975] Lustre: DEBUG MARKER: lctl get_param -n timeout
[ 7471.796323] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Using TIMEOUT=20
[ 7471.989513] Lustre: DEBUG MARKER: Using TIMEOUT=20
[ 7472.150495] Lustre: DEBUG MARKER: lctl dl | grep ' IN osc ' 2>/dev/null | wc -l
[ 7472.495736] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-MDT0000.quota_slave.enabled
[ 7473.143673] Lustre: DEBUG MARKER: /usr/sbin/lctl conf_param lustre.quota.mdt=ug3
[ 7473.296428] LustreError: 9740:0:(osd_handler.c:1689:osd_trans_create()) lustre-MDT0000: someone try to start transaction under readonly mode, should be disabled.
[ 7473.302031] CPU: 0 PID: 9740 Comm: llog_process_th Tainted: G OE ------------ 3.10.0-693.21.1.el7_lustre.x86_64 #1
[ 7473.307279] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[ 7473.309907] Call Trace:
[ 7473.312287] [<ffffffff816ae7c8>] dump_stack+0x19/0x1b
[ 7473.314849] [<ffffffffc0d2ea9c>] osd_trans_create+0x5cc/0x610 [osd_ldiskfs]
[ 7473.317607] [<ffffffffc0877c71>] llog_write+0x91/0x3d0 [obdclass]
[ 7473.320207] [<ffffffffc0db012a>] mgs_modify_handler+0x36a/0x440 [mgs]
[ 7473.322805] [<ffffffffc08759c9>] llog_process_thread+0x839/0x1560 [obdclass]
[ 7473.325492] [<ffffffffc089fc19>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
[ 7473.328172] [<ffffffffc08770ff>] llog_process_thread_daemonize+0x9f/0xe0 [obdclass]
[ 7473.330884] [<ffffffffc0877060>] ? llog_backup+0x500/0x500 [obdclass]
[ 7473.333483] [<ffffffff810b4031>] kthread+0xd1/0xe0
[ 7473.335897] [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
[ 7473.338389] [<ffffffff816c0577>] ret_from_fork+0x77/0xb0
[ 7473.340792] [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
[ 7473.343238] LustreError: 9739:0:(mgs_llog.c:954:mgs_modify()) MGS: modify lustre/quota.mdt failed: rc = -30
[ 7473.345910] LustreError: 9739:0:(mgs_llog.c:1940:mgs_write_log_direct_all()) MGS: Can't modify llog lustre-MDT0000: rc = -30
[ 7473.348694] CPU: 1 PID: 9739 Comm: lctl Tainted: G OE ------------ 3.10.0-693.21.1.el7_lustre.x86_64 #1
[ 7473.351406] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[ 7473.353790] Call Trace:
[ 7473.355798] [<ffffffff816ae7c8>] dump_stack+0x19/0x1b
[ 7473.358052] [<ffffffffc0d2ea9c>] osd_trans_create+0x5cc/0x610 [osd_ldiskfs]
[ 7473.360387] [<ffffffffc0877c71>] llog_write+0x91/0x3d0 [obdclass]
[ 7473.362665] [<ffffffffc0dad80e>] record_marker+0x15e/0x2b0 [mgs]
[ 7473.364843] [<ffffffffc0dae9f2>] mgs_write_log_direct+0xe2/0x2d0 [mgs]
[ 7473.367092] [<ffffffffc0dbd6cb>] mgs_write_log_direct_all+0x38b/0x640 [mgs]
[ 7473.369279] [<ffffffffc0dd06ea>] mgs_write_log_quota+0x2d7/0x31d [mgs]
[ 7473.371448] [<ffffffffc0dbe4bb>] mgs_write_log_param+0x5ab/0x1e30 [mgs]
[ 7473.373529] [<ffffffffc0dbfd87>] ? mgs_find_fsdb+0x47/0x70 [mgs]
[ 7473.375591] [<ffffffffc0dc2677>] ? mgs_find_or_make_fsdb+0x67/0x1c0 [mgs]
[ 7473.377614] [<ffffffffc0dc6d6c>] mgs_set_param+0xabc/0xd40 [mgs]
[ 7473.379604] [<ffffffffc0dac23a>] mgs_iocontrol+0xd2a/0xde0 [mgs]
[ 7473.381507] [<ffffffffc088aae3>] class_handle_ioctl+0x18d3/0x1de0 [obdclass]
[ 7473.383517] [<ffffffff811b1f16>] ? do_read_fault.isra.44+0xe6/0x130
[ 7473.385376] [<ffffffff812b72be>] ? security_capable+0x1e/0x20
[ 7473.387227] [<ffffffffc086f802>] obd_class_ioctl+0xd2/0x170 [obdclass]
[ 7473.389074] [<ffffffff81219e90>] do_vfs_ioctl+0x350/0x560
[ 7473.390832] [<ffffffff816bb521>] ? __do_page_fault+0x171/0x450
[ 7473.392525] [<ffffffff8121a141>] SyS_ioctl+0xa1/0xc0
[ 7473.394199] [<ffffffff816c0655>] ? system_call_after_swapgs+0xa2/0x146
[ 7473.395942] [<ffffffff816c0715>] system_call_fastpath+0x1c/0x21
[ 7473.397679] [<ffffffff816c0661>] ? system_call_after_swapgs+0xae/0x146
[ 7473.399422] LustreError: 9739:0:(mgs_llog.c:1948:mgs_write_log_direct_all()) MGS: writing log lustre-MDT0000: rc = -30
[ 7473.401661] CPU: 0 PID: 9741 Comm: llog_process_th Tainted: G OE ------------ 3.10.0-693.21.1.el7_lustre.x86_64 #1

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_802 - set mdt quota type failed



 Comments   
Comment by Peter Jones [ 02/May/18 ]

Hongchao

Could you please investigate?

Thanks

Peter

Comment by Andreas Dilger [ 02/May/18 ]

This failure seems start showing on 2.11.50.51, b3738 on April 9, 2018

It probably makes sense to see which quota-related patches landed just before then.

Comment by James Nunez (Inactive) [ 28/Jul/18 ]

sanity test 802 fails when we run the ‘full’ test group, but passes for all other testing; review-ldiskfs, review-dne, etc. On difference between full and all other testing is, for ‘full’ testing, we enable quotas for all test suites and for all other testing, we don’t enable quotas.

For sanity test 802, we stop all servers and then mount the servers as read only. When we bring up the server, in read only mode, we try and reset quotas in setup_quota() and the following call to conf_param on the mgs fails

2119         do_facet mgs $LCTL conf_param $FSNAME.quota.mdt=$QUOTA_TYPE ||
2120                 error "set mdt quota type failed"
2121         do_facet mgs $LCTL conf_param $FSNAME.quota.ost=$QUOTA_TYPE ||
2122                 error "set ost quota type failed"

One question is, is Lustre behaving properly and not allowing calls to conf_param when a server is read-only or, more specifically, should we be able to set quotas by calling conf_param on a read only server?

Comment by Andreas Dilger [ 28/Jul/18 ]

Since enabling quota requires changes to the filesystem on the targets, it doesn't make sense to enable it on a read-only filesystem.

Comment by Gerrit Updater [ 30/Jul/18 ]

James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32900
Subject: LU-10940 tests: skip sanity test 802 when quota enabled
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: acf177a62c538cf8517a697fe57a20f340de5538

Comment by Gerrit Updater [ 09/Aug/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32900/
Subject: LU-10940 tests: skip sanity test 802 when quota enabled
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ddb3d0b61ded0b9507baa25de08a2d51af17b284

Comment by James Nunez (Inactive) [ 10/Aug/18 ]

Patch landed to master

Generated at Sat Feb 10 02:39:32 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.