Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.11.0
-
3
-
9223372036854775807
Description
sanity test_133g fails on the call to find on the MDS. From the test_log, we see the call to find
mds1_proc_dirs='/proc/fs/lustre/ /proc/sys/lnet/ /sys/fs/lustre/ /sys/kernel/debug/lnet/ /sys/kernel/debug/lustre/' sanity test_133g: @@@@@@ FAIL: mds1 find /proc/fs/lustre/ /proc/sys/lnet/ /sys/fs/lustre/ /sys/kernel/debug/lnet/ /sys/kernel/debug/lustre/ failed
Looking at the client2 dmesg, we see the output from 133g
[ 3573.106070] Lustre: DEBUG MARKER: == sanity test 133g: Check for Oopses on bad io area writes/reads in /proc =========================== 21:44:29 (1504907069) [ 3573.249527] Lustre: 18911:0:(lprocfs_status.c:2483:lprocfs_wr_root_squash()) lustre: failed to set root_squash due to bad address, rc = -14 [ 3573.254669] Lustre: 18911:0:(lprocfs_status.c:2479:lprocfs_wr_root_squash()) lustre: failed to set root_squash to "", needs uid:gid format, rc = -22 [ 3573.263113] Lustre: 18916:0:(lprocfs_status.c:2551:lprocfs_wr_nosquash_nids()) lustre: failed to set nosquash_nids to "", bad address rc = -14 [ 3573.268474] Lustre: 18916:0:(lprocfs_status.c:2555:lprocfs_wr_nosquash_nids()) lustre: failed to set nosquash_nids due to string too long rc = -22 [ 3573.349716] LustreError: 18934:0:(gss_cli_upcall.c:240:gss_do_ctx_init_rpc()) ioctl size 5, expect 80, please check lgss_keyring version [ 3573.379338] LustreError: 18984:0:(ldlm_resource.c:355:lru_size_store()) lru_size: invalid value written [ 3573.422121] Lustre: 19067:0:(libcfs_string.c:127:cfs_str2mask()) unknown mask ' '. mask usage: [+|-]<all|type> ... [ 3574.553074] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity test_133g: @@@@@@ FAIL: mds1 find \/proc\/fs\/lustre\/
Looking at the dmesg log on the MDS1, we see similar output, but a few more lines
[ 3479.763450] Lustre: DEBUG MARKER: find /proc/fs/lustre/ /proc/sys/lnet/ /sys/fs/lustre/ /sys/kernel/debug/lnet/ /sys/kernel/debug/lustre/ -type f -not -name force_lbug -not -name changelog_mask -exec badarea_io {} \; [ 3479.911266] Lustre: 17021:0:(mdt_coordinator.c:1944:mdt_hsm_policy_seq_write()) lustre-MDT0000: ' ' is unknown, supported policies are: [ 3479.944620] LustreError: 17069:0:(mdt_coordinator.c:2097:mdt_hsm_cdt_control_seq_write()) lustre-MDT0000: Valid coordinator control commands are: enabled shutdown disabled purge help [ 3479.950356] Lustre: 17071:0:(lprocfs_status.c:2483:lprocfs_wr_root_squash()) lustre-MDT0000: failed to set root_squash due to bad address, rc = -14 [ 3479.954690] Lustre: 17071:0:(lprocfs_status.c:2479:lprocfs_wr_root_squash()) lustre-MDT0000: failed to set root_squash to "", needs uid:gid format, rc = -22 [ 3479.960431] LustreError: 17072:0:(genops.c:1540:obd_export_evict_by_uuid()) lustre-MDT0000: can't disconnect : no exports found [ 3479.965980] LustreError: 17074:0:(mdt_lproc.c:366:lprocfs_identity_info_seq_write()) lustre-MDT0000: invalid data count = 5, size = 1048 [ 3479.970389] LustreError: 17074:0:(mdt_lproc.c:383:lprocfs_identity_info_seq_write()) lustre-MDT0000: MDS identity downcall bad params [ 3479.975746] Lustre: 17075:0:(lprocfs_status.c:2551:lprocfs_wr_nosquash_nids()) lustre-MDT0000: failed to set nosquash_nids to "", bad address rc = -14 [ 3479.980510] Lustre: 17075:0:(lprocfs_status.c:2555:lprocfs_wr_nosquash_nids()) lustre-MDT0000: failed to set nosquash_nids due to string too long rc = -22 [ 3479.988148] LustreError: 17079:0:(mdt_lproc.c:298:mdt_identity_upcall_seq_write()) lustre-MDT0000: identity upcall too long [ 3480.066568] LustreError: 17187:0:(lproc_fid.c:177:lprocfs_server_fid_width_seq_write()) ctl-lustre-MDT0000: invalid FID sequence width: rc = -14 [ 3480.104133] LustreError: 17240:0:(ldlm_resource.c:104:seq_watermark_write()) Failed to set lock_limit_mb, rc = -14. [ 3480.122289] LustreError: 17250:0:(nodemap_storage.c:420:nodemap_idx_nodemap_update()) cannot add nodemap config to non-existing MGS. [ 3480.128801] LustreError: 17252:0:(nodemap_handler.c:1049:nodemap_create()) cannot add nodemap: ' ': rc = -22 [ 3480.250476] LustreError: 17363:0:(ldlm_resource.c:355:lru_size_store()) lru_size: invalid value written [ 3480.333333] Lustre: 17493:0:(libcfs_string.c:127:cfs_str2mask()) unknown mask ' '. mask usage: [+|-]<all|type> ... [ 3480.468566] Lustre: DEBUG MARKER: /usr/sbin/lctl mark sanity test_133g: @@@@@@ FAIL: mds1 find \/proc\/fs\/lustre\/
This ticket was opened because we are seeing this in testing a separate MGS and MDS. Logs for these failures are at:
https://testing.hpdd.intel.com/test_sets/308ca8f6-97d7-11e7-b944-5254006e85c2
https://testing.hpdd.intel.com/test_sets/774c9f26-97d7-11e7-b944-5254006e85c2
We have also seen this test fail frequently during interop testing.