Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for eaujames <eaujames@ddn.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/bdc21484-cfd2-46d9-a0a9-a46d1a5682b2
test_19 failed with the following error:
test_19 returned 1
This failure apears on review-ldiskfs-arm session, but the OSS was running on x86_64
Client test:
CMD: onyx-124vm1 lctl set_param -n osd*.*OS*.force_sync=1 ... CMD: onyx-81vm3 /usr/sbin/lctl set_param -n os[cd]*.*MD*.force_sync 1 CMD: onyx-81vm3 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-81vm3 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* CMD: onyx-81vm3 /usr/sbin/lctl get_param -n osc.*MDT*.sync_* ... Delete is not completed in 37 seconds CMD: onyx-81vm3 /usr/sbin/lctl get_param osc.*MDT*.sync_* osc.lustre-OST0000-osc-MDT0000.sync_changes=0 osc.lustre-OST0000-osc-MDT0000.sync_in_flight=0 osc.lustre-OST0000-osc-MDT0000.sync_in_progress=1 ... CMD: onyx-91vm5.onyx.whamcloud.com runas -u0 -g0 -G0 lfs quota -q /mnt/lustre running as uid/gid/euid/egid 0/0/0/0, groups: 0 [lfs] [quota] [-q] [/mnt/lustre] /usr/lib64/lustre/tests/sanity-sec.sh: line 1293: [21448]: syntax error: operand expected (error token is "[21448]") CMD: onyx-81vm3 /usr/sbin/lctl nodemap_del c0 CMD: onyx-81vm3 /usr/sbin/lctl nodemap_del c1 CMD: onyx-81vm3 /usr/sbin/lctl nodemap_modify --name default --property admin --value 0 CMD: onyx-81vm3 /usr/sbin/lctl get_param -n nodemap.active CMD: onyx-81vm3 /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @ On MGS 10.240.26.198, default.admin_nodemap = nodemap.default.admin_nodemap=0 CMD: onyx-124vm1 /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @ onyx-124vm1: ssh: connect to host onyx-124vm1 port 22: No route to host pdsh@onyx-91vm5: onyx-124vm1: ssh exited with exit code 255
MDS dmesg:
[18117.644399] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param -n os[cd]*.*MD*.force_sync 1 [18118.434329] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n osc.*MDT*.sync_* [18120.214180] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n osc.*MDT*.sync_* .... [18133.582033] Lustre: 11600:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1649444731/real 1649444731] req@000000009702997f x1729549387801536/t0(0) o13->lustre-OST0006-osc-MDT0000@10.240.30.32@tcp:7/4 lens 224/368 e 0 to 1 dl 1649444738 ref 1 fl Rpc:XQr/0/ffffffff rc 0/-1 job:'osp-pre-6-0.0' [18133.582035] Lustre: 11599:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1649444731/real 1649444731] req@00000000f3577720 x1729549387801600/t0(0) o13->lustre-OST0002-osc-MDT0000@10.240.30.32@tcp:7/4 lens 224/368 e 0 to 1 dl 1649444738 ref 1 fl Rpc:XQr/0/ffffffff rc 0/-1 job:'osp-pre-2-0.0' [18133.582063] Lustre: lustre-OST0002-osc-MDT0000: Connection to lustre-OST0002 (at 10.240.30.32@tcp) was lost; in progress operations using this service will wait for recovery to complete
OST dmesg:
[18108.151196] Lustre: DEBUG MARKER: /usr/sbin/lctl list_nids | grep -w tcp | cut -f 1 -d @ [18108.941076] Lustre: DEBUG MARKER: /usr/sbin/lctl get_param nodemap.c0.trusted_nodemap [18115.611844] Lustre: DEBUG MARKER: lctl set_param -n osd*.*OS*.force_sync=1
No messages are logged after "force_sync=1" on the OST, but no crash has been reported. The OSS seems to have disappeared.
Hard reset? Misconfiguration of kdumpd? Network issues?
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-sec test_19 - test_19 returned 1