Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13323

sanity test_160a: timeout

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for S Buisson <sbuisson@ddn.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/f0d9c514-dca5-4f09-8180-068456c0d0c8

      test_160a failed with the following error:

      Timeout occurred after 160 mins, last suite running was sanity
      

      After MDT failover, it seems clients never reconnect.
      On MDS side:

      [ 5987.459946] Lustre: DEBUG MARKER: umount -d /mnt/lustre-mds1
      [ 5987.641616] Lustre: Failing over lustre-MDT0000
      [ 5988.179879] LustreError: 137-5: lustre-MDT0000_UUID: not available for connect from 10.9.5.43@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
      [ 5988.182823] LustreError: Skipped 6 previous similar messages
      [ 5993.848997] Lustre: 25449:0:(client.c:2228:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1583256866/real 1583256866]  req@ffff939a9e2c4900 x1660159018546432/t0(0) o251->MGC10.9.6.10@tcp@10.9.6.10@tcp:26/25 lens 224/224 e 0 to 1 dl 1583256872 ref 2 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'umount.0'
      [ 5994.048821] Lustre: server umount lustre-MDT0000 complete
      [ 5994.292111] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
      [ 5994.292111] lctl dl | grep ' ST ' || true
      [ 5994.650156] Lustre: DEBUG MARKER: modprobe dm-flakey;
      [ 5994.650156] 			 dmsetup targets | grep -q flakey
      [ 5995.000133] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1
      [ 5995.350091] Lustre: DEBUG MARKER: modprobe dm-flakey;
      [ 5995.350091] 			 dmsetup targets | grep -q flakey
      [ 5995.696047] Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey >/dev/null 2>&1
      [ 5996.039360] Lustre: DEBUG MARKER: dmsetup status /dev/mapper/mds1_flakey 2>&1
      [ 5996.384483] Lustre: DEBUG MARKER: test -b /dev/mapper/mds1_flakey
      [ 5996.727194] Lustre: DEBUG MARKER: e2label /dev/mapper/mds1_flakey
      [ 5997.067827] Lustre: DEBUG MARKER: mkdir -p /mnt/lustre-mds1; mount -t lustre -olocalrecov  /dev/mapper/mds1_flakey /mnt/lustre-mds1
      [ 5997.294752] LDISKFS-fs (dm-3): mounted filesystem with ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
      [ 5997.326213] Lustre: osd-ldiskfs create tunables for lustre-MDT0000
      [ 5997.721570] Lustre: lustre-MDT0000: Imperative Recovery not enabled, recovery window 60-180
      [ 5997.773182] Lustre: lustre-MDD0000: changelog on
      [ 5997.774238] Lustre: 26177:0:(mdd_device.c:545:mdd_changelog_llog_init()) lustre-MDD0000 : orphan changelog records found, starting from index 0 to index 16, being cleared now
      [ 5997.778242] Lustre: lustre-MDT0000: in recovery but waiting for the first client to connect
      

      On client1:

      [ 6075.180830] Lustre: lustre-MDT0000-mdc-ffff9513656af800: Connection to lustre-MDT0000 (at 10.9.6.10@tcp) was lost; in progress operations using this service will wait for recovery to complete
      [ 6082.181294] Lustre: 19983:0:(client.c:2228:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1583256870/real 1583256870]  req@ffff951364fd4480 x1660164743953984/t0(0) o400->MGC10.9.6.10@tcp@10.9.6.10@tcp:26/25 lens 224/224 e 0 to 1 dl 1583256877 ref 1 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'kworker/0:1.0'
      [ 6082.186243] LustreError: 166-1: MGC10.9.6.10@tcp: Connection to MGS (at 10.9.6.10@tcp) was lost; in progress operations using this service will fail
      

      On client2:

      [ 6072.132648] Lustre: lustre-MDT0000-mdc-ffff92943a687000: Connection to lustre-MDT0000 (at 10.9.6.10@tcp) was lost; in progress operations using this service will wait for recovery to complete
      [ 6079.133314] Lustre: 11180:0:(client.c:2228:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1583256867/real 1583256867]  req@ffff929423be0900 x1660159108778048/t0(0) o400->MGC10.9.6.10@tcp@10.9.6.10@tcp:26/25 lens 224/224 e 0 to 1 dl 1583256874 ref 1 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:'kworker/1:2.0'
      [ 6079.136248] LustreError: 166-1: MGC10.9.6.10@tcp: Connection to MGS (at 10.9.6.10@tcp) was lost; in progress operations using this service will fail
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity test_160a - Timeout occurred after 160 mins, last suite running was sanity

      Attachments

        Activity

          People

            wc-triage WC Triage
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: