Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6538

Client s getting "operation ost_write failed with -3."

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • None
    • Lustre 2.5.3
    • Linux meerkat-mds-10-1.local 2.6.32-431.29.2.el6_lustre.gb8d9077.x86_64 #1 SMP Mon Apr 27 12:55:47 PDT 2015 x86_64 x86_64 x86_64 GNU/Linux
    • 4
    • 9223372036854775807

    Description

      Some clients can't write to some OSTs.

      user@gordon-ln3 LTEST]$ cd /oasis/projects/nsf/use300/mahidhar/LTEST
      [user@gordon-ln3 LTEST]$ lfs setstripe -c 1 -i 54 .
      [user@gordon-ln3 LTEST]$ ls
      IOR-2.10.3.tar
      [user@gordon-ln3 LTEST]$ rm IOR-2.10.3.tar
      [user@gordon-ln3 LTEST]$ cp ../IOR-2.10.3.tar .
      cp: writing `./IOR-2.10.3.tar': No such process
      [user@gordon-ln3 LTEST]$

      Client dmesg got:

      LustreError: 11-0: meerkat-OST0015-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx244@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0036-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST000e-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST000d-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0024-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST002a-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0009-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0036-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST000e-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST000d-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0024-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST002a-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0009-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0036-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0036-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0026-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0026-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0025-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0036-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0017-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0036-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0036-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0031-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0025-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0036-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0036-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0036-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0036-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0031-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST000c-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.

      Server log see attachment

      Attachments

        1. aggregated_messages_4-27.gz
          2.82 MB
          Haisong Cai
        2. debugfs_out
          13 kB
          Haisong Cai
        3. lustre_messages_today
          166 kB
          Haisong Cai
        4. proc_out
          13 kB
          Haisong Cai

        Activity

          [LU-6538] Client s getting "operation ost_write failed with -3."

          Haisong, anything else need we do on this ticket? Can we close this one?

          niu Niu Yawei (Inactive) added a comment - Haisong, anything else need we do on this ticket? Can we close this one?

          Niu,

          A quick test on one the problematic OSTs show that fix the write problem.
          We are going to initialize all OSTs and MDT now.

          Haisong

          haisong Haisong Cai (Inactive) added a comment - Niu, A quick test on one the problematic OSTs show that fix the write problem. We are going to initialize all OSTs and MDT now. Haisong

          I'm not sure if the latest e2fsprogs has been installed when you enable quota feature for the OST/MDT devices (when upgrading from 2.1 to 2.4). If it wasn't the latest version, I suggest you to run it for all MDT/OSTs. Of course, you'd choose the problematic one first to see if it can fix the problem.

          niu Niu Yawei (Inactive) added a comment - I'm not sure if the latest e2fsprogs has been installed when you enable quota feature for the OST/MDT devices (when upgrading from 2.1 to 2.4). If it wasn't the latest version, I suggest you to run it for all MDT/OSTs. Of course, you'd choose the problematic one first to see if it can fix the problem.

          [root@meerkat-oss-11-4 log]# umount /meerkat/ost_sdi
          [root@meerkat-oss-11-4 log]# tune2fs -O ^quota /dev/ost_sdi
          tune2fs 1.42.12.wc1 (15-Sep-2014)
          [root@meerkat-oss-11-4 log]# tune2fs -O quota /dev/ost_sdi
          tune2fs 1.42.12.wc1 (15-Sep-2014)
          [root@meerkat-oss-11-4 log]# mount -t lustre /dev/ost_sdi /meerkat/ost_sdi
          mount.lustre: set /sys/block/sdi/queue/max_sectors_kb to 280

          [root@meerkat-oss-11-4 log]#

          Testing and will let you know if it fix the problem or not.

          haisong Haisong Cai (Inactive) added a comment - [root@meerkat-oss-11-4 log] # umount /meerkat/ost_sdi [root@meerkat-oss-11-4 log] # tune2fs -O ^quota /dev/ost_sdi tune2fs 1.42.12.wc1 (15-Sep-2014) [root@meerkat-oss-11-4 log] # tune2fs -O quota /dev/ost_sdi tune2fs 1.42.12.wc1 (15-Sep-2014) [root@meerkat-oss-11-4 log] # mount -t lustre /dev/ost_sdi /meerkat/ost_sdi mount.lustre: set /sys/block/sdi/queue/max_sectors_kb to 280 [root@meerkat-oss-11-4 log] # Testing and will let you know if it fix the problem or not.

          By the way....

          [root@meerkat-oss-11-4 log]# rpm -aq | grep e2fsprog
          e2fsprogs-libs-1.42.12.wc1-7.el6.x86_64
          e2fsprogs-1.42.12.wc1-7.el6.x86_64

          haisong Haisong Cai (Inactive) added a comment - By the way.... [root@meerkat-oss-11-4 log] # rpm -aq | grep e2fsprog e2fsprogs-libs-1.42.12.wc1-7.el6.x86_64 e2fsprogs-1.42.12.wc1-7.el6.x86_64

          On all OSTs or just the ones reporting problem?

          thanks,
          Haisong

          haisong Haisong Cai (Inactive) added a comment - On all OSTs or just the ones reporting problem? thanks, Haisong

          Aggregated logs from all MDS/OSS, before and after 2.4.3 -> 2.5.3 upgrade.
          Sometime between 3pm and 4pm on that day was when we mounted OST/MDT.

          haisong Haisong Cai (Inactive) added a comment - Aggregated logs from all MDS/OSS, before and after 2.4.3 -> 2.5.3 upgrade. Sometime between 3pm and 4pm on that day was when we mounted OST/MDT.

          Looks the quota accounting files are corrupted somehow, could you make sure that latest e2fsprogs are installed on the problematic OSTs? (1.42.12-wc1)
          Then try to fix the quota files by:

          • umount the OST;
          • tune2fs -O ^quota $ostdev;
          • tune2fs -O quota $ostdev;
          • mount OST;
          niu Niu Yawei (Inactive) added a comment - Looks the quota accounting files are corrupted somehow, could you make sure that latest e2fsprogs are installed on the problematic OSTs? (1.42.12-wc1) Then try to fix the quota files by: umount the OST; tune2fs -O ^quota $ostdev; tune2fs -O quota $ostdev; mount OST;

          Last few lines from dmesg on 11-4:

          VFS: Error -5 occurred while creating quota.
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.
          LustreError: 4787:0:(qsd_handler.c:1155:qsd_op_adjust()) meerkat-OST000e: fail to locate lqe for id:505418, type:0
          LustreError: 4787:0:(qsd_handler.c:1155:qsd_op_adjust()) Skipped 9787 previous similar messages
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.
          Lustre: 12307:0:(qsd_reint.c:486:qsd_reint_main()) meerkat-OST0036: reint global for [0x200000006:0x20000:0x0] failed. -3
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.
          LustreError: 4787:0:(qsd_handler.c:1155:qsd_op_adjust()) meerkat-OST000e: fail to locate lqe for id:505418, type:0
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.
          VFS: find_free_dqentry(): Data block full but it shouldn't.
          VFS: Error -5 occurred while creating quota.

          haisong Haisong Cai (Inactive) added a comment - Last few lines from dmesg on 11-4: VFS: Error -5 occurred while creating quota. VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota. LustreError: 4787:0:(qsd_handler.c:1155:qsd_op_adjust()) meerkat-OST000e: fail to locate lqe for id:505418, type:0 LustreError: 4787:0:(qsd_handler.c:1155:qsd_op_adjust()) Skipped 9787 previous similar messages VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota. VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota. VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota. Lustre: 12307:0:(qsd_reint.c:486:qsd_reint_main()) meerkat-OST0036: reint global for [0x200000006:0x20000:0x0] failed. -3 VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota. VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota. VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota. VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota. VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota. LustreError: 4787:0:(qsd_handler.c:1155:qsd_op_adjust()) meerkat-OST000e: fail to locate lqe for id:505418, type:0 VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota. VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota. VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota. VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota. VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota. VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota. VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota. VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota. VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota. VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota. VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota. VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota. VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota. VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota. VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota. VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota. VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota. VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota. VFS: find_free_dqentry(): Data block full but it shouldn't. VFS: Error -5 occurred while creating quota.

          Tried unmount and mount an bad OST:

          [root@meerkat-oss-11-4 ~]# mount -t lustre /dev/ost_sdi /meerkat/ost_sdi
          mount.lustre: set /sys/block/sdi/queue/max_sectors_kb to 280

          [root@meerkat-oss-11-4 ~]#

          haisong Haisong Cai (Inactive) added a comment - Tried unmount and mount an bad OST: [root@meerkat-oss-11-4 ~] # mount -t lustre /dev/ost_sdi /meerkat/ost_sdi mount.lustre: set /sys/block/sdi/queue/max_sectors_kb to 280 [root@meerkat-oss-11-4 ~] #

          People

            niu Niu Yawei (Inactive)
            haisong Haisong Cai (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: