Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6538

Client s getting "operation ost_write failed with -3."

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • None
    • Lustre 2.5.3
    • Linux meerkat-mds-10-1.local 2.6.32-431.29.2.el6_lustre.gb8d9077.x86_64 #1 SMP Mon Apr 27 12:55:47 PDT 2015 x86_64 x86_64 x86_64 GNU/Linux
    • 4
    • 9223372036854775807

    Description

      Some clients can't write to some OSTs.

      user@gordon-ln3 LTEST]$ cd /oasis/projects/nsf/use300/mahidhar/LTEST
      [user@gordon-ln3 LTEST]$ lfs setstripe -c 1 -i 54 .
      [user@gordon-ln3 LTEST]$ ls
      IOR-2.10.3.tar
      [user@gordon-ln3 LTEST]$ rm IOR-2.10.3.tar
      [user@gordon-ln3 LTEST]$ cp ../IOR-2.10.3.tar .
      cp: writing `./IOR-2.10.3.tar': No such process
      [user@gordon-ln3 LTEST]$

      Client dmesg got:

      LustreError: 11-0: meerkat-OST0015-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx244@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0036-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST000e-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST000d-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0024-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST002a-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0009-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0036-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST000e-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST000d-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0024-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST002a-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0009-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0036-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0036-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0026-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0026-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0025-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0036-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0017-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0036-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0036-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0031-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0025-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0036-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0036-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0036-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0036-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST0031-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.
      LustreError: 11-0: meerkat-OST000c-osc-ffff88102f998c00: Communicating with xx.xx.xx.xx@tcp, operation ost_write failed with -3.

      Server log see attachment

      Attachments

        1. aggregated_messages_4-27.gz
          2.82 MB
        2. debugfs_out
          13 kB
        3. lustre_messages_today
          166 kB
        4. proc_out
          13 kB

        Activity

          [LU-6538] Client s getting "operation ost_write failed with -3."

          Haisong, anything else need we do on this ticket? Can we close this one?

          niu Niu Yawei (Inactive) added a comment - Haisong, anything else need we do on this ticket? Can we close this one?

          Niu,

          A quick test on one the problematic OSTs show that fix the write problem.
          We are going to initialize all OSTs and MDT now.

          Haisong

          haisong Haisong Cai (Inactive) added a comment - Niu, A quick test on one the problematic OSTs show that fix the write problem. We are going to initialize all OSTs and MDT now. Haisong

          I'm not sure if the latest e2fsprogs has been installed when you enable quota feature for the OST/MDT devices (when upgrading from 2.1 to 2.4). If it wasn't the latest version, I suggest you to run it for all MDT/OSTs. Of course, you'd choose the problematic one first to see if it can fix the problem.

          niu Niu Yawei (Inactive) added a comment - I'm not sure if the latest e2fsprogs has been installed when you enable quota feature for the OST/MDT devices (when upgrading from 2.1 to 2.4). If it wasn't the latest version, I suggest you to run it for all MDT/OSTs. Of course, you'd choose the problematic one first to see if it can fix the problem.

          [root@meerkat-oss-11-4 log]# umount /meerkat/ost_sdi
          [root@meerkat-oss-11-4 log]# tune2fs -O ^quota /dev/ost_sdi
          tune2fs 1.42.12.wc1 (15-Sep-2014)
          [root@meerkat-oss-11-4 log]# tune2fs -O quota /dev/ost_sdi
          tune2fs 1.42.12.wc1 (15-Sep-2014)
          [root@meerkat-oss-11-4 log]# mount -t lustre /dev/ost_sdi /meerkat/ost_sdi
          mount.lustre: set /sys/block/sdi/queue/max_sectors_kb to 280

          [root@meerkat-oss-11-4 log]#

          Testing and will let you know if it fix the problem or not.

          haisong Haisong Cai (Inactive) added a comment - [root@meerkat-oss-11-4 log] # umount /meerkat/ost_sdi [root@meerkat-oss-11-4 log] # tune2fs -O ^quota /dev/ost_sdi tune2fs 1.42.12.wc1 (15-Sep-2014) [root@meerkat-oss-11-4 log] # tune2fs -O quota /dev/ost_sdi tune2fs 1.42.12.wc1 (15-Sep-2014) [root@meerkat-oss-11-4 log] # mount -t lustre /dev/ost_sdi /meerkat/ost_sdi mount.lustre: set /sys/block/sdi/queue/max_sectors_kb to 280 [root@meerkat-oss-11-4 log] # Testing and will let you know if it fix the problem or not.

          By the way....

          [root@meerkat-oss-11-4 log]# rpm -aq | grep e2fsprog
          e2fsprogs-libs-1.42.12.wc1-7.el6.x86_64
          e2fsprogs-1.42.12.wc1-7.el6.x86_64

          haisong Haisong Cai (Inactive) added a comment - By the way.... [root@meerkat-oss-11-4 log] # rpm -aq | grep e2fsprog e2fsprogs-libs-1.42.12.wc1-7.el6.x86_64 e2fsprogs-1.42.12.wc1-7.el6.x86_64

          On all OSTs or just the ones reporting problem?

          thanks,
          Haisong

          haisong Haisong Cai (Inactive) added a comment - On all OSTs or just the ones reporting problem? thanks, Haisong

          Aggregated logs from all MDS/OSS, before and after 2.4.3 -> 2.5.3 upgrade.
          Sometime between 3pm and 4pm on that day was when we mounted OST/MDT.

          haisong Haisong Cai (Inactive) added a comment - Aggregated logs from all MDS/OSS, before and after 2.4.3 -> 2.5.3 upgrade. Sometime between 3pm and 4pm on that day was when we mounted OST/MDT.

          People

            niu Niu Yawei (Inactive)
            haisong Haisong Cai (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: