Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9825

Multiple errors on OST/MDS

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.10.1
    • None
    • Soak cluster - testing LU-7899 patch
    • 3
    • 9223372036854775807

    Description

      Running soak on LU-7899 patch, seeing multiple repeated errors not tied to any system halts.
      First error, on OST, appeared after an OST failover, during LFSCK run. (LFSCK exceeded timeout and was aborted)
      There is a hiccup in network and simul job dies:

      sys-recov.today:/scratch/logs/syslog/soak-6.log:Aug  3 22:14:38 soak-6 kernel: Lustre: soaked-OST0016: Recovery over after 0:05, of 35 clients 35 recovered and 0 were evicted.
      sys-recov.today:/scratch/logs/syslog/soak-6.log:Aug  3 22:14:49 soak-6 kernel: Lustre: soaked-OST0004: Recovery over after 0:06, of 35 clients 35 recovered and 0 were evicted.
      
      /scratch/logs/syslog/soak-5.log:Aug  3 22:17:08 soak-5 kernel: LustreError: 24771:0:(osd_index.c:224:__osd_xattr_load_by_oid()) Skipped 13 previous similar messages
      /scratch/logs/syslog/soak-5.log:Aug  3 22:17:08 soak-5 kernel: LustreError: 24771:0:(osd_index.c:224:__osd_xattr_load_by_oid()) soaked-OST0003: can't get bonus, rc = -17
      /scratch/logs/syslog/soak-2.log:Aug  3 22:18:07 soak-2 kernel: LustreError: 24114:0:(osd_index.c:224:__osd_xattr_load_by_oid()) Skipped 534 previous similar messages
      /scratch/logs/syslog/soak-2.log:Aug  3 22:18:07 soak-2 kernel: LustreError: 24114:0:(osd_index.c:224:__osd_xattr_load_by_oid()) soaked-OST0012: can't get bonus, rc = -17
      

      Second set of errors comes up on MDS

      scratch/logs/syslog/soak-8.log:Aug  3 22:13:46 soak-8 kernel: LustreError: 4491:0:(client.c:3006:ptlrpc_replay_interpret()) @@@ status -2, old was 0  req@ffff8807d76d5700 x1574746919153872/t167504869448(167504869448) o6->soaked-OST0016-osc-MDT0000@192.168.1.107@o2ib:28/4 lens 664/400 e 0 to 0 dl 1501798434 ref 2 fl Interpret:R/4/0 rc -2/-2
      /scratch/logs/syslog/soak-8.log:Aug  3 22:13:50 soak-8 kernel: LustreError: 4491:0:(client.c:3006:ptlrpc_replay_interpret()) @@@ status -2, old was 0  req@ffff8807f37a1500 x1574746919157120/t167504861127(167504861127) o6->soaked-OST0004-osc-MDT0000@192.168.1.107@o2ib:28/4 lens 664/400 e 0 to 0 dl 1501798438 ref 2 fl Interpret:R/4/0 rc -2/-2
      /scratch/logs/syslog/soak-8.log:Aug  3 22:13:56 soak-8 kernel: LustreError: 4939:0:(mdt_lvb.c:163:mdt_lvbo_fill()) Skipped 1 previous similar message
      /scratch/logs/syslog/soak-8.log:Aug  3 22:13:56 soak-8 kernel: LustreError: 4939:0:(mdt_lvb.c:163:mdt_lvbo_fill()) soaked-MDT0000: expected 968 actual 344.
      /scratch/logs/syslog/soak-8.log:Aug  3 22:14:11 soak-8 kernel: LustreError: 11-0: soaked-OST0016-osc-MDT0000: operation ost_destroy to node 192.168.1.107@o2ib failed: rc = -107
      /scratch/logs/syslog/soak-8.log:Aug  3 22:14:33 soak-8 kernel: LustreError: 11-0: soaked-OST0004-osc-MDT0000: operation ost_create to node 192.168.1.107@o2ib failed: rc = -107
      /scratch/logs/syslog/soak-8.log:Aug  3 22:14:33 soak-8 kernel: LustreError: 4596:0:(osp_precreate.c:619:osp_precreate_send()) soaked-OST0004-osc-MDT0000: can't precreate: rc = -107
      /scratch/logs/syslog/soak-8.log:Aug  3 22:14:40 soak-8 kernel: LustreError: 11-0: soaked-OST000a-osc-MDT0000: operation ost_create to node 192.168.1.107@o2ib failed: rc = -107
      /scratch/logs/syslog/soak-8.log:Aug  3 22:14:40 soak-8 kernel: LustreError: 4609:0:(osp_precreate.c:619:osp_precreate_send()) soaked-OST000a-osc-MDT0000: can't precreate: rc = -107
      /scratch/logs/syslog/soak-8.log:Aug  3 22:14:54 soak-8 kernel: LustreError: 11-0: soaked-OST0010-osc-MDT0000: operation ost_statfs to node 192.168.1.107@o2ib failed: rc = -107
      /scratch/logs/syslog/soak-8.log:Aug  3 22:22:14 soak-8 kernel: LustreError: 4692:0:(mdt_lvb.c:163:mdt_lvbo_fill()) Skipped 23 previous similar messages
      /scratch/logs/syslog/soak-8.log:Aug  3 22:22:14 soak-8 kernel: LustreError: 4692:0:(mdt_lvb.c:163:mdt_lvbo_fill()) soaked-MDT0000: expected 416 actual 344.
      /scratch/logs/syslog/soak-8.log:Aug  3 22:22:15 soak-8 kernel: LustreError: 4853:0:(mdt_lvb.c:163:mdt_lvbo_fill()) Skipped 25 previous similar messages
      /scratch/logs/syslog/soak-8.log:Aug  3 22:22:15 soak-8 kernel: LustreError: 4853:0:(mdt_lvb.c:163:mdt_lvbo_fill()) soaked-MDT0000: expected 968 actual 416.
      /scratch/logs/syslog/soak-8.log:Aug  3 22:22:28 soak-8 kernel: LustreError: 4769:0:(mdt_lvb.c:163:mdt_lvbo_fill()) soaked-MDT0000: expected 968 actual 344.
      /scratch/logs/syslog/soak-8.log:Aug  3 22:25:20 soak-8 kernel: LustreError: 5065:0:(osp_object.c:582:osp_attr_get()) soaked-OST000b-osc-MDT0000:osp_attr_get update error [0x1000b0000:0x111cd18:0x0]: rc = -4
      

      Servers are not dying, but this is new with the patch, so possibly of interest.

      Attachments

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: