Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2679

osc_init_grant(), available grant < 0, the OSS is probably not running with patch from bug20278

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.4.0
    • None
    • 3
    • 6265

    Description

      After some kind of problem on Sequoia I/O Nodes (lustre 2.3.58-6chaos on both clients and servers), we see the following message:

      Lustre: 3266:0:(osc_request.c:1064:osc_init_grant()) ls1-OST002c-osc-c00000038d516c00: available grant < 0, the OSS is probably not running with patch from bug20278 (-720896)
      

      At first glance, this just looks like fallout from whetever else went wrong. But the ", the OSS is probably not running with patch from bug20278"
      clearly needs to be removed from the message. Then we need to see if the situation that triggers the message can be fixed altogether.

      Here is more of the console log from the client leading up to this error:

      Lustre: Mounted ls1-client
      Lustre: ls1-MDT0000-mdc-c00000038d516c00: Connection to ls1-MDT0000 (at 172.20.5.1@o2ib500) was lost; in progress operations using this service will wait for recovery to complete
      Lustre: ls1-MDT0000-mdc-c00000038d516c00: Connection restored to ls1-MDT0000 (at 172.20.5.1@o2ib500)
      Lustre: Skipped 15 previous similar messages
      LustreError: 33933:0:(osc_cache.c:896:osc_extent_wait()) extent c00000036ead6c88@{[3088 -> 3089/3103], [3|0|+|rpc|wihY|c0000002d481f780], [131072|2|+|-|c000000388d7ee70|16|c0000003631d8920]} ls1-OST003b-osc-c00000038d516c00: wait ext to 0 timedout, recovery in progress?
      LustreError: 20532:0:(osc_cache.c:896:osc_extent_wait()) extent c0000003670b4ac0@{[3325 -> 3327/3327], [3|0|+|rpc|wihY|c0000002d4814380], [196608|3|+|-|c0000002de43e7e0|16|c0000003c3ed5400]} ls1-OST012b-osc-c00000038d516c00: wait ext to 0 timedout, recovery in progress?
      LustreError: 11-0: ls1-MDT0000-mdc-c00000038d516c00: Communicating with 172.20.5.1@o2ib500, operation ldlm_enqueue failed with -107
      LustreError: Skipped 5 previous similar messages
      Lustre: ls1-MDT0000-mdc-c00000038d516c00: Connection to ls1-MDT0000 (at 172.20.5.1@o2ib500) was lost; in progress operations using this service will wait for recovery to complete
      LustreError: 167-0: ls1-MDT0000-mdc-c00000038d516c00: This client was evicted by ls1-MDT0000; in progress operations using this service will fail.
      LustreError: 38233:0:(ldlm_resource.c:804:ldlm_resource_complain()) Namespace ls1-MDT0000-mdc-c00000038d516c00 resource refcount nonzero (440) after lock cleanup; forcing cleanup.
      LustreError: 38233:0:(ldlm_resource.c:810:ldlm_resource_complain()) Resource: c0000002d2961080 (8590005591/14977/0/12) (rc: 440)
      Lustre: ls1-MDT0000-mdc-c00000038d516c00: Connection restored to ls1-MDT0000 (at 172.20.5.1@o2ib500)
      flush_iolink_connection_list freed connections=3
      flush_iolink_connection_list freed connections=6
      LustreError: 11-0: ls1-MDT0000-mdc-c00000038d516c00: Communicating with 172.20.5.1@o2ib500, operation ldlm_enqueue failed with -107
      Lustre: ls1-MDT0000-mdc-c00000038d516c00: Connection to ls1-MDT0000 (at 172.20.5.1@o2ib500) was lost; in progress operations using this service will wait for recovery to complete
      LustreError: 167-0: ls1-MDT0000-mdc-c00000038d516c00: This client was evicted by ls1-MDT0000; in progress operations using this service will fail.
      LustreError: 38697:0:(mdc_locks.c:784:mdc_enqueue()) ldlm_cli_enqueue: -5
      LustreError: 38697:0:(file.c:2394:ll_inode_revalidate_fini()) ls1: revalidate FID [0x200000001:0x6:0x0] error: rc = -5
      LustreError: 38698:0:(ldlm_resource.c:804:ldlm_resource_complain()) Namespace ls1-MDT0000-mdc-c00000038d516c00 resource refcount nonzero (2) after lock cleanup; forcing cleanup.
      LustreError: 38698:0:(ldlm_resource.c:810:ldlm_resource_complain()) Resource: c0000002d2961080 (8590005591/14977/0/12) (rc: 2)
      Lustre: ls1-MDT0000-mdc-c00000038d516c00: Connection restored to ls1-MDT0000 (at 172.20.5.1@o2ib500)
      LustreError: 11-0: ls1-MDT0000-mdc-c00000038d516c00: Communicating with 172.20.5.1@o2ib500, operation ldlm_enqueue failed with -107
      Lustre: ls1-MDT0000-mdc-c00000038d516c00: Connection to ls1-MDT0000 (at 172.20.5.1@o2ib500) was lost; in progress operations using this service will wait for recovery to complete
      LustreError: 167-0: ls1-MDT0000-mdc-c00000038d516c00: This client was evicted by ls1-MDT0000; in progress operations using this service will fail.
      LustreError: 38852:0:(mdc_locks.c:784:mdc_enqueue()) ldlm_cli_enqueue: -5
      LustreError: 38852:0:(file.c:2394:ll_inode_revalidate_fini()) ls1: revalidate FID [0x200000001:0x6:0x0] error: rc = -5
      LustreError: 38853:0:(ldlm_resource.c:804:ldlm_resource_complain()) Namespace ls1-MDT0000-mdc-c00000038d516c00 resource refcount nonzero (2) after lock cleanup; forcing cleanup.
      LustreError: 38853:0:(ldlm_resource.c:810:ldlm_resource_complain()) Resource: c0000002d2961080 (8590005591/14977/0/12) (rc: 2)
      Lustre: ls1-MDT0000-mdc-c00000038d516c00: Connection restored to ls1-MDT0000 (at 172.20.5.1@o2ib500)
      LustreError: 11-0: ls1-OST0001-osc-c00000038d516c00: Communicating with 172.20.1.1@o2ib500, operation ost_statfs failed with -107
      Lustre: ls1-OST0001-osc-c00000038d516c00: Connection to ls1-OST0001 (at 172.20.1.1@o2ib500) was lost; in progress operations using this service will wait for recovery to complete
      Lustre: 3266:0:(osc_request.c:1064:osc_init_grant()) ls1-OST0001-osc-c00000038d516c00: available grant < 0, the OSS is probably not running with patch from bug20278 (-327680)
      LustreError: 167-0: ls1-OST0001-osc-c00000038d516c00: This client was evicted by ls1-OST0001; in progress operations using this service will fail.
      LustreError: 38874:0:(osc_lock.c:837:osc_ldlm_completion_ast()) lock@c0000003966e7850[2 3 0 1 1 00000000] R(1):[0, 18446744073709551615]@[0x100090000:0x44046:0x0] {
      LustreError: 38874:0:(osc_lock.c:837:osc_ldlm_completion_ast())     lovsub@c0000003e46e79c0: [111 c0000002d2009928 P(0):[0, 18446744073709551615]@[0x200011557:0x3a81:0x0]]
      LustreError: 38874:0:(osc_lock.c:837:osc_ldlm_completion_ast())     osc@c0000001f3e25d00: c0000001fa215c80    0x20000001001 0xeb88562abbf8497b 3 c000000319f49588 size: 470368256 mtime: 1359019361 atime: 0 ctime: 1359019361 blocks: 3289
      LustreError: 38874:0:(osc_lock.c:837:osc_ldlm_completion_ast()) } lock@c0000003966e7850
      LustreError: 38874:0:(osc_lock.c:837:osc_ldlm_completion_ast()) dlmlock returned -5
      Lustre: 3266:0:(osc_request.c:1064:osc_init_grant()) ls1-OST002c-osc-c00000038d516c00: available grant < 0, the OSS is probably not running with patch from bug20278 (-720896)
      Lustre: 3266:0:(osc_request.c:1064:osc_init_grant()) Skipped 15 previous similar messages
      

      Attachments

        Activity

          People

            laisiyao Lai Siyao
            morrone Christopher Morrone (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: