Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7392

client evicted: namespace resource [0x2b9a7de:0x0:0x0].0x0 (ffff8806d80cfcc0) refcount nonzero (1)

Details

    • Bug
    • Resolution: Won't Fix
    • Critical
    • None
    • Lustre 2.8.0
    • lola
      build: 2.7.62-28-g0754bc8, 0754bc8f2623bea184111af216f7567608db35b6; soakbuild '20151104.1'
    • 3
    • 9223372036854775807

    Description

      Error occurred during soak testing of build '20151104.1' on cluster lola (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20151104.1). MDTs are fromated with ldiskfs and OSTs with zfs as storage backend. DNE is enabled. MDSes are configured in HA failover configuration.

      Sequence of events:

      • 2015-11-04 18:47:30 – mds_restart lola-9 completed
      • 2015-11-04 18:50:30 – OSS (lola-5) evict client
        lola-5.log:Nov  4 18:50:30 lola-5 kernel: LustreError: 0:0:(ldlm_lockd.c:342:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 192.168.1.131@o2ib100  ns: filter-soaked-OST0007_UUID lock: ffff880313f841c0/0x15cebc1506e2a9b5 lrc: 3/0,0 mode: PW/PW res: [0x2b9a7de:0x0:0x0].0x0 rrc: 4 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x60000080010020 nid: 192.168.1.131@o2ib100 remote: 0x175f680569401922 expref: 5 pid: 10807 timeout: 4306845817 lvb_type: 0
        

        similar message exist on all OSS nodes

      • 2015-11-04 18:55:45 – client (lola-31) evicted from ost7
        Nov  4 18:50:45 lola-31 kernel: LustreError: 167-0: soaked-Nov  4 18:50:30 -osc-ffff881071e62400: This client w
        as evicted by soaked-OST0007; in progress operations using this service will fail.
        Nov  4 18:50:45 lola-31 kernel: LustreError: 16677:0:(ldlm_resource.c:887:ldlm_resource_complain()) so
        aked-OST0007-osc-ffff881071e62400: namespace resource [0x2b9a7de:0x0:0x0].0x0 (ffff8806d80cfcc0) refco
        unt nonzero (1) after lock cleanup; forcing cleanup.
        Nov  4 18:50:45 lola-31 kernel: LustreError: 16677:0:(ldlm_resource.c:1502:ldlm_resource_dump()) --- R
        esource: [0x2b9a7de:0x0:0x0].0x0 (ffff8806d80cfcc0) refcount = 2
        Nov  4 18:50:45 lola-31 kernel: LustreError: 16677:0:(ldlm_resource.c:1505:ldlm_resource_dump()) Grant
        ed locks (in reverse order):
        Nov  4 18:50:45 lola-31 kernel: LustreError: 16677:0:(ldlm_resource.c:1508:ldlm_resource_dump()) ### #
        ## ns: soaked-OST0007-osc-ffff881071e62400 lock: ffff880850f12a80/0x175f680569401922 lrc: 3/0,1 mode: 
        PW/PW res: [0x2b9a7de:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->1844674407370955
        1615) flags: 0x526480000000 nid: local remote: 0x15cebc1506e2a9b5 expref: -99 pid: 15402 timeout: 0 lv
        b_type: 1
        
      • till 2015-11-04 18:59:05 ost {8, a, b, c}

        are evicted with same error messages on client and OSSes

      • 2015-11-05 – client (lola-31) osc stay in state DISCONN, EVICTED for the OSTs affected (see
        file 'evicted-client.txt.bz2)
      • client node is unusable and all jobs crashed

      The problem might be related to LU-2067.

      Attached files:

      • OSSes (lola-[2-5]: messages, console log files
      • client lola-31: messages, console log files, 'lctl ..state* - output

      Attachments

        1. console-lola-31.log.bz2
          43 kB
        2. evicted-client.txt.bz2
          0.8 kB
        3. lola-2.log.bz2
          65 kB
        4. lola-3.log.bz2
          75 kB
        5. lola-4.log.bz2
          64 kB
        6. lola-5.log.bz2
          66 kB
        7. messages-lola-2.log.bz2
          456 kB
        8. messages-lola-3.log.bz2
          411 kB
        9. messages-lola-31.log.bz2
          217 kB
        10. messages-lola-4.log.bz2
          226 kB
        11. messages-lola-5.log.bz2
          453 kB

        Issue Links

          Activity

            People

              jay Jinshan Xiong (Inactive)
              heckes Frank Heckes (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: