Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1126

Client file locking issue. Assertion triggered when decrementing a read lock on an item that has no existing read locks.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.5.0
    • Lustre 1.8.7
    • None
    • Client: Lustre 1.8.7-wc1, inkernel IB, RHEL 5.6
      Server Lustre 1.8.4, CentOS 5.5, Terascala appliance
    • 1
    • 3992

    Description

      The Lustre filesystem is mounted on the client using the -o flock option. Without this option the customer's application will not run. The application uses fcntl for file locking. The application does not explicitly release locks, it relies the file close operation to do that.

      The client hardware configuration is a single HP DL980 G7 server with 8x8 core Nahalem-EX CPUs and 512 GB RAM.

      The application workload consists of a number of processes writing to a small number of shared datasets.

      Below is an example of the traceback.

      Feb 15 21:08:43 pt980a kernel: LustreError: 23926:0:(ldlm_lock.c:599:ldlm_lock_decref_internal_nolock()) ASSERTION(lock->l_readers > 0) failed
      Feb 15 21:08:43 pt980a kernel: LustreError: 23926:0:(ldlm_lock.c:599:ldlm_lock_decref_internal_nolock()) LBUG
      Feb 15 21:08:43 pt980a kernel: Pid: 23926, comm: sas
      Feb 15 21:08:43 pt980a kernel:
      Feb 15 21:08:43 pt980a kernel: Call Trace:
      Feb 15 21:08:43 pt980a kernel: [<ffffffff889fa6a1>] libcfs_debug_dumpstack+0x51/0x60 [libcfs]
      Feb 15 21:08:43 pt980a kernel: [<ffffffff889fabda>] lbug_with_loc+0x7a/0xd0 [libcfs]
      Feb 15 21:08:43 pt980a kernel: [<ffffffff88a02ff0>] tracefile_init+0x0/0x110 [libcfs]
      Feb 15 21:08:43 pt980a kernel: [<ffffffff88b2161f>] ldlm_lock_decref_internal_nolock+0x7f/0x100 [ptlrpc]
      Feb 15 21:08:43 pt980a kernel: [<ffffffff88b492d9>] ldlm_process_flock_lock+0x1089/0x18a0 [ptlrpc]
      Feb 15 21:08:43 pt980a kernel: [<ffffffff88a4c33d>] LNetMDUnlink+0xcd/0xf0 [lnet]
      Feb 15 21:08:43 pt980a kernel: [<ffffffff88b1fd59>] ldlm_grant_lock+0x4e9/0x550 [ptlrpc]
      Feb 15 21:08:43 pt980a kernel: [<ffffffff88b4a5fb>] ldlm_flock_completion_ast+0xa0b/0xaf0 [ptlrpc]
      Feb 15 21:08:43 pt980a kernel: [<ffffffff88b23729>] ldlm_lock_enqueue+0x9d9/0xb20 [ptlrpc]
      Feb 15 21:08:43 pt980a kernel: [<ffffffff88b3bc8b>] ldlm_cli_enqueue_fini+0xa5b/0xbc0 [ptlrpc]
      Feb 15 21:08:43 pt980a kernel: [<ffffffff88ab567d>] class_handle_hash+0x16d/0x250 [obdclass]
      Feb 15 21:08:43 pt980a kernel: [<ffffffff8008e7f7>] default_wake_function+0x0/0xe
      Feb 15 21:08:43 pt980a kernel: [<ffffffff88b3d7cf>] ldlm_cli_enqueue+0x63f/0x700 [ptlrpc]
      Feb 15 21:08:43 pt980a kernel: [<ffffffff88b3e0a0>] ldlm_completion_ast+0x0/0x880 [ptlrpc]
      Feb 15 21:08:43 pt980a kernel: [<ffffffff88d20acf>] ll_file_flock+0x57f/0x680 [lustre]
      Feb 15 21:08:43 pt980a kernel: [<ffffffff88b49bf0>] ldlm_flock_completion_ast+0x0/0xaf0 [ptlrpc]
      Feb 15 21:08:43 pt980a kernel: [<ffffffff8003063e>] locks_remove_posix+0x84/0xa8
      Feb 15 21:08:43 pt980a kernel: [<ffffffff8003007e>] __up_write+0x27/0xf2
      Feb 15 21:08:43 pt980a kernel: [<ffffffff80023da7>] filp_close+0x54/0x64
      Feb 15 21:08:43 pt980a kernel: [<ffffffff8001e211>] sys_close+0x88/0xbd
      Feb 15 21:08:43 pt980a kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0

      Attachments

        1. debug_logs.tar.bz2
          6.21 MB
        2. flock.c
          1 kB
        3. log_lustre.3
          0.8 kB
        4. messages
          2.90 MB

        Issue Links

          Activity

            People

              green Oleg Drokin
              pcpiela Peter Piela (Inactive)
              Votes:
              2 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: