Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-376

Client hangs when listing big directory with ls -la

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.1.0, Lustre 1.8.6
    • Lustre 2.1.0, Lustre 1.8.6
    • None
    • Client: 1.8.5
      Server: 2xMDS, 8xOSS, 24xOST, Lustre 2.0.59, RHEL 5.6

    Description

      We have noticed some interoperability issue between 1.8.5 clients and 2.0.59 server (no other versions tested)
      Clients with 2.0.59 are not affected with the problem.

      How to reproduce problem:

      On client node issue:
      cd /mnt/lustre
      mkdir somebigdir
      for i in `seq 1 10000`; do touch file.$i; done;
      ls -la

      Symptom is trivial - client hangs , when 2.0.59 is used, such kind of listing takes ~4s

      Problem is interconnect independent: tested with @tcp as well as with @o2ib

      Possible log message related to the issue:

      00010000:00010000:10:1306772139.242230:0:3591:0:(ldlm_lock.c:597:ldlm_lock_decref_internal_nolock()) ### ldlm_lock_decref(PR) ns: scratch-MDT0000-mdc-ffff81041677b800 lock: ffff8103f56ec200/0xf6a4fad9013fdffb lrc: 3/1,0 mode: PR/PR res: 8589937616/1 bits 0x3 rrc: 2 type: IBT flags: 0x0 remote: 0x3b122fd677c9380d expref: -99 pid: 1905 timeout: 0
      00010000:00010000:10:1306772139.242239:0:3591:0:(ldlm_lock.c:580:ldlm_lock_addref_internal_nolock()) ### ldlm_lock_addref(PR) ns: scratch-MDT0000-mdc-ffff81041677b800 lock: ffff8103f56ec200/0xf6a4fad9013fdffb lrc: 2/1,0 mode: PR/PR res: 8589937616/1 bits 0x3 rrc: 3 type: IBT flags: 0x0 remote: 0x3b122fd677c9380d expref: -99 pid: 1905 timeout: 0
      00010000:00010000:10:1306772139.242244:0:3591:0:(ldlm_lock.c:1088:ldlm_lock_match()) ### matched (0 0) ns: scratch-MDT0000-mdc-ffff81041677b800 lock: ffff8103f56ec200/0xf6a4fad9013fdffb lrc: 2/1,0 mode: PR/PR res: 8589937616/1 bits 0x3 rrc: 2 type: IBT flags: 0x0 remote: 0x3b122fd677c9380d expref: -99 pid: 1905 timeout: 0
      00000080:00200000:10:1306772139.242252:0:3591:0:(dir.c:594:ll_dir_readpage_20()) VFS Op:inode=144115238810157057/0(ffff8103f56ef920) off 3590582044
      00000100:00100000:10:1306772139.242259:0:3591:0:(client.c:2084:ptlrpc_queue_wait()) Sending RPC pname:cluuid:pid:xid:nid:opc ls:9a637513-e3b6-abe7-b530-d8d413e552d9:3591:x1370249573210902:172.16.193.1@o2ib:37
      00000100:00100000:10:1306772139.242811:0:3591:0:(client.c:2189:ptlrpc_queue_wait()) Completed RPC pname:cluuid:pid:xid:nid:opc ls:9a637513-e3b6-abe7-b530-d8d413e552d9:3591:x1370249573210902:172.16.193.1@o2ib:37

      I can provide more information and do provide testing when needed.
      Best Regards

      Lukasz Flis

      Attachments

        Activity

          People

            yong.fan nasf (Inactive)
            lflis Lukasz Flis
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: