-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.1.0, Lustre 1.8.6
-
None
-
Client: 1.8.5
Server: 2xMDS, 8xOSS, 24xOST, Lustre 2.0.59, RHEL 5.6
-
3
-
4996
We have noticed some interoperability issue between 1.8.5 clients and 2.0.59 server (no other versions tested)
Clients with 2.0.59 are not affected with the problem.
How to reproduce problem:
On client node issue:
cd /mnt/lustre
mkdir somebigdir
for i in `seq 1 10000`; do touch file.$i; done;
ls -la
Symptom is trivial - client hangs , when 2.0.59 is used, such kind of listing takes ~4s
Problem is interconnect independent: tested with @tcp as well as with @o2ib
Possible log message related to the issue:
00010000:00010000:10:1306772139.242230:0:3591:0:(ldlm_lock.c:597:ldlm_lock_decref_internal_nolock()) ### ldlm_lock_decref(PR) ns: scratch-MDT0000-mdc-ffff81041677b800 lock: ffff8103f56ec200/0xf6a4fad9013fdffb lrc: 3/1,0 mode: PR/PR res: 8589937616/1 bits 0x3 rrc: 2 type: IBT flags: 0x0 remote: 0x3b122fd677c9380d expref: -99 pid: 1905 timeout: 0
00010000:00010000:10:1306772139.242239:0:3591:0:(ldlm_lock.c:580:ldlm_lock_addref_internal_nolock()) ### ldlm_lock_addref(PR) ns: scratch-MDT0000-mdc-ffff81041677b800 lock: ffff8103f56ec200/0xf6a4fad9013fdffb lrc: 2/1,0 mode: PR/PR res: 8589937616/1 bits 0x3 rrc: 3 type: IBT flags: 0x0 remote: 0x3b122fd677c9380d expref: -99 pid: 1905 timeout: 0
00010000:00010000:10:1306772139.242244:0:3591:0:(ldlm_lock.c:1088:ldlm_lock_match()) ### matched (0 0) ns: scratch-MDT0000-mdc-ffff81041677b800 lock: ffff8103f56ec200/0xf6a4fad9013fdffb lrc: 2/1,0 mode: PR/PR res: 8589937616/1 bits 0x3 rrc: 2 type: IBT flags: 0x0 remote: 0x3b122fd677c9380d expref: -99 pid: 1905 timeout: 0
00000080:00200000:10:1306772139.242252:0:3591:0:(dir.c:594:ll_dir_readpage_20()) VFS Op:inode=144115238810157057/0(ffff8103f56ef920) off 3590582044
00000100:00100000:10:1306772139.242259:0:3591:0:(client.c:2084:ptlrpc_queue_wait()) Sending RPC pname:cluuid:pid:xid:nid:opc ls:9a637513-e3b6-abe7-b530-d8d413e552d9:3591:x1370249573210902:172.16.193.1@o2ib:37
00000100:00100000:10:1306772139.242811:0:3591:0:(client.c:2189:ptlrpc_queue_wait()) Completed RPC pname:cluuid:pid:xid:nid:opc ls:9a637513-e3b6-abe7-b530-d8d413e552d9:3591:x1370249573210902:172.16.193.1@o2ib:37
I can provide more information and do provide testing when needed.
Best Regards
–
Lukasz Flis