[LU-8935] Lustre mount re-export via NFS long timeouts, not working with autofs Created: 13/Dec/16  Updated: 22/Aug/17  Resolved: 29/Jul/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.10.1, Lustre 2.11.0

Type: Bug Priority: Major
Reporter: Steve Dainard (Inactive) Assignee: Oleg Drokin
Resolution: Fixed Votes: 0
Labels: None
Environment:

CentOS 7.2
Linux dmds1 3.10.0-327.3.1.el7_lustre.x86_64 #1 SMP Thu Feb 18 10:53:23 PST 2016 x86_64 x86_64 x86_64 GNU/Linux


Attachments: File 2016-12-20-syslog.log.vanlustre4     File client.nfs.mount.strace     File client.nfs.mount.strace.timestamps     File dmds1.lustre.messages     File lustre-debug-files.tar.gz     HTML File nfs-server-strace-exportfs-a     File nfs.server.logs    
Issue Links:
Duplicate
Related
Epic/Theme: Lustre-2.8.0
Severity: 2
Rank (Obsolete): 9223372036854775807

 Description   

We have two lustre clusters (all machines CentOS 7.2). Each of the clusters has a lustre filesystem mounted on a CentOS 7.2 server which exports the filesystem via NFS.

The start of our issue may have coincided with a system downtime for hardware maintenance on the 'delta' cluster.

The 'delta' cluster lustre filesystem will mount properly on the 'vanlustre4' NFS server, but clients intermittently cannot access the export via autofs mounts and if they can it is after long delays. Manually mounting the NFS export on a client machine will work but there is an incredibly long delay (~ 1-5 minutes), which I assume is why autofs has issues mounting.

We also attempted to mount the 'delta' lustre filesystem on the 'vanlustre3' server which exports a lustre filesystem from the 'echo' cluster, but encounter the same long delays so it seems reasonable this is not an NFS server issue.

On the MDS server for the 'delta' cluster 'dmds1' kernel logs I see reconnecting messages relating to the NFS server vanlustre4 (10.23.22.114). As well as LustreError ldlm_lib.c:3169:target_bulk_io(). I've attached the 'Lustre' logs from syslog.

I've also attached an strace of a client machine manually mounting the 'delta' lustre filesystem export from 'vanlustre4'.

On 'vanlustre4' messages shows multiple auth attempts from my client during the above manual mount which did eventually succeed:
Dec 13 10:19:59 vanlustre4 rpc.mountd[2807]: authenticated mount request from 10.23.32.109:877 for /user_data (/user_data)
Dec 13 10:20:04 vanlustre4 rpc.mountd[2805]: authenticated mount request from 10.23.32.109:877 for /user_data (/user_data)
Dec 13 10:20:14 vanlustre4 rpc.mountd[2812]: authenticated mount request from 10.23.32.109:877 for /user_data (/user_data)
Dec 13 10:20:34 vanlustre4 rpc.mountd[2808]: authenticated mount request from 10.23.32.109:766 for /user_data (/user_data)
Dec 13 10:20:58 vanlustre4 rpc.mountd[2809]: authenticated mount request from 10.23.32.109:766 for /user_data (/user_data)
Dec 13 10:20:58 vanlustre4 rpc.mountd[2809]: authenticated mount request from 10.23.32.109:766 for /user_data (/user_data)

Any suggestions on how to further troubleshoot this issue would be appreciated.



 Comments   
Comment by Oleg Drokin [ 14/Dec/16 ]

Wha is in the lustre client (=nfs server) logs?
Those timeouts for readdir look suspicious? are you sure your network is ok?

Comment by Andreas Dilger [ 14/Dec/16 ]

Given that this problem started with system maintenance, the first place to look would be related to what changed there. One possible issue is that there is a problem with the network on this client? It is hard to say what the problem is without also seeing the console messages from the client.

Minor note - running strace -ttt will include timestamps in the output so that it is possible to see which commands are taking a long time. I suspect it is mount() that is slow, but hard to know for sure.

Comment by Steve Dainard (Inactive) [ 14/Dec/16 ]

I've added the NFS server kernel logs, and another client strace with timestamps.

To clarify the 'hardware maintenance' was a disk shelf being replaced in one of the OSS servers.

Although we cannot export the lustre filesystem via NFS without long delays, the NFS server does run rsync to backup other hosts onto its mounted lustre filesystem, and this is working properly so the lustre filesystem mount seems ok which makes us think its an NFS server issue. But the NFS server being a culprit seems to be discounted because if we mount the lustre filesystem and export on a different NFS server we get the same NFS mount delay.

The network looks pretty solid, although quite a few rx_discards, but these stats are not incrementing while I'm test mounting the NFS export on my client system, or at any point today.

Comment by Oleg Drokin [ 15/Dec/16 ]
Dec 11 00:00:56 vanlustre4 kernel: Lustre: 46864:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1481443249/real 1481443249]  req@ffff880946c29e00 x1553319753601680/t0(0) o36->delta-MDT0000-mdc-ffff880fd9647800@10.23.22.90@tcp:12/10 lens 608/1016 e 0 to 1 dl 1481443256 ref 2 fl Rpc:X/0/ffffffff rc 0/-1

So the sent time is 1481443249 which is 00:0:49, just 7 seconds.
On one hand for readdir should be more than enough (unless the directory is huge or something, I guess).
I have no idea what is this directory that's being listed, but does it take long time for ls in it from any other client (I imagine since it's at mount, it must be one of the directories on the path to the export point or the export point itself.
Server logs indicate that actual readdir takes:

Dec 13 00:02:24 dmds1 kernel: LustreError: 49950:0:(ldlm_lib.c:3169:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff884825f43000 x1553326849866236/t0(0) o37->cfbb5c72-aeba-7fa6-0c04-35195c6e8ec0@10.23.22.114@tcp:375/0 lens 568/440 e 0 to 0 dl 1481616150 ref 1 fl Interpret:/0/0 rc 0/0
Dec 13 00:02:46 dmds1 kernel: LustreError: 49821:0:(ldlm_lib.c:3119:target_bulk_io()) @@@ bulk READ failed: rc -107  req@ffff882f3d19ec50 x1553326852292996/t0(0) o37->cfbb5c72-aeba-7fa6-0c04-35195c6e8ec0@10.23.22.114@tcp:434/0 lens 568/440 e 0 to 0 dl 1481616209 ref 1 fl Interpret:/0/0 rc 0/0

So it takes the server another 22 seconds to prepare and send the readdir reply and then notice that nobody is listening anymore.
That makes it about 29 seconds for the whole thing which is quite long.

Also you might want to check that mtu settings everywhere match since I guess readdir is one of the bigger RPCs mds does so it might be the one to trigger various MTU mismatches.

Another thing 0 if you rebooted your MDS too - it would start with an empty cache, and it'll take awhile for it to be repopulated, though one would imagine that would need to happen only once and after that things would become fast, which is clearly not the case here.

Enabling some extended debug on the nfs server/lustre client node (i.e. echo +vfstrace +rpctrace >/proc/sys/lnet/debug (or even -1 to get everything with corresponding debug_mb increase) might prove useful to see what is the directory eing accessed, why is it timing out so fast and so on.
Also additionally I guess you might want to make sure that AT parameters and obd_timout on the clients and servers match.
You don't happen to be mounting filesystems with different obd_timeout values set in them from that nfs reexporting node, do you?

Comment by Steve Dainard (Inactive) [ 15/Dec/16 ]

From the NFS server, listing the top level in the lustre filesystem is very quick, and the top level only has one sub dir:

[sdai@vanlustre4 user_data]$ pwd
/user_data
[sdai@vanlustre4 user_data]$ time ll
total 4
drwxrwxrwx 3 root root 4096 Oct 27 11:53 ARCHIVE

real 0m0.005s
user 0m0.002s
sys 0m0.002s

If I do a find with a few depths the response time is still pretty quick:

[sdai@vanlustre4 user_data]$ time find . -maxdepth 5 | grep ARCHIVE | wc -l
11788

real 0m1.495s
user 0m0.056s
sys 0m0.543s

I'm guessing at this point the cache is coming into play.

I've checked the MTU on all the hosts, and they are all set to the default 1500. This mirrors the setup of the 'echo' cluster hosts which are on the same switches.

You don't happen to be mounting filesystems with different obd_timeout values set in them from that nfs reexporting node, do you?

I've got about 6 hours of time troubleshooting Lustre, so I can't comment too much on configuration.
The mount options for lustre are pretty much just defaults:
fstab: dmds1:/delta /user_data lustre _netdev,localflock,defaults 0 0
mount: 10.23.22.90@tcp:/delta on /user_data type lustre (rw,localflock)

I should also mention, when restarting or re-exporting NFS exports on the NFS server, the service start also hangs for a very long time. ~1-5 minutes. I've attached an strace (see 'nfs-server-strace-exportfs-a') of a different NFS server which has mounted the 'delta' filesystem and run exportfs -r.

I also noticed when I unmounted the delta filesystem on the new NFS test server, the following is logged:
[sdai@sdai-test mnt]$ sudo umount delta
Dec 15 11:12:49 sdai-test kernel: LustreError: 21556:0:(lov_obd.c:922:lov_cleanup()) delta-clilov-ffff880045592000: lov tgt 18 not cleaned! deathrow=0, lovrc=1
Dec 15 11:12:49 sdai-test kernel: Lustre: Unmounted delta-client

And in all cases after echoing -1 to /proc/sys/lnet/debug I'm not seeing any 'Lustre' logging on the NFS server, although to be fair most of the logging we've seen to date has been in the early morning hours, not when troubleshooting NFS exports.
I can't find the debug output file where its supposed to be:
cat /proc/sys/lnet/debug_path
/tmp/lustre-log
Maybe this is just syslog?

stat /tmp/lustre-log
stat: cannot stat ‘/tmp/lustre-log’: No such file or directory

Comment by Steve Dainard (Inactive) [ 15/Dec/16 ]

In addition the NFS server looks like it crashed early this morning, here are the last logs in syslog:

Dec 15 00:35:17 vanlustre4 rpc.mountd[2810]: authenticated mount request from 10.23.20.50:999 for /user_data (/use
r_data)
Dec 15 00:40:01 vanlustre4 systemd: Created slice user-0.slice.
Dec 15 00:40:01 vanlustre4 systemd: Starting user-0.slice.
Dec 15 00:40:01 vanlustre4 systemd: Started Session 1361 of user root.
Dec 15 00:40:01 vanlustre4 systemd: Starting Session 1361 of user root.
Dec 15 00:40:01 vanlustre4 CROND[20207]: (root) CMD (/usr/lib64/sa/sa1 1 1 >& /dev/null)
Dec 15 00:40:01 vanlustre4 systemd: Removed slice user-0.slice.
Dec 15 00:40:01 vanlustre4 systemd: Stopping user-0.slice.
Dec 15 00:40:42 vanlustre4 kernel: Lustre: 20438:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent ha
s timed out for slow reply: [sent 1481791235/real 1481791235] req@ffff881423d53f00 x1553333777759772/t0(0) o36->d
elta-MDT0000-mdc-ffff880fd9647800@10.23.22.90@tcp:12/10 lens 608/1016 e 0 to 1 dl 1481791242 ref 2 fl Rpc:X/0/ffff
ffff rc 0/-1
Dec 15 00:40:42 vanlustre4 kernel: Lustre: 20438:0:(client.c:2063:ptlrpc_expire_one_request()) Skipped 85 previous
similar messages
Dec 15 00:40:42 vanlustre4 kernel: Lustre: delta-MDT0000-mdc-ffff880fd9647800: Connection to delta-MDT0000 (at 10.
23.22.90@tcp) was lost; in progress operations using this service will wait for recovery to complete
Dec 15 00:40:42 vanlustre4 kernel: Lustre: Skipped 82 previous similar messages
Dec 15 00:40:42 vanlustre4 kernel: Lustre: delta-MDT0000-mdc-ffff880fd9647800: Connection restored to 10.23.22.90@
tcp (at 10.23.22.90@tcp)
Dec 15 00:40:42 vanlustre4 kernel: Lustre: Skipped 82 previous similar messages
Dec 15 00:43:15 vanlustre4 rpc.mountd[2805]: authenticated mount request from 10.23.20.50:963 for /user_data (/use
r_data)
Dec 15 00:44:15 vanlustre4 rpc.mountd[2808]: authenticated mount request from 10.23.20.50:780 for /user_data (/use
r_data)
Dec 15 00:45:15 vanlustre4 rpc.mountd[2806]: authenticated mount request from 10.23.20.50:879 for /user_data (/use
r_data)
Dec 15 00:47:58 vanlustre4 kernel: LustreError: 21534:0:(niobuf.c:329:ptlrpc_register_bulk()) ASSERTION( desc->bd_
nob_transferred == 0 ) failed:
Dec 15 00:47:58 vanlustre4 kernel: LustreError: 21534:0:(niobuf.c:329:ptlrpc_register_bulk()) LBUG
Dec 15 00:56:41 vanlustre4 rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="2629" x-info="http://www
.rsyslog.com"] start

Comment by Oleg Drokin [ 19/Dec/16 ]

Sorry, I was not really verbose.
Lustre debug logs are accessed using "lctl dk >/some/nonlustre/path" command (once you do the echo -1 to actually enable them).

Comment by Steve Dainard (Inactive) [ 19/Dec/16 ]

I've attached the log output from just before the client tries to mount the NFS export, until it mounts successfully. I have a TON of intermediate files as well if they end up being useful.

Comment by Steve Dainard (Inactive) [ 20/Dec/16 ]

Kernel Panic on the NFS server this morning:

Dec 20 01:58:54 vanlustre4 kernel: LustreError: 11-0: delta-MDT0000-mdc-ffff881fe3158000: operation ldlm_enqueue t
o node 10.23.22.90@tcp failed: rc = -107
Dec 20 01:58:54 vanlustre4 kernel: LustreError: Skipped 4 previous similar messages
Dec 20 02:00:12 vanlustre4 systemd: Created slice user-0.slice.
Dec 20 02:00:12 vanlustre4 systemd: Starting user-0.slice.
Dec 20 02:00:12 vanlustre4 systemd: Started Session 533 of user root.
Dec 20 02:00:12 vanlustre4 systemd: Starting Session 533 of user root.
Dec 20 02:00:12 vanlustre4 CROND[46317]: (root) CMD (/usr/lib64/sa/sa1 1 1 >& /dev/null)
Dec 20 02:00:12 vanlustre4 systemd: Removed slice user-0.slice.
Dec 20 02:00:12 vanlustre4 systemd: Stopping user-0.slice.
Dec 20 02:00:58 vanlustre4 dhclient[2505]: DHCPREQUEST on bond0 to 10.23.20.10 port 67 (xid=0x3f1af2b3)
Dec 20 02:00:58 vanlustre4 dhclient[2505]: DHCPACK from 10.23.20.10 (xid=0x3f1af2b3)
Dec 20 02:01:01 vanlustre4 nscd: 2134 monitored file `/etc/hosts` was moved into place, adding watch
Dec 20 02:01:01 vanlustre4 dhclient[2505]: bound to 10.23.22.114 – renewal in 17234 seconds.
Dec 20 02:01:12 vanlustre4 systemd: Created slice user-0.slice.
Dec 20 02:01:12 vanlustre4 systemd: Starting user-0.slice.
Dec 20 02:01:12 vanlustre4 systemd: Started Session 534 of user root.
Dec 20 02:01:12 vanlustre4 systemd: Starting Session 534 of user root.
Dec 20 02:01:12 vanlustre4 CROND[46665]: (root) CMD (run-parts /etc/cron.hourly)
Dec 20 02:01:12 vanlustre4 run-parts(/etc/cron.hourly)[4666 starting 0anacron
Dec 20 02:01:12 vanlustre4 anacron[46677]: Anacron started on 2016-12-20
Dec 20 02:01:12 vanlustre4 run-parts(/etc/cron.hourly)[4667 finished 0anacron
Dec 20 02:01:12 vanlustre4 run-parts(/etc/cron.hourly)[4666 starting 0yum-hourly.cron
Dec 20 02:01:12 vanlustre4 run-parts(/etc/cron.hourly)[4668 finished 0yum-hourly.cron
Dec 20 02:01:12 vanlustre4 run-parts(/etc/cron.hourly)[4666 starting tmpwatch-weed
Dec 20 02:01:12 vanlustre4 anacron[46677]: Normal exit (0 jobs run)
Dec 20 02:01:12 vanlustre4 run-parts(/etc/cron.hourly)[4669 finished tmpwatch-weed
Dec 20 02:01:12 vanlustre4 systemd: Removed slice user-0.slice.
Dec 20 02:01:12 vanlustre4 systemd: Stopping user-0.slice.
Dec 20 02:01:38 vanlustre4 kernel: Lustre: 46333:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent ha
s timed out for slow reply: [sent 1482228091/real 1482228091] req@ffff880d96e8a100 x1554054856577112/t0(0) o36->d
elta-MDT0000-mdc-ffff881fe3158000@10.23.22.90@tcp:12/10 lens 608/1016 e 0 to 1 dl 1482228098 ref 2 fl Rpc:X/0/ffff
ffff rc 0/-1
Dec 20 02:01:38 vanlustre4 kernel: Lustre: 46333:0:(client.c:2063:ptlrpc_expire_one_request()) Skipped 66 previous
similar messages
Dec 20 02:01:38 vanlustre4 kernel: Lustre: delta-MDT0000-mdc-ffff881fe3158000: Connection to delta-MDT0000 (at 10.
23.22.90@tcp) was lost; in progress operations using this service will wait for recovery to complete
Dec 20 02:01:38 vanlustre4 kernel: Lustre: Skipped 67 previous similar messages
Dec 20 02:01:38 vanlustre4 kernel: Lustre: delta-MDT0000-mdc-ffff881fe3158000: Connection restored to 10.23.22.90@
tcp (at 10.23.22.90@tcp)
Dec 20 02:01:38 vanlustre4 kernel: Lustre: Skipped 67 previous similar messages
Dec 20 02:01:47 vanlustre4 kernel: LustreError: 43858:0:(niobuf.c:329:ptlrpc_register_bulk()) ASSERTION( desc->bd_
nob_transferred == 0 ) failed:
Dec 20 02:01:47 vanlustre4 kernel: LustreError: 43858:0:(niobuf.c:329:ptlrpc_register_bulk()) LBUG
Dec 20 02:01:47 vanlustre4 kernel: Pid: 43858, comm: rsync
Dec 20 02:01:47 vanlustre4 kernel: #012Call Trace:
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffffa07d37d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffffa07d3d75>] lbug_with_loc+0x45/0xc0 [libcfs]
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffffa0b508ed>] ptlrpc_register_bulk+0x10d/0x9c0 [ptlrpc]
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffffa07de9e7>] ? libcfs_debug_msg+0x57/0x80 [libcfs]
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffffa0b51c9f>] ptl_send_rpc+0x1ff/0xdb0 [ptlrpc]
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffffa07dc3c4>] ? libcfs_log_return+0x24/0x30 [libcfs]
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffffa0b4b189>] ptlrpc_check_set.part.22+0x1779/0x1d70 [ptlrpc]
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffffa0b4b7db>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc]
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffffa0b4bd51>] ptlrpc_set_wait+0x4f1/0x900 [ptlrpc]
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffff810b8c10>] ? default_wake_function+0x0/0x20
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffffa0b4c1dd>] ptlrpc_queue_wait+0x7d/0x220 [ptlrpc]
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffffa0d00501>] mdc_getpage+0x1e1/0x680 [mdc]
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffff811b43e9>] ? alloc_pages_current+0xa9/0x170
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffffa0d00ad5>] mdc_read_page_remote+0x135/0x670 [mdc]
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffffa0d009a0>] ? mdc_read_page_remote+0x0/0x670 [mdc]
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffff811698df>] do_read_cache_page+0x7f/0x190
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffff81169a2c>] read_cache_page+0x1c/0x30
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffffa0cff2e4>] mdc_read_page+0x1a4/0xa00 [mdc]
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffffa0d37122>] lmv_read_page+0x1b2/0x590 [lmv]
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffff811f22e0>] ? filldir+0x0/0xf0
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffffa0d69168>] ll_get_dir_page+0xc8/0x380 [lustre]
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffffa0da6260>] ? ll_md_blocking_ast+0x0/0x740 [lustre]
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffffa0d6958d>] ll_dir_read+0xad/0x320 [lustre]
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffff811f22e0>] ? filldir+0x0/0xf0
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffff811f22e0>] ? filldir+0x0/0xf0
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffffa0d69925>] ll_readdir+0x125/0x4d0 [lustre]
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffff811f22e0>] ? filldir+0x0/0xf0
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffff811f22e0>] ? filldir+0x0/0xf0
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffff811f21d0>] vfs_readdir+0xb0/0xe0
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffff811f25f5>] SyS_getdents+0x95/0x120
Dec 20 02:01:47 vanlustre4 kernel: [<ffffffff81645b09>] system_call_fastpath+0x16/0x1b
Dec 20 02:01:47 vanlustre4 kernel:
Dec 20 02:01:47 vanlustre4 kernel: Kernel panic - not syncing: LBUG

Full syslog attached.

Comment by Steve Dainard (Inactive) [ 22/Dec/16 ]

NFS server crash:

Dec 22 02:03:05 vanlustre4 kernel: Lustre: 20662:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1482400977/real 1482400977] req@ffff880280affb00 x1554362320709380/t0(0) o36->delta-MDT0
000-mdc-ffff880fe594d800@10.23.22.90@tcp:12/10 lens 608/1016 e 0 to 1 dl 1482400984 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
Dec 22 02:03:05 vanlustre4 kernel: Lustre: 20662:0:(client.c:2063:ptlrpc_expire_one_request()) Skipped 82 previous similar messages
Dec 22 02:03:05 vanlustre4 kernel: Lustre: delta-MDT0000-mdc-ffff880fe594d800: Connection to delta-MDT0000 (at 10.23.22.90@tcp) was lost; in progress operations using this service will wait for recovery to complete
Dec 22 02:03:05 vanlustre4 kernel: Lustre: Skipped 80 previous similar messages
Dec 22 02:03:05 vanlustre4 kernel: Lustre: delta-MDT0000-mdc-ffff880fe594d800: Connection restored to 10.23.22.90@tcp (at 10.23.22.90@tcp)
Dec 22 02:03:05 vanlustre4 kernel: Lustre: Skipped 80 previous similar messages
Dec 22 02:09:49 vanlustre4 sudo: jenkins : TTY=unknown ; PWD=/opt/jenkins/workspace/dirvish-filer-vannfs8 ; USER=root ; COMMAND=/user_data/ARCHIVE/dirvish/dirvish-common/bin/dirvish-filer vannfs8
Dec 22 02:10:01 vanlustre4 systemd: Created slice user-0.slice.
Dec 22 02:10:01 vanlustre4 systemd: Starting user-0.slice.
Dec 22 02:10:01 vanlustre4 systemd: Started Session 157 of user root.
Dec 22 02:10:01 vanlustre4 systemd: Starting Session 157 of user root.
Dec 22 02:10:01 vanlustre4 CROND[41610]: (root) CMD (/usr/lib64/sa/sa1 1 1 >& /dev/null)
Dec 22 02:10:01 vanlustre4 systemd: Removed slice user-0.slice.
Dec 22 02:10:01 vanlustre4 systemd: Stopping user-0.slice.
Dec 22 02:12:03 vanlustre4 kernel: LustreError: 46952:0:(niobuf.c:329:ptlrpc_register_bulk()) ASSERTION( desc->bd_nob_transferred == 0 ) failed:
Dec 22 02:12:03 vanlustre4 kernel: LustreError: 46952:0:(niobuf.c:329:ptlrpc_register_bulk()) LBUG
Dec 22 02:12:03 vanlustre4 kernel: Pid: 46952, comm: rsync
Dec 22 02:12:03 vanlustre4 kernel: #012Call Trace:
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffffa07bb7d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffffa07bbd75>] lbug_with_loc+0x45/0xc0 [libcfs]
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffffa0b258ed>] ptlrpc_register_bulk+0x10d/0x9c0 [ptlrpc]
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffffa082e452>] ? LNetMDUnlink+0xe2/0x180 [lnet]
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffffa0b53956>] ? sptlrpc_import_sec_ref+0x36/0x40 [ptlrpc]
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffffa0b26c9f>] ptl_send_rpc+0x1ff/0xdb0 [ptlrpc]
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffff8108c1fb>] ? lock_timer_base.isra.33+0x2b/0x50
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffffa0b20189>] ptlrpc_check_set.part.22+0x1779/0x1d70 [ptlrpc]
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffff8108d782>] ? del_timer_sync+0x52/0x60
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffffa0b207db>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc]
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffffa0b20d51>] ptlrpc_set_wait+0x4f1/0x900 [ptlrpc]
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffff810b8c10>] ? default_wake_function+0x0/0x20
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffffa0b211dd>] ptlrpc_queue_wait+0x7d/0x220 [ptlrpc]
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffffa0cd5501>] mdc_getpage+0x1e1/0x680 [mdc]
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffff811b43e9>] ? alloc_pages_current+0xa9/0x170
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffffa0cd5ad5>] mdc_read_page_remote+0x135/0x670 [mdc]
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffffa0cd59a0>] ? mdc_read_page_remote+0x0/0x670 [mdc]
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffff811698df>] do_read_cache_page+0x7f/0x190
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffff81169a2c>] read_cache_page+0x1c/0x30
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffffa0cd42e4>] mdc_read_page+0x1a4/0xa00 [mdc]
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffffa0d57572>] ? ll_inode_permission+0xb2/0x3f0 [lustre]
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffffa0d0c122>] lmv_read_page+0x1b2/0x590 [lmv]
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffff811f22e0>] ? filldir+0x0/0xf0
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffffa0d3e168>] ll_get_dir_page+0xc8/0x380 [lustre]
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffffa0d7b260>] ? ll_md_blocking_ast+0x0/0x740 [lustre]
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffffa0d3e58d>] ll_dir_read+0xad/0x320 [lustre]
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffff811f22e0>] ? filldir+0x0/0xf0
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffff811f22e0>] ? filldir+0x0/0xf0
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffffa0d3e925>] ll_readdir+0x125/0x4d0 [lustre]
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffff811f22e0>] ? filldir+0x0/0xf0
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffff811f22e0>] ? filldir+0x0/0xf0
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffff811f21d0>] vfs_readdir+0xb0/0xe0
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffff811f25f5>] SyS_getdents+0x95/0x120
Dec 22 02:12:03 vanlustre4 kernel: [<ffffffff81645b09>] system_call_fastpath+0x16/0x1b
Dec 22 02:12:03 vanlustre4 kernel:
Dec 22 02:12:03 vanlustre4 kernel: Kernel panic - not syncing: LBUG

Comment by Oleg Drokin [ 22/Dec/16 ]

Hm, did the lbug happen while you had debug set to -1 by any chance? Did you happen to capture a crashdump?

Comment by Oleg Drokin [ 22/Dec/16 ]

about the debug logs the log buffer is cleared once you do lctl dk, so all the intermediate ones you took potentially have all the info I am looking for, but the one before the mount has nothing and the one once the mount completes potentially only has the last request that was taking the long time.
I see there's a bunch of IO happening on this client too that potentially makes the logs bigger than necessary.
What I am hoping is to have the totally idle client to try and have nfs exported lustre (if exportfs -a reproduces that - that's good too) with debug enabled and collected around that event with no other lustre activity happening.

By increasing the debug_mb settign you increase the size of the debug buffer so more stuff fits in if the process takes a long time (say 1024 in there would be good enough for an otherwise totally idle client, I guess).

Comment by Steve Dainard (Inactive) [ 22/Dec/16 ]

crashdump as in kdump? If yes, I do.

I've got crashes from Dec 10, 12, 15, 17, 18, 20, 22. Do you have somewhere I can upload these to? They're 53-70G in size... at least one of them should have debug set to -1 but I don't think the most recent one would have, although it is the largest on disk.

Comment by Oleg Drokin [ 22/Dec/16 ]

you can upload to our ftp at ftp.hpdd.intel.com to /uploads/ directory, just create a directory like LU-8935 for your files.

I would need the crashdump and vmlinux from -debuginfo rpm and lustre-modules-debuginfo rpm (assuming you selfbuilt lustre + kernel, if you used one of the ones we built, just let me know which one was it).

Thank you.

Comment by Steve Dainard (Inactive) [ 03/Jan/17 ]

I think there is a LU-8935 directory on the FTP server which I uploaded on the 23rd of Dec, but I can't see that directory now. I also can't create it, so perhaps I don't have perms for that dir now. In any case, I'm uploading the two most recent dumps to /uploads/LU8935/ right now, which should be done in ~90 minutes.

We're using the pre-built kernels from lustre repos:
Linux vanlustre4 3.10.0-327.3.1.el7_lustre.x86_64 #1 SMP Thu Feb 18 10:53:23 PST 2016 x86_64 x86_64 x86_64 GNU/Linux

Thanks

Comment by Steve Dainard (Inactive) [ 05/Jan/17 ]

Hello,

Has anyone had a change to take a look through these kernel dumps?

Thanks

Comment by Oleg Drokin [ 06/Jan/17 ]

I am taking a look at the files you uploaded (they made it in just fine).

Comment by Steve Dainard (Inactive) [ 09/Jan/17 ]

Sometime soon after delta cluster was created, delta-OST0012 seems to have been deactivated:

2016-11-02-syslog.log:Nov  2 15:10:12 dmds1 kernel: Lustre: 4643:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1478124604/real 1478124604]  req@ffff8852d8dd7200 x1549476144894000/t0(0) o13->delta-OST0012-osc-MDT0000@10.23.22.92@tcp:7/4 lens 224/368 e 0 to 1 dl 1478124611 ref 1 fl Rpc:X/0/ffffffff rc 0/-1
2016-11-02-syslog.log:Nov  2 15:10:12 dmds1 kernel: Lustre: delta-OST0012-osc-MDT0000: Connection to delta-OST0012 (at 10.23.22.92@tcp) was lost; in progress operations using this service will wait for recovery to complete
2016-11-02-syslog.log:Nov  2 15:10:18 dmds1 kernel: Lustre: 4588:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1478124612/real 1478124612]  req@ffff88407c1e2100 x1549476144899076/t0(0) o8->delta-OST0012-osc-MDT0000@10.23.22.92@tcp:28/4 lens 520/544 e 0 to 1 dl 1478124618 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
2016-11-02-syslog.log:Nov  2 15:10:48 dmds1 kernel: Lustre: 4588:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1478124637/real 1478124637]  req@ffff883d0c413900 x1549476144914028/t0(0) o8->delta-OST0012-osc-MDT0000@10.23.22.92@tcp:28/4 lens 520/544 e 0 to 1 dl 1478124648 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
2016-11-02-syslog.log:Nov  2 15:11:18 dmds1 kernel: Lustre: 4588:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1478124662/real 1478124662]  req@ffff884070e6d100 x1549476144932136/t0(0) o8->delta-OST0012-osc-MDT0000@10.23.22.92@tcp:28/4 lens 520/544 e 0 to 1 dl 1478124678 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
2016-11-02-syslog.log:Nov  2 15:11:40 dmds1 kernel: LustreError: 140-5: Server delta-OST0012 requested index 18, but that index is already in use. Use --writeconf to force
2016-11-02-syslog.log:Nov  2 15:11:40 dmds1 kernel: LustreError: 108857:0:(mgs_handler.c:460:mgs_target_reg()) Failed to write delta-OST0012 log (-98)
2016-11-02-syslog.log:Nov  2 15:11:48 dmds1 kernel: Lustre: 4588:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1478124687/real 1478124687]  req@ffff884070e6a100 x1549476144946028/t0(0) o8->delta-OST0012-osc-MDT0000@10.23.22.92@tcp:28/4 lens 520/544 e 0 to 1 dl 1478124708 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
2016-11-02-syslog.log:Nov  2 15:12:18 dmds1 kernel: Lustre: 4588:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1478124712/real 1478124712]  req@ffff880622c00000 x1549476144962948/t0(0) o8->delta-OST0012-osc-MDT0000@10.23.22.92@tcp:28/4 lens 520/544 e 0 to 1 dl 1478124738 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
2016-11-02-syslog.log:Nov  2 15:13:13 dmds1 kernel: Lustre: 4588:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1478124762/real 1478124762]  req@ffff8843c7f14b00 x1549476145004052/t0(0) o8->delta-OST0012-osc-MDT0000@10.23.22.92@tcp:28/4 lens 520/544 e 0 to 1 dl 1478124793 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
2016-11-02-syslog.log:Nov  2 15:14:08 dmds1 kernel: Lustre: 4588:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1478124812/real 1478124812]  req@ffff8843c7f16300 x1549476145039664/t0(0) o8->delta-OST0012-osc-MDT0000@10.23.22.92@tcp:28/4 lens 520/544 e 0 to 1 dl 1478124848 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
2016-11-02-syslog.log:Nov  2 15:15:58 dmds1 kernel: Lustre: 4588:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1478124912/real 1478124912]  req@ffff8843c7f13c00 x1549476145113496/t0(0) o8->delta-OST0012-osc-MDT0000@10.23.22.92@tcp:28/4 lens 520/544 e 0 to 1 dl 1478124958 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
2016-11-02-syslog.log:Nov  2 15:18:12 dmds1 kernel: Lustre: 4588:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1478125037/real 1478125038]  req@ffff8844389de900 x1549476145200192/t0(0) o8->delta-OST0012-osc-MDT0000@10.23.22.92@tcp:28/4 lens 520/544 e 0 to 1 dl 1478125092 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
2016-11-02-syslog.log:Nov  2 15:23:12 dmds1 kernel: Lustre: 4588:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1478125337/real 1478125337]  req@ffff88446375f200 x1549476145360432/t0(0) o8->delta-OST0012-osc-MDT0000@10.23.22.92@tcp:28/4 lens 520/544 e 0 to 1 dl 1478125392 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
2016-11-02-syslog.log:Nov  2 15:31:57 dmds1 kernel: Lustre: 4588:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1478125862/real 1478125862]  req@ffff88446375a100 x1549476145540712/t0(0) o8->delta-OST0012-osc-MDT0000@10.23.22.92@tcp:28/4 lens 520/544 e 0 to 1 dl 1478125917 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
2016-11-02-syslog.log:Nov  2 15:43:12 dmds1 kernel: Lustre: 4588:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1478126537/real 1478126537]  req@ffff88446375bc00 x1549476145818356/t0(0) o8->delta-OST0012-osc-MDT0000@10.23.22.92@tcp:28/4 lens 520/544 e 0 to 1 dl 1478126592 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
2016-11-02-syslog.log:Nov  2 15:54:27 dmds1 kernel: Lustre: 4588:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1478127212/real 1478127212]  req@ffff8843f8cde600 x1549476146019280/t0(0) o8->delta-OST0012-osc-MDT0000@10.23.22.92@tcp:28/4 lens 520/544 e 0 to 1 dl 1478127267 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
2016-11-02-syslog.log:Nov  2 16:05:42 dmds1 kernel: Lustre: 4588:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1478127887/real 1478127887]  req@ffff8843f8cdf200 x1549476146209332/t0(0) o8->delta-OST0012-osc-MDT0000@10.23.22.92@tcp:28/4 lens 520/544 e 0 to 1 dl 1478127942 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
2016-11-02-syslog.log:Nov  2 16:13:00 dmds1 kernel: LustreError: 140-5: Server delta-OST0012 requested index 18, but that index is already in use. Use --writeconf to force
2016-11-02-syslog.log:Nov  2 16:13:00 dmds1 kernel: LustreError: 4713:0:(mgs_handler.c:460:mgs_target_reg()) Failed to write delta-OST0012 log (-98)
2016-11-02-syslog.log:Nov  2 16:50:53 dmds1 kernel: Lustre: setting import delta-OST0012_UUID INACTIVE by administrator request
2016-11-02-syslog.log:Nov  2 16:52:58 dmds1 kernel: Lustre: Permanently deactivating delta-OST0012
2016-11-02-syslog.log:Nov  2 16:52:58 dmds1 kernel: Lustre: Setting parameter delta-OST0012-osc.osc.active in log delta-client
2016-12-09-syslog.log:Dec  9 15:37:59 dmds1 kernel: Lustre: setting import delta-OST0012_UUID INACTIVE by administrator request
2016-12-09-syslog.log:Dec  9 15:52:05 dmds1 kernel: Lustre: setting import delta-OST0012_UUID INACTIVE by administrator request
2016-12-21-syslog.log:Dec 21 12:46:16 dmds1 kernel: Lustre: setting import delta-OST0012_UUID INACTIVE by administrator request
2017-01-03-syslog.log:Jan  3 11:41:22 dmds1 kernel: Lustre: setting import delta-OST0012_UUID INACTIVE by administrator request
2017-01-03-syslog.log:Jan  3 12:35:14 dmds1 kernel: Lustre: setting import delta-OST0012_UUID INACTIVE by administrator request
2017-01-03-syslog.log:Jan  3 15:39:52 dmds1 kernel: Lustre: setting import delta-OST0012_UUID INACTIVE by administrator request
2017-01-04-syslog.log:Jan  4 03:11:15 dmds1 kernel: LustreError: 140-5: Server delta-OST0012 requested index 18, but that index is already in use. Use --writeconf to force
2017-01-04-syslog.log:Jan  4 03:11:15 dmds1 kernel: LustreError: 4702:0:(mgs_handler.c:460:mgs_target_reg()) Failed to write delta-OST0012 log (-98)
2017-01-04-syslog.log:Jan  4 08:49:27 dmds1 kernel: Lustre: setting import delta-OST0012_UUID INACTIVE by administrator request
2017-01-04-syslog.log:Jan  4 08:49:27 dmds1 kernel: LustreError: 114443:0:(osp_precreate.c:899:osp_precreate_cleanup_orphans()) delta-OST0012-osc-MDT0000: cannot cleanup orphans: rc = -108
2017-01-04-syslog.log:Jan  4 09:36:59 dmds1 kernel: Lustre: delta-OST0012-osc-MDT0000: Connection to delta-OST0012 (at 10.23.22.92@tcp) was lost; in progress operations using this service will wait for recovery to complete
2017-01-04-syslog.log:Jan  4 09:36:59 dmds1 kernel: LustreError: 167-0: delta-OST0012-osc-MDT0000: This client was evicted by delta-OST0012; in progress operations using this service will fail.
2017-01-04-syslog.log:Jan  4 09:36:59 dmds1 kernel: Lustre: delta-OST0012-osc-MDT0000: Connection restored to 10.23.22.92@tcp (at 10.23.22.92@tcp)
2017-01-04-syslog.log:Jan  4 10:54:57 dmds1 kernel: Lustre: delta-OST0012-osc-MDT0000: Connection to delta-OST0012 (at 10.23.22.92@tcp) was lost; in progress operations using this service will wait for recovery to complete
2017-01-04-syslog.log:Jan  4 10:54:57 dmds1 kernel: Lustre: delta-OST0012-osc-MDT0000: Connection restored to 10.23.22.92@tcp (at 10.23.22.92@tcp)
2017-01-05-syslog.log:Jan  5 03:08:44 dmds1 kernel: LustreError: 14d-8: No target specified: delta-OST0012_UUID=1

On test client:

# lctl device_list
  0 UP mgc MGC10.23.22.90@tcp 1d99ec10-5360-1b32-d30d-18c34eadf020 5
  1 UP lov delta-clilov-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 4
  2 UP lmv delta-clilmv-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 4
  3 UP mdc delta-MDT0000-mdc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
  4 UP osc delta-OST0000-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
  5 UP osc delta-OST0033-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
  6 UP osc delta-OST0001-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
  7 UP osc delta-OST0034-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
  8 UP osc delta-OST0022-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
  9 UP osc delta-OST0002-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 10 UP osc delta-OST0035-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 11 UP osc delta-OST0023-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 12 UP osc delta-OST0003-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 13 UP osc delta-OST0036-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 14 UP osc delta-OST0037-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 15 UP osc delta-OST0004-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 16 UP osc delta-OST0024-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 17 UP osc delta-OST0038-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 18 UP osc delta-OST0005-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 19 UP osc delta-OST0025-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 20 UP osc delta-OST0039-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 21 UP osc delta-OST0006-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 22 UP osc delta-OST003a-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 23 UP osc delta-OST0026-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 24 UP osc delta-OST0007-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 25 UP osc delta-OST003b-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 26 UP osc delta-OST0008-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 27 UP osc delta-OST0027-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 28 UP osc delta-OST003c-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 29 UP osc delta-OST0009-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 30 UP osc delta-OST003d-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 31 UP osc delta-OST0028-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 32 UP osc delta-OST000a-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 33 UP osc delta-OST003e-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 34 UP osc delta-OST0029-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 35 UP osc delta-OST000b-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 36 UP osc delta-OST003f-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 37 UP osc delta-OST000c-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 38 UP osc delta-OST0040-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 39 UP osc delta-OST002a-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 40 UP osc delta-OST000d-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 41 UP osc delta-OST0041-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 42 UP osc delta-OST0043-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 43 UP osc delta-OST002b-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 44 UP osc delta-OST000e-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 45 UP osc delta-OST000f-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 46 UP osc delta-OST002c-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 47 UP osc delta-OST0010-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 48 UP osc delta-OST002d-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 49 UP osc delta-OST002e-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 50 UP osc delta-OST002f-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 51 UP osc delta-OST0030-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 52 UP osc delta-OST0031-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 53 UP osc delta-OST0032-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 54 UP osc delta-OST0012-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 4
 55 UP osc delta-OST0013-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 56 UP osc delta-OST0014-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 57 UP osc delta-OST0015-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 58 UP osc delta-OST0016-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 59 UP osc delta-OST0017-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 60 UP osc delta-OST0018-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 61 UP osc delta-OST0019-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 62 UP osc delta-OST001a-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 63 UP osc delta-OST001b-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 64 UP osc delta-OST001c-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 65 UP osc delta-OST001d-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 66 UP osc delta-OST001e-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 67 UP osc delta-OST001f-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 68 UP osc delta-OST0020-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 69 UP osc delta-OST0021-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5
 70 UP osc delta-OST0011-osc-ffff880032d64800 32b62615-3145-295b-feb4-bbca447e962d 5

After mounting lustre on my test machine, and activating OST0012, df commands, nfs restarts, and client automounts are returning quickly.

Can you advise if this is a bug, or whereabouts I could permanently enable OST0012 so that it is activated when clients mount lustre?

Comment by Steve Dainard (Inactive) [ 09/Jan/17 ]

lctl --device 54 conf_param delta-OST0012.osc.active looks to have set the OST to activate on client mount.

Is this a known bug, that if a OST is inactive it will cause significant hangs for df, nfs mounts etc?

Comment by Oleg Drokin [ 10/Jan/17 ]

Hm, the disabled OST delays is likely occurence of LU-7759. You can overcome it with lazystatfs mount option (default from 2.9.0 on now).

Now your crashes is a new bug that seems to be triggered by some network problem (not 100% sure on that yet, though)
I have not finished sorting through the crashdump totally yet, but what I see already is there is a request for readdir that is being resent, but as we go to process the resend the request already has the "replied" flag bit set so it's state is not reset. Some sort of a race it appears?
The logs are unfortunately somewhat spotty as if some pages were already dropped.
I'll keep this ticket updated as I find more stuff

Comment by Oleg Drokin [ 10/Jan/17 ]

So the debug logs did not have what I hoped they would have probably due to page dropping.
I see that you run this rsync job on nfs server and it's always the one crashing somehow?
It seems there are always those request timeouts involved that might mean a slow mds.
So say for this message from the 22nd (2nd crashdump):

[48378.577309] Lustre: 20662:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1482400977/real 1482400977]  req@ffff880280affb00 x1554362320709380/t0(0) o36->delta-MDT0000-mdc-ffff880fe594d800@10.23.22.90@tcp:12/10 lens 608/1016 e 0 to 1 dl 1482400984 ref 2 fl Rpc:X/0/ffffffff rc 0/-1

What was the actual time on the node when it was printed? The request was sent at "Thu Dec 22 05:02:57" EST (though I don't know what timezone servers are in) and
I just need to know how much time has elapsed since this request was sent.
There seems to be a strong suspicion that we actually got reply for this request from MDS too at just about this same time triggering a very rare race, can you please see in MDS logs if there were any messages betwen those times +-1 minute?

Comment by Steve Dainard (Inactive) [ 10/Jan/17 ]

Not sure about a specific rsync job crashing, I believe there are a few that crash. There are quite a few (hundreds) clients backing up over rsync to the NFS server.

On NFS server (PST timestamps):

Dec 22 02:03:05 vanlustre4 kernel: Lustre: 20662:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1482400977/real 1482400977]  req@ffff880280affb00 x1554362320709380/t0(0) o36->delta-MDT0000-mdc-ffff880fe594d800@10.23.22.90@tcp:12/10 lens 608/1016 e 0 to 1 dl 1482400984 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
Dec 22 02:03:05 vanlustre4 kernel: Lustre: 20662:0:(client.c:2063:ptlrpc_expire_one_request()) Skipped 82 previous similar messages

On MDS dmds1 (PST):

Dec 22 00:01:59 dmds1 kernel: Lustre: delta-MDT0000: Connection restored to 5d0ddfdb-65fb-bb1d-85f6-2509c2d8bc16 (at 10.23.22.114@tcp)
Dec 22 00:02:06 dmds1 kernel: Lustre: delta-MDT0000: Client 89fa1c40-896d-2b3e-6ccd-c237c9d4f145 (at 10.23.22.114@tcp) reconnecting
Dec 22 00:02:06 dmds1 kernel: LustreError: 22492:0:(ldlm_lib.c:3169:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff881c9e367200 x1554361449174764/t0(0) o37->89fa1c40-896d-2b3e-6ccd-c237c9d4f145@10.23.22.114@tcp:307/0 lens 568/440 e 0 to 0 dl 1482393732 ref 1 fl Interpret:/0/0 rc 0/0
Dec 22 00:02:06 dmds1 kernel: Lustre: delta-MDT0000: Connection restored to 5d0ddfdb-65fb-bb1d-85f6-2509c2d8bc16 (at 10.23.22.114@tcp)
Dec 22 00:02:13 dmds1 kernel: Lustre: delta-MDT0000: Client 89fa1c40-896d-2b3e-6ccd-c237c9d4f145 (at 10.23.22.114@tcp) reconnecting
Dec 22 00:02:13 dmds1 kernel: Lustre: delta-MDT0000: Connection restored to 5d0ddfdb-65fb-bb1d-85f6-2509c2d8bc16 (at 10.23.22.114@tcp)
Comment by Oleg Drokin [ 10/Jan/17 ]

So only 55 seconds since the request was sent.
Can you increase your at_min setting to say 80 seconds and see if the frequent reconnects go away along with the Reconnet on bulk READ messages on the mds?
You need to do this on all the nodes. The downside is the detection of the dead nodes would take longer time, but I just want to ensure it's just slow MDS or something in the network that causes this.
You don't need to have any special extra debugging enabled when doing this.

Also did the lazystatfs option help with the delay?

Comment by Gerrit Updater [ 13/Apr/17 ]

Lai Siyao (lai.siyao@intel.com) uploaded a new patch: https://review.whamcloud.com/26583
Subject: LU-8935 ptlrpc: missing barrier before wake_up
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1965158cd0d0e4a0d6c1f82474e99332d963ec26

Comment by Lai Siyao [ 13/Apr/17 ]

It looks like ptlrpc_client_wake_req() misses a memory barrier, which may cause ptlrpc_resend_req() wake up ptlrpc_send_rpc -> ptlrpc_register_bulk, while the latter doesn't see rq_resend set.

Comment by Gerrit Updater [ 29/Jul/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26583/
Subject: LU-8935 ptlrpc: missing barrier before wake_up
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 33033c27aae361069877d56d44714097a208aa76

Comment by Peter Jones [ 29/Jul/17 ]

Landed for 2.11

Comment by Gerrit Updater [ 02/Aug/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28321
Subject: LU-8935 ptlrpc: missing barrier before wake_up
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: a40516411152326101a7f19ac81b8d2b2429c509

Comment by Gerrit Updater [ 18/Aug/17 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28321/
Subject: LU-8935 ptlrpc: missing barrier before wake_up
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 0fe9e8f802f873b89283de34ab22909759235dff

Generated at Sat Feb 10 02:21:49 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.