[LU-8250] MDT recovery stalled on secondary node Created: 08/Jun/16  Updated: 26/Oct/16  Resolved: 26/Oct/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Blocker
Reporter: Frank Heckes (Inactive) Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: soak
Environment:

lola
build: commit aa84fbc8165f526dae4bd824a48c186c3ac2f639 + patches


Attachments: File console-lola-11.log-20160608.bz2     File dmesg-lola-11-20160609-0811.bz2     Text File lola-11-lustre-log.20160608-0656.bz2     File messages-lola-11.log-20160608.bz2     File soak.log.2016-06-11-1417.txt.bz2    
Issue Links:
Duplicate
is duplicated by LU-8428 Very long recovery times for MDTs aft... Resolved
Related
is related to LU-8704 RPC sent inside mdd_create transaction. Open
is related to LU-8714 too many update logs during soak-test. Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The error happened during soak testing of build '20160601' (see: https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20160601). DNE is enabled. MDT have been formatted using ldiskfs, OSTs using zfs. MDSes host one MDT per node. MDSes are configured in active-active failover configuration.

The issue might be related to LU-7848 although the change is part of build under test.
The error results in the start of the oom-killer, which is documented in LU-7836. This ticket might be a duplicate of LU-7836.

Events:
1st Event:

  • 2016-06-03 11:31:10 - failover resource of lola-10 (MDT-2) --> lola-11
  • 2016-06-03 11:36:37 - ... soaked-MDT0002 mounted successfully on lola-11
  • till 2016-06-04-00:44 - soaked-MDT0002 in status 'RECOVERING'.
  • 2016-06-04-00:44:52 - lola-11 crash with oom-killer

2nd Event:

  • 2016-06-07 08:34:06,621 triggering fault mds_failover lola-10 (MDT-2) --> lola-11
  • 2016-06-07 08:38:42 - Mounting soaked-MDT0002 on lola-11
  • since 2016-06-07 08:39:32,155 Wait for recovery to complete
  • memory resources are nearly exhausted:
    [root@lola-11 ~]# date
    Wed Jun  8 07:59:49 PDT 2016
    [root@lola-11 ~]# collectl -sm --verbose
    waiting for 1 second sample...
    
    # MEMORY SUMMARY
    #<-------------------------------Physical Memory--------------------------------------><-----------Swap------------><-------Paging------>
    #   Total    Used    Free    Buff  Cached    Slab  Mapped    Anon  Commit  Locked Inact Total  Used  Free   In  Out Fault MajFt   In  Out
       32006M  30564M   1441M 127144K 676256K  28701M  16196K  69072K 201740K   5008K  509M   15G     0   15G    0    0    28     0    0    8
       32006M  30565M   1441M 127144K 676256K  28701M  16196K  69072K 201740K   5008K  509M   15G     0   15G    0    0    63     0    0    4
       32006M  30565M   1441M 127144K 676256K  28701M  16196K  69072K 201740K   5008K  509M   15G     0   15G    0    0     1     0    0    0
       32006M  30564M   1441M 127144K 676256K  28701M  16196K  69072K 201740K   5008K  509M   15G     0   15G    0    0    17     0    0    0
    

    Attached files:
    *1st event only: Saved crash dump file to lhn.hpdd.intel.com:/var/crashdumps/lu-7836/lola-11/127.0.0.1-2016-06-04-00:44:52

  • 2nd event only: kernel debug log of lola-11, dmesg
  • Both event: messages, console logs


 Comments   
Comment by Frank Heckes (Inactive) [ 08/Jun/16 ]

Triggering an abort of the recovery hangs:

LustreError: 139015:0:(mdt_handler.c:5896:mdt_iocontrol()) soaked-MDT0002: Aborting recovery for device
LustreError: 139015:0:(ldlm_lib.c:2577:target_stop_recovery_thread()) soaked-MDT0002: Aborting recovery
INFO: task lctl:139015 blocked for more than 120 seconds.
      Tainted: P           -- ------------    2.6.32-573.26.1.el6_lustre.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
lctl          D 0000000000000013     0 139015 129872 0x00000080
 ffff8804187cfb68 0000000000000082 0000000000000000 ffff880431d7eab0
 ffff880431d7eae8 ffff880431d7f150 00ff8804187cfb18 0000000000000000
 ffff8803fc2c0078 ffff880038775a00 ffff8803fc2c05f8 ffff8804187cffd8
Call Trace:
 [<ffffffff81539b0e>] ? thread_return+0x4e/0x7d0
 [<ffffffff8153aa35>] schedule_timeout+0x215/0x2e0
 [<ffffffff8153a693>] wait_for_common+0x123/0x180
 [<ffffffff81067650>] ? default_wake_function+0x0/0x20
 [<ffffffff8153a7cd>] wait_for_completion+0x1d/0x20
 [<ffffffffa0ae3e80>] target_stop_recovery_thread+0x50/0xe0 [ptlrpc]
 [<ffffffffa1223c02>] mdt_iocontrol+0x6b2/0x920 [mdt]
 [<ffffffff812337cf>] ? security_inode_permission+0x1f/0x30
 [<ffffffffa08d4edc>] class_handle_ioctl+0x15fc/0x20d0 [obdclass]
 [<ffffffff811a5998>] ? do_filp_open+0x798/0xd20
 [<ffffffffa08b92ab>] obd_class_ioctl+0x4b/0x190 [obdclass]
 [<ffffffff811a7f82>] vfs_ioctl+0x22/0xa0
 [<ffffffff811a8124>] do_vfs_ioctl+0x84/0x580
 [<ffffffff811a03f6>] ? final_putname+0x26/0x50
 [<ffffffff811a86a1>] sys_ioctl+0x81/0xa0
 [<ffffffff810e905e>] ? __audit_syscall_exit+0x25e/0x290
 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
Comment by Cliff White (Inactive) [ 18/Aug/16 ]

We seem to have hit this again.
Build lustre-master 3424.
MDS crashes:

Kernel panic - not syncing: Attempted to kill init!^M
Pid: 1, comm: init Tainted: P           -- ------------    2.6.32-573.26.1.el6_lustre.x86_64 #1^M
Call Trace:^M
 [<ffffffff81539407>] ? panic+0xa7/0x16f^M
 [<ffffffff8107d0a7>] ? do_exit+0x867/0x870^M
 [<ffffffff8107d1b7>] ? sys_exit+0x17/0x20^M
 [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b^M
^[[2J^[[1;1HCopyright(c) 2009 - 2012 Intel Corporation.All rights reserved. ^

Upon restart, MDS fails to complete recovery, times out, During this time the soak controller halted lola-11 for failover. Attempting to abort recovery results in timeouts

LustreError: 7532:0:(ldlm_lib.c:2576:target_stop_recovery_thread()) soaked-MDT0003: Aborting recovery^M
Lustre: 6032:0:(ldlm_lib.c:2025:target_recovery_overseer()) recovery is aborted, evict exports in recovery^M
Lustre: 4563:0:(client.c:2113:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1471532976/real 1471532976]  req@ffff8804070af9c0 x1543013122722384/t0(0) o38->soaked-MDT0002-osp-MDT0003@192.168.1.111@o2ib10:24/4 lens 520/544 e 0 to 1 dl 1471533032 ref 1 fl Rpc:eXN/0/ffffffff rc 0/-1^M
Lustre: 4563:0:(client.c:2113:ptlrpc_expire_one_request()) Skipped 5 previous similar messages^M
Aug 18 08:10:01 lola-10 TIME: Time stamp for console
INFO: task lctl:7532 blocked for more than 120 seconds.^M
      Tainted: P           -- ------------    2.6.32-573.26.1.el6_lustre.x86_64 #1^M
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.^M
lctl          D 0000000000000011     0  7532   7440 0x00000080^M
 ffff8804069dbb68 0000000000000086 0000000000000000 ffff880835ef3520^M
 ffff880835ef3558 ffff880835ef3bc0 000000d247916847 0000000000000000^M
 ffff880437394078 0000000100092ec2 ffff8804373945f8 ffff8804069dbfd8^M
Call Trace:^M
 [<ffffffff8153aa35>] schedule_timeout+0x215/0x2e0^M
 [<ffffffff8153a693>] wait_for_common+0x123/0x180^M
 [<ffffffff81067650>] ? default_wake_function+0x0/0x20^M
 [<ffffffff8153a7cd>] wait_for_completion+0x1d/0x20^M
 [<ffffffffa0970f80>] target_stop_recovery_thread+0x50/0xe0 [ptlrpc]^M
 [<ffffffffa10ebf22>] mdt_iocontrol+0x6a2/0xa80 [mdt]^M
 [<ffffffff812337cf>] ? security_inode_permission+0x1f/0x30^M
 [<ffffffffa07843ec>] class_handle_ioctl+0x15fc/0x20d0 [obdclass]^M
 [<ffffffff811a5998>] ? do_filp_open+0x798/0xd20^M
 [<ffffffffa07682ab>] obd_class_ioctl+0x4b/0x190 [obdclass]^M
 [<ffffffff811a7f82>] vfs_ioctl+0x22/0xa0^M
 [<ffffffff811a8124>] do_vfs_ioctl+0x84/0x580^M
 [<ffffffff811a03f6>] ? final_putname+0x26/0x50^M
 [<ffffffff811a86a1>] sys_ioctl+0x81/0xa0^M
 [<ffffffff810e905e>] ? __audit_syscall_exit+0x25e/0x290^M
 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b^M
LustreError: 137-5: soaked-MDT0002_UUID: not available for connect from 192.168.1.131@o2ib100 (no target). If you are running an HA pair check that the target is mounted on the other server.^M
LustreError: Skipped 351 previous similar messages^M
INFO: task lctl:7532 blocked for more than 120 seconds.^M
      Tainted: P           -- ------------    2.6.32-573.26.1.el6_lustre.x86_64 #1^M
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.^M
lctl          D 0000000000000011     0  7532   7440 0x00000080^M
 ffff8804069dbb68 0000000000000086 0000000000000000 ffff880835ef3520^M
 ffff880835ef3558 ffff880835ef3bc0 000000d247916847 0000000000000000^M
 ffff880437394078 0000000100092ec2 ffff8804373945f8 ffff8804069dbfd8^M
Call Trace:^M
 [<ffffffff8153aa35>] schedule_timeout+0x215/0x2e0^M
 [<ffffffff8153a693>] wait_for_common+0x123/0x180^M
 [<ffffffff81067650>] ? default_wake_function+0x0/0x20^M
 [<ffffffff8153a7cd>] wait_for_completion+0x1d/0x20^M
 [<ffffffffa0970f80>] target_stop_recovery_thread+0x50/0xe0 [ptlrpc]^M
 [<ffffffffa10ebf22>] mdt_iocontrol+0x6a2/0xa80 [mdt]^M
 [<ffffffff812337cf>] ? security_inode_permission+0x1f/0x30^M
 [<ffffffffa07843ec>] class_handle_ioctl+0x15fc/0x20d0 [obdclass]^M
 [<ffffffff811a5998>] ? do_filp_open+0x798/0xd20^M
 [<ffffffffa07682ab>] obd_class_ioctl+0x4b/0x190 [obdclass]^M
 [<ffffffff811a7f82>] vfs_ioctl+0x22/0xa0^M
 [<ffffffff811a8124>] do_vfs_ioctl+0x84/0x580^M
 [<ffffffff811a03f6>] ? final_putname+0x26/0x50^M
 [<ffffffff811a86a1>] sys_ioctl+0x81/0xa0^M
 [<ffffffff810e905e>] ? __audit_syscall_exit+0x25e/0x290^M
 [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b^M
INFO: task lctl:7532 blocked for more than 120 seconds.^M

At this point the systems were rebooted.
After reboot, the MDT could not be mounted, as MGS reported a duplicate export:

Aug 18 11:24:44 lola-10 kernel: LDISKFS-fs (dm-1): mounted filesystem with ordered data mode. quota=on. Opts:
Aug 18 11:24:45 lola-10 kernel: LustreError: 6045:0:(tgt_lastrcvd.c:1474:tgt_clients_data_init()) soaked-MDT0002: duplicate export for client generation 1
Aug 18 11:24:45 lola-10 kernel: LustreError: 6045:0:(obd_config.c:578:class_setup()) setup soaked-MDT0002 failed (-114)
Aug 18 11:24:45 lola-10 kernel: LustreError: 6045:0:(obd_config.c:1671:class_config_llog_handler()) MGC192.168.1.108@o2ib10: cfg command failed: rc = -114
Aug 18 11:24:45 lola-10 kernel: Lustre:    cmd=cf003 0:soaked-MDT0002  1:soaked-MDT0002_UUID  2:2  3:soaked-MDT0002-mdtlov  4:f
Aug 18 11:24:45 lola-10 kernel:
Aug 18 11:24:45 lola-10 kernel: LustreError: 15c-8: MGC192.168.1.108@o2ib10: The configuration from log 'soaked-MDT0002' failed (-114). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
Aug 18 11:24:45 lola-10 kernel: LustreError: 5951:0:(obd_mount_server.c:1352:server_start_targets()) failed to start server soaked-MDT0002: -114
Aug 18 11:24:45 lola-10 kernel: LustreError: 5951:0:(obd_mount_server.c:1844:server_fill_super()) Unable to start targets: -114
Aug 18 11:24:45 lola-10 kernel: LustreError: 5951:0:(obd_config.c:625:class_cleanup()) Device 4 not setup
Aug 18 11:24:46 lola-10 kernel: Lustre: server umount soaked-MDT0002 complete
Aug 18 11:24:46 lola-10 kernel: LustreError: 5951:0:(obd_mount.c:1453:lustre_fill_super()) Unable to mount  (-114)
Comment by Peter Jones [ 19/Aug/16 ]

Hongchao

Could you please look into this issue?

THanks

Peter

Comment by Hongchao Zhang [ 23/Aug/16 ]

I have looked at the logs, but I have not found useful information to help identify the cause of the problem.

the increased memory usage could be related to the continuous lock replay req to enqueue

...
00010000:00080000:26.0:1465394173.485502:0:6068:0:(ldlm_lib.c:2733:target_queue_recovery_request()) @@@ queue lock replay req  req@ffff88017e947050 x1536423085514416/t0(0) o101->soaked-MDT0001-mdtlov_UUID@192.168.1.109@o2ib10:334/0 lens 328/0 e 0 to 0 dl 1465394179 ref 2 fl Interpret:/40/ffffffff rc 0/-1
00010000:00080000:27.0:1465394173.485521:0:6067:0:(ldlm_lib.c:2733:target_queue_recovery_request()) @@@ queue lock replay req  req@ffff880247be9c50 x1536423085514432/t0(0) o101->soaked-MDT0001-mdtlov_UUID@192.168.1.109@o2ib10:334/0 lens 328/0 e 0 to 0 dl 1465394179 ref 2 fl Interpret:/40/ffffffff rc 0/-1

...

00010000:00080000:26.0:1465394179.502412:0:6068:0:(ldlm_lib.c:2733:target_queue_recovery_request()) @@@ queue lock replay req  req@ffff88016a6c2450 x1536423085514928/t0(0) o101->soaked-MDT0001-mdtlov_UUID@192.168.1.109@o2ib10:340/0 lens 328/0 e 0 to 0 dl 1465394185 ref 2 fl Interpret:/40/ffffffff rc 0/-1
00010000:00080000:27.0:1465394179.502417:0:6067:0:(ldlm_lib.c:2733:target_queue_recovery_request()) @@@ queue lock replay req  req@ffff8802828f5050 x1536423085514944/t0(0) o101->soaked-MDT0001-mdtlov_UUID@192.168.1.109@o2ib10:340/0 lens 328/0 e 0 to 0 dl 1465394185 ref 2 fl Interpret:/40/ffffffff rc 0/-1

...

00010000:00080000:27.0:1465394185.503466:0:6067:0:(ldlm_lib.c:2733:target_queue_recovery_request()) @@@ queue lock replay req  req@ffff880141b1e850 x1536423085515456/t0(0) o101->soaked-MDT0001-mdtlov_UUID@192.168.1.109@o2ib10:346/0 lens 328/0 e 0 to 0 dl 1465394191 ref 2 fl Interpret:/40/ffffffff rc 0/-1
00010000:00080000:26.0:1465394185.503491:0:6068:0:(ldlm_lib.c:2733:target_queue_recovery_request()) @@@ queue lock replay req  req@ffff88031ca74050 x1536423085515472/t0(0) o101->soaked-MDT0001-mdtlov_UUID@192.168.1.109@o2ib10:346/0 lens 328/0 e 0 to 0 dl 1465394191 ref 2 fl Interpret:/40/ffffffff rc 0/-1

...

00010000:00080000:27.0:1465394191.504477:0:6067:0:(ldlm_lib.c:2733:target_queue_recovery_request()) @@@ queue lock replay req  req@ffff88023309b450 x1536423085516016/t0(0) o101->soaked-MDT0001-mdtlov_UUID@192.168.1.109@o2ib10:352/0 lens 328/0 e 0 to 0 dl 1465394197 ref 2 fl Interpret:/40/ffffffff rc 0/-1
00010000:00080000:10.0:1465394191.504487:0:6068:0:(ldlm_lib.c:2733:target_queue_recovery_request()) @@@ queue lock replay req  req@ffff88023309bc50 x1536423085516032/t0(0) o101->soaked-MDT0001-mdtlov_UUID@192.168.1.109@o2ib10:352/0 lens 328/0 e 0 to 0 dl 1465394197 ref 2 fl Interpret:/40/ffffffff rc 0/-1

...

00010000:00080000:10.0:1465394209.507404:0:6068:0:(ldlm_lib.c:2733:target_queue_recovery_request()) @@@ queue lock replay req  req@ffff880247b59850 x1536423085517600/t0(0) o101->soaked-MDT0001-mdtlov_UUID@192.168.1.109@o2ib10:370/0 lens 328/0 e 0 to 0 dl 1465394215 ref 2 fl Interpret:/40/ffffffff rc 0/-1
00010000:00080000:11.0:1465394209.507410:0:6067:0:(ldlm_lib.c:2733:target_queue_recovery_request()) @@@ queue lock replay req  req@ffff88017e9e1c50 x1536423085517616/t0(0) o101->soaked-MDT0001-mdtlov_UUID@192.168.1.109@o2ib10:370/0 lens 328/0 e 0 to 0 dl 1465394215 ref 2 fl Interpret:/40/ffffffff rc 0/-1

the problem that MDT can't be mounted (with error -114, -EALREADY) is the same problem described in LU-7794.

Is the stack trace of all processes available, it could help to find where the recovery is stuck.
Thanks!

Comment by Frank Heckes (Inactive) [ 23/Aug/16 ]

Hi Hongchao,
there's a crash file available that had been stored at: lhn.hpdd.intel.com:/var/crashdumps/lu-7836/lola-11/127.0.0.1-2016-06-04-00:44:52 (see above). It's easy to access if you have an account on Lola (DCO create that within 2 hours if you file a ticket containing your ssh-public-key). crashdump tools and lustre.so are installed on the head node.

Comment by Di Wang [ 21/Sep/16 ]

Hmm, right now, we do not fail the connection between MDTs to make sure the FS will not corrupt silently.

Jun  7 08:40:11 lola-11 kernel: Lustre: soaked-MDT0002: Client f6c679b2-f46e-20c5-89f7-51193ed93a53 (at 192.168.1.121@o2ib100) reconnecting, waiting for 20 clients in recovery for 1:52
Jun  7 08:40:11 lola-11 kernel: Lustre: Skipped 99 previous similar messages
Jun  7 08:40:18 lola-11 sshd[7507]: Accepted publickey for root from 10.4.0.116 port 47943 ssh2
Jun  7 08:40:18 lola-11 sshd[7507]: pam_unix(sshd:session): session opened for user root by (uid=0)
Jun  7 08:40:18 lola-11 sshd[7507]: Received disconnect from 10.4.0.116: 11: disconnected by user
Jun  7 08:40:18 lola-11 sshd[7507]: pam_unix(sshd:session): session closed for user root
Jun  7 08:40:33 lola-11 sshd[7530]: Accepted publickey for root from 10.4.0.116 port 47963 ssh2
Jun  7 08:40:33 lola-11 sshd[7530]: pam_unix(sshd:session): session opened for user root by (uid=0)
Jun  7 08:40:33 lola-11 sshd[7530]: Received disconnect from 10.4.0.116: 11: disconnected by user
Jun  7 08:40:33 lola-11 sshd[7530]: pam_unix(sshd:session): session closed for user root
Jun  7 08:40:47 lola-11 kernel: Lustre: soaked-MDT0003: recovery is timed out, evict stale exports
Jun  7 08:40:47 lola-11 kernel: Lustre: 6183:0:(ldlm_lib.c:2016:target_recovery_overseer()) soaked-MDT0003 recovery is aborted by hard timeout
Jun  7 08:40:47 lola-11 kernel: Lustre: 6183:0:(ldlm_lib.c:2026:target_recovery_overseer()) recovery is aborted, evict exports in recovery
Jun  7 08:40:47 lola-11 kernel: Lustre: soaked-MDT0003: disconnecting 1 stale clients

I just glanced the log a bit, and it seems MDT0002 and MDT0003 were recovering at the same time. I thought soak-test will wait the MDT recovery finish, before fail another MDT? no? I might miss sth. thanks.

Comment by Mikhail Pershin [ 30/Sep/16 ]

Frank, is patch from http://review.whamcloud.com/#/c/13726/ applied in the build or not? If not, I'd try with it.

Comment by Di Wang [ 12/Oct/16 ]

It turns out one MDS is stuck here

mdt00_016     S 0000000000000005     0  6217      2 0x00000080
 ffff880404c37760 0000000000000046 0000000000000000 ffff880404c37724
 0000000000000000 ffff88043fe82800 00000db7708e9e5b 0000000000000286
 ffff880404c37700 ffffffff81089e8c ffff8804066d9068 ffff880404c37fd8
Call Trace:
 [<ffffffff81089e8c>] ? lock_timer_base+0x3c/0x70
 [<ffffffff8153a9b2>] schedule_timeout+0x192/0x2e0
 [<ffffffff81089fa0>] ? process_timeout+0x0/0x10
 [<ffffffffa0b09821>] ptlrpc_set_wait+0x321/0x960 [ptlrpc]
 [<ffffffffa0afe980>] ? ptlrpc_interrupted_set+0x0/0x120 [ptlrpc]
 [<ffffffff81067650>] ? default_wake_function+0x0/0x20
 [<ffffffffa0b15d05>] ? lustre_msg_set_jobid+0xf5/0x130 [ptlrpc]
 [<ffffffffa0b09ee1>] ptlrpc_queue_wait+0x81/0x220 [ptlrpc]
 [<ffffffffa13d4bc2>] osp_remote_sync+0xf2/0x1e0 [osp]
 [<ffffffffa13ba821>] osp_xattr_get+0x681/0xf90 [osp]
 [<ffffffffa12eb5b5>] lod_xattr_get+0x185/0x760 [lod]
 [<ffffffffa134f917>] mdd_links_read+0x117/0x270 [mdd]
 [<ffffffffa1364996>] ? mdd_attr_set_internal+0xd6/0x2c0 [mdd]
 [<ffffffffa13515bc>] mdd_linkea_prepare+0x3ec/0x4d0 [mdd]
 [<ffffffffa13573f2>] mdd_link+0xde2/0x10c0 [mdd]
 [<ffffffffa1222a8f>] mdt_reint_link+0xb2f/0xce0 [mdt]
 [<ffffffff81299b7a>] ? strlcpy+0x4a/0x60
 [<ffffffffa12178cf>] ? ucred_set_jobid+0x5f/0x70 [mdt]
 [<ffffffffa121b04d>] mdt_reint_rec+0x5d/0x200 [mdt]
 [<ffffffffa1205d5b>] mdt_reint_internal+0x62b/0xa50 [mdt]
 [<ffffffffa120662b>] mdt_reint+0x6b/0x120 [mdt]
 [<ffffffffa0b790cc>] tgt_request_handle+0x8ec/0x1440 [ptlrpc]
 [<ffffffffa0b25821>] ptlrpc_main+0xd31/0x1800 [ptlrpc]
 [<ffffffff8106ee50>] ? pick_next_task_fair+0xd0/0x130
 [<ffffffff81539896>] ? schedule+0x176/0x3a0
 [<ffffffffa0b24af0>] ? ptlrpc_main+0x0/0x1800 [ptlrpc]
 [<ffffffff810a138e>] kthread+0x9e/0xc0
 [<ffffffff8100c28a>] child_rip+0xa/0x20
 [<ffffffff810a12f0>] ? kthread+0x0/0xc0
 [<ffffffff8100c280>] ? child_rip+0x0/0x20

Then other threads waiting for the journal

mdt_out01_008 D 000000000000000b     0  6281      2 0x00000080
 ffff8804030a7a40 0000000000000046 0000000000000000 ffff88079c63b5c0
 ffff8807fbe51ad8 ffff88082d91f400 00000bef4b83696a ffff880828bdba80
 ffff8804030a7a10 0000000100c3b96b ffff8803fbd31068 ffff8804030a7fd8
Call Trace:
 [<ffffffffa0fe7fca>] start_this_handle+0x25a/0x480 [jbd2]
 [<ffffffff811781fb>] ? cache_alloc_refill+0x15b/0x240
 [<ffffffff810a1820>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa0fe83d5>] jbd2_journal_start+0xb5/0x100 [jbd2]
 [<ffffffffa1036a36>] ldiskfs_journal_start_sb+0x56/0xe0 [ldiskfs]
 [<ffffffffa10870f1>] osd_trans_start+0x1e1/0x430 [osd_ldiskfs]
 [<ffffffffa0b7c53c>] out_tx_end+0x9c/0x5d0 [ptlrpc]
 [<ffffffffa0b81ec9>] out_handle+0x11d9/0x18d0 [ptlrpc]
 [<ffffffff8105e9b6>] ? enqueue_task+0x66/0x80
 [<ffffffff8105ab8d>] ? check_preempt_curr+0x6d/0x90
 [<ffffffffa080bc5a>] ? lc_watchdog_touch+0x7a/0x190 [libcfs]
 [<ffffffff8153afce>] ? mutex_lock+0x1e/0x50
 [<ffffffffa0b71eda>] ? req_can_reconstruct+0x6a/0x120 [ptlrpc]
 [<ffffffffa0b790cc>] tgt_request_handle+0x8ec/0x1440 [ptlrpc]
 [<ffffffffa0b25821>] ptlrpc_main+0xd31/0x1800 [ptlrpc]
 [<ffffffff8106ee50>] ? pick_next_task_fair+0xd0/0x130
 [<ffffffff81539896>] ? schedule+0x176/0x3a0
 [<ffffffffa0b24af0>] ? ptlrpc_main+0x0/0x1800 [ptlrpc]
 [<ffffffff810a138e>] kthread+0x9e/0xc0
 [<ffffffff8100c28a>] child_rip+0xa/0x20
 [<ffffffff810a12f0>] ? kthread+0x0/0xc0
 [<ffffffff8100c280>] ? child_rip+0x0/0x20

Then it cause other MDTs can not reconnect to this MDT.

LustreError: 11-0: soaked-MDT0001-osp-MDT0003: operation mds_connect to node 192.168.1.109@o2ib10 failed: rc = -114
LustreError: 11-0: soaked-MDT0001-osp-MDT0003: operation mds_connect to node 192.168.1.109@o2ib10 failed: rc = -114
LustreError: 11-0: soaked-MDT0001-osp-MDT0003: operation mds_connect to node 192.168.1.109@o2ib10 failed: rc = -114

Then it cause the whole recovery stuck. I will cook a fix.

Comment by Gerrit Updater [ 12/Oct/16 ]

wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/23096
Subject: LU-8250 mdd: move linkea prepare out of transaction.
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b67d62eefef30fbb3365da3382021820985b5e39

Comment by Andreas Dilger [ 12/Oct/16 ]

It would be useful to add a check for an open journal handle in the declare code and in the ptlrpc code to detect any other code paths like this. That can be done relatively easily with ldiskfs, check current->journal_info in osd_trans_declare_op() and ptlrpc_set_wait().

Comment by Gerrit Updater [ 12/Oct/16 ]

wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/23111
Subject: LU-8250 osd: add journal info check
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c8bc88c85d6f273b18f7741cc9c0cdc1bcfca4d5

Comment by Di Wang [ 13/Oct/16 ]

There is another panic with 23111. Since this is a blocker of 2.9, I will create a new ticket (LU-8704).

Comment by Joseph Gmitter (Inactive) [ 18/Oct/16 ]

http://review.whamcloud.com/23111 has been moved to LU-8704 per discussion with Di.

Comment by Gerrit Updater [ 26/Oct/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/23096/
Subject: LU-8250 mdd: move linkea prepare out of transaction.
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b596047cae1d3381cafae9c4132e1a84e99ca9d0

Comment by Peter Jones [ 26/Oct/16 ]

Landed for 2.9

Generated at Sat Feb 10 02:15:55 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.