Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.4.1, Lustre 2.5.0
    • Lustre 2.4.0
    • None
    • 3
    • 7398

    Description

      Running conf-sanity.sh in a loop I hit this crash:

      [ 6625.425296] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
      [ 6625.425755] last sysfs file: /sys/devices/system/cpu/possible
      [ 6625.426029] CPU 1 
      [ 6625.426068] Modules linked in: ptlrpc(-) obdclass lvfs ksocklnd lnet libcfs exportfs jbd sha512_generic sha256_generic ext4 mbcache jbd2 virtio_balloon virtio_console i2c_piix4 i2c_core virtio_blk virtio_net virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod nfs lockd fscache nfs_acl auth_rpcgss sunrpc be2iscsi bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: fld]
      [ 6625.429023] 
      [ 6625.429231] Pid: 4991, comm: ll_ping Not tainted 2.6.32-debug #6 Bochs Bochs
      [ 6625.429247] RIP: 0010:[<ffffffff8104d2e6>]  [<ffffffff8104d2e6>] __wake_up_common+0x56/0x90
      [ 6625.429247] RSP: 0018:ffff88008f9e5de0  EFLAGS: 00010082
      [ 6625.429247] RAX: ffffffffa0d963fa RBX: ffff880098e26f68 RCX: 0000000000000000
      [ 6625.429247] RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffffffffa0d963fa
      [ 6625.429247] RBP: ffff88008f9e5e20 R08: 0000000000000000 R09: 000000000000005c
      [ 6625.429247] R10: 0000000000000001 R11: 0000000000000000 R12: c284e8fffb9889d0
      [ 6625.429247] R13: 00000000ffffb502 R14: 0000000000000000 R15: 0000000000000000
      [ 6625.429247] FS:  00007f7026720700(0000) GS:ffff880006240000(0000) knlGS:0000000000000000
      [ 6625.429247] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 6625.429247] CR2: 000000000089ee20 CR3: 00000000905c0000 CR4: 00000000000006e0
      [ 6625.429247] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 6625.429247] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [ 6625.429247] Process ll_ping (pid: 4991, threadinfo ffff88008f9e4000, task ffff8800af3ea5c0)
      [ 6625.429247] Stack:
      [ 6625.429247]  ffffffffffffffff 0000000300000001 0000000000000001 ffff880098e26f68
      [ 6625.429247] <d> 0000000000000282 0000000000000003 0000000000000001 0000000000000000
      [ 6625.429247] <d> ffff88008f9e5e60 ffffffff81051f68 ffff88008f9e5e40 0000000000000001
      [ 6625.429247] Call Trace:
      [ 6625.429247]  [<ffffffff81051f68>] __wake_up+0x48/0x70
      [ 6625.429247]  [<ffffffffa0acf7fa>] cfs_waitq_signal+0x1a/0x20 [libcfs]
      [ 6625.429247]  [<ffffffffa0d5f73f>] ptlrpc_pinger_main+0x5cf/0x7f0 [ptlrpc]
      [ 6625.429247]  [<ffffffff81057d60>] ? default_wake_function+0x0/0x20
      [ 6625.429247]  [<ffffffffa0d5f170>] ? ptlrpc_pinger_main+0x0/0x7f0 [ptlrpc]
      [ 6625.429247]  [<ffffffff8100c14a>] child_rip+0xa/0x20
      [ 6625.429247]  [<ffffffffa0d5f170>] ? ptlrpc_pinger_main+0x0/0x7f0 [ptlrpc]
      [ 6625.429247]  [<ffffffffa0d5f170>] ? ptlrpc_pinger_main+0x0/0x7f0 [ptlrpc]
      [ 6625.429247]  [<ffffffff8100c140>] ? child_rip+0x0/0x20
      [ 6625.429247] Code: e8 18 48 39 c7 4c 8b 60 18 74 3d 49 83 ec 18 eb 0b 0f 1f 40 00 4c 89 e0 4c 8d 62 e8 44 8b 28 4c 89 f1 44 89 fa 8b 75 cc 48 89 c7 <ff> 50 10 85 c0 74 0c 41 83 e5 01 74 06 83 6d c8 01 74 0a 4c 39 
      

      test output had this:

      == conf-sanity test 22: start a client before osts (should return errs) == 03:20:55 (1364282455)
      start mds service on centos6-14.localnet
      Starting mds1:   -o loop /tmp/lustre-mdt1 /mnt/mds1
      Started lustre-MDT0000
      Client mount with ost in logs, but none running
      start ost1 service on centos6-14.localnet
      Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/ost1
      Started lustre-OST0000
      error: list_param: /proc/{fs,sys}/{lnet,lustre}/osc/lustre-OST0000-osc-MDT0000/ost_server_uuid: Found no match
      error: list_param: /proc/{fs,sys}/{lnet,lustre}/osc/lustre-OST0000-osc-MDT0000/ost_server_uuid: Found no match
      Stopping client centos6-14.localnet /mnt/lustre (opts:)
      PASS 
      Client mount with a running ost
      start ost1 service on centos6-14.localnet
      Starting ost1:   -o loop /tmp/lustre-ost1 /mnt/ost1
      Started lustre-OST0000
      mount lustre on /mnt/lustre.....
      Starting client: centos6-14.localnet: -o user_xattr,flock centos6-14.localnet@tcp:/lustre /mnt/lustre
      centos6-14.localnet: osc.lustre-OST0000-osc-MDT0000.ost_server_uuid in FULL state after 14 sec
      centos6-14.localnet: osc.lustre-OST0000-osc-MDT0000.ost_server_uuid in FULL state after 0 sec
      setup single mount lustre success
      PASS 
      umount lustre on /mnt/lustre.....
      Stopping client centos6-14.localnet /mnt/lustre (opts:)
      stop ost1 service on centos6-14.localnet
      Stopping /mnt/ost1 (opts:-f) on centos6-14.localnet
      waited 0 for 10 ST ost OSS OSS_uuid 0
      

      Crashdump and modules are in /exports/crashdumps/192.168.10.224-2013-03-26-03\:21\:55/

      Attachments

        Activity

          People

            dmiter Dmitry Eremin (Inactive)
            green Oleg Drokin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: