Opened 10 years ago

Closed 9 years ago

#169 closed defect (fixed)

Fallo de disco en el pool oceano de seal

Reported by: antonio Owned by: fernando
Priority: critical Milestone:
Component: Cluster Keywords: zfs, sata, sas expander, hba, reset storm
Cc: antonio, fernando, valva

Description

Hoy seal al lanzado WARNING en el pool oceano, ha usado un spare y se ha recuperado solo, todo ello en unos 30 mins

...
spare-3                      DEGRADED     0     0     0       
  c4t11d0                    FAULTED      5     5     0  too many errors       
  c8t24d0                    ONLINE       0     0     0  740M resilvered
...

el iostat muestra errores en los dispositivos de la controladora c4

admin@seal.macc.unican.es:~$ iostat -Cne
  ---- errors ---
  s/w h/w trn tot device
    0  67  85 152 c4
    0   4   8  12 c4t1d0
    0   4   0   4 c4t2d0
    0   4   8  12 c4t3d0
    0   4  18  22 c4t4d0
    0   4   2   6 c4t5d0
    0   4   2   6 c4t6d0
    0   4   2   6 c4t7d0
    0   4   3   7 c4t8d0
    0   4   1   5 c4t9d0
    0   4  23  27 c4t10d0
    0  20   5  25 c4t11d0
    0   7  13  20 c4t12d0
....

así que lo ha originado un disco de la c4

Mirando el messages aparecen estos mensajes (he eliminado los mensajes intermedios del sshd)

May 31 10:48:11 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:11 seal.macc.unican.es     mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31123000
May 31 10:48:11 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:11 seal.macc.unican.es     mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31123000
May 31 10:48:13 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:13 seal.macc.unican.es     Log info 0x31123000 received for target 12.
May 31 10:48:13 seal.macc.unican.es     scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
May 31 10:48:13 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:13 seal.macc.unican.es     Log info 0x31123000 received for target 12.
May 31 10:48:13 seal.macc.unican.es     scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
May 31 10:48:13 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:13 seal.macc.unican.es     Log info 0x31123000 received for target 12.
May 31 10:48:13 seal.macc.unican.es     scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
May 31 10:48:13 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:13 seal.macc.unican.es     Log info 0x31123000 received for target 12.
May 31 10:48:13 seal.macc.unican.es     scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
May 31 10:48:16 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:16 seal.macc.unican.es     mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31111000
May 31 10:48:16 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:16 seal.macc.unican.es     mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31111000
May 31 10:48:16 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:16 seal.macc.unican.es     mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31112000
May 31 10:48:16 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:16 seal.macc.unican.es     mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31112000
May 31 10:48:17 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:17 seal.macc.unican.es     Log info 0x31111000 received for target 12.
May 31 10:48:17 seal.macc.unican.es     scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
May 31 10:48:20 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:20 seal.macc.unican.es     SAS Discovery Error on port 0. DiscoveryStatus is DiscoveryStatus is |Unaddressable device found|
May 31 10:48:22 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:22 seal.macc.unican.es     mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31123000
May 31 10:48:22 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:22 seal.macc.unican.es     mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31123000
May 31 10:48:27 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:27 seal.macc.unican.es     mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31111000
May 31 10:48:27 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:27 seal.macc.unican.es     mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31111000
May 31 10:48:27 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:27 seal.macc.unican.es     mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31112000
May 31 10:48:27 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:27 seal.macc.unican.es     mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31112000
May 31 10:48:28 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:28 seal.macc.unican.es     Log info 0x31111000 received for target 12.
May 31 10:48:28 seal.macc.unican.es     scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
May 31 10:48:31 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:31 seal.macc.unican.es     SAS Discovery Error on port 0. DiscoveryStatus is DiscoveryStatus is |Unaddressable device found|
May 31 10:48:34 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:34 seal.macc.unican.es     mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31123000
May 31 10:48:34 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:34 seal.macc.unican.es     mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31123000
May 31 10:48:38 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:38 seal.macc.unican.es     mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31111000
May 31 10:48:38 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:38 seal.macc.unican.es     mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31111000
May 31 10:48:38 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:38 seal.macc.unican.es     mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31112000
May 31 10:48:38 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:38 seal.macc.unican.es     mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31112000
May 31 10:48:40 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:40 seal.macc.unican.es     Log info 0x31111000 received for target 12.
May 31 10:48:40 seal.macc.unican.es     scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
May 31 10:48:43 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:43 seal.macc.unican.es     SAS Discovery Error on port 0. DiscoveryStatus is DiscoveryStatus is |Unaddressable device found|
May 31 10:48:45 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:45 seal.macc.unican.es     mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31123000
May 31 10:48:45 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:45 seal.macc.unican.es     mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31123000
May 31 10:48:49 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:49 seal.macc.unican.es     mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31111000
May 31 10:48:49 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:49 seal.macc.unican.es     mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31111000
May 31 10:48:49 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:49 seal.macc.unican.es     mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31112000
May 31 10:48:49 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:49 seal.macc.unican.es     mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31112000
May 31 10:48:51 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:51 seal.macc.unican.es     Log info 0x31111000 received for target 12.
May 31 10:48:51 seal.macc.unican.es     scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
May 31 10:48:54 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:54 seal.macc.unican.es     SAS Discovery Error on port 0. DiscoveryStatus is DiscoveryStatus is |Unaddressable device found|
May 31 10:48:56 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:56 seal.macc.unican.es     mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31123000
May 31 10:48:56 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:56 seal.macc.unican.es     mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31123000
May 31 10:48:59 seal.macc.unican.es scsi: [ID 107833 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:48:59 seal.macc.unican.es     Disconnected command timeout for Target 10
May 31 10:49:01 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:49:01 seal.macc.unican.es     mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31111000
May 31 10:49:01 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:49:01 seal.macc.unican.es     mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31112000
May 31 10:49:01 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:49:01 seal.macc.unican.es     Log info 0x31140000 received for target 10.
May 31 10:49:01 seal.macc.unican.es     scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
May 31 10:49:01 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:49:01 seal.macc.unican.es     mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31111000
May 31 10:49:01 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:49:01 seal.macc.unican.es     mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31112000
May 31 10:49:01 seal.macc.unican.es scsi: [ID 107833 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:49:01 seal.macc.unican.es     passthrough command timeout
May 31 10:49:01 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:49:01 seal.macc.unican.es     Rev. 8 LSI, Inc. 1068E found.
May 31 10:49:01 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:49:01 seal.macc.unican.es     mpt2 supports power management.
May 31 10:49:02 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:49:02 seal.macc.unican.es     mpt2: IOC Operational.
May 31 10:49:16 seal.macc.unican.es scsi: [ID 107833 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:49:16 seal.macc.unican.es     Can only start 1 task management command at a time
May 31 10:50:16 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:50:16 seal.macc.unican.es     Rev. 8 LSI, Inc. 1068E found.
May 31 10:50:16 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:50:16 seal.macc.unican.es     mpt2 supports power management.
May 31 10:50:16 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:50:16 seal.macc.unican.es     mpt2: IOC Operational.
May 31 10:50:47 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:50:47 seal.macc.unican.es     Rev. 8 LSI, Inc. 1068E found.
May 31 10:50:47 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:50:47 seal.macc.unican.es     mpt2 supports power management.
May 31 10:50:50 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:50:50 seal.macc.unican.es     mpt2: IOC Operational.
May 31 10:51:16 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:51:16 seal.macc.unican.es     Rev. 8 LSI, Inc. 1068E found.
May 31 10:51:16 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:51:16 seal.macc.unican.es     mpt2 supports power management.
May 31 10:51:20 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:51:20 seal.macc.unican.es     mpt2: IOC Operational.
May 31 10:52:46 seal.macc.unican.es scsi: [ID 107833 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:52:46 seal.macc.unican.es     Disconnected command timeout for Target 11
May 31 10:52:47 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:52:47 seal.macc.unican.es     Log info 0x31140000 received for target 11.
May 31 10:52:47 seal.macc.unican.es     scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
May 31 10:52:47 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:52:47 seal.macc.unican.es     Log info 0x31130000 received for target 11.
May 31 10:52:47 seal.macc.unican.es     scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
May 31 10:52:47 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:52:47 seal.macc.unican.es     Log info 0x31130000 received for target 11.
May 31 10:52:47 seal.macc.unican.es     scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
May 31 10:52:47 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:52:47 seal.macc.unican.es     Log info 0x31130000 received for target 11.
May 31 10:52:47 seal.macc.unican.es     scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
May 31 10:52:47 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:52:47 seal.macc.unican.es     Log info 0x31130000 received for target 11.
May 31 10:52:47 seal.macc.unican.es     scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
May 31 10:52:51 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:52:51 seal.macc.unican.es     mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31111000
May 31 10:52:51 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:52:51 seal.macc.unican.es     mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31111000
May 31 10:52:53 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:52:53 seal.macc.unican.es     Log info 0x31111000 received for target 11.
May 31 10:52:53 seal.macc.unican.es     scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
May 31 10:52:56 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:52:56 seal.macc.unican.es     SAS Discovery Error on port 0. DiscoveryStatus is DiscoveryStatus is |Unaddressable device found|
May 31 10:53:37 seal.macc.unican.es scsi: [ID 107833 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:53:37 seal.macc.unican.es     passthrough command timeout
May 31 10:53:37 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:53:37 seal.macc.unican.es     Rev. 8 LSI, Inc. 1068E found.
May 31 10:53:37 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:53:37 seal.macc.unican.es     mpt2 supports power management.
May 31 10:53:37 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2):
May 31 10:53:37 seal.macc.unican.es     mpt2: IOC Operational.
May 31 10:54:10 seal.macc.unican.es fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major
May 31 10:54:10 seal.macc.unican.es EVENT-TIME: Thu May 31 10:54:09 CEST 2012
May 31 10:54:10 seal.macc.unican.es PLATFORM: X8DTH-i-6-iF-6F, CSN: 1234567890, HOSTNAME: seal.macc.unican.es
May 31 10:54:10 seal.macc.unican.es SOURCE: zfs-diagnosis, REV: 1.0
May 31 10:54:10 seal.macc.unican.es EVENT-ID: 5d33a13b-61e3-cf16-86a7-e9587d510170
May 31 10:54:10 seal.macc.unican.es DESC: The number of I/O errors associated with a ZFS device exceeded
May 31 10:54:10 seal.macc.unican.es          acceptable levels.  Refer to http://sun.com/msg/ZFS-8000-FD for more information.
May 31 10:54:10 seal.macc.unican.es AUTO-RESPONSE: The device has been offlined and marked as faulted.  An attempt
May 31 10:54:10 seal.macc.unican.es          will be made to activate a hot spare if available.
May 31 10:54:10 seal.macc.unican.es IMPACT: Fault tolerance of the pool may be compromised.
May 31 10:54:10 seal.macc.unican.es REC-ACTION: Run 'zpool status -x' and replace the bad device.

(entiendo que c4 es mpt2 verdad?)

mirando el iostat, parece que es el c4t11d0 el que primero falla, pero el messages empieza con errores en el target 12 (c4t12d0?) luego con un error en el port 0 (disco sas?), luego errores con el target 10 y parece que vuelve a descubrir la controladora (Rev. 8 LSI, Inc. 1068E found), más de una vez, termnando fallando el target 11 y finalmente el ZFS de cuescade que algo sestá pasando.

Ayer mandé un mensaje a la lista zfs-discuss, ya que parece que siempre que falla un disco empienzan a haber probelmascon los discos del mismo backplane: http://mail.opensolaris.org/pipermail/zfs-discuss/2012-May/051600.html

Así que es un preblema bien conocido cuando se usan discos SATA conectados a SAS expanders, haciendo que un fallo de un SATA haga que el canal de comunicación que tiene el expandar con la HBA se caiga.

Change History (2)

comment:1 Changed 10 years ago by fernando

En todo este mejunge estaba yo de por medio.
El fallo en la c4 comenzo, como no con el smartmontool ejecutado a todo el sistema.
Y paso lo que has descrito.

Como no tenia claro que el disco c4t11 fallese de verdad (los parametretos S.M.A.R.T parecen correctos), te muestro una captura del primer momento;

         extended device statistics    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    2.3    0.0   60.7    0.0  3.0  7.0 1321.7 3082.1   1 265 c4
    0.0    0.0    1.1    0.0  0.0  2.0    0.1 120007.9   0 100 c4t1d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c4t2d0
    1.1    0.0   30.8    0.0  0.0  0.0    0.0    4.6   0   0 c4t3d0
    1.1    0.0   28.9    0.0  0.0  0.0    0.0    7.9   0   0 c4t4d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c4t5d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c4t6d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c4t7d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c4t8d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c4t9d0
    0.0    0.0    0.0    0.0  0.0  4.0    0.0 239995.7   0 100 c4t10d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c4t11d0
    0.0    0.0    0.0    0.0  3.0  1.0 179744.6 58331.8  88  64 c4t12d0
   14.2    0.0  455.0    0.0  0.0  0.2    0.0   14.5   0   5 c5


Volvi a poner al c4t11 dentro, y de hay que parece que todo fuese tan rapido:

 zpool online oceano c4t11d0
 pfexec zpool clear oceano c4t11d0
 zpool detach oceano c8t24d0 **este es el disco hot spare que habia entrado**

Tengo sospechas que hay un problema en el c4t1

comment:2 Changed 9 years ago by fernando

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.