Opened 10 years ago
Closed 9 years ago
#169 closed defect (fixed)
Fallo de disco en el pool oceano de seal
Reported by: | antonio | Owned by: | fernando |
---|---|---|---|
Priority: | critical | Milestone: | |
Component: | Cluster | Keywords: | zfs, sata, sas expander, hba, reset storm |
Cc: | antonio, fernando, valva |
Description
Hoy seal al lanzado WARNING en el pool oceano, ha usado un spare y se ha recuperado solo, todo ello en unos 30 mins
... spare-3 DEGRADED 0 0 0 c4t11d0 FAULTED 5 5 0 too many errors c8t24d0 ONLINE 0 0 0 740M resilvered ...
el iostat muestra errores en los dispositivos de la controladora c4
admin@seal.macc.unican.es:~$ iostat -Cne ---- errors --- s/w h/w trn tot device 0 67 85 152 c4 0 4 8 12 c4t1d0 0 4 0 4 c4t2d0 0 4 8 12 c4t3d0 0 4 18 22 c4t4d0 0 4 2 6 c4t5d0 0 4 2 6 c4t6d0 0 4 2 6 c4t7d0 0 4 3 7 c4t8d0 0 4 1 5 c4t9d0 0 4 23 27 c4t10d0 0 20 5 25 c4t11d0 0 7 13 20 c4t12d0 ....
así que lo ha originado un disco de la c4
Mirando el messages aparecen estos mensajes (he eliminado los mensajes intermedios del sshd)
May 31 10:48:11 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:11 seal.macc.unican.es mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31123000 May 31 10:48:11 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:11 seal.macc.unican.es mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31123000 May 31 10:48:13 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:13 seal.macc.unican.es Log info 0x31123000 received for target 12. May 31 10:48:13 seal.macc.unican.es scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc May 31 10:48:13 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:13 seal.macc.unican.es Log info 0x31123000 received for target 12. May 31 10:48:13 seal.macc.unican.es scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc May 31 10:48:13 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:13 seal.macc.unican.es Log info 0x31123000 received for target 12. May 31 10:48:13 seal.macc.unican.es scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc May 31 10:48:13 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:13 seal.macc.unican.es Log info 0x31123000 received for target 12. May 31 10:48:13 seal.macc.unican.es scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc May 31 10:48:16 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:16 seal.macc.unican.es mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31111000 May 31 10:48:16 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:16 seal.macc.unican.es mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31111000 May 31 10:48:16 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:16 seal.macc.unican.es mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31112000 May 31 10:48:16 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:16 seal.macc.unican.es mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31112000 May 31 10:48:17 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:17 seal.macc.unican.es Log info 0x31111000 received for target 12. May 31 10:48:17 seal.macc.unican.es scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc May 31 10:48:20 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:20 seal.macc.unican.es SAS Discovery Error on port 0. DiscoveryStatus is DiscoveryStatus is |Unaddressable device found| May 31 10:48:22 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:22 seal.macc.unican.es mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31123000 May 31 10:48:22 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:22 seal.macc.unican.es mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31123000 May 31 10:48:27 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:27 seal.macc.unican.es mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31111000 May 31 10:48:27 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:27 seal.macc.unican.es mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31111000 May 31 10:48:27 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:27 seal.macc.unican.es mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31112000 May 31 10:48:27 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:27 seal.macc.unican.es mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31112000 May 31 10:48:28 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:28 seal.macc.unican.es Log info 0x31111000 received for target 12. May 31 10:48:28 seal.macc.unican.es scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc May 31 10:48:31 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:31 seal.macc.unican.es SAS Discovery Error on port 0. DiscoveryStatus is DiscoveryStatus is |Unaddressable device found| May 31 10:48:34 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:34 seal.macc.unican.es mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31123000 May 31 10:48:34 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:34 seal.macc.unican.es mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31123000 May 31 10:48:38 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:38 seal.macc.unican.es mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31111000 May 31 10:48:38 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:38 seal.macc.unican.es mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31111000 May 31 10:48:38 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:38 seal.macc.unican.es mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31112000 May 31 10:48:38 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:38 seal.macc.unican.es mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31112000 May 31 10:48:40 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:40 seal.macc.unican.es Log info 0x31111000 received for target 12. May 31 10:48:40 seal.macc.unican.es scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc May 31 10:48:43 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:43 seal.macc.unican.es SAS Discovery Error on port 0. DiscoveryStatus is DiscoveryStatus is |Unaddressable device found| May 31 10:48:45 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:45 seal.macc.unican.es mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31123000 May 31 10:48:45 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:45 seal.macc.unican.es mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31123000 May 31 10:48:49 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:49 seal.macc.unican.es mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31111000 May 31 10:48:49 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:49 seal.macc.unican.es mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31111000 May 31 10:48:49 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:49 seal.macc.unican.es mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31112000 May 31 10:48:49 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:49 seal.macc.unican.es mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31112000 May 31 10:48:51 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:51 seal.macc.unican.es Log info 0x31111000 received for target 12. May 31 10:48:51 seal.macc.unican.es scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc May 31 10:48:54 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:54 seal.macc.unican.es SAS Discovery Error on port 0. DiscoveryStatus is DiscoveryStatus is |Unaddressable device found| May 31 10:48:56 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:56 seal.macc.unican.es mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31123000 May 31 10:48:56 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:56 seal.macc.unican.es mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31123000 May 31 10:48:59 seal.macc.unican.es scsi: [ID 107833 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:48:59 seal.macc.unican.es Disconnected command timeout for Target 10 May 31 10:49:01 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:49:01 seal.macc.unican.es mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31111000 May 31 10:49:01 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:49:01 seal.macc.unican.es mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31112000 May 31 10:49:01 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:49:01 seal.macc.unican.es Log info 0x31140000 received for target 10. May 31 10:49:01 seal.macc.unican.es scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc May 31 10:49:01 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:49:01 seal.macc.unican.es mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31111000 May 31 10:49:01 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:49:01 seal.macc.unican.es mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31112000 May 31 10:49:01 seal.macc.unican.es scsi: [ID 107833 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:49:01 seal.macc.unican.es passthrough command timeout May 31 10:49:01 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:49:01 seal.macc.unican.es Rev. 8 LSI, Inc. 1068E found. May 31 10:49:01 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:49:01 seal.macc.unican.es mpt2 supports power management. May 31 10:49:02 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:49:02 seal.macc.unican.es mpt2: IOC Operational. May 31 10:49:16 seal.macc.unican.es scsi: [ID 107833 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:49:16 seal.macc.unican.es Can only start 1 task management command at a time May 31 10:50:16 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:50:16 seal.macc.unican.es Rev. 8 LSI, Inc. 1068E found. May 31 10:50:16 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:50:16 seal.macc.unican.es mpt2 supports power management. May 31 10:50:16 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:50:16 seal.macc.unican.es mpt2: IOC Operational. May 31 10:50:47 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:50:47 seal.macc.unican.es Rev. 8 LSI, Inc. 1068E found. May 31 10:50:47 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:50:47 seal.macc.unican.es mpt2 supports power management. May 31 10:50:50 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:50:50 seal.macc.unican.es mpt2: IOC Operational. May 31 10:51:16 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:51:16 seal.macc.unican.es Rev. 8 LSI, Inc. 1068E found. May 31 10:51:16 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:51:16 seal.macc.unican.es mpt2 supports power management. May 31 10:51:20 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:51:20 seal.macc.unican.es mpt2: IOC Operational. May 31 10:52:46 seal.macc.unican.es scsi: [ID 107833 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:52:46 seal.macc.unican.es Disconnected command timeout for Target 11 May 31 10:52:47 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:52:47 seal.macc.unican.es Log info 0x31140000 received for target 11. May 31 10:52:47 seal.macc.unican.es scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc May 31 10:52:47 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:52:47 seal.macc.unican.es Log info 0x31130000 received for target 11. May 31 10:52:47 seal.macc.unican.es scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc May 31 10:52:47 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:52:47 seal.macc.unican.es Log info 0x31130000 received for target 11. May 31 10:52:47 seal.macc.unican.es scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc May 31 10:52:47 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:52:47 seal.macc.unican.es Log info 0x31130000 received for target 11. May 31 10:52:47 seal.macc.unican.es scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc May 31 10:52:47 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:52:47 seal.macc.unican.es Log info 0x31130000 received for target 11. May 31 10:52:47 seal.macc.unican.es scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc May 31 10:52:51 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:52:51 seal.macc.unican.es mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31111000 May 31 10:52:51 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:52:51 seal.macc.unican.es mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31111000 May 31 10:52:53 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:52:53 seal.macc.unican.es Log info 0x31111000 received for target 11. May 31 10:52:53 seal.macc.unican.es scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc May 31 10:52:56 seal.macc.unican.es scsi: [ID 243001 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:52:56 seal.macc.unican.es SAS Discovery Error on port 0. DiscoveryStatus is DiscoveryStatus is |Unaddressable device found| May 31 10:53:37 seal.macc.unican.es scsi: [ID 107833 kern.warning] WARNING: /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:53:37 seal.macc.unican.es passthrough command timeout May 31 10:53:37 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:53:37 seal.macc.unican.es Rev. 8 LSI, Inc. 1068E found. May 31 10:53:37 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:53:37 seal.macc.unican.es mpt2 supports power management. May 31 10:53:37 seal.macc.unican.es scsi: [ID 365881 kern.info] /pci@7a,0/pci8086,3410@9/pci1000,3140@0 (mpt2): May 31 10:53:37 seal.macc.unican.es mpt2: IOC Operational. May 31 10:54:10 seal.macc.unican.es fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major May 31 10:54:10 seal.macc.unican.es EVENT-TIME: Thu May 31 10:54:09 CEST 2012 May 31 10:54:10 seal.macc.unican.es PLATFORM: X8DTH-i-6-iF-6F, CSN: 1234567890, HOSTNAME: seal.macc.unican.es May 31 10:54:10 seal.macc.unican.es SOURCE: zfs-diagnosis, REV: 1.0 May 31 10:54:10 seal.macc.unican.es EVENT-ID: 5d33a13b-61e3-cf16-86a7-e9587d510170 May 31 10:54:10 seal.macc.unican.es DESC: The number of I/O errors associated with a ZFS device exceeded May 31 10:54:10 seal.macc.unican.es acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information. May 31 10:54:10 seal.macc.unican.es AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt May 31 10:54:10 seal.macc.unican.es will be made to activate a hot spare if available. May 31 10:54:10 seal.macc.unican.es IMPACT: Fault tolerance of the pool may be compromised. May 31 10:54:10 seal.macc.unican.es REC-ACTION: Run 'zpool status -x' and replace the bad device.
(entiendo que c4 es mpt2 verdad?)
mirando el iostat, parece que es el c4t11d0 el que primero falla, pero el messages empieza con errores en el target 12 (c4t12d0?) luego con un error en el port 0 (disco sas?), luego errores con el target 10 y parece que vuelve a descubrir la controladora (Rev. 8 LSI, Inc. 1068E found), más de una vez, termnando fallando el target 11 y finalmente el ZFS de cuescade que algo sestá pasando.
Ayer mandé un mensaje a la lista zfs-discuss, ya que parece que siempre que falla un disco empienzan a haber probelmascon los discos del mismo backplane: http://mail.opensolaris.org/pipermail/zfs-discuss/2012-May/051600.html
Así que es un preblema bien conocido cuando se usan discos SATA conectados a SAS expanders, haciendo que un fallo de un SATA haga que el canal de comunicación que tiene el expandar con la HBA se caiga.
Change History (2)
comment:1 Changed 10 years ago by fernando
comment:2 Changed 9 years ago by fernando
- Resolution set to fixed
- Status changed from new to closed
En todo este mejunge estaba yo de por medio.
El fallo en la c4 comenzo, como no con el smartmontool ejecutado a todo el sistema.
Y paso lo que has descrito.
Como no tenia claro que el disco c4t11 fallese de verdad (los parametretos S.M.A.R.T parecen correctos), te muestro una captura del primer momento;
Volvi a poner al c4t11 dentro, y de hay que parece que todo fuese tan rapido:
Tengo sospechas que hay un problema en el c4t1