Opened 10 years ago
Closed 9 years ago
#166 closed defect (fixed)
Errores en el pool oceano
Reported by: | antonio | Owned by: | fernando |
---|---|---|---|
Priority: | critical | Milestone: | |
Component: | Cluster | Keywords: | |
Cc: | antonio |
Description
El viernes 25, inicié un scrub del pool oceano y el domingo (27 a las 2200 horas), cuando le quedaba poco para completar el scrubbing, el pool empezó a fallar.
Ha debido fallar un disco de la controladora c5 ya que han sido los discos asociados a esa controladoras las que han empezado a fallar en cadena, haciendo el pool inusable.
Han entrado a jugar los spares, pero el sistema ha seguido fallando ya que el sistema sigue intentando acceder al dispositivo defectuoso haciendo que fallen los discos de esa controladora en cadena.
He detectado que el disco que daba fallos era el
c5t5000CCA369C52EE4d0
ya que ra el que mas fallos tenia usando el comando iostat
Lo he puesto en modo offline, para evitar que el sistema lo siga usando. Hay que tener cuidado ya que al reiniciar el sistema el disco vuelve a estar online. Como podemos hacer que el pool lo descarte y no lo incluya para recuperarse?
El sistema ha puesto a funcionar además otro spare, aunque no es necesario. Como se quita el spare del resilver?
admin@seal.macc.unican.es:~$ zpool status pool: oceano state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h1m, 0.02% done, 107h40m to go config: NAME STATE READ WRITE CKSUM oceano DEGRADED 0 0 0 raidz2-0 ONLINE 0 0 0 c5t5000CCA369C5A416d0 ONLINE 0 0 0 c5t5000CCA369C5A420d0 ONLINE 0 0 0 c5t5000CCA369C5A432d0 ONLINE 0 0 0 c10t5000CCA369C505D5d0 ONLINE 0 0 0 c10t5000CCA369C506AFd0 ONLINE 0 0 0 c10t5000CCA369C506BBd0 ONLINE 0 0 0 c5t5000CCA369C5C19Ad0 ONLINE 0 0 0 c10t5000CCA369C508C9d0 ONLINE 0 0 0 c5t5000CCA369C52E05d0 ONLINE 0 0 0 c10t5000CCA369C508E0d0 ONLINE 0 0 0 c10t5000CCA369C50609d0 ONLINE 0 0 0 raidz2-1 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 c4t7d0 ONLINE 0 0 0 c8t10d0 ONLINE 0 0 0 c8t11d0 ONLINE 0 0 0 c8t12d0 ONLINE 0 0 0 c8t13d0 ONLINE 0 0 0 c8t14d0 ONLINE 0 0 0 c8t15d0 ONLINE 0 0 0 c8t16d0 ONLINE 0 0 0 c8t17d0 ONLINE 0 0 0 raidz2-2 ONLINE 0 0 0 c4t8d0 ONLINE 0 0 0 c4t9d0 ONLINE 0 0 0 c4t10d0 ONLINE 0 0 0 c4t11d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c8t18d0 ONLINE 0 0 0 c8t19d0 ONLINE 0 0 0 c8t20d0 ONLINE 0 0 0 c8t21d0 ONLINE 0 0 0 c8t22d0 ONLINE 0 0 0 c8t23d0 ONLINE 0 0 0 raidz2-3 ONLINE 0 0 0 c5t5000CCA369C5A41Dd0 ONLINE 0 0 0 c10t5000CCA369C4E90Bd0 ONLINE 0 0 0 c5t5000CCA369C5A42Dd0 ONLINE 0 0 0 c10t5000CCA369C4F888d0 ONLINE 0 0 0 c5t5000CCA369C5A374d0 ONLINE 0 0 0 c10t5000CCA369C50F1Fd0 ONLINE 0 0 0 c5t5000CCA369C5A407d0 ONLINE 0 0 0 c10t5000CCA369C224D1d0 ONLINE 0 0 0 c5t5000CCA369C5A409d0 ONLINE 0 0 0 c10t5000CCA369C504D1d0 ONLINE 0 0 0 c5t5000CCA369C59954d0 ONLINE 0 0 0 raidz2-4 DEGRADED 0 0 0 spare-0 DEGRADED 0 0 0 replacing-0 DEGRADED 0 0 0 c5t5000CCA369C52EE4d0 OFFLINE 0 0 0 c5t5000CCA369C55766d0 ONLINE 0 0 0 30.9M resilvered c8t24d0 ONLINE 0 0 0 31.0M resilvered c10t5000CCA369C508E5d0 ONLINE 0 0 0 c5t5000CCA369C54C04d0 ONLINE 0 0 0 c10t5000CCA369C508ECd0 ONLINE 0 0 0 c5t5000CCA369C554CAd0 ONLINE 0 0 0 c10t5000CCA369C509D4d0 ONLINE 0 0 0 c5t5000CCA369C598A7d0 ONLINE 0 0 0 c10t5000CCA369C509ECd0 ONLINE 0 0 0 spare-8 ONLINE 0 0 0 c5t5000CCA369C599ACd0 ONLINE 0 0 0 c10t5000CCA369C50370d0 ONLINE 0 0 0 30.9M resilvered c10t5000CCA369C5026Ed0 ONLINE 0 0 0 c10t5000CCA369C50679d0 ONLINE 0 0 0 raidz2-5 ONLINE 0 0 0 c5t5000CCA369C58224d0 ONLINE 0 0 0 c10t5000CCA369C5084Bd0 ONLINE 0 0 0 c5t5000CCA369C5190Dd0 ONLINE 0 0 0 c10t5000CCA369C50680d0 ONLINE 0 0 0 c10t5000CCA369C5177Bd0 ONLINE 0 0 0 c5t5000CCA369C59907d0 ONLINE 0 0 0 c10t5000CCA369C5178Fd0 ONLINE 0 0 0 c5t5000CCA369C59910d0 ONLINE 0 0 0 c10t5000CCA369C47080d0 ONLINE 0 0 0 raidz2-6 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 c4t12d0 ONLINE 0 0 0 c8t7d0 ONLINE 0 0 0 c8t8d0 ONLINE 0 0 0 c8t9d0 ONLINE 0 0 0 c8t1d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 c8t4d0 ONLINE 0 0 0 logs /dev/zvol/dsk/rpool/oceanolog ONLINE 0 0 0 cache c8t5d0 ONLINE 0 0 0 c8t6d0 ONLINE 0 0 0 spares c8t24d0 INUSE currently in use c10t5000CCA369C50370d0 INUSE currently in use
Change History (4)
comment:1 Changed 10 years ago by antonio
comment:2 Changed 10 years ago by fernando
"Lo he puesto en modo offline, para evitar que el sistema lo siga usando. Hay que tener cuidado ya que al reiniciar el sistema el disco vuelve a estar online. Como podemos hacer que el pool lo descarte y no lo incluya para recuperarse?"
Curiosamente es lo que deberia hacer, para que lo hubiese puesto online en le reboot habria que haber puesto -t:
zpool offline [-t] pool device ... Takes the specified physical device offline. While the device is offline, no attempt is made to read or write to the device. This command is not applicable to spares or cache dev- ices. -t Temporary. Upon reboot, the specified physical dev- ice reverts to its previous state.
"El sistema ha puesto a funcionar además otro spare, aunque no es necesario. Como se quita el spare del resilver?"
Una opcion que funciona es la siguiente, el hota spare vuelve a ser un disco de spare pero posiblemente el resilver vuelva a comenzar.
zpool detach "hot spare"
Pero no me ha dejado :
root@seal.macc.unican.es:~# zpool detach oceano c10t5000CCA369C50370d0 cannot detach c10t5000CCA369C50370d0: no valid replicas
comment:3 in reply to: ↑ description Changed 10 years ago by antonio
Ya ha terminado todo el resilver
admin@seal.macc.unican.es:~$ zpool status pool: oceano state: ONLINE scrub: resilver completed after 38h48m with 0 errors on Thu May 31 04:49:33 2012 config: NAME STATE READ WRITE CKSUM oceano ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 c5t5000CCA369C5A416d0 ONLINE 0 0 0 76K resilvered c5t5000CCA369C5A420d0 ONLINE 0 0 0 360K resilvered c5t5000CCA369C5A432d0 ONLINE 0 0 0 354K resilvered c10t5000CCA369C505D5d0 ONLINE 0 0 0 c10t5000CCA369C506AFd0 ONLINE 0 0 0 c10t5000CCA369C506BBd0 ONLINE 0 0 0 c5t5000CCA369C5C19Ad0 ONLINE 0 0 0 81.5K resilvered c10t5000CCA369C508C9d0 ONLINE 0 0 0 c5t5000CCA369C52E05d0 ONLINE 0 0 0 8K resilvered c10t5000CCA369C508E0d0 ONLINE 0 0 0 c10t5000CCA369C50609d0 ONLINE 0 0 0 raidz2-1 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 c4t7d0 ONLINE 0 0 0 c8t10d0 ONLINE 0 0 0 c8t11d0 ONLINE 0 0 0 c8t12d0 ONLINE 0 0 0 c8t13d0 ONLINE 0 0 0 c8t14d0 ONLINE 0 0 0 c8t15d0 ONLINE 0 0 0 c8t16d0 ONLINE 0 0 0 c8t17d0 ONLINE 0 0 0 raidz2-2 ONLINE 0 0 0 c4t8d0 ONLINE 0 0 0 c4t9d0 ONLINE 0 0 0 c4t10d0 ONLINE 0 0 0 c4t11d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c8t18d0 ONLINE 0 0 0 c8t19d0 ONLINE 0 0 0 c8t20d0 ONLINE 0 0 0 c8t21d0 ONLINE 0 0 0 c8t22d0 ONLINE 0 0 0 c8t23d0 ONLINE 0 0 0 raidz2-3 ONLINE 0 0 0 c5t5000CCA369C5A41Dd0 ONLINE 0 0 0 335K resilvered c10t5000CCA369C4E90Bd0 ONLINE 0 0 0 c5t5000CCA369C5A42Dd0 ONLINE 0 0 0 5.50K resilvered c10t5000CCA369C4F888d0 ONLINE 0 0 0 c5t5000CCA369C5A374d0 ONLINE 0 0 0 7K resilvered c10t5000CCA369C50F1Fd0 ONLINE 0 0 0 c5t5000CCA369C5A407d0 ONLINE 0 0 0 6.50K resilvered c10t5000CCA369C224D1d0 ONLINE 0 0 0 c5t5000CCA369C5A409d0 ONLINE 0 0 0 5.50K resilvered c10t5000CCA369C504D1d0 ONLINE 0 0 0 c5t5000CCA369C59954d0 ONLINE 0 0 0 354K resilvered raidz2-4 ONLINE 0 0 0 spare-0 ONLINE 0 0 0 c5t5000CCA369C55766d0 ONLINE 0 0 0 1.78T resilvered c8t24d0 ONLINE 0 0 0 1.78T resilvered c10t5000CCA369C508E5d0 ONLINE 0 0 0 c5t5000CCA369C54C04d0 ONLINE 0 0 0 c10t5000CCA369C508ECd0 ONLINE 0 0 0 c5t5000CCA369C554CAd0 ONLINE 0 0 0 91K resilvered c10t5000CCA369C509D4d0 ONLINE 0 0 0 c5t5000CCA369C598A7d0 ONLINE 0 0 0 94K resilvered c10t5000CCA369C509ECd0 ONLINE 0 0 0 c5t5000CCA369C599ACd0 ONLINE 0 0 0 105K resilvered c10t5000CCA369C5026Ed0 ONLINE 0 0 0 c10t5000CCA369C50679d0 ONLINE 0 0 0 raidz2-5 ONLINE 0 0 0 c5t5000CCA369C58224d0 ONLINE 0 0 0 11K resilvered c10t5000CCA369C5084Bd0 ONLINE 0 0 0 c5t5000CCA369C5190Dd0 ONLINE 0 0 0 7.50K resilvered c10t5000CCA369C50680d0 ONLINE 0 0 0 c10t5000CCA369C5177Bd0 ONLINE 0 0 0 c5t5000CCA369C59907d0 ONLINE 0 0 0 9.50K resilvered c10t5000CCA369C5178Fd0 ONLINE 0 0 0 c5t5000CCA369C59910d0 ONLINE 0 0 0 9.50K resilvered c10t5000CCA369C47080d0 ONLINE 0 0 0 raidz2-6 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 c4t12d0 ONLINE 0 0 0 c8t7d0 ONLINE 0 0 0 c8t8d0 ONLINE 0 0 0 c8t9d0 ONLINE 0 0 0 c8t1d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 c8t4d0 ONLINE 0 0 0 logs /dev/zvol/dsk/rpool/oceanolog ONLINE 0 0 0 cache c8t5d0 ONLINE 0 0 0 c8t6d0 ONLINE 0 0 0 c10t5000CCA369C51558d0 ONLINE 0 0 0 spares c8t24d0 INUSE currently in use c10t5000CCA369C50370d0 AVAIL c5t5000CCA369D347CEd0 AVAIL errors: No known data errors
aunque uno de los spares se ha quitado solo, del otro me ha dejado hacer el detach del spare:
admin@seal.macc.unican.es:~$ pfexec zpool detach oceano c8t24d0 admin@seal.macc.unican.es:~$ zpool status pool: oceano state: ONLINE scrub: resilver completed after 38h48m with 0 errors on Thu May 31 04:49:33 2012 config: NAME STATE READ WRITE CKSUM oceano ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 c5t5000CCA369C5A416d0 ONLINE 0 0 0 76K resilvered c5t5000CCA369C5A420d0 ONLINE 0 0 0 360K resilvered c5t5000CCA369C5A432d0 ONLINE 0 0 0 354K resilvered c10t5000CCA369C505D5d0 ONLINE 0 0 0 c10t5000CCA369C506AFd0 ONLINE 0 0 0 c10t5000CCA369C506BBd0 ONLINE 0 0 0 c5t5000CCA369C5C19Ad0 ONLINE 0 0 0 81.5K resilvered c10t5000CCA369C508C9d0 ONLINE 0 0 0 c5t5000CCA369C52E05d0 ONLINE 0 0 0 8K resilvered c10t5000CCA369C508E0d0 ONLINE 0 0 0 c10t5000CCA369C50609d0 ONLINE 0 0 0 raidz2-1 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 c4t7d0 ONLINE 0 0 0 c8t10d0 ONLINE 0 0 0 c8t11d0 ONLINE 0 0 0 c8t12d0 ONLINE 0 0 0 c8t13d0 ONLINE 0 0 0 c8t14d0 ONLINE 0 0 0 c8t15d0 ONLINE 0 0 0 c8t16d0 ONLINE 0 0 0 c8t17d0 ONLINE 0 0 0 raidz2-2 ONLINE 0 0 0 c4t8d0 ONLINE 0 0 0 c4t9d0 ONLINE 0 0 0 c4t10d0 ONLINE 0 0 0 c4t11d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c8t18d0 ONLINE 0 0 0 c8t19d0 ONLINE 0 0 0 c8t20d0 ONLINE 0 0 0 c8t21d0 ONLINE 0 0 0 c8t22d0 ONLINE 0 0 0 c8t23d0 ONLINE 0 0 0 raidz2-3 ONLINE 0 0 0 c5t5000CCA369C5A41Dd0 ONLINE 0 0 0 335K resilvered c10t5000CCA369C4E90Bd0 ONLINE 0 0 0 c5t5000CCA369C5A42Dd0 ONLINE 0 0 0 5.50K resilvered c10t5000CCA369C4F888d0 ONLINE 0 0 0 c5t5000CCA369C5A374d0 ONLINE 0 0 0 7K resilvered c10t5000CCA369C50F1Fd0 ONLINE 0 0 0 c5t5000CCA369C5A407d0 ONLINE 0 0 0 6.50K resilvered c10t5000CCA369C224D1d0 ONLINE 0 0 0 c5t5000CCA369C5A409d0 ONLINE 0 0 0 5.50K resilvered c10t5000CCA369C504D1d0 ONLINE 0 0 0 c5t5000CCA369C59954d0 ONLINE 0 0 0 354K resilvered raidz2-4 ONLINE 0 0 0 c5t5000CCA369C55766d0 ONLINE 0 0 0 1.78T resilvered c10t5000CCA369C508E5d0 ONLINE 0 0 0 c5t5000CCA369C54C04d0 ONLINE 0 0 0 c10t5000CCA369C508ECd0 ONLINE 0 0 0 c5t5000CCA369C554CAd0 ONLINE 0 0 0 91K resilvered c10t5000CCA369C509D4d0 ONLINE 0 0 0 c5t5000CCA369C598A7d0 ONLINE 0 0 0 94K resilvered c10t5000CCA369C509ECd0 ONLINE 0 0 0 c5t5000CCA369C599ACd0 ONLINE 0 0 0 105K resilvered c10t5000CCA369C5026Ed0 ONLINE 0 0 0 c10t5000CCA369C50679d0 ONLINE 0 0 0 raidz2-5 ONLINE 0 0 0 c5t5000CCA369C58224d0 ONLINE 0 0 0 11K resilvered c10t5000CCA369C5084Bd0 ONLINE 0 0 0 c5t5000CCA369C5190Dd0 ONLINE 0 0 0 7.50K resilvered c10t5000CCA369C50680d0 ONLINE 0 0 0 c10t5000CCA369C5177Bd0 ONLINE 0 0 0 c5t5000CCA369C59907d0 ONLINE 0 0 0 9.50K resilvered c10t5000CCA369C5178Fd0 ONLINE 0 0 0 c5t5000CCA369C59910d0 ONLINE 0 0 0 9.50K resilvered c10t5000CCA369C47080d0 ONLINE 0 0 0 raidz2-6 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 c4t12d0 ONLINE 0 0 0 c8t7d0 ONLINE 0 0 0 c8t8d0 ONLINE 0 0 0 c8t9d0 ONLINE 0 0 0 c8t1d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 c8t4d0 ONLINE 0 0 0 logs /dev/zvol/dsk/rpool/oceanolog ONLINE 0 0 0 cache c8t5d0 ONLINE 0 0 0 c8t6d0 ONLINE 0 0 0 c10t5000CCA369C51558d0 ONLINE 0 0 0 spares c8t24d0 AVAIL c10t5000CCA369C50370d0 AVAIL c5t5000CCA369D347CEd0 AVAIL errors: No known data errors
comment:4 Changed 9 years ago by fernando
- Resolution set to fixed
- Status changed from new to closed
El sistema parece que se está recuperando aunque muy lentamente.