I had a drive fail in one of my vdevs setting the state of the zpool to degraded: root@mediaserver:~# zpool status pool: zfsraid state: DEGRADED
So I started taking steps to replace the drive.
This is the id of the old disk: wwn-0x5000cca225f459d5 This is the id of the replacement disk: wwn-0x5000c5006e38bc61
1) offline old disk:
zpool offline zfsraid wwn-0x5000cca225f459d5
2) Physically replace old disk with new disk
3) Issue replace command:
zpool replace -o ashift=12 zfsraid wwn-0x5000cca225f459d5 wwn-0x5000c5006e38bc61
The replace command fails with:
root@mediaserver:~# zpool replace -o ashift=12 zfsraid wwn-0x5000cca225f459d5 wwn-0x5000c5006e38bc61
**invalid vdev specification
use '-f' to override the following errors:
/dev/disk/by-id/wwn-0x5000c5006e38bc61 does not contain an EFI label but it may contain partition
information in the MBR.**
I can't seem to find any information to help. A few forums said to use the -f option, but that seems sketchy. There are no partitions listed on the new drive
root@mediaserver:~# fdisk -l /dev/disk/by-id/wwn-0x5000c5006e38bc61
Disk /dev/disk/by-id/wwn-0x5000c5006e38bc61: 3000.6 GB, 3000592982016 bytes
255 heads, 63 sectors/track, 364801 cylinders, total 5860533168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000
Disk /dev/disk/by-id/wwn-0x5000c5006e38bc61 doesn't contain a valid partition table
root@mediaserver:~#
Do I have to run some command to wipe the new drive?
These are the last few lines in dmesg relating to the drive:
[420274.400024] scsi 11:0:8:0: Direct-Access ATA ST3000DM001-1CH1 CC29 PQ: 0 ANSI: 6
[420274.400036] scsi 11:0:8:0: SATA: handle(0x000f), sas_addr(0x4433221107000000), phy(7), device_name(0x0000000000000000)
[420274.400039] scsi 11:0:8:0: SATA: enclosure_logical_id(0x5000000080000000), slot(4)
[420274.400130] scsi 11:0:8:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[420274.400134] scsi 11:0:8:0: qdepth(32), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
[420274.400502] sd 11:0:8:0: Attached scsi generic sg17 type 0
[420274.401375] sd 11:0:8:0: [sdr] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB)
[420274.401377] sd 11:0:8:0: [sdr] 4096-byte physical blocks
[420274.475163] sd 11:0:8:0: [sdr] Write Protect is off
[420274.475166] sd 11:0:8:0: [sdr] Mode Sense: 7f 00 10 08
[420274.475966] sd 11:0:8:0: [sdr] Write cache: enabled, read cache: enabled, supports DPO and FUA
[420274.554649] sdr: unknown partition table
[420274.646245] sd 11:0:8:0: [sdr] Attached SCSI disk
This is the version of ubuntu i'm running:
Ubuntu 12.04.3 LTS \n \l
root@mediaserver:~# uname -a
Linux mediaserver 3.5.0-44-generic #67~precise1-Ubuntu SMP Wed Nov 13 16:16:57 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
@Matt, here's more detail.
TL;DR:
To make the new drive usable to replace the failed one, use parted and
mklabel GPT
Extended Dance Remix Version:
I had this exact issue and resolved it tonight. I'm using Debian Squeeze (6.0.10) with zfs on linux (0.6.0-1) and 3 x 1TB drives.
root@host:~# zpool status
pool: dead_pool
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scan: resilvered 6.09G in 3h10m with 0 errors on Tue Sep 1 11:15:24 2015
config:
NAME STATE READ WRITE CKSUM
dead_pool DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
scsi-SATA_<orig_device_1> ONLINE 0 0 0
scsi-SATA_<orig_device_2> ONLINE 0 0 0
scsi-SATA_<orig_device_3> FAULTED 0 3 0 too many errors
Yikes. I went out this afternoon and bought a new drive of the same size (different make/model), powered off and installed it alongside the three existing zfs drives. Power up again, and I watched the flood of I/O errors on the old drive as the system booted. Terrifying stuff.
To replace the old with the new in zfs:
New disk device: /dev/disk/by-id/scsi-SATA_
root@host:~# zpool offline dead_pool 1784233895253655477
root@host:~# zpool replace dead_pool 1784233895253655477 /dev/disk/by-id/scsi-SATA_<new_device_id>
invalid vdev specification
use '-f' to override the following errors:
/dev/disk/by-id/scsi-SATA_<new_device_id> does not contain an EFI label but it may contain partition
information in the MBR.
This is where @Matt's question comes in to play. Use parted to setup a GPT (thanks systutorials.com):
root@host:~# parted /dev/disk/by-id/scsi-SATA_<new_device_id>
GNU Parted 2.3
Using /dev/sde
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) mklabel GPT
(parted) q
Information: You may need to update /etc/fstab.
Try the replace again:
root@host:~# zpool replace dead_pool 1784233895253655477 /dev/disk/by-id/scsi-SATA_<new_device_id>
root@host:~#
Great it returned successfully. Now check zpool again:
root@host:~# zpool status
pool: dead_pool
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Thu Sep 3 22:31:25 2015
23.8G scanned out of 690G at 19.7M/s, 9h35m to go
7.93G resilvered, 3.45% done
config:
NAME STATE READ WRITE CKSUM
dead_pool DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
scsi-SATA_<orig_device_1> ONLINE 0 0 0
scsi-SATA_<orig_device_2> ONLINE 0 0 0
replacing-2 OFFLINE 0 0 0
scsi-SATA_<orig_device_3> OFFLINE 0 0 0
scsi-SATA_<new_device_id> ONLINE 0 0 0 (resilvering)
Hope this helps.