Using generic drives in a Netapp FAS250

By thomas, 15 May, 2009
We use a retired netapp as a backup with snapmirror. We don't have support on the device anymore and one of the disks failed. I took the drive out and discovered it was a SEAGATE ST3146807FC. I found this site that said it was possible to flash the firmware on a generic drive for use in a network appliance.

Since out FAS250 does not have a shelf of disks as vardomskiy is using in his example, I had to find a way to connect the SCA40 drive to a host controller.

I found on ebay a Emulex LightPulse 850 PCI card for 10$ with a DB9 connector. I found this adapter to convert from the SCA40 connector on the drive to a standard copper fibre channel connector (DB9). I then needed to purchase a NetApp shelf to shelf cable to connect the adapter to the LP850 card. The final part of this setup is a fibre channel copper terminator (I'm still not sure what this is, if it's a resistor or a straight loopback setup...need to take it apart).

Now, my choice of the LP850 presented more than a few problems...the LP850 is not supported by the current lpfc driver. I had to download an older version of the driver here. This driver was made to for the 2.2 and 2.4 series of kernels. I had to install an ancient version of the operating system to get this card to work.

[root@fcal root]# tar xf lpfc-i386.tar [root@fcal root]# cd SourceBuild/ [root@fcal SourceBuild]# make Build Environment root: /lib/modules/2.4.20-8/build cc -D__GENKSYMS__ -D__KERNEL__=1 -D__SMP__=1 -DMODULE -DMODVERSIONS -include /lib/modules/2.4.20-8/build/include/linux/modversions.h -I./include -I/lib/modules/2.4.20-8/build/drivers/scsi -I/lib/modules/2.4.20-8/build/include/scsi -I/lib/modules/2.4.20-8/build/include -DLP6000 -D_LINUX -I./include -I/lib/modules/2.4.20-8/build/drivers/scsi -I/lib/modules/2.4.20-8/build/include/scsi -I/lib/modules/2.4.20-8/build/include -E fcLINUXfcp.c > lpfc.ver1 In file included from fcLINUXfcp.c:164: /lib/modules/2.4.20-8/build/include/linux/module.h:15:1: warning: "_set_ver" redefined In file included from /lib/modules/2.4.20-8/build/include/linux/modversions.h:4, from :1: /lib/modules/2.4.20-8/build/include/linux/modsetver.h:9:1: warning: this is the location of the previous definition cat lpfc.ver1 | /sbin/genksyms -k 2.2.5 > lpfc.ver cc -Wall -O2 -fomit-frame-pointer -D__KERNEL__=1 -D__SMP__=1 -DMODULE -DMODVERSIONS -include /lib/modules/2.4.20-8/build/include/linux/modversions.h -I./include -I/lib/modules/2.4.20-8/build/drivers/scsi -I/lib/modules/2.4.20-8/build/include/scsi -I/lib/modules/2.4.20-8/build/include -DLP6000 -D_LINUX -I./include -I/lib/modules/2.4.20-8/build/drivers/scsi -I/lib/modules/2.4.20-8/build/include/scsi -I/lib/modules/2.4.20-8/build/include -c fcLINUXfcp.c fcLINUXfcp.c: In function `lpfc_do_dpc': fcLINUXfcp.c:1715: structure has no member named `sigmask_lock' make: *** [build] Error 1

I'm not sure what is going on here, but there is indeed nothing called sigmask_lock in the task_struct. The lines in fcLINUXfcp.c that correspond to this error are trying to use spinlock on the irq, since our machine is only going to be used for relabeling a drive, I didn't think it was a big deal to comment this out...

--- SourceBuild/fcLINUXfcp.c.uphill 2009-05-14 18:01:31.000000000 -0400 +++ SourceBuild/fcLINUXfcp.c 2009-05-14 18:01:50.000000000 -0400 @@ -1712,9 +1712,9 @@ if( signal_pending(current) ) { iflg = 0; - spin_lock_irqsave(&current->sigmask_lock, iflg); + //spin_lock_irqsave(&current->sigmask_lock, iflg); flush_signals(current); - spin_unlock_irqrestore(&current->sigmask_lock, iflg); + //spin_unlock_irqrestore(&current->sigmask_lock, iflg); /* Only allow our driver unload to kill the KP */ if( ldp->dpc_notify != NULL )

After making this change (you can apply the above as a patch...) The code compiles cleanly.

[root@fcal SourceBuild]# make ... cp lpfcdriver lpfcdriver.o ld -r -o lpfcdd.2.4.20-8.o lpfcdriver.o fcLINUXfcp.o lpfc.conf.o ld -r -o lpfndd.2.4.20-8.o fcLINUXlan.o

The driver created is lpfcdd.2.4.20-8.o, inserting this resulted in an error that I didn't bother rectifying (lazy, sorry, if anyone knows how to fix, let me know).

[root@fcal SourceBuild]# insmod lpfcdd.2.4.20-8.o lpfcdd.2.4.20-8.o: The module you are trying to load (lpfcdd.2.4.20-8.o) is compiled with a gcc version 2 compiler, while the kernel you are running is compiled with a gcc version 3 compiler. This is known to not work.

The "fix" I opted for was to force the loading of the module.

[root@fcal SourceBuild]# insmod -f lpfcdd.2.4.20-8.o Warning: The module you are trying to load (lpfcdd.2.4.20-8.o) is compiled with a gcc version 2 compiler, while the kernel you are running is compiled with a gcc version 3 compiler. This is known to not work. Warning: loading lpfcdd.2.4.20-8.o will taint the kernel: no license See http://www.tux.org/lkml/#export-tainted for information about tainted modules Warning: loading lpfcdd.2.4.20-8.o will taint the kernel: forced load Module lpfcdd.2.4.20-8 loaded, with warnings [root@fcal SourceBuild]# tail /var/log/messages May 14 18:07:41 fcal kernel: Emulex LightPulse FC SCSI/IP 4.20p May 14 18:07:42 fcal kernel: !lpfc0:031:Link Up Event received Data: 1 1 1 2 May 14 18:07:45 fcal kernel: scsi1 : Emulex LPFC (LP850) SCSI on PCI bus 01 device 40 irq 3 May 14 18:07:45 fcal kernel: Vendor: SEAGATE Model: ST3146807FC Rev: 0006 May 14 18:07:45 fcal kernel: Type: Direct-Access ANSI SCSI revision: 03 [root@fcal SourceBuild]#

Now that the drive is recognized, we can continue with vardomskiy's method using fwdl.

Using sysconfig -v I was able to verify that the NetApp names the ST3146807FC as X274_SCHT6146F10. Our filer is running OnTap 7.3.1 and the latest firmware for that drive is NA16, so looking in the /etc/disk_fw directory we find X274_SCHT6146F10.NA16.LOD

Compiling fwdl was trivial. So all that is left to do is use fwdl to update the firmware on the drive.

[root@fcal fwdl-1.2.3]# make g++ -O2 -o fwdl -Dlinux -DDEBUG fwdl.C fwdl-linux.c [root@fcal fwdl-1.2.3]# But we don't know what the device id is of the drive yet, I used seatools from seagate to perform this part.

[root@fcal root]# tar xf seatools_cli.tar [root@fcal root]# ./st -l Host adapter information: SCSI host adapter emulation for IDE ATAPI devices Emulex LPFC (LP850) SCSI on PCI bus 01 device 40 irq 3 Drive information: /dev/sga SEAGATE ST3146807FC 0006 286749487 blocks [root@fcal root]#

Ok, now we can actually do the firmware upgrade.

[root@fcal fwdl-1.2.3]# ./fwdl /dev/sga X274_SCHT6146F10.NA16.LOD Gathering inquiry data from the drive...done Device Type: Disk Removable: 0 ISO Version: 3 Response Data Format: 12 Additional Length: 8b Option Bits: a Vendor ID: SEAGATE Product ID: ST3146807FC Revision Level: 0006 Vendor Specific: 3HY6LGTW About to update drive firmware. This could render the drive unusable. Are you certain you want to continue? [yN] y About to update firmware...1...2...3...4...5...updating firmware... done. [root@fcal fwdl-1.2.3]# cd .. [root@fcal root]# ./st -l Host adapter information: SCSI host adapter emulation for IDE ATAPI devices Emulex LPFC (LP850) SCSI on PCI bus 01 device 40 irq 3 Drive information: /dev/sga NETAPP X274_SCHT6146F10 NA16 Cannot read capacity (Sense data = 03/31/00) [root@fcal root]#

Almost done, the NetApp uses 520 Byte sectors so we need to reformat the drive now. I used sg3_utils to do this.

[root@fcal root]# cd sg3_utils-1.27/src [root@fcal src]# ./sg_format --format --size=520 --verbose /dev/sga inquiry cdb: 12 00 00 00 24 00 NETAPP X274_SCHT6146F10 NA16 peripheral_type: disk [0x0] PROTECT=0 mode sense (10) cdb: 5a 00 01 00 00 00 00 00 fc 00 mode sense (10): requested 252 bytes but got 28 bytes Mode Sense (block descriptor) data, prior to changes: Number of blocks=280790184 [0x10bc84a8] Block size=520 [0x208] A FORMAT will commence in 10 seconds ALL data on /dev/sga will be DESTROYED Press control-C to abort A FORMAT will commence in 5 seconds ALL data on /dev/sga will be DESTROYED Press control-C to abort format cdb: 04 18 00 00 00 00 Format has started Format in progress, 0% done Format in progress, 0% done Format in progress, 0% done Format in progress, 0% done ... FORMAT Complete [root@fcal root]# dmesg |tail -3 Attached scsi disk sda at scsi1, channel 0, id 12, lun 0 SCSI device sda: 286749488 512-byte hdwr sectors (146816 MB) sda: unknown partition table

Now that the drive is formated and showing up as 144GB, we need to get the netapp to recognise the drive. Since I'm doing this piecemeal I don't like the initialize all disks option put forward by vardomskiy. I instead tried owning the disk and then copying another drive to it in order to fool the netapp into thinking it had already labelled the disk.

fs> sysconfig -r Broken disks RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) --------- ------ ------------- ---- ---- ---- ----- -------------- -------------- bad label 0b.22 0b 1 6 FC:B - FCAL 10000 136000/278528000 137104/280790184 fs> reboot Starting AUTOBOOT press any key to abort... Loading: 0xffffffff80001000/25888 0xffffffff80007520/15502440 Entry at 0xffffffff80001000 Starting program at 0xffffffff80001000 Press CTRL-C for special boot menu ..................................................Special boot options menu will be available. NetApp Release 7.3.1: Thu Jan 8 01:24:50 PST 2009 Copyright (c) 1992-2008 NetApp. Starting boot on Fri May 15 12:58:16 GMT 2009 Fri May 15 12:58:20 GMT [nvram.battery.state:info]: The NVRAM battery is currently ON. Fri May 15 12:58:25 GMT [diskown.isEnabled:info]: software ownership has been enabled for this system (1) Normal boot. (2) Boot without /etc/rc. (3) Change password. (4) Initialize owned disks (7 disks are owned by this filer). (4a) Same as option 4, but create a flexible root volume. (5) Maintenance mode boot. Selection (1-5)? 5 *> disk_list DISK CHAN VENDOR PRODUCT ID REV SERIAL# HW (BLOCKS BPS) DQ ------------ ----- -------- ---------------- ---- -------------------- -- -------------- -- 0b.19 FC:B NETAPP X274_HJURE146F10 NA14 404F7958 ff 284820800 520 N 0b.20 FC:B NETAPP X274_HJURE146F10 NA14 40456113 ff 284820800 520 N 0b.21 FC:B NETAPP X274_HJURE146F10 NA14 404C9761 ff 284820800 520 N 0b.22 FC:B NETAPP X274_SCHT6146F10 NA16 3HY6LGTW00007428DWHP ff 280790184 520 N 0b.16 FC:B NETAPP X274_SCHT6146F10 NA16 3HY107C9000073480CQK ff 280790184 520 N 0b.17 FC:B NETAPP X274_SCHT6146F10 NA16 3HY0YW9J00007347WSB0 ff 280790184 520 N 0b.18 FC:B NETAPP X274_SCHT6146F10 NA16 3HY0ZK88000073478DKV ff 280790184 520 N *> diskcopy -s 0b.16 -d 0b.22 You are about to copy over disk 0b.22 with the contents of disk 0b.16. Retries at the SCSI layer are: ENABLED I/O size is 4096 sectors Any data on disk 0b.22 will be lost! Are you sure you want to continue with diskcopy? y Copying from disk 0b.16 to disk 0b.22. 600 MB copied - Copy operation of 68553 MB from disk 0b.16 to disk 0b.22 has completed. NOTE: disk 0b.16 must be removed from the system prior to rebooting! *> halt

Now, take 0b.22 out of the netapp and reboot the filer. After it has rebooted, stick the drive back in to have it marked as spare.

Fri May 15 10:43:40 EDT [ses.channel.rescanInitiated:info]: Initiating rescan on channel 0b. Fri May 15 10:43:49 EDT [raid.assim.disk.spare:notice]: Sparing Disk /0b.22 Shelf 1 Bay 6 [NETAPP X274_SCHT6146F10 NA16] S/N [3e Fri May 15 10:43:50 EDT [sfu.firmwareUpToDate:info]: Firmware is up-to-date on all disk shelves. fs> sysconfig -r ... Spare disks RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) --------- ------ ------------- ---- ---- ---- ----- -------------- -------------- Spare disks for block or zoned checksum traditional volumes or aggregates spare 0b.22 0b 1 6 FC:B - FCAL 10000 136000/278528000 137104/280790184 (not zeroed) fs>

Done. The drive is now available as a spare.