2009-05-26

My First JBOD, Part 2: Irony

J4200 After unpack­ing, rack­ing, and mount­ing the JBOD, I waited until the week­end had started before pow­er­ing down the server and installing the RAID card. Connected it all up, rebooted into the Adaptec BIOS, and con­fig­ured the 6x 1TB dri­ves into a RAID6 array. After that, I installed the RAID StorageManager off of Sun’s web­site, and then the “Common Array Manager” soft­ware. CAM is sup­posed to pro­vide a web GUI to an organization’s worth of Sun JBODs, so you can update JBOD firmware and query sta­tus and what­not from a sin­gle inter­face. There’s client and server bits writ­ten in Java that run on the var­i­ous boxes, so the data path was going to look like this:

JBOD -> XEN dom0 run­ning remote proxy tool -> XEN domU run­ning web GUI

I say “was going” and “sup­posed to” because all the remote proxy tool in CAM ended up doing was con­sis­tently trig­ger­ing a ker­nel panic in the aacraid dri­ver when­ever it’s detec­tion code fired up.

Take a long drag off the irony of dri­ver and firmware issues, and down­load the latest-n-greatest aacraid dri­ver and firmware from Intel via Sun, and update. Same results. Repeat in var­i­ous con­fig­u­ra­tions, and before throw­ing in the towel, get a basic dump and file a bug. I didn’t put any more seri­ous thought into debug­ging it sim­ply because this whole thing has to be up and run­ning yes­ter­day, and the last time I asked for doc­u­men­ta­tion on the topic, I was rebuffed with a vari­ant of this clas­sic: “If you were smart enough to debug the ker­nel, you wouldn’t need doc­u­men­ta­tion on how to debug the kernel.”

Take a moment to stand in awe of the mas­sive poi­so­nous cobag­gery involved in that state­ment being offered to some­one who wants to help fix a crasher. I’ll wait.

That kind of shit would never fly in any GNOME venue, which is why GNOME kicks so much ass.

Update: The cobag­gery about ker­nel devel­op­ment did not come from Sun or any rep­re­sen­ta­tive of any com­pany involved in open-source, and was unre­lated to this sit­u­a­tion at all. I relate it sim­ply as it per­tains to debug­ging ker­nel issues, and why I don’t do it.

Comment on this...


2009-05-23

My First JBOD: Introduction

This is me set­ting up a JBOD for use by one or more XEN hosts, using pro­fes­sional hard­ware. It’s not a hack, not throw­ing a shit­load of dri­ves into a PC with some “pro­sumer” SATA RAID cards that require you spend weeks fuss­ing with dri­vers and firmware to get even a min­i­mal write per­for­mance out of their under­pow­ered hard­ware RAID.

A for­mer room­mate of mine once setup such a beast using a 12-port SATA card which ended up deliv­er­ing a whop­ping 1 MBps of write speed in a RAID 5 con­fig­u­ra­tion. I sim­ply don’t have time to play around like that these days, so this is me trad­ing cap­i­tal for time.

The host machine is a Sun Fire X4200M2 server with an inter­nal RAID10, run­ning a RHEL 5.3 XEN instal­la­tion. None of the ser­vices cur­rently run­ning on this box are crit­i­cal, which means I can take them down for an hour at the end of the day with­out trou­ble, pro­vided I can get them back up again. I also have the (Memorial Day) week­end to get the new JBOD up and run­ning on this box.

After it’s up, how­ever, I will be host­ing impor­tant business-ey things on var­i­ous vir­tual machines using this JBOD: e-mail, website(s), inter­nal wiki, NAS, along with pri­mary ker­beros, LDAP, cob­bler, pup­pet on the inter­nal RAID; so it’s fairly impor­tant that this get up and work­ing, and be sta­ble once it’s going…

The JBOD itself is a Sun StorageTek J4200 array with a sin­gle IO mod­ule and a PCIe SAS RAID card, run­ning 6x 1TB SATA disks in (even­tu­ally) a RAID6 array. I’d like to play around with inter­est­ing things like redun­dant SATA mul­ti­pathing, but I’m pretty new to the whole stor­age admin area, so I’m not going to be play­ing around with those things on *this* setup…

Comment on this...