IT engineering and a little bit of hacking: 2012

Friday, November 9, 2012

Vegas Poker

I know this is not the usual topic of posts on my blog, but as a longtime poker player who recently had a chance to play some Vegas poker, I found that my pre-trip research just didn't turn up the info I needed. So for the good of the internet, here you go: "Vegas Poker Etiquette for a First Timer."

- take your time folding: it's better to be a slow folder than to fold out of turn. And the guy on your right might be covering his cards, making it more difficult to see if he's folded.

- after you lose, leave. It's against the rules to hang around the table for too long if you're not playing. The exception is that you can stay outside the playing area, which is typically cordoned off.

- don't influence the actions of other players (tell them to call, to fold). It's against the rules.

- bring extra $$. A $100 tournament might have a $25 fee on top.

- if you don't say "raise," it's normally assumed you're calling and need change. This is situational tho: if the bet is $600 and you put down $5500, especially in multiple chips, that's a pretty clear raise.

- put your high value chips in front. This is a rule.

- you can only play the $ that's on the table before a hand starts. So you can put a $100 bill out there and play it instead of chips, but you can't pull out your wallet mid hand and bet up those pocket aces.

- there's usually a monitor someplace visible telling you the blinds, ante, round, time until the blinds are raised, and number of players left if it's a tournament.

- be friendly. If you're a nice guy and you make a faux pas, they'll go easy on you and tell you what you did wrong. If you're a jerk...

- don't string bet. If you dont know what that is, go look it up immediately.

- don't ever tell anyone your cards (or anything about your cards) until the hand is over.

Thursday, November 1, 2012

Honeynet

Now THIS is cool! http://map.honeynet.org/

Tuesday, October 30, 2012

Ports

Quick hits on ports because it keeps being brought up:

There are no 4 port 10GbE cards.
Only onboard ports can be changed from initiator to target or visa versa.
WWPN's are hard coded into the cards and onboard INITIATOR ports.
You can change WWPN's on target ports.

Thursday, October 25, 2012

BJJ!

I went to sleep last night after BJJ, and today is my first day as a blue belt. I was graduated yesterday! 2 herniated disks, a torn MCL, 2 cases of MRSA, a million hyperextended joints and 5 years after I first stepped into a dojo, I am a blue belt. It feels pretty fantastic!

Wednesday, September 12, 2012

Non Disruptive 1-Chassis-to-2-Chassis Transition

Can you non-disruptively transition a clustered single chassis system into two chassis? We included a 7.3.6=> 8.1.1 upgrade to try to take advantage of cf takeover –n, which is used when there is a version mismatch to force a takeover when the other controller halts.

Here’s a timeline of what we tried (on a 3240 in the lab) along with the results:

1. Upgrade B

a. Update B to 8.1.1, fail over to A

b. Move B to new chassis and connect interconnect cable

c. Set B's boolean to false

d. Cf giveback -f

2. Upgrade A

a. Update A to 8.1.1

b. Cf takeover -n failed because the interconnect was determined to be down, so B couldn't see A halting*1

c. A is halted at this point

d. Cf takeover –f failed, because of the version mismatch*2

e. Cf forcetakeover succeeded

f. Set A's boolean to false

g. Cf giveback failed because the interconnect was determined to be down. *3

h. Cf giveback -f succeeded.

3. All appears stable, interconnect is up.

Notes:

*1 “Partner is not UP, NDU Takeover Terminated”

*2 “cf: takeover cannot be performed because of reason (interconnect error)”

*3 “cf monitor all” attached

What we found out:

There is a Boolean env variable that tells each controller whether it’s sharing the chassis with another controller, which is called a “CC” configuration (true = yes, CC config). The cool thing about this variable is that ONTAP will automatically set it to the correct value in two cases:

1. Any time the system is in CC configuration, ONTAP will set the correct value itself (true).
2. Any time the system is in CI configuration (i.e. an IOXM is present), ONTAP will set the correct value itself (false).
3. For all other configurations, ONTAP will not change the value.

Conclusion: The upgrade/cf takeover -n didn't contribute. There is still a viable path for a non-disruptive plan, but it requires a precisely timed halt and cf forcetakeover, which isn’t without risk. Action plan below:

Part 1:

Fail over to A
Move B to new chassis and connect interconnect
Set B's boolean to false
cf giveback -f

Part 2:

Halt A, cf forcetakeover as soon as A drops to LOADER prompt
Set A's boolean to false
Boot A. Interconnect should be up when node reaches 'Waiting for giveback'
cf giveback –f
cf should be enabled

Note: There is also a Boolean env variable that fools the controller into thinking it is in “CI” configuration. It’s an effective override, wasn't useful here.

Wednesday, September 5, 2012

Electric Car

With gas going up, I took a look at the Tesla S model, entry level:

$49,999 (after $7500 credit(
40kWh
125 mile range (optimum)
Recharge: 2 hours

Let's say a 5% interest rate, 8 year payment plan, that's a $633/month payment. $7596/year.

I pay $1920/year car payment + $3200/year in gas. $5100/year total for my car that "recharges" (with gas) in 3 minutes. And my car is slower, older, less cool looking, etc.

But I have to add electricity cost. A 4 cylinder efficient BMW was 3x more expensive to drive than the Tesla, i think my 6 cylinder Sonata is probably 4x.

So the numbers are: $5100 vs $8396. That's 70% more expensive, without the wild cards.

A couple other important things: cost to repair is going to be much more expensive for the Tesla, and how long the $20k battery lasts is a total wild card right now, delivering diminished range and efficiency over time (probably 30-40% less after 5 years, which means a range of only 87 miles). And as hard as I drive, I'll likely see a lower range sooner.

I like that the car is heavy for snow and accidents, but with 125 mile range I'd never make it to Michigan again. And my insurance would go up in the Tesla.

But if gas hits $7gal, the numbers change dramatically...all of a sudden, the two options are on par. Maybe in a couple years! Sorry, Tesla.

http://wheels.blogs.nytimes.com/2012/06/25/tesla-model-s-offers-a-lesson-in-electric-vehicle-economics/
http://seekingalpha.com/article/844561-model-s-makes-inroads-for-tesla

Monday, August 20, 2012

NCIE

Things to study:

ISCSI commands

dm-mp Linux supports dm_mp multipathing type.
HBAnywhere = Emulex
SANsurfer= Qlogic HBA
SnapDrive = HTTP, HTTPS, RPC
Beneficial uses of VLANs: 1) To isolate iSCSI traffic from LAN/WAN traffic. 2) To isolate management traffic from other IP traffic.
zones should be single initiator
A direct connect topology allows for guaranteed maximum network performance for iSCSI.
iSCSI access lists: control which network interfaces on a storage system that aninitiator can access, and limit the number of network interfaces advertised to a host by the storage system.
Power on order: 1) Network Switches. 2) Disk Shelves. 3) Any Tape Backup Devices. 4) NetApp Controller Heads.
Disable ALUA on igroups connecting using the NetApp DSM.

Brocade Switches

Web Tools : Tool to manage Brocade Switches.

cfgshow : CLI command to display all defined zone information.
configshow : CLI command use to verify configuration.

fabricshow : CLI command to display which Brocade switches are connected into a fabric.

supportshow : CLI command to collect detailed diagnostic information from a Brocade FC switch.
switchshow : CLI command to view current nodes connected to the switch

Cisco Switches

Cisco Fabric Manager : Native GUI switch tool for managing Cisco MDS-Series switches and directors.

show zoneset : CLI command to view all defined zone configuration information.

show zoneset active : CLI command to display currently active zoneset.

show version : CLI command to collect information about the firmware version.

NetApp deduplication (ASIS) is enabled on the volume level.

NetApp Host Utilities Kits perform the following functions: 1) They provide properly set disk and HBA timeout values. 2) They identify and set path priorities for NetApp LUNs.

A LUN can be mapped to an ALUA-enabled igroup and a non-ALUA-enabled igroup.

On ESX 4.0 : Path Selection Policy (PSP) for an MSCS LUN should be set to: MRU (Most recently used).

http://cosonok.blogspot.com/2012/02/netapp-ns0-502-study-notes-part-34-san.html

Monday, July 16, 2012

NCIE-SAN

This post is going to be a mess. Here's a whole amalgamation of data I found that I want to keep here on my blog:

1.U_Port (Universal) - Port on the switch when booting wait for something plugged into the port.
2.G_Port (General Purpose Port) - Port on the switch that auto configures itself once the attached node logs in to the fabric. The node may be any N-Port such as an HBA (F_Port) or another switch (E_Port).
3.FL_Port (Fabric Loop Port) - Disk array that supports Fabric (public) addressing and relies on arbitrated loop services. Many switches autoconfigure to FL-Port after the attached array has logged in.
4.F-Port (Fabric Port) - Array controller or HBA that supports Fabric (Public) addressing.
5.E_Port (Expansion Port) - Inter Switch Link (ISL) or switch to switch connection within a fabric. Older Qlogic and Sun switches refer to this as a T_Port. 1

Port	Full Name	Port Function
N-port	network port or node port	Node port used to connect a node to a Fibre Channel switch
F-port	fabric port	Switch port used to connect the Fibre Channel fabric to a node
L-port	loop port	Node port used to connect a node to a Fibre Channel loop
NL-port	network + loop port	Node port which connects to both loops and switches
FL-port	fabric + loop port	Switch port which connects to both loops and switches
E-port	extender port	Used to cascade Fibre Channel switches together
G-port	general port	General purpose port which can be configured to emulate other port types
EX_port	external port	Connection between a fibre channel router and a fibre channel switch; on the switch side, it looks like a normal E_port -- but on the router side, it is a EX_port
TE_port	trunking E-port	Povides standard E_port functions and allows for routing of multiple virtual SANs by modifying the standard Fibre Channel frame upon ingress/egress of the VSAN environment2

The definitions above are pretty unclear. The N port is the port literally on the node (server, storage system, etc) and the F port is the one on the switch. You typically connect from the N port to the F port. E ports run from one switch to another. You can create virtual ports for when you want multiple addresses per physical port.

It appears NPIV is just the method by which VN_Ports are created.

Quick hits:

1.2.1 NetApp storage system configuration details.

WWPN of NetApp FC Target Ports begin with 5 (target HBAs generally begin with 1 forEmulex, 2 for QLogic, and 5 for NetApp - e.g. 50:0a:09:81:83:e1:52:d9 ).

FC fabric topologies that NetApp supports: 1) A single FC switch. 2) Dual FC switches with no ISLs (Inter-Switch Links.) 3) Four FC switches with multiple ISLs between eachpair of switches. 4) Four FC switches with multiple ISLs between ALL switches.

iscsi security show : DOT CLI command to display current CHAP settings

Supported iSCSI configurations: Direct-attached, Network-attached (Single-network, Multi-network, VLANs)

iscsi session show -v : DOT CLI command to see if iSCSI digests are enabled.

Two benefits of soft zoning (device WWPN zoning) over hard zoning (domain ID plus port) for Cisco and Brocade FC switches: 1) A device can be connected to any port in the fabric without changing zoning. 2) It is fully interoperable between switch vendors.

Network types: page 39 http://www.filibeto.org/sun/lib/nonsun/brocade/53-0000231-05.pdf
cascade, mesh, core-edge.
also a great resource on trunking, etc

single image cfmode:
http://hd.kvsconsulting.us/netappdoc/733docs/html/ontap/bsag/GUID-31DC026F-2B78-425A-BA55-487782F9909A.html

Disable ALUA on igroups connecting using the NetApp DSM.

1 http://my.opera.com/siyeclover/blog/show.dml/159596
2. http://searchvirtualstorage.techtarget.com/definition/Fibre-Channel-port-names
3. https://www.ibm.com/developerworks/mydeveloperworks/blogs/anthonyv/entry/don_t_say_green_say_aqua1?lang=en

Friday, June 29, 2012

iSCSI

Here's a few concepts I've been studying. A TPG (Target Portal Group) is basically a method of allowing the server to communicate to your storage system via iSCSI on multiple interfaces and multiple connections on a single session. This means you can enable MPIO by having multiple TPGs. Here's the dummy breakdown:
- Each interface (virtual or real) can only be part of one TPG
- Each TPG can have multiple interfaces
- Each session can have multiple connections
- Each connection can only be in one session.
- Each session can only communicate through one TPG

A great use of this is when a server has multiple virtual OS's and therefore needs multiple connections to the same storage system. If you have multiple paths, you need ALUA.

When understanding ALUA (Asymmetric Logical Unit Access), it's helpful to know it's also called Target Port Group Support. Basically it's a protocol for determining the best path from the server to the LUN (hence the LU in ALUA). This protocol is standardized to work with any vendor's iSCSI hardware.

ISNS is basically DNS for iSCSI, but a little smarter in that it also understands TPG's and helps systems find each other that way.

More data: http://hd.kvsconsulting.us/netappdoc/801docs/html/ontap/bsag/GUID-FF148B6E-6CCB-48CC-9547-7D063A904B40.html

Tuesday, June 26, 2012

NetApp SPC Results

NetApp is buzzing over the last SPC results. The reason it's important is that we proved we're are able to do a lot with a little. The test was a first in a few ways:

1. It's our first performance test of the new cluster-mode software, which is going to be huge.

2. The test was FC, where NetApp has historically been called the NAS experts by our competitors.

3. If you look at the hardware both in cost and amount, we did the same or better latency-wise up to 250,000 IOPS with a smaller number of controllers and disks.

I know list prices don't mean anything, but as a rough guesstimate, look at the 3PAR solution vs NetApp comparison in the recoverymonkey blog (below). Here's the breakdown:

3PAR

List: $5,885,148

IOPS: 450,212.66

Latency: 13.67

NetApp

List: $1,672,602

IOPS: 250,039.97
Latency: 3.35ms

You could buy 2 of the NetApp solutions for half the price, 50,000 more IOPS, and 1/4 the latency of the 3PAR system. Wow.

http://www.theregister.co.uk/2012/06/26/netapp_cluster_mode_spc_benchmark/

http://recoverymonkey.org/2012/06/20/netapp-posts-great-cluster-mode-spc-1-result

Disclaimer: I'm affiliated with NetApp and I'm appropriately biased, but not paid for any of the content on this site.

Tuesday, June 5, 2012

NetApp Experience: Shelf ADD => Disk Fail => Failover

During a shelf add last week, I experienced as big of a system outage as I've ever encountered on NetApp equipment. We started seeing a few of these errors, which are normally spurious:

ses.exceptionShelfLog:info]: Retrieving Exception SES Shelf Log information on channel 0h ESH module A disk shelf ID 4.
ses.exceptionShelfLog:info]: Retrieving Exception SES Shelf Log information on channel 6b ESH module B disk shelf ID 5.

The first connection went smoothly, but when I unplugged the second connection from the existing loop, I started seeing some scary results. Here's the order of important messages:

NOTE: Currently 14 disks are unowned. Use 'disk show -n' for additional information.

fci.link.break:error]: Link break detected on Fibre Channel adapter 0h.

disk.senseError:error]: Disk 7b.32: op 0x2a:1bc91268:0100 sector 0 SCSI:aborted command - (b 47 1 4e)

raid.disk.maint.start:notice]: Disk /aggr3_thin/plex0/rg0/7b.32 Shelf 2 Bay 0 will be tested.

diskown.errorReadingOwnership:warning]: error 46 (disk condition triggered maintenance testing) while reading ownership on disk 7b.32

disk.failmsg:error]: Disk 7b.32 (JXWGA8UM): sense information: SCSI:aborted command(0x0b), ASC(0x47), ASCQ(0x01), FRU(0x00).

raid.rg.recons.missing:notice]: RAID group /aggr3_thin/plex0/rg0 is missing 1 disk(s).

Spare disk 0b.32 will be used to reconstruct one missing disk in RAID group /aggr3_thin/plex0/rg0.

raid.rg.recons.start:notice]: /aggr3_thin/plex0/rg0: starting reconstruction, using disk 0b.32

[disk.senseError:error]: Disk 7b.41: op 0x2a:190ca400:0100 sector 0 SCSI:aborted command - (b 47 1 4e)

diskown.errorReadingOwnership:warning]: error 46 (disk condition triggered maintenance testing) while reading ownership on disk 7b.41

[raid.disk.maint.start:notice]: Disk /aggr3_thin/plex0/rg1/7b.41 Shelf 2 Bay 9 will be tested
[disk.senseError:error]: Disk 7b.37: op 0x2a:190ca500:0100 sector 0 SCSI:aborted command - (b 47 1 4e)

raid.config.filesystem.disk.failed:error]: File system Disk /aggr3_thin/plex0/rg1/7b.37 Shelf 2 Bay 5 failed.

[disk.senseError:error]: Disk 7b.40: op 0x2a:190ca500:0100 sector 0 SCSI:aborted command - (b 47 1 4e)

raid.config.filesystem.disk.failed:error]: File system Disk /aggr3_thin/plex0/rg1/7b.40 Shelf 2 Bay 8 failed.

raid.vol.failed:CRITICAL]: Aggregate aggr3_thin: Failed due to multi-disk error

disk.failmsg:error]: Disk 7b.37 (JXWG6MLM): sense information: SCSI:aborted command(0x0b), ASC(0x47), ASCQ(0x01), FRU(0x00).

disk.failmsg:error]: Disk 7b.40 (JXWEEB3M): sense information: SCSI:aborted command(0x0b), ASC(0x47), ASCQ(0x01), FRU(0x00).

raid.disk.unload.done:info]: Unload of Disk 7b.37 Shelf 2 Bay 5 has completed successfully

raid.disk.unload.done:info]: Unload of Disk 7b.40 Shelf 2 Bay 8 has completed successfully

Waiting to be taken over. REBOOT in 17 seconds.

cf.fsm.takeover.mdp:ALERT]: Cluster monitor: takeover attempted after multi-disk failure on partner

Long story short, this system had caused numerous issues in the past, and we replaced both a dead disk and an ESH module. After that, the system stabilized: "Since the ESH module replacement there were no new loop or link breaks noticed in subsequent ASUPs."

Wednesday, April 4, 2012

BJJ: Armbar from Guard

Pretty amazing, how you learn even things you thought you knew.

My armbar from guard has always been terrible: I've spent years faking that move just to get to something else. I could never seem to get my hips shifted quickly enough: putting my foot on their hip was too big of a giveaway.

But yesterday, everything changed. Watch this video by world-class black belt of black belts Pedro Sauer: ignore the foot on the hip. Focus on the other leg. With no foot on the hip, you can swing your other foot across the person's back in one smooth movement, shifting your hips and catching them by surprise. So slick.

http://www.youtube.com/watch?v=zrYiQPp0T-M

NetApp Experience: Mixed Ownership

When determining where to add a shelf to a production system, disk show -o is useful in determining which loops/stacks contain disks owned by the controller you're planning for. When the system is properly set up, this works just fine. When the system already has mixed ownership on a loop but single ownership on the other loops, you would obviously prefer to expand the single ownership loops.

But disk show -o will not indicate mixed ownership, so it's a bit of a trap. The additional step you can take would be to either a)check autosupports before for mixed ownership or b) use disk show -v to show all disks, and verify there are no disks on that loop owned by the other controller.

Thursday, March 29, 2012

NetApp Experience: Amber LED

Quick hit: If you have an amber LED on your filer that won't turn off, try this:
1. Disable CF.
2. Halt -f both heads.
3. On reboot, enable CF.

Cool trick!

Wednesday, January 18, 2012

SOPA

Thursday, January 5, 2012

NetApp Insights: NDU Ifgrp

I'm shamelessly plagiarizing some of my colleagues because this data is just too good to not share.

Q). Has anybody converted a standalone physical interface into a single-mode vif/ifgrp non-disruptively before?

It should just be a matter of tweaking the partner interfaces and rc files at the appropriate times and doing some takeover/givebacks, but if someone has actually done it for real rather than me working out what I “think” will work in theory then that would be nice.

I’ve got cutting and pasting or “source” a script file that downs all the vlan interfaces, etc then creates a singlemode ifgrp, then recreates all the vlans, aliases, etc as an option, but am also looking for other options that might sound a little bit more NDU to a cust that’s a bit hot at the moment rather than playing with live interfaces on an active controller.

A). As far as the VIF goes I would copy to original rc file, place the new rc in place and yes do takeover/givebacks so long as the networking is done correctly the filers should boot up cleanly using the new rc file.

Tuesday, January 3, 2012

Downloads

This could accurately be filed under the category "rant."

Hey software companies: how do you earn money? When people use your products, right? Yeah. That's when you earn money.

Here's a question: why would you ever make it more difficult for people to TRY your product? Never. That would be stupid.

So what's the #1 behavior you want to encourage? Customer interaction with your product. Well, how do they interact with it? Most of the time, they have to download it. So I would imagine that creating a labyrinth of hyperlinks that would lose and confuse users would be the last thing you'd want to do. But you freakin guys do it all the time.

Case in point: Spybot Search and Destroy. You've got users who probably already have a virus. They're already cranky, they're already frustrated, they already have no patience. They're dumb end users who just want their computers to work. So experiment with me: how many clicks does it take you to go from their home page to actually downloading a product? I count 6. And that's WHILE knowing where to go.

You know where your download link should be? On your home page. Front and center. People who visit the page should be asked (via text on your homepage) to download your tool BEFORE they know what they're downloading. What on earth is more important for a homepage than getting your product in your customer's hands?