Sunday, July 31, 2011

Why more women should go into engineering

Every time I have a conversation with a girl about engineering as a career, I hear the exact same sentiment: "but I really don't like math/I'm no good at math." There are plenty of people smarter than me puzzling over the lack of women in the sciences (engineering being applied science), people who have found other big reasons.  Let me humbly submit my voice to the crowd: a lot more women would be a lot more successful in this industry than they think, and let me tell you why.

It's the industry's best kept secret: The vast majority of engineering is 40% organization and 60% communication. 0% calculus. 0% physics. Seriously.  And this isn't my idea: I realized this reading much more experienced and respected engineers.

Once you graduate college, only a small subset of careers will deal with differentials, Bohr's law, or the properties of electromagnetic radiation.  Just stick it through school and you're golden.

Let me give you an example. I spoke with a woman yesterday who volunteers at a food pantry, and as part of her job she manages what food is handed out. In the background, you have to figure out what food will go bad when, the relative nutritional value of the donated food, and how much space is in the freezer/fridge/cupboards.

That managing of scarce resources is virtually identical to SAN Storage Tiering, and requires just as little knowledge of math and physics.  What both require is organization to keep track of everything and communication to get people on board.  Earn trust.  Share your ideas, respect others' ideas, pick the best plan, be willing to compromise, follow through and execute.  If you can do that, no one cares that you can't do an integral.

And here's why you should go into engineering:

  • We get the best working conditions.  The most perks, flexible schedule, good money, no screaming customers (usually), no nasty hospital or sales deadlines.  Less pressure from your boss.  Lots of easy work training, expense accounts, work travel, etc.  
  • Engineering/IT culture is more relaxed.  We're just easier to get along with.
  • Our managers usually are superior to other industries.  Since this is such an intangible profession, it's impossible to reduce to an equation.  Typically, my bosses haven't cared at all where I am or what I'm working on or how I'm doing it.  Actually, my boss usually didn't understand what I was doing in the first place: it took me several weeks of research, and no leader can keep up with 10 people's weekly increase in knowledge.  Just get things done and don't break anything.
  • Engineers are in desperate demand.  Especially if you know a foreign language.
  • You can build your skills at no expense, no permission needed, no equipment needed.  You don't need a license like a nurse, you don't need a degree in your new skill.  Everything you want to learn is online for free, and if you understand it, it will be readily apparent to experts in that field and they'll find a job for you.   

In conclusion: we need more women contributing to the economy in engineering!  I hope that for women whose stumbling block is math/physics intimidation, I've knocked that myth out.  Don't quit because you hate math.  Make it through college, and then come enjoy the reward.  :-)

Thursday, July 28, 2011

NetApp Training Brain Dump: ONTAP 8.0 Simulator Troubleshooting

In previous posts here and here, I went through how to set up your ONTAP 8.0 simulator.  Unfortunately, on one of my laptops, the vm kept rebooting automatically, kicking out cryptic messages right before it shut down again.

I tried reinstalling the VM with a complete redownload of the ONTAP simulator, to no avail.  Here's how I ended up getting this actually fixed:
  1. Use this flowchart to control-c into the Special Boot Menu
  2. Choose option 4 and use 'y' to confirm this choice twice.
  3. When the machine finishes the complete wipe, use the above flowchart again to enter the 'Loader' menu.  
  4.  From there, use boot_backup to boot to the backup kernal.
  5. When the system boots up, you should be able to run setup successfully.  
Finally, NetApp has awesomely provided trial licenses for your simulator, covering just about anything you'll want to check out.  NetApp login required!

Wednesday, July 27, 2011

NetApp Experience: Hardware

Got a few more hardware knowledge hits for you.  For one, there is a best practice around connecting shelves on a SAS PCI card, and it has to do with the internal architecture of the card.  The general idea is that there's two single points of failure inside each card called ASIC's.  Basically, A/B (or 1/2) are paired to one ASIC, and C/D (or 3/4) to the other ASIC.


For this reason, when you are connecting a stack to a single SAS PCI card (which you should try to avoid in the first place, but is occasionally unavoidable) you should use A/C as the start of the paths, and B/D should be the return paths.


Onboard ASIC's are paired between 0a-0b, 0c-0d, 0e-0f, etc.

Second, I ran into a customer that had a TON of problems with a system.  It showed up in all sorts of weird ways, leading the admins to update ONTAP and all the firmware.  They finally traced the issue to a single disk, which they replaced.  But the replacement disk failed, and so did the next replacement disk, which they pulled and left the problem slot empty.  Long story short, we swapped out the entire shelf chassis, pulling out disks, ESH modules, and power supplies and placing them in the new chassis.  We made the call to put an entirely new disk into the new chassis after all this.

Although this was a success in resolving the customer's issue, one interesting note was that the ESH modules did not retain the shelf ID.  It turns out that while the shelf ID is retained in the ESH module's volatile memory,  it is actually stored permanently in the internal circuitry of the shelf chassis, and read by the ESH module upon boot up.  Whoa!

Monday, July 25, 2011

NetApp Experience: Networking

For reasons unknown, a disproportionate amount of the trouble tech people run into is network.  I encountered some of these issues recently that forced me to take a close look at the specifics of ONTAP networking.  Here are some important details:
  • Make sure the switch ports are hard-coded for LACP if you are creating LACP interface groups (vif's) in ONTAP.
  • These are the options for status:
    • Up: the link is sending and receiving data.
    • Down: the link is down but believed to be operational.
    • Broken: the link is inactive and believed to be non-operational.
Here's a problem I ran into, see if you can spot the issue.  Upon boot, ONTAP presents these messages:
vif: Cannot create a multi-level 802.3ad compliant vif: vif1
vif: Cannot create a multi-level 802.3ad compliant vif: vif2
vif: vif1 cannot create multi level 802.3ad vif
vif: Failure adding vif1. Continuing with other interfaces
vif: vif2 cannot create multi level 802.3ad vif
vif: Failure adding vif2. Continuing with other interfaces

Here's some more hints:
rdfile /etc/rc
#Auto-generated by setup Thu Jul 14 19:22:30 GMT 2011
hostname NAME
vif create lacp vif1 -b ip e0a e0b
vif create lacp vif2 -b ip e0c e0d
vif create lacp supervif1 -b ip vif1 vif2
ifconfig supervif1 `hostname`-supervif1 mediatype auto netmask 255.255.255.0
route add default 10.18.33.1 1
routed on
options dns.domainname acme.corp.com
options dns.enable on
options nis.enable off
savecore


ifconfig -a
e0a: flags=0xa508866 mtu 1500
        ether 00:00:00:00:00:00 (auto-100tx-fd-cfg_down) flowcontrol full
        trunked vif1
e0b: flags=0xa508866 mtu 1500
        ether 00:00:00:00:00:00 (auto-unknown-cfg_down) flowcontrol full
        trunked vif1
e0c: flags=0xa508866 mtu 1500
        ether 00:00:00:00:00:00 (auto-unknown-cfg_down) flowcontrol full
        trunked vif2
e0d: flags=0xa508866 mtu 1500
        ether 00:00:00:00:00:00 (auto-unknown-cfg_down) flowcontrol full
        trunked vif2
lo: flags=0x1948049 mtu 9188
        inet 127.0.0.1 netmask-or-prefix 0xff000000 broadcast 127.0.0.1
vif1: flags=0x22408862 mtu 1500
        ether 00:00:00:00:00:00 (Disabled virtual interface)
vif2: flags=0x22408862 mtu 1500
        ether 00:00:00:00:00:00 (Disabled virtual interface)
supervif1: flags=0x2354b863 mtu 1500
        inet 10.18.33.33 netmask-or-prefix 0xffffff00 broadcast 10.18.33.255
        ether 02:a0:98:2c:01:9c (Disabled virtual interface)

Solution: You can't create multi-level multi-mode or LACP vifs.  Essentially, if you have two vifs that are load balanced, you can't create a load balanced vif out of those two vifs.  This is documented as an unsupported configuration by netapp.
https://kb.netapp.com/support/index?page=content&id=3011251 (requires NetApp login)

So what to do?  Simple!  Just create one big multi or LACP vif out of the ports you wanted to use in the first place.

Wednesday, July 6, 2011

NetApp Experience: Think on your feet (Part 3)

One of the little known intricacies of FAS systems is the 3-5 second rule.  It's kind of like the rule where your food hits the floor, except instead of germs on your food, you get a panic'd filer. 

The 3-5 second rule is not science.  It's not best practice.  It's not in a white paper.  It's just experience.  What the rule says is this: you can unplug a shelf cable and plug it back in before 3-5 seconds pass.  Now, as a computer engineer, the 40% ambiguity is frightening to me.  But the principle is sound: if a shelf loses contact with the controller or the rest of the shelves, it will tolerate the momentary issue.  

I saw this in action recently.  A shelf to shelf cable to a new DS14 shelf had been plugged in from A module to B module to the existing loop (it should be A module to A module).  The system was live, with disk autoassign turned off.  We quickly unplugged the shelf to shelf cable from the B module and plugged it into the A module.  What we observed was a momentary error stating that the ESH module was blocking traffic, which quickly reconciled.  This did cause an amber light to turn on for half an hour, which apparently lags the actual state of the machine significantly.  

Nuance.