Monday, January 31, 2011

Working with IBM Scaled x460's

IBM offers the ability to stack two or more x460's together to make the resources of all the servers (RAM, CPU, etc) available to the Primary server.  This is some pretty old hardware we're talking about here though, since then much better solutions have been developed.

But if you find yourself having to deal with one of these, here's some information I learned recently while troubleshooting this dinosaur of a machine:

1.       IBM says average cost to repair at this time is $2661. 
2.       These servers are not clustered (implies redundancy): they are scaled (implies increased performance capacity).
3.       When the two servers are properly working together, the state is called “merged.”  The primary OS will show the total RAM of both boxes, minus overhead. 
4.       When the two servers are not merged properly, the primary OS will show less than half the total RAM of the two boxes.  This is a good test for whether they are merged or not.
5.       When the secondary is merged properly, it will display a black screen to the effect of “This server is merged, please view primary server.”
6.       Each server has an independent BIOS.
7.       Both RSA’s should be available.  The RSA’s are also merged in some sense, I wasn’t able to look into this very much. 
8.       You obviously want to avoid powering down the Secondary before the Primary – this would be akin to yanking half the sticks of RAM out of a live machine.
9.       Removing the power supplies and putting them back in helps clear spurious errors and reset the machine.
10.    I recommend against ever touching, looking at, or thinking about this equipment.


To shut down these servers:
1.       Shutting down the Primary via the OS will automatically shut down the secondary.  Wait for them both to blink power lights.
2.       I have seen this hang before, at a blank grey screen.  If it does: manually power down the Primary, and then the Secondary. 

To start up these servers:
1.       There is a “latching” mechanism here:
a.       KVM the Secondary.
b.      Start up the Secondary first.
c.       When the Secondary says “Waiting for primary” at the blue IBM screen (1 minute or so), start up the Primary.  If you wait too long, the Secondary will give up and begin a “failure to boot” loop.
d.      The Primary will display a "initializing system memory, please wait" screen for several minutes.
e.      The Primary will then display “Initializing PCI devices.”
f.       The Primary will then display “Searching for Secondary server.”
                                                               i.      If it is not able to find the Secondary server, the Primary may automatically shut down.  If you start it again and it is still unable to find the Secondary, it will boot to the OS unmerged.
                                                             ii.      If you are attempting to get into the primary’s BIOS, press F1 shortly after the Primary displays that it was unable to merge.  It will acknowledge your input and boot to setup.
                                                            iii.      If it is able to find the Secondary, it will indicate its merging attempt was successful and then boot properly.

Nehalem Performance Optimization

Found a really well put together IBM document on Nehalem performance configuration*, highlights and link below:

RAM Configuration
1. Identical configuration for each memory channel (3 channels per Proc), and same speed RAM across the board. Within a memory channel you can mix sizes, but each memory channel must have an identical configuration. For best performance, each channel would have a single DIMM.


2. If a dual Proc machine only has RAM in one bank, there is a significant performance hit for the second Proc to access the “Remote RAM.”


3. The optimal configurations for dual Proc Nehalem servers are
a. 6GB (6x1GB)
b. 12GB (6x2GB)
c. 18GB (6x2GB, 6x1GB)
d. 24GB (6x4GB or 12x2GB)
e. 48GB (6x8GB or 12x4GB)
f. 72GB (6x4GB, 6x8GB)
g. 96GB (6x16GB or 12x8GB)


4. For single Proc Database servers, the best configs would be
a. 3GB ( 3x1GB sticks)
b. 6GB (3x2GB sticks)
c. 9GB (3x2GB, 3x1GB sticks)
d. 12GB (3x4GB sticks)
e. 24GB (6x4GB sticks)
f. 36GB (3x8GB, 3x4GB sticks)
g. 48GB (6x8GB sticks)


5. In general, avoid populating DIMM slots 1 (the first slot), 4, 9, or 12. Doing so unbalances the memory channels and decreases performance, so it is preferable to increase the DIMM size across the board rather than add more sticks.


6. Populate the furthest slots first, in this order: (3, 6, 8) and then (2, 5, 7)


7. Always use dual rank memory if available (e.g. 2Rx4, 2Rx8, etc).


BIOS Settings (more analysis to come):
Setting
Maximum Performance Setting per IBM
Memory Speed
Auto
Memory Channel Mode
Independent
Socket Interleaving
NUMA
Patrol Scrubbing
Disabled
Demand Scrubbing
Enabled
C-States
Enabled
Turbo Mode
Enabled
Thermal Mode
Performance
Hyper Threading
Dependent upon App




*This is for x3650 M2/x3550 M2&M3/dx360 M2. HS22’s have other requirements.

Optimizing Nehalem Performance (Dead link?)

Friday, January 28, 2011

IBM ASU

In my research into the uEFI settings on IBM's Nehalem offerings (info soon), I ran into a bit of a gem: IBM Advanced Settings Utility. It's a command line utility that can script setting BIOS/RSA settings via the RSA. Trust me, IBM's not paying me for this publicity to my legions of readers, but I'm a pretty big fan of this thing so far. I sifted through IBM's typically lacking documentation* and translated it into usable English for you.



  1. ASU can edit select settings on the uEFI/BIOS and IMM/RSA. Surprisingly robust options on a x3650m2, I'm not sure about RSA's or RSA2's yet.
  2. Some settings are no reboot required. I didn't have time to dig into this.
  3. Search IBM.com for ASU to download it, their links change too frequently to post an URL here. The architecture (64 vs 32 bit) of the exe refers to the server you are using this tool on as opposed to the target server being configured. These two are not always the same thing, as it works over the network too.
  1. Double clicking the downloaded exe extracts ASU.exe and supporting files.
  2. Two methods of configuring a server using ASU:
    1. Run on local server. It will connect to the IMM via OS integrated drivers: "USB in-band interface." Basically a virtual NIC in Windows.
      1. CMD: Asu.exe batch c:\admin\uEFI.log
    1. Run on a hop server. It will connect to the IMM over network at the IP you specify (!!!!).
      1. CMD: Asu.exe batch c:\admin\uEFI.log options --host
      2. e.g. Asu.exe batch c:\admin\uEFI.log options --host 10.10.10.10
      3. If you go over ethernet, you will need to supply credentials. Do so using the following syntax:
CMD: Asu.exe batch c:\admin\uEFI.log options --host --user --password
e.g. Asu.exe batch c:\admin\uEFI.log options --host 10.10.10.10 --user batman --password robin

  1. Developing your batch file
    1. Cmd: Asu.exe Show > c:\admin\uEFI.log
    2. Edit
      1. Remove top lines
      2. Syntax: set "setting value"
        1. e.g. set IMM.IMMInfo_Location "1234 Batcave Drive"
        2. Use this line to see your options:
asu.exe showvalues C:\admin\Values.log


You may need enable the "Allow commands on USB interface" setting if your script is connecting to the IMM, and then failing to properly execute.
"Note: The ASU works with a disabled USB in-band interface if an IPMI device driver is installed."

ASU Guide

Notes on the "Allow commands on USB interface" option.


*Most tech writers are far too focused on being complete and accurate to remember that documentation is supposed to be helpful. The efficiency of the reader seems to rarely be taken into consideration, so most of it ends up looking like a yahoo search: tons of data you don't need and one thing you do need, written in a way that will make perfect sense once you already know it. And that's why I use Google :-)