IT engineering and a little bit of hacking: March 2017

Thursday, March 30, 2017

Rancher Setup Basics

A client asked recently how SolidFire can integrate with Rancher. I had a few RHEL servers available, so I'm going to set up Rancher on RHEL. Here are the first steps:

Install a supported version of Docker (align compatibility for Docker, K8s, and Rancher): curl https://releases.rancher.com/install-docker/1.12.sh | sh

sudo service docker start

sudo docker run -d --restart=unless-stopped -p 8080:8080 rancher/server

Alright, let's pause here. What did we just do? First, we installed Docker. Docker is the software that enables you to easily download, create, run, and manage containers. Next we made sure the docker service was running. Last we downloaded a container that will run the Rancher software. At this point, you should be able to reach rancher's gui at :8080.

So let's get K8s and Trident up! First we need a place to deploy K8s. Click Infrastructure | Hosts.

Click add host.

And then save

Then enter the IP address of the server that will function as a host for containers. Follow the instructions to copy-paste the command into a console on your new host server.

Done!

https://docs.rancher.com/rancher/v1.5/en/installing-rancher/installing-server/#single-container

Why Rancher: http://rancher.com/beyond-kubernetes/

Tuesday, March 28, 2017

Understanding SolidFire Capacity

To calculate the effective capacity available, follow this formula:

Error Threshold (#3) minus Used Capacity (#1) = Available physical space

In the example below, 86.24TB - 48.41TB = 37.83TB

Multiply Redupe Ratio * Compression Ratio, then divide by 2 (for Double Helix)

In the example below, (1.81 * 1.79)/2 = 1.62:1

Multiply the results of (a) and (b)

37.83TB * 1.62 = 61.29TB

*This calculation assumes the current rate of dedupe and compression will continue
*This calculates capacity until the Error Threshold is reached, not the Total Capacity.

From ActiveIQ, our cloud monitoring tool:

Used Capacity. This is the capacity physically taken up on disk by data. After dedupe, compression, and double helix occur, this number is the end result.
Warning Threshold. This is an adjustable alert threshold to alert that you’re approaching the Error Threshold.
Error Threshold. This is the point after which the system cannot rebuild the second copy of data after a node loss. This is calculated by subtracting one node’s physical block capacity from the Total Capacity.
Total Capacity. This is the raw physical space on disk. In this example, 1.92TB * 9 SSD’s * 5 nodes = 86.42TB.

From the SolidFire GUI:

5. Block Remaining. This is calculated subtracting the Used Capacity (#1) from Total Capacity (#4).

6. Block Capacity until Warning. This is calculated subtracting the Used Capacity (#1) from Warning Threshold (#2).

***Bonus***

7. In ActiveIQ, under Reporting | Cluster Efficiency, hover over the graph to view the dedupe and compression ratios. In this lab system, 1.81 * 1.79 = 3.24:1

Setting up PowerShell for SolidFire

A simple Guide to setting up PowerShell for SolidFire.

Enable PowerShell
Download SolidFire PowerShell toolkit
Unzip the toolkit
Navigate to PowerShell-master\PowerShell-master\Install\ and run SolidFire_PowerShell_1_3_1_4-install.msi
Done!

https://www.youtube.com/watch?v=nvnpzP4-T5I

Tuesday, March 7, 2017

Trident in Action

Some screenshots of NetApp's dynamic storage provisioner for K8s! In this case, we're using OpenShift on SolidFire.

Saturday, March 4, 2017

OpenShift, Docker, and Elasticsearch

***This is part of an ongoing series I call "Mode 1 Storage Guy goes to a Mode 2 World." I'm not an expert (yet), YMMV.***

There are already many good setup docs for Elasticsearch on OpenShift, like here and here. So what I'm going to do is flush out the concepts so those instructions make more sense.

OpenShift

OpenShift has a concept of a project. This is how they provide multitenancy.

Next is namespace, which is similar to a project.

Next is the image. An image is a pre-packed container probably with an application installed, like MongoDB or JBOSS. You can create new images by installing an application into a container and saving that container. The image concept is analogous to a VM Template: you keep it updated and deploy fresh containers from it.

A stream seems to be a set of evolving images: for example, when you download the latest CentOS, you're not asking for a specific version, just the latest version. The stream is the set of successive images that you get the latest from.

Application. This is the normal definition of an application, however in the context of containers, it's good to think about the application as separate from a container

The catalog is a view, a view of the images held in the registry.

The registry is merely what OSE calls the collection of images, held in a folder.

A Pod is the OSE equivalent of a container

A deployment is the mechanism that manages where, how many, and replication on pods for a given application.

Persistent Volume Claim (PVC): In OpenShift, this is a request for storage. It can sit unfulfilled.

Persistent Volume (PV): When you create a PVC, and then match it to a real LUN/Export, you have a PV.

Don't panic if you see the word "cartridge." That's just British for container.

Some notes:
openshift's gui is very foreign. Take some time getting used to it.
Unless you're an old linux admin, you're going to need to take the linux learning curve seriously. You'll really need to brush up on vi, ls, cat, curl, and wget.
The & symbol is your friend. For any command that appears hung (but isn't hung, it's actually running) like "docker run ", throw an '&' at the end and I betcha it'll give you your prompt back.
I had no luck connecting to the OpenShift GUI via chrome, but firefox worked fine (remember, https and 8443)
When deploying images from docker into OpenShift, don't underestimate the Pull Secrets. OpenShift has to be allowed to pull the images from Docker. This mechanism is intended to prevent unauthorized access of Docker images.

Declarative State Storage

***This is part of an ongoing series I call "Mode 1 Storage Guy goes to a Mode 2 World." I'm not an expert (yet), YMMV.***

Anyone working on containers will have heard the term "declarative state" with regards to the number and type of containers. Basically, it just means you have a platform that ensures you always have x number of containers. In other words whenever a container dies/fails for any reason, the platform will recreate one in its place. You're declaring how you want it to be from now on.

Say Kubernetes you tell it "I want 100 apache webserver containers" on a cluster of 10 hardware servers. It will spread 10 containers onto each server. If you lose 2 servers, Kubernetes will re-create those 20 lost containers and spread them across the 8 remaining.*

This is very different from a Vmware-type mindset. In Vmware you can say "deploy 100 vm's off this template," but Vmware doesn't watch the vm's, count them, and re-start them if they come down. This is "imperative code" meaning "do what I tell you now," following a single order.

The existence of declarative state means storage has to change. Previously, the storage layer just creates 10 LUNS, present them to the right servers, and you're done. If the server dies, storage doesn't automatically present those LUNs to a new server, or delete them. It sits, static, until someone comes and manually fixes it.

Luckily, in Kubernetes 1.5 there is a new concept of StatefulSets. StatefulSets are all about enabling stability (i.e. persistence), and when combined with NetApp's powerful Trident connector it means you can create Declarative State Storage. Which is exciting, because as far as I know I coined the phrase :-)

Declarative state storage means when a container is deleted, its corresponding volume is deleted. When a container dies and is recreated, its corresponding volume is connected automatically. When an application is scaled from 10 to 100 containers, the volumes are provisioned automatically.

More to come!

*In reality, Kubernetes creates a synchronous copy of each container. So it doesn't actually re-create the container: it switches the secondary copy of the container to be primary (lowering the impact of interruption) and then re-creates a secondary copy elsewhere. Cool stuff.