Sunday, February 18, 2018

R Syntax Explained

Aggregate: This is used to apply a function (like mean) across a data set that is subset according to your needs.  For example, if you have a table of car sales details and prices (called mydata) and you want to know average sale prices, you can't just average the entire price column: you need the average price for each type of car.  Let's say your table has these columns: make, model, and price.

Aggregate takes a few inputs.  The first item is the dataset you care about: in this case, the table mydata, but specifically the price column.  Second item is a list of what subsets you'd like to create.  For example, we want to subset every row that matches "Ford" and "F150" and average their price.  So our second item is what categories we want to break the data out into: in this case, we want to see every unique combination of make and model.  The last item is the function we want to apply to the subset: average, median, etc.

result <- aggregate="" by="list(mydata$MAKE," mean="" mydata="" p="">

Filter and Select:  One of my favorite combinations. 
Filter takes two inputs: your dataset, and how you'd like to subset it.  So first input is our table mydata, easy enough.  Second input is a test: we give it the column Make, and test the values to see if they equal (==) Ford.  If the row's Make column contains Ford, filter will keep that row.  Otherwise, it's tossed. 

Select then is being given the result from filter.  Filter has snagged every row from our original table (mydata) and includes every column.  In other words, mydata started with columns make, model, and price, and filter also has all those columns. 

Select takes two inputs: one is your complete data set, the other is the column(s) you want to keep.  In this case, we want to keep the price column.  So this command eliminates every row that isn't a Ford sale and gives you a 1-column table of the prices of those Fords. 

In other words, the command below answers the question "give me just the prices from every Ford sale in the table."

select(filter(mydata, Make=='Ford'), Price)

Native dataframe manipulation: Sometimes you don't need to use commands like filter or aggregate to get the subset of data you want.  Let's say you have a 1-column table (let's call it car_returns) of the prices of all the cars that were brought back from a customer and had to be refunded.  How would you identify the make and model of the cars that were returned, just from the price?   So let's say the question is "give me all the rows (including make, model, and price) from my original table (mydata) that match these prices."

In general, you can subset a dataframe with a [] after the name: mydata[].  Inside the bracket, we'll need to pass two pieces of information: first, what column in mydata will correspond to the values in car_returns?  Obviously, price.  Now, we aren't comparing mydata's price column to a single price: we need to compare it to all prices that are in the car_returns table.  So we will use %in% to say "we want all the rows from mydata where Price equals one of the values from car_returns." 

mydata[mydata$PRICE %in% car_returns, ]

The other thing you'll notice is the comma after car_returns.  What's going on there is that we are comparing all the values of the column Price.  If we wanted to compare and subset based a row, we would put that after the comma.  For example, if we just wanted the Make of the car, we could do this:

mydata[mydata$PRICE %in% car_returns, mydata$MAKE]

Tuesday, January 23, 2018

Dahua IP Cam Setup

  1. Power: this thing doesn't come with a power source.  Seriously
    1. PoE or 12V input.  Do not do both!
    2. If your camera is outside, do PoE.  Look for something like this to boost your power: Single Gigabit Port PoE+ Injector – 30W – 802.3at 
    3. If you're going to do 12V
      1. BV-Tech DC12V 1A UL-Listed Switching Power Supply Adapter for CCTV - 5 Pack - Black
  2. Connecting to your camera
    1. Connect the camera via ethernet to your router/switch
    2. IE works OK but has issues.  Use chrome, it will prompt you to download and use a specific app.
    3. There appears to be no default IP - it's DHCP.  So log into your router and see what IP's are connected to it.  Mine always landed at or .15.
    4. Open a browser to http://
    5. Default password is admin/admin

  1. Basic Setup
    1. Change the password:
      1. System | Account | click the pencil under "modify" | check "modify password"
    2. Connect to wifi:
      1. Network | wifi | check "enable" | double click the correct network and put in password
    3. Set the wifi IP address:
      1. Network | TCP/IP | At "ethernet card" click the dropdown and select "Wireless" | click the "static" button" | enter an IP address that you would like this camera to live on permanently | Click Save
      2. Set a new name for your device - something like "garage" or "front door" | click Save
      3. open the browser to your new IP address
      4. Unplug the ethernet cable
    4. Upgrade to latest code (very important if you don't want to get hacked)
      1. Find your device's latest firmware at
      2. Unzip the package
      3. My firmware file was called DH_IPC-ACK-Themis_EngSpnFrn_N_V2.400.0000.15.R.20170804.bin
      4. System | Upgrade | Browse | Select the firmware file
      5. Click upgrade
    5. Set System Time
      1. System | General | Date&Time | Set your GMT time | Save
      2. Set DST here if you want.
  2. Advanced Setup
    1. Enable HTTPS (this is really important, it encrypts your connection to your camera)
      1. Network | Create | fill in all the boxes | change duration to 5000 days | click save
      2. Click install | Click download | Click save
      3. Check the box for "enable HTTPS" | Click save
    2. I set frame rate from 30 to 10 and enabled smart codec.

Monday, December 18, 2017

Bandcamp, Music and Android

Here's a quick walkthrough on how to download music from into your android phone.  First, you'll need to download an unzip app - Easy Unrar is free. Find it in the app store (Google Play) and download it.

I'll throw red dots on these screenshots to make it easy for you to follow along.

Next, get your download code and click on the link - for example,  Input your code and click next.

Now click "here's how"

Click the second "Here's how", then select your file type (I used MP3 220, but I'm not a music expert so YMMV).   Then click download.

Great!  Your music is downloaded.  Now we need to unzip it.  Open Easy Unrar and click the "Download" folder.

If you've had your phone awhile, it might be tough to wade through all the downloads and find your album.  So click on the sort button on the upper right.

Choose sort by file size, large to small.

Since your album is probably more than 100MB, it should be near the top.  Here you can see Hope Hymns, the album i want.  Check the box and click Extract.

Check this box and click extract as well.

 OK - good news and bad news.  Bad news is you're not done, good news is you're almost done.  Now we need Google Music to rescan and discover your unzipped album files.  Open up the Settings app and click on Applications.

 Now find and click Google Play Music

Almost done!  Click Storage

!Important!  DO NOT click "Clear Data."  DO click "Clear Cache"

Now reboot your phone, open up Google Play Music, and you're done!  Enjoy.

Thursday, September 21, 2017


This is just a running list of useful things about Pandas as I learn.

If you have data coming in from a .csv, use this: df=pd.read_csv('file.csv')

If your dataframe has strings and you want them to be numbers, use this: df.column = pd.to_numeric(df.column, errors='coerce')

If you have date/time in linux epoch format, you can convert using this: df['date'] = pd.to_datetime(df['date'],unit='s')

If you want to index on select columns: df.ix[:,:2]

df.describe() gives you min, max, avg, mean, percentiles, and std


df[column.other == ]

Grab specific columns by name: df1 = df[['a','b']]

data.iloc[:, 0:2] # first two columns of data frame with all rows

this will square each cell, skipping the first column.  df.iloc[:,1:7]=df.iloc[:,1:7].apply(numpy.square, axis=0)

great link on managing jupyter:

Great guide to pandas:

Some images you can ignore:

Thursday, March 30, 2017

Rancher Setup Basics

A client asked recently how SolidFire can integrate with Rancher.  I had a few RHEL servers available, so I'm going to set up Rancher on RHEL.  Here are the first steps:

Install a supported version of Docker (align compatibility for Docker, K8s, and Rancher): curl | sh

sudo service docker start

sudo docker run -d --restart=unless-stopped -p 8080:8080 rancher/server

Alright, let's pause here.  What did we just do?  First, we installed Docker.  Docker is the software that enables you to easily download, create, run, and manage containers.  Next we made sure the docker service was running.  Last we downloaded a container that will run the Rancher software.  At this point, you should be able to reach rancher's gui at :8080.

So let's get K8s and Trident up!  First we need a place to deploy K8s.  Click Infrastructure | Hosts.

Click add host.

And then save

Then enter the IP address of the server that will function as a host for containers.  Follow the instructions to copy-paste the command into a console on your new host server.


Why Rancher: 

Tuesday, March 28, 2017

Understanding SolidFire Capacity

To calculate the effective capacity available, follow this formula:

  • Error Threshold (#3) minus Used Capacity (#1) = Available physical space
    • In the example below, 86.24TB - 48.41TB = 37.83TB
  • Multiply Redupe Ratio * Compression Ratio, then divide by 2 (for Double Helix)
    • In the example below, (1.81 * 1.79)/2 = 1.62:1
  • Multiply the results of (a) and (b)
    • 37.83TB * 1.62 = 61.29TB

*This calculation assumes the current rate of dedupe and compression will continue
*This calculates capacity until the Error Threshold is reached, not the Total Capacity.

From ActiveIQ, our cloud monitoring tool:

  1. Used Capacity.  This is the capacity physically taken up on disk by data.  After dedupe, compression, and double helix occur, this number is the end result.
  2. Warning Threshold.  This is an adjustable alert threshold to alert that you’re approaching the Error Threshold.
  3. Error Threshold.  This is the point after which the system cannot rebuild the second copy of data after a node loss.  This is calculated by subtracting one node’s physical block capacity from the Total Capacity.
  4. Total Capacity.  This is the raw physical space on disk.  In this example, 1.92TB * 9 SSD’s * 5 nodes = 86.42TB.

From the SolidFire GUI:
      5. Block Remaining.  This is calculated subtracting the Used Capacity (#1) from Total Capacity (#4).
      6. Block Capacity until Warning.  This is calculated subtracting the Used Capacity (#1) from Warning Threshold (#2).


       7. In ActiveIQ, under Reporting | Cluster Efficiency, hover over the graph to view the dedupe and compression ratios.  In this lab system, 1.81 * 1.79 = 3.24:1

Setting up PowerShell for SolidFire

A simple Guide to setting up PowerShell for SolidFire.
  1. Enable PowerShell
  2. Download SolidFire PowerShell toolkit 
  3. Unzip the toolkit
  4. Navigate to PowerShell-master\PowerShell-master\Install\ and run SolidFire_PowerShell_1_3_1_4-install.msi 
  5. Done!