Saturday, March 2, 2019

Tensorbook Premium Review

For a lightweight work laptop, I've been using a Surfacebook 1 for a long time.  It has a great keyboard and display, plus it's light.  But I've had some serious problems with it, for one the performance is so bad that sometimes outlook just crashes.  The other problem is storage performance: the surfacebook has an awful SSD, just terrible latencies and throughput.

The bigger issue has been that the tablet-to-keyboard connection frequently disconnects, leaving you unable to control your laptop until it re-connects. Sometimes that would happen a lot.  I had this issue with the first two Surfacebooks MS sent - the third has not had it, to my relief. 

Now that I'm really getting into ML for my graduate degree, I've found the Surfacebook just fails out of some of the notebooks I'm running, and others take 20+ minutes.  So I decided it was time for an upgrade!

I settled on the Tensorbook Premium, since it had the best specs I could find anywhere at the $2800 price range, and I wanted to gain more Linux experience. It matched the hardware and price of the MSI system and comes with a pre-installed Ubuntu system, with all the drivers, CUDA, etc validated and worked out.  I had spent hours trying to get my Surfacebook to work correctly with CUDA and Tensorflow, to no avail.  Here are my gripes so far:
  • hard drive doesn't come encrypted?  And if you want to encrypt it, you'd have to wipe the image in order to do so.
  • no jupyter, anaconda, python installed
  • battery lasts 2 hours at best
  • It has a numberpad, so you spend 90% of your time on the left hand side of the screen, where the actual keyboard is.  Why is this so wide?
  • Capslock has a delay, so typing is a giant pain.  Typing a case-sensitive password is a nightmare (and no, I will never learn to use the shift key!  Old habits die hard).
  • The capslock key doesn't have a light to indicate on or off.
  • battery drivers have no idea how much time is left, and the % does not match the time
  • gets hot and loud
  To test performance I used a Jupyter notebook from my grad school class that grabs 2000 pictures of cats and dogs, converts them to greyscale arrays, and then trains a DNN and a CNN.  The Tensorbook did it in 62 seconds and hit 1.6GB/s write to disk.  WOW!

Here are the Tensorbook Premium results:
Processing image files to 512x512 color or grayscale arrays 
  • Image processing run time: 44.2s 
  • Image processing CPU/GPU/RAM bottleneck time: 35.7s 
  • Image processing Disk IO bottleneck time: 8.5s
Overall notebook run time: 62.2s

And here are the Surfacebook results:
Processing image files to 512x512 color or grayscale arrays 
  • Image processing run time: 401.3s 
  • Image processing CPU/GPU/RAM bottleneck time: 129.5s 
  • Image processing Disk IO bottleneck time: 271.8.5s

Overall notebook run time: 1,078s (18 minutes)

Surfacebook hit 200MB/s read at one point. And 60MB/s write while it's doing on all those scaled files. Tensorbook hit 1,600MB/s write during the saves and only tapped out the GPU during the NN training.

Implications for data storage
1) 1.6GB/s from a local NVMe SSD is amazing
2) Lots of metadata ops (read file names, edit file names, list directory)...these types of ops might run into scale issues on servers linux file systems.
3) For the image processing, there is a lot of read and write to disk.  It represented about 1/4 of the runtime for my Tensorbook and 3/4 of the time on my Surfacebook.  Throughput matters!

Monday, January 7, 2019

Pandas Part 2 (Ongoing)

If you want to print the name of a column, just do df.columns[]

To print a column, try df.columns[<"name of column">]

To print the number of rows/columns, len(df.rows) or len(df.columns)

to identify the class/type of object, type()

Simple iteration: for i in range (, )

df_python=survey_df[['column name1', 'column name 2']]

to print out unique values in a column: int df[col].unique()

If you want to manually create a dataframe, do this:

df= pd.DataFrame(columns=['col1', 'col2'])
df['col1']=['data','data2','data3'] or

If you want to create a new column that transforms existing text values into a numeric value, do this:

=pd.merge(,[["name of column 1 to bring", "name of column 2 to bring"]],left_on="which column from df 1 to match",right_on="which column from df 2 to match",how='left')

Get rid of columns that aren't helpful: del df['column_name']

to assign a value to a specific cell in a df, df.ix[0, 'COL_NAME'] = x