Monday, December 5, 2016

Pub Business: Part 1

My family and I recently took a massive risk - we purchased an Irish pub.  The pub is in downtown Grand Rapids, MI and had been struggling due to a lack of investment, management, and ownership neglect.  It was losing money and it wasn't hard to see why: service was awful, the building was dated, the beer was skunky, everything smelled.

We ran the bar for 4 weeks after purchase to get a better understanding of the business, then shut it down for 6 weeks for renovations.  We re-opened 2 weeks ago and have endured the trial by fire, and I think we've come out the other side satisfactorily.  In this post I'd like to lay out our financial modeling, our business strategy, and the results so far.

 I composed from the previous owner's tax returns an income statement. What it revealed is that draft costs and liquor costs are higher than industry standard, accounting for $7k in losses.  I frankly don't believe either of those numbers: the draft lines were 120ft long and poured pure foam, plus the staff (and their friends) were drinking for free.  I'd personally seen bartenders giving away booze!  I'm curious to anyone's advice on how (or why) the previous owner could have hid these costs.

The other thing this reveals is that food costs as a % of food sales are way too high: 43% in contrast with industry-standard 29%.  There's $35k disappearing there!


Our business plan was pretty simple in concept:

  1. $275k in loans and $225k in capital to purchase the bar and completely remodel it.
  2. Tap talent for the NEW bar!
    1. Design (logo, artwork, menu)
    2. Construction (floors, ceiling, new bar, new cooler and draft lines, booths, etc)
  3. Transform the staff and culture: many of the original team did not make the cut.  Anyone whose honesty we questioned or who did not have a customer-centric outlook quickly phased out of the bar, and the pub's great new look attracted better employees.  We took a zero tolerance policy to theft, sexual harassment, and set high standards for everyone.
  4. Implement systems of quality: daily and weekly checklists to avoid crises and ensure tasks like ordering and cleaning were completed with accountability, plus replacing aging systems with modern ones with lower TCO.
  5. Advertise: we are developing a strategy that combines media, community involvement, social media, and a ground game to fill our pub during non-peak hours.
  6. Marketing: we tapped a mixologist and chef for parts of our menu.  Much of the whiskey, beer, and food selection was made by ownership.  And we made big pricing changes: I graphed the COGS and price of every item and created tiers of target margin, and we now only carry items that fit into those tiers.
Opening Day

Along the way, we've learned a few lessons.  One is that owning a pub is all-consuming: there are always a million things to be done, and you have to learn time management like none other.  Another is having to accept that in the bar business, you're simply going to have some unhappy customers, and you have to figure out what's in your power.  And last, uncertainty kills job creation.  We'd like to hire several more people, but it's so hard to predict demand and sales that we are just taking the work on ourselves.
I could cover a million large steps we took to drive sales, but let's leave that for Part 2.  So how about the results so far - are we going to be miserable failures?  For a baseline, I pulled daily sales data from the past two years (2014 sales were better than 2015 sales) and graphed it:
A couple of takeaways here: we are running at 247% of 2015 sales and 153% of 2014 sales.  We have not had a single day where we didn't exceed previous years!

We are wildly exceeding our expectations.  If this pace continues, we'll have plenty of money to hire a manager and let the pub stand largely on its own operationally.  That would give us the ability to focus on business development and efficiency.  I've created a calculator to model increased costs with increased sales, but there's a lot of uncertainty - will this trend continue?  We can't do 147% better than last year forever right?  How much of this is just a new-bar pop, vs how much can we create new business and beat the competition? 
Part 2 coming soon.  In the meanwhile, check out our website and facebook!  Or if you find yourself in west Michigan, stop in for a drink :-)

Saturday, July 23, 2016

SolidFire vs EMC ScaleIO

Doing a bit of research and thought I'd write this down for posterity.  Disclaimer: I'm a SolidFire engineer.

SolidFire Pros:

  1. All-Flash optimized
  2. Global, inline dedupe and compression
  3. Enterprise data services:
    1. Snapshots, cloning, replication (sync, async, snapshot based)
    2. Automation (openstack, api's, etc)
    3. Vvols support and vmware integration
  4. QOS
  5. iSCSI or FCP

ScaleIO Pros:
  1. All flash or hybrid
  2. No dedupe or compression.  Never will have global dedupe.
  3. Scales larger (1,000 nodes instead of 100 nodes)
  4. Wider whitebox support
  5. iSCSI only
  6. Can live on top of a compute node, occupying free resources.
More conversation here:

Thursday, June 16, 2016

SolidFire Architecture #1

 It's time I write a long-overdue overview of SolidFire: how it works, how it solves problems, and why service providers love it.  So here is Part 1!

First, SolidFire is not the solution to everything.  But it is the best in the world at what it does solve, which is why it won Gartner awards for the last two years.  Since this is an engineering blog, let's talk about how it works.

SolidFire hardware is regular servers with SSD's and no RAID, so you get commodity hardware prices and a truly software-defined architecture.  It protects data by writing it in two places using an algorithm we call Double Helix, and then earns space back with inline compression and global inline dedupe.  The global inline dedupe allows for much greater dedupe ratios than anything else on the market, because every block of data written is unique.  Other storage solutions have silos of dedupe, pools of blocks that are unique locally but duplicated many times throughout the environment.
The SolidFire robot

Today SF is iSCSI and FCP only.  When you create a LUN, SF chooses where in the cluster to place the data, removing the enormous complexity of we call the "placement question."  Let's spend some time on that: in most traditional storage environments, you have a couple of storage nodes that form capacity and performance silos.  When you scale out to 20 or 1000 nodes, your provisioning encounters a complex question: where do I place this data?  That spurs hours of performance and capacity analysis, trending and peaks vs average conversations.  On SF, the cluster does it for you.

It also solves the performance question that multi-tenancy brings by allowing you to provision performance.  Not just capacity, but performance!  SF does this by allowing you to set a minimum, maximum, and burst for each volume, guaranteeing a service level.

I've only scratched the surface on this one: we'll save the scale cluster model and more for the next blog post.

Sunday, May 29, 2016

BlackPhone 2 Review

I've been excited about the Blackphone 2 for quite some time and finally switched over to it.  Partially because I want to make it harder for criminals, companies, and the government to intrude on my privacy, partially because I want to encourage the tech industry to implement smart security measures.  Why in the world does pandora demand access to my iPhone's calendar?  This has gotten out of hand.

Unfortunately, the Blackphone 2 is not ready for prime time.  Here's a short list of why.
  • Any attempt to update the OS is met with "Download failed" with no debug data for why it failed.
  • It's freaking huge.  One handed, fully 50% of the screen is out of your thumb's reach.  And the weight of it is killing my wrist.  And in a world where I need map on my phone while driving or coffee in my hand while texting, a phone this big is basically worthless. 
  • The touch screen is really inaccurate.  I'll tap the same icon 5-10 times before it's registered and acted upon.  Sometimes I'll tap an button in an app and SilentOS will pass the tap back to the home screen, opening a completely application
  • Google maps keeps failing to find my location, even though I've given it permission. 
  • Even though I blocked Facebook from accessing my location, it is able to access my location via Google Play Service.  So what's the point, Blackphone?  It's not granular permission if you're giving them a giant back door.
  • The volume/power combo for taking a screenshot is very hit or miss.
  • No built-in visual voicemail.  Gotta download your carrier's app.
  • Notifications take a TON of configuring.  Let me put it this way: new applications should not be able to display full notifications on a locked screen.  For example, Whatsapp or Facebook Messenger shouldn't default to publishing the entire text.  A private-by-design phone should at least only show the name of the contact, but preferably only show that a message exists.
  • The phone's usability is 5 years behind an iPhone.  Some examples:
    • My mom texted me a picture and I wanted to send it to my sister.  There's literally no way to copy/save a picture in BlackPhone's text app. 
    • Highlighting text is a real, real pain.  But copying it is impossible, because the copy button is at the very top the screen, 4 inches away from your thumb.
    • The triangle-circle-square buttons at the bottom of the screen are endless torture for me.  Apps like outlook have a set of buttons at the bottom of the screen and it's completely counter-intuitive to have two layers of buttons.  I accidentally reply to emails several times a day. 
    • To open the camera from a lock screen on an iPhone, you swipe upwards.  To do the same on the Blackphone you have to unlock the phone.  This is a big deal because I frequently want to record events transpiring around me, and I might miss it while unlocking the phone.
  • It takes forever to reboot.
  • It's noisy.  Why did I just get a notification that I took a screenshot?  I know I took a screenshot.  Why did the keyboard icon just appear in the upper right?  I know I'm using the keyboard. 
Some good things:
  • The display is great
  • Tethering works great
  • The wifi by GPS feature works great
  • I love the granular control "security center" where it works.  
  • Apps seem to be compatible and work just fine usually.
Google play services (as mentioned before) kills the whole point of the phone by handing control over:

I'm going to be asking Silent Circle if there's a new, smaller version coming out in the next couple months.  If not, I'll probably give this to some kid to use as a tablet.

Friday, May 6, 2016

Content Delivery Networks

I've been doing a bit of research into both Data Science and the hyperscalers and a few things have struck me.  One is that Google Fiber is incredibly strategic for Google Platform.

The ability to ensure QOS and cache by controlling the last mile of delivery is a huge advantage.  Everything AWS does is dependent on the telcos, because all of their wonderful technology is designed to deliver something (data, a website, a database) from AWS to somewhere, or from somewhere to AWS.  And that something almost always goes through Verizon, ATT, Comcast, Time Warner, etc.

So one of the most innovative, client focused, fast-paced tech giants is completely dependent on the most change resistant, oligopoly, entrenched companies in the US.  ATT and Verizon do some good business in the private-cloud enterprise space, but I've seen up close and personal that their company culture is bureaucratic as it comes.  And of course, there is cost involved: as all of us commercial users of ISPs know, the oligopoly always gets its money.

That's why Google Fiber is so forward thinking.  They're going to leapfrog AWS on this one, and if Google succeeds in laying a network quickly, in 15-20 years it could give Google Platform a killer competitive edge.  

Wednesday, March 16, 2016

SolidFire and ONTAP

I had a reseller ask last week "Now that NetApp bought SolidFire, are they going to kill all-flash FAS?"  My answer: not on your life. 

NetApp has sold tens of thousands of all-flash FAS (AFF) systems, which run CDOT, our flagship operating system.  It's a great product that enormous enterprises (and governments) are spending a billion dollars a year on: there's no way we'd back down from continuing to invest in R&D there.  

Besides that, SolidFire has a completely different architecture than AFF.  One way to understand it is that AFF's architecture starts with smaller building blocks.  Here's what I mean:

  1. AFF dedupes each volume individually: SolidFire dedupes the entire cluster.
  2. AFF protects each disk using RAID: SolidFire protects each node using two copies of everything.
  3. AFF puts QOS on each volume: SolidFire shows you whether your QOS promises exceed the cluster's ability.
  4. AFF deals with node failure by having a redundant partner take over: SolidFire deals with node failure by having ALL the other nodes pick up the slack.

These are different architectures, which solve different needs.   I thought this was a great overview of SolidFire as well:

Tuesday, February 23, 2016

Performance Archive Part II

More on performance archiving: it collects data at the highest granularity possible for each statistic, often 1 second.  The time range of each dataset is up to 6 hours long, and each node preserves its own set of this performance data. 

There is nothing like this in 7mode (rolling perf data collection) on-box.  You’d have to run perfstats. 

NetApp has tools to analyze the data, but they are not customer facing.    You provide the case number in the command line and it auto-uploads.

You can specify the exact time window you’re looking for:
cs001::> system node autosupport invoke-performance-archive -node cs001-pn01 -start-date "02/22/2016 8:00:00" -end-date "02/20/2016 14:00:00"

There is minimal impact on the netapp system of running this – it already runs in the background and keeps up to 28 days of data.

Friday, February 19, 2016


SolidFire is a "make private cloud easy" solution primarily designed for service providers.  It's a "born in OpenStack" all-flash whitebox solution that aims to be stupid-easy to deploy and manage.

The goal for SolidFire is not to be the fastest, the most resilient, or the most features.  It aims to answer one question, best in class: "How do I easily deploy Storage as a Service?"  You can see this in their design choices:
  1. Because this is a product service providers sell, they're flash only, have required QOS policies, and skip all the management tools, leaving that to OpenStack.
  2. Because they use two copies of everything instead of RAID, they achieve node level resiliency and skip expensive hardware and software, using inline dedupe/compression to recover the space delta.  This also spreads performance requirements across the entire cluster.
  3. Because they expect you'll be deploying a single configuration thousands of times, they support only 1 protocol and have very limited configuration options.
  4. Because this is for a cloud, not a single-purpose, the cluster (up to 100 nodes) auto-grows when you add a new node and recovers quickly when you lose one.
A few technical details:
  • Platform today is Dell servers.  Now that Dell owns EMC, it'll probably convert to Cisco.
    • 10 drives per node
    • SF2405: 5-10TB and 50k IOPS
    • SF4805: 10-20TB and 50k IOPS
    • SF9605: 20-40TB and 50k IOPS
    • SF9010:  20-40TB and 75k IOPS
  • Features:
    • Inline dedupe and compression
    • For QOS you can set min, max, and burst limits.
    • Mix any node platform
    • You can hot remove nodes
    • iSCSI, FCP (with a gateway device)
    • native snapshot capability and can backup to any Amazon Web Services S3 or OpenStack SWIFT-compatible API. 
  • Under the hood:
    • Nodes are connected via 10GbE over your shared network.  Not a private intracluster network.
    • “All connections for a particular LUN presented to storage go back to the primary node for that LUN. IE: multipath doesn't help you weather a failover. They're dependent on long iSCSI timeouts to give them time to fail a node and redirect traffic.”
  • Performance and QOS:
  • Node Loss Demo:
SolidFire wins Gold in the Storage magazine/ 2015 products of the Year Storage Systems: All-Flash Systems category.

CDOT Tip: Performance Troubleshooting

NetApp is engineering simpler, more elegant tools for our clients to manage their large technology deployments.  We’ve found recently that clients are relatively unaware of one tool that can make your life much easier when you are investigating a performance issue: Performance Archives.

Performance Archives has existed in ONTAP for a long time, but beginning with 8.3 (released November 2014), the payload was updated to specifically enable diagnostic use cases, trending use cases, etc.  In 8.3, there is a command “system node autosupport invoke-performance-archive” which allows customers (HTTPS enabled) to ship back up to 6 hours worth of data at a time, collected a much higher resolution than off-box PerfStat ever did (per-second for many counters) AND allows you to “go back in time” up to 28 days, depending on customer configuration. 

The tool we recommended pre-8.3, PerfStat, will continue to function through 8.4 per current plan but we recognize it is post-failure collection mechanism, which is not ideal.  In other words, after you run into a performance issue, you have to set up PerfStat to run and wait for that issue to recur.  Performance Archives gives you the ability to instantly look back several hours and catch what the problem in the act.

We’re also making big improvements to Performance AutoSupports:  We are focused on efficiently streaming real-time performance information to our cloud and making it easier to access this content within the client-facing NetApp ASUP infrastructure. This will allow for our customers and NetApp support engineers to do trending, analytics, diagnostics and more.

And of course OnCommand Performance Manager remains your go-to tool for retaining, graphing, and machine learning analysis for your entire NetApp footprint.

Read more here:

Thursday, February 18, 2016


  We’ve all seen the cloud coming for years, but now suddenly our clients are finding hyperscalers compelling and maybe essential.  We encourage our clients to use the public cloud wherever it’s optimal and we’d love to partner with customers as they make the transition.  One of the ways we can help is with Cloud ONTAP.

  Retaining the security, standards, and integrity of your data as you start to use the cloud isn't easy.  If your data is already on NetApp, it's a cinch to deploy NetApp's OS in the cloud and replicate the data over, solving all those problems. 

Cloud ONTAP is purchasable in two forms:
  • Pay as you go – By directly from AWS and pay per hour.
  • Bring Your Own License: 6 or 12 month “everything included” license quoted by NetApp.
    • I heard NetApp can put together a 48 month license quote if needed.
    • Includes up to 368TB of capacity.
  • Neither of these options include the cost of AWS, which depends on several factors, including:
    • Which instance size you choose (M4.4KL, M4.2XL, M3.XL, M3.2XL, R3.XL, R3.2XL, etc).  This is a menu of RAM/CPU combos.*
    • If you’re using NetApp aggr encryption, you can only choose M4.2XL
    • What type of disk you chose (SSD or HDD).
    • How much capacity you need.
    • How utilized you expect the CPU to be.
    • How much data you expect to be transferred out of the instance.  Transfer into AWS is free.

You can use the AWS cost calculator to estimate the cost:

*Testing indicates a M3.2XL instance caps out at ~10k IOPS (100% read) and 1.5k IOPS (100% write) at 20ms latency.  M4.4XL should accomplish roughly 2x the performance of M3.2XL.  

  In most situations, using storage in the cloud is going to cost you 3+ times the acquisition cost of the same on-prem storage array over a 4 year life.  After accounting for datacenter costs and hardware management though, you may find the TCO comparable.  NetApp offers a free cloud workshop to help you identify which workloads are great cloud candidates to help you sort through your real requirements and costs.  Feel free to reach out to your friendly neighborhood netapp engineer or myself if you're interested!

All technical information is as of CDOT 8.3.2, CDOT 8.4 is expected to have big improvement for Cloud ONTAP.

Tuesday, January 26, 2016

NetApp SDK and API

First, some important documentation: here is a good document for the Powershell Toolkit, and all SDK Documentation can be found here.

In the SDK you'll find a help file.  

But the really useful information is buried a bit here:

Where you'll find info on all the objects and methods available!

Also here is the developer network community site:  You'll find the developer community site the most helpful of all of them by a long shot!

Monday, January 25, 2016

CDOT Tip: Vol Language

A few notes on vol languages:
1) With cDOT 8.2 you can change the SVM language afterwards, and volumes within the SVM can have different settings (from each other and from the SVM root), but you can’t change a volume language after creation time.
2) cDOT doesn’t have a “undefined” language setting so it may be necessary to change the volume language on the 7-Mode system before migrating it over to cDOT.
3) You can only replicate to a volume with the same language as the source volume.
4) Newly created volumes in CDOT will inherit the SVM default language.
5) NetApp often recommends customers use C.UTF-8 (particularly for the SVM root volume), because it will allow namespace traversal to child volumes of any language.

Even more details:
  en_US is a subset of C.UTF-8.  The first 128 characters of both character sets match and are stored in ASCII and are 1byte.  .UTF-8 differs in that it includes more character sets and stores them in more than 1 byte.  Ideally, everyone is trying to get to UTF-8 (any version) as all versions of UTF-8 are the same.  Volume language only impacts UNIX hosts not Windows.  All UNIX hosts should maintain a matching locale UTF-8 but this is not always possible as more current distros of *NIX are .UTF-8 by default and older volumes may be configured for something else.  The only time the customer is going to experience a potential issue is when and if there are high-order characters above 128 characters and where the UNIX host and volume language do not match.  
  When a host opens a file on a volume it interprets the data through the lens of the  locale of the host.  Given that most new installs of *NIX will be .UTF-8 and that en_US is a subset of .UTF-8, the recommendation from Engineering (last I heard) was UTF-8.  It’s difficult to align both host and volume since old volumes are often a different language.  The problem could arise if there are high order characters and the host cannot correctly interpret the data (maybe because it is a high-order character that utilizes multiple bytes)…in this case you could get a bag of bits OR if that character set does not align perfectly the data could potentially be interpreted as something else.  E.g. en_US = $ but .UTF-8 = % (just an example to make a point but there are character sets that don’t align).  The customers I would be more concerned about are those that share files internationally.  

  The volume language is completely irrelevant for the names with ASCII-only characters. The problem starts when volume names contain the non-ASCII characters. The reason UTF-8 was selected as a new default is that this problem goes away with UTF-8. 
  It does not really matter whether it’s en_us.UTF-8 or he.UTF-8 .  The difference between en_us.UTF-8 and he.UTF-8 is in the handling of date format, a currency sign, the comma in thousands, and other almost-irrelevant for ONTAP things. Currently, ONTAP does not pay attention to these “details”. It only cares about the character set, which is identical for any UTF-8 variations. And that’s the reason UTF-8 was selected as a new default (C.UTF-8 to be more accurate).

Sunday, January 17, 2016

Memory as a Service and Apache Spark

Apache Spark is quickly crowding out MapReduce as the framework for orchestrating data analytics on compute clusters like Hadoop.  It's much faster (~100x), requires less disk storage and throughput, is more resilient, and is easier to code for.

MapReduce accomplishes speed and resiliency by sending 3 copies of each piece of data to 3 different nodes.  Each node writes it to disk and then starts working, which means you're tripling the disk writes and tripling the space required to do a job.  One way to solve this is to use direct-attach storage arrays: you can connect 4+ Hadoop nodes to one array and then send only 2 copies of each piece of data, relying on the storage array's RAID protection for a layer of protection.  This also allows you to scale capacity and performance as needed.  Data storage companies see this architecture as their way to contribute and sell to the Big Data industry.

But here comes Spark, which has "Resilient Distributed Datasets," essentially data parity so that you can always reconstruct the missing data if a node goes down.  Think erasure coding.  Since the data has a layer of resiliency built in, writing to disk isn't needed, and the work can be done in RAM.

This basically eliminates today's value proposition of data storage companies.  But it does require a bunch more RAM, which is expensive.  I'm starting to see potential for the Flash As Memory Extension developing market: connect an all-flash or 3dxpoint array to several nodes and you have a slower but much cheaper way to run Spark.  For use cases that don't need the blazing speed and have cost constraints, that could work well.

Memory as a shared resource...looks like we're heading toward a Memory as a Service model similar to RAMcloud!

Thursday, January 14, 2016

Developing Memory Trends

There are a few new types of memory breaking onto the scene that are reportedly going to change the world.  The trick to bringing a memory technology to the commercial market today is nailing all 4 key properties: fast, dense, non-volatile, and inexpensive.  
  1. Normal DDR3 DRAM is fast (6-20 nanoseconds) and dense, but volatile and expensive at $50/GB for enterprise server DRAM.
  2. NAND flash is dense, non-volatile, and inexpensive (~$5/GB for enterprise SSD), but it's nowhere near the speed of DRAM at >50,000 nanoseconds
  3. Memristors sound amazing, but they don't exist yet and likely won't in the next 5 years.  
  4. 3D Xpoint appears to be the the only viable option right now.  Intel and Micron report it is:
    1. Fast: 1,000x faster than Flash would mean 50 nanosecond range.
    2. Non-volatile
    3. Up to 50% less expensive than DRAM
    4. 10x denser than DRAM
Importantly, 3Dxpoint is reported to be durable.  One of the drawbacks of Flash is writes cause damage to it over time, so if you start with a 1TB disk after 4 years you may have a 200GB disk.  To solve this, manufacturers pack up to 6x the needed amount of Flash into an SSD so when an area goes bad, the SSD just allows you to write to a brand new area of the disk.  This does two bad things: drives up cost and drives down performance. 
"We show that future gains in density will come at significant drops in performance and reliability. As a result, SSD manufacturers and users will face a tough choice in trading off between cost, performance, capacity and reliability."  Source
If 3dxpoint delivers on its promises, there will be several huge impacts.
  • All-3dxpoint arrays.  Today's storage operating systems will need to be completely re-written as they are simply not capable of going from 50,000ns disk latencies to 50ns.  
  • All systems will have more memory.  At 50% the cost of DRAM, rather than spend less money on memory we'll probably just spend the same money on 2x the amount of memory.
  • Because you'll have 2x the memory, operating systems will need to be re-written.  Our current OS's are designed around the cost constraints of memory gradually declining according to Moore's law.  3dxpoint would thrust us forward along that line and require serious software engineering to take advantage of it.
  • Since it's non-volatile, operating systems will need to be re-written.  Lose power?  Start right back where you left off.  This means you wouldn't be able to resolve an application/OS freeze-up with a hard reboot as well.
  • FaME (Flash as Memory Extension) will give way to 3dxpoint as Memory Extension and accelerate the trend rapidly.  The cost of 3dxpoint ($25/GB?) make a strong case for the shared-resource model of today's data storage industry, while the performance and non-volatility feature would be the succession of SAP HANA's "database in memory" architecture (and IBM's Spark, too).  
DRAM's performance advantage over 3dxpoint probably means DRAM won't go away in the enterprise.  Rather, 3dxpoint would be another tier of memory, below DRAM and above disk, with DRAM mirrored to the 3dxpoint for its non-volatility.  

Wednesday, January 13, 2016

Top of Rack Flash and FaME

One developing topic is Flash as Memory Extension, or FaME.  With flash becoming cheaper and cheaper and CPU's getting faster and faster, there's an architectural bottleneck to solve: there's a huge performance gap between the DRAM and the SAN.  

FaME tries to solve this by accessing SSDs in way that mimics access to RAM, skipping the SCSI stack and driving down latency.  This begins to close the DRAM-SAN gap, achieving latencies in the 500-900ns range.  There are a whole host of ways to accomplish this.
  • NVMe is one, essentially a PCI-slot compatible SSD.  Pop it in a server and it acts like DRAM.  The disadvantage here is you've re-introduced the problems the SAN/NAS was introduced to solve: stranded capacity, lack of data protection features (snapshots, replication), and you'll need to come up with a way to make this NVMe available to multiple nodes for parallelization and redundancy. Think Fusion-IO, which also required significant re-coding of applications.
    • RoCE Infiniband has port-to-port latencies ~100ns
    • RoCE Ethernet has port-to-port latencies ~230ns and RoCE v2 is routable.  This is a link-layer protocol, however RoCE v2 is not supported by many/all OS's yet. 
  • iWARP is a protocol that allows RDMA wrapped in a packet for a stateful protocol like TCP.  
  • Memcached is an open-source way for your servers to use other servers as DRAM extension.  I'm a bit fuzzy on whether it simply uses the second server as a place to put part of your current working set or if it offloads portions of the computation as well.  In any case, here's a good explanation.
  • HDFS, key value store semantics and other protocols: this may be the smartest way to do things, just let the application speak directly to a storage array the same way it would speak to RAM.  
Currently the fastest all flash SAN/NAS arrays have latencies in the 100,000-200,000ns range.  This is miles away from the DRAM, which is in the 6-20ns range.  FaME is aiming for 900ns.  Architectures like HANA and Spark try to solve this by putting the entire workload in DRAM, which is expensive and means that a power outage requires long process of re-loading the data from disk into DRAM.  
Courtesy of

Looks like FaME will be a good price/performance solution until we develop a super-cheap static RAM.

Wednesday, January 6, 2016

CDOT 8.3.2: What's New

The next release of our ONTAP operating system, CDOT 8.3.2, is expected to have several key features and is expected February-ish. 8.3.2RC2 is already out!  Here’s what’s new:

1) Inline Dedupe: supported on All-flash and FlashPool enabled systems.  This feature reduces the transactions to disk and reduces storage footprint by deduping in-memory.
    a. Enabled on all-flash arrays by default
    b. Uses 4k block size and stays deduped when replicated.
    c. Syntax: volume efficiency modify -vserver NorthAmerica -volume /vol/ production-001 -inline-deduplication true

2) QOS (adjustable performance limit) is now supported on clusters up to 24 nodes (up from 8 nodes)!

3) Support for 3.84TB TLC SSDs.  These huge flash drives have a lower cost per GB than smaller SSDs.  They also require lower power consumption (85% less) and lower rack footprint (82% less RU) than a same-performance HDD array.

4) Static Adaptive Compression: in 8.3.1 we introduced a high-efficiency compression algorithm for inline compression.  8.3.2 gives you the ability to run this algorithm on an existing volume or dataset.
    a. This is important for when you do a vol move from a system without inline compression enabled to one with it enabled, for example from a HDD FAS to an all-flash FAS.

5) Copy Free Transition: This feature allows you to shut down a 7-mode cluster and hook the disk shelves into a CDOT system.  The data is converted to CDOT, vfilers are converted to vservers, and in a 2-8 hour window the entire 7 to C migration is complete.

6) Migrate a volume between vservers: the equivalent of vfiler move, this allows you to re-host a volume between vservers.  This only works once, and only for volumes transitioned from 7-mode.

7) Inline Foreign LUN Import: Migrate FCP data by presenting the LUN at the NetApp controller, and NetApp then presents the LUN to the host.  In the background, NetApp will copy the data over and then you can retire the old system and LUN.  This is vendor-agnostic as of 8.3.1 and works for All-Flash FAS in 8.3.2.

8) SVM-DR MSID replication.  SVM-DR allows you to replicate an entire vserver and all its configuration for a push-button disaster recover.  MSID’s are the unique identifiers for volumes, and replicating them allows applications like VMware to accept the DR exports without re-mounting them, greatly shortening your RTO.

9) Audit Logging enhancements: records login failures and IP addresses of logins.

Read more at 8.3.2RC2 release notes: