Monday, January 25, 2016

CDOT Tip: Vol Language

A few notes on vol languages:
1) With cDOT 8.2 you can change the SVM language afterwards, and volumes within the SVM can have different settings (from each other and from the SVM root), but you can’t change a volume language after creation time.
2) cDOT doesn’t have a “undefined” language setting so it may be necessary to change the volume language on the 7-Mode system before migrating it over to cDOT.
3) You can only replicate to a volume with the same language as the source volume.
4) Newly created volumes in CDOT will inherit the SVM default language.
5) NetApp often recommends customers use C.UTF-8 (particularly for the SVM root volume), because it will allow namespace traversal to child volumes of any language.

Even more details:
  en_US is a subset of C.UTF-8.  The first 128 characters of both character sets match and are stored in ASCII and are 1byte.  .UTF-8 differs in that it includes more character sets and stores them in more than 1 byte.  Ideally, everyone is trying to get to UTF-8 (any version) as all versions of UTF-8 are the same.  Volume language only impacts UNIX hosts not Windows.  All UNIX hosts should maintain a matching locale UTF-8 but this is not always possible as more current distros of *NIX are .UTF-8 by default and older volumes may be configured for something else.  The only time the customer is going to experience a potential issue is when and if there are high-order characters above 128 characters and where the UNIX host and volume language do not match.  
 
  When a host opens a file on a volume it interprets the data through the lens of the  locale of the host.  Given that most new installs of *NIX will be .UTF-8 and that en_US is a subset of .UTF-8, the recommendation from Engineering (last I heard) was UTF-8.  It’s difficult to align both host and volume since old volumes are often a different language.  The problem could arise if there are high order characters and the host cannot correctly interpret the data (maybe because it is a high-order character that utilizes multiple bytes)…in this case you could get a bag of bits OR if that character set does not align perfectly the data could potentially be interpreted as something else.  E.g. en_US = $ but .UTF-8 = % (just an example to make a point but there are character sets that don’t align).  The customers I would be more concerned about are those that share files internationally.  

  The volume language is completely irrelevant for the names with ASCII-only characters. The problem starts when volume names contain the non-ASCII characters. The reason UTF-8 was selected as a new default is that this problem goes away with UTF-8. 
 
  It does not really matter whether it’s en_us.UTF-8 or he.UTF-8 .  The difference between en_us.UTF-8 and he.UTF-8 is in the handling of date format, a currency sign, the comma in thousands, and other almost-irrelevant for ONTAP things. Currently, ONTAP does not pay attention to these “details”. It only cares about the character set, which is identical for any UTF-8 variations. And that’s the reason UTF-8 was selected as a new default (C.UTF-8 to be more accurate).

2 comments:

  1. For example. will be a problem to set he.UFT-8 to a volume that host a Datastore-lun to vmware?

    ReplyDelete
  2. I discovered your website internet site on the internet and appearance a few of your early posts. Preserve in the great operate. I merely extra the Rss to my MSN News Reader. Seeking toward reading a lot more from you finding out down the line!… white house market

    ReplyDelete