Table of content:

Encoding in general

ANSI chars 0-127

Coding pages

Unicode

Encoding on Windows

How encoding works in Windows cmd

New line in Windows cmd

Encoding in general

ANSI chars 0-127

https://en.wikipedia.org/wiki/ASCII

ANSI chars 0-127 (decimal), 00-7F (hexa):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


   | 0   1   2   3   4   5   6   7   8   9   A   B   C   D   E   F  
---+----------------------------------------------------------------
0x | NUL SOH STX ETX EOT ENQ ACK BEL BS  HT  LF  VT  FF  CR  SO  SI   
1x | DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM  SUB ESC FS  GS  RS  US   
2x | SP  !   "   #   $   %   &   '   (   )   *   +   ,   -   .   /  
3x | 0   1   2   3   4   5   6   7   8   9   :   ;   <   =   >   ?  
4x | @   A   B   C   D   E   F   G   H   I   J   K   L   M   N   O  
5x | P   Q   R   S   T   U   V   W   X   Y   Z   [   \   ]   ^   _  
6x | `   a   b   c   d   e   f   g   h   i   j   k   l   m   n   o  
7x | p   q   r   s   t   u   v   w   x   y   z   {   |   }   ~   DEL  

Characters 00-1F (hexa) — control characters
Characters 20-7F (hexa) — printable characters
127 chars can be represented by 7 bits

Coding pages

8 bit coding pages can encode chars 0-255 (decimal), 00-FF (hexa). In general:

encoding for 00-7F is always the same (= ANSI encoding)
encoding for 80-FF is unique to coding page (usually represent lang-specific letters and symbols)

Examples:

cp-1251 (cyrillic)
oem 437 (US)
oem 866 (cyrillic)

OEM vs CP:

OEM are older (dos) versions
CP today are used in gui apps for example

Unicode

UTF-8, UTF-16, UTF-32 — they all encode the same set of characters. They differ only in the way they encode them.

ANSI chars look the same in all encodings (is it true?).

Encoding on Windows

How encoding works in Windows cmd

CLI apps output set of bytes (like 11001100-01010101-…) to Windows cmd stdout/stderr. Author of CLI app can have any encoding in his mind. Unless this CLI app specifically inspects cmd settings on their own, cmd will interpret this set of bytes with currently active coding page.

If [author’s encoding] match [currently active coding page], all will be rendered as expected (of cause if current font supports all the chars). If not — you’ll get giberrish output to all chars out of ANSI (0-127) range.

New line in Windows cmd

Windows uses \r\n for new line, Linux uses \n, most apps are able to handle this difference automatically (\r and \n are ANSI control chars from 00-1F hexa range)
To enable UTF-8 encoding on Windows cmd (for IO of the terminal), execute “chcp 65001” (then apps that output bytes sequences encoded in UTF-8 will have correctly rendered output)
Coloring on Windows cmd is done by enabling VT100 feature (present since ~2015 on Windows 10):
- Coloring is encoded via special combinations of ANSI control chars (00-1F)
- This is how it was done on Linux, and Windows cmd started supporting it recently