Notessh2a

Basic Data Types

Go has three basic data types: boolean, numeric, and string. Each type has a zero value that is assigned when a variable is declared without initialization.

Boolean

KeywordValues
booltrue or false

Zero value: false.

Numeric

KeywordSizeValues
uint8/byte8-bit0 to 255
uint1616-bit0 to 65535
uint3232-bit0 to 4294967295
uint6464-bit0 to 18446744073709551615
int88-bit-128 to 127
int1616-bit-32768 to 32767
int32/rune32-bit-2147483648 to 2147483647
int6464-bit-9223372036854775808 to 9223372036854775807
float3232-bit-3.4e+38 to 3.4e+38
float6464-bit-1.7e+308 to +1.7e+308
uint32 bit / 64 bituint32 / uint64
int32 bit / 64 bitint32 / int64

Zero value: 0.

String

KeywordValues
string"text in double quotes" or `backticks`

Zero value: "".

Extra:

    • A string written with double quotes is an interpreted string literal. Escape sequences such as \n, \t, \uXXXX and \xNN are processed into their byte values. It cannot span multiple lines in source code.
    • A string written with backticks is a raw string literal. Its content is preserved exactly as written with no escape processing. It may span multiple lines.
  • Technically, a string is a read-only slice of bytes:

    str := "abcd" // [97 98 99 100]
    
    fmt.Printf("str[0]: %v, type: %T\n", str[0], str[0]) // str[0]: 97, type: uint8
    
    for i, v := range str {
    	fmt.Println(i, v)
    }
    
    // 0 97
    // 1 98
    // 2 99
    // 3 100

    But it's not always one byte per character. Some characters require more than one byte:

    str := "é"
    fmt.Printf("str: %v, len: %v, bytes: %v", str, len(str), []byte(str)) // str: é, len: 2, bytes: [195 169]

    len() returns the byte count, not the character count.

    This is because Go strings use UTF-8 encoding.

    UTF-8 uses variable length encoding. A character may use 1 to 4 bytes depending on its Unicode code point.

    First code pointLast code pointByte 1Byte 2Byte 3Byte 4
    01270yyyzzzz
    1282,047110xxxyy10yyzzzz
    2,04865,5351110wwww10xxxxyy10yyzzzz
    65,5361,114,11111110uvv10vvwwww10xxxxyy10yyzzzz

    Explanation:

    The character "é" has Unicode code point U+00E9 (233). Since 233 is in the range 128 to 2047, it uses two bytes in UTF-8:

    str := "é" // [195 169]
    
    fmt.Printf("str[0]: %v\n", str[0]) // str[0]: 195
    fmt.Printf("str[1]: %v\n", str[1]) // str[1]: 169

    Proof (decoding):

    195 is 11000011 and 169 is 10101001.

    Matching the bytes using the table above (2nd row):

    • First byte: 11000011 (110xxxyy) -> 110 and 00011.
    • Second byte: 10101001 (10yyzzzz) -> 10 and 101001.

    Combining 00011 and 101001 gives 00011101001 (binary) = 233 (decimal).

    Extra:

    When iterating with range, Go decodes each UTF-8 character and returns its Unicode code point:

    str := "Héllo" // [72 195 169 108 108 111]
    
    for i, v := range str {
    	fmt.Println(i, v)
    }
    
    // 0 72
    // 1 233
    // 3 108
    // 4 108
    // 5 111

    A string does not always hold UTF-8 encoded bytes. Strings can contain arbitrary bytes, but when created from string literals, those bytes are (almost always) UTF-8.

    // Constructing a string directly from raw bytes:
    var1 := string([]byte{0, 1, 2, 65, 195, 255, 195, 169, 0xff, 0xfd})
    fmt.Printf("var1: %v, len: %v, bytes: %v", var1, len(var1), []byte(var1))
    // var1: A���, len: 10, bytes: [0 1 2 65 195 255 195 169 255 253]
    
    // String literal containing byte-level escape sequences:
    var2 := "\xbd\x20\xb2\x3d\xbc\x48"
    fmt.Printf("var2: %v, len: %v, bytes: %v", var2, len(var2), []byte(var2))
    // var2: � �=�H, len: 6, bytes: [189 32 178 61 188 72]
    
    // String constructed from a regular string literal:
    var3 := "Hello"
    fmt.Printf("var3: %v, len: %v, bytes: %v", var3, len(var3), []byte(var3))
    // var3: Hello, len: 5, bytes: [72 101 108 108 111]

    For example, the byte sequence for var1 is [0 1 2 65 195 255 195 169 255 253], where some readable characters can be spotted, such as A and é.

    • A has Unicode code point 65, which is in the 0 to 127 range. In UTF-8, this range uses one byte, so 65 maps directly to A.
    • é has Unicode code point 233, which is in the 128 to 2047 range. In UTF-8, this uses two bytes. In var1, these are 195 and 169, which together decode to 233.

    Other bytes do not form valid UTF-8 sequences, so they are shown as replacement characters ().

On this page