Basic Data Types
Go has three basic data types: boolean, numeric, and string. Each type has a zero value that is assigned when a variable is declared without initialization.
Boolean
| Keyword | Values |
|---|---|
bool | true or false |
Zero value:
false.
Numeric
| Keyword | Size | Values |
|---|---|---|
uint8/byte | 8-bit | 0 to 255 |
uint16 | 16-bit | 0 to 65535 |
uint32 | 32-bit | 0 to 4294967295 |
uint64 | 64-bit | 0 to 18446744073709551615 |
int8 | 8-bit | -128 to 127 |
int16 | 16-bit | -32768 to 32767 |
int32/rune | 32-bit | -2147483648 to 2147483647 |
int64 | 64-bit | -9223372036854775808 to 9223372036854775807 |
float32 | 32-bit | -3.4e+38 to 3.4e+38 |
float64 | 64-bit | -1.7e+308 to +1.7e+308 |
uint | 32 bit / 64 bit | uint32 / uint64 |
int | 32 bit / 64 bit | int32 / int64 |
Zero value:
0.
String
| Keyword | Values |
|---|---|
string | "text in double quotes" or `backticks` |
Zero value:
"".
Extra:
-
- A string written with double quotes is an interpreted string literal. Escape sequences such as
\n,\t,\uXXXXand\xNNare processed into their byte values. It cannot span multiple lines in source code. - A string written with backticks is a raw string literal. Its content is preserved exactly as written with no escape processing. It may span multiple lines.
- A string written with double quotes is an interpreted string literal. Escape sequences such as
-
Technically, a string is a read-only slice of bytes:
str := "abcd" // [97 98 99 100] fmt.Printf("str[0]: %v, type: %T\n", str[0], str[0]) // str[0]: 97, type: uint8 for i, v := range str { fmt.Println(i, v) } // 0 97 // 1 98 // 2 99 // 3 100But it's not always one byte per character. Some characters require more than one byte:
str := "é" fmt.Printf("str: %v, len: %v, bytes: %v", str, len(str), []byte(str)) // str: é, len: 2, bytes: [195 169]len()returns the byte count, not the character count.This is because Go strings use UTF-8 encoding.
UTF-8 uses variable length encoding. A character may use 1 to 4 bytes depending on its Unicode code point.
First code point Last code point Byte 1 Byte 2 Byte 3 Byte 4 0 127 0yyyzzzz 128 2,047 110xxxyy 10yyzzzz 2,048 65,535 1110wwww 10xxxxyy 10yyzzzz 65,536 1,114,111 11110uvv 10vvwwww 10xxxxyy 10yyzzzz Explanation:
The character "
é" has Unicode code pointU+00E9(233). Since233is in the range128to2047, it uses two bytes in UTF-8:str := "é" // [195 169] fmt.Printf("str[0]: %v\n", str[0]) // str[0]: 195 fmt.Printf("str[1]: %v\n", str[1]) // str[1]: 169Proof (decoding):
195is11000011and169is10101001.Matching the bytes using the table above (2nd row):
- First byte:
11000011(110xxxyy) ->110and00011. - Second byte:
10101001(10yyzzzz) ->10and101001.
Combining
00011and101001gives00011101001(binary) =233(decimal).Extra:
When iterating with
range, Go decodes each UTF-8 character and returns its Unicode code point:str := "Héllo" // [72 195 169 108 108 111] for i, v := range str { fmt.Println(i, v) } // 0 72 // 1 233 // 3 108 // 4 108 // 5 111A string does not always hold UTF-8 encoded bytes. Strings can contain arbitrary bytes, but when created from string literals, those bytes are (almost always) UTF-8.
// Constructing a string directly from raw bytes: var1 := string([]byte{0, 1, 2, 65, 195, 255, 195, 169, 0xff, 0xfd}) fmt.Printf("var1: %v, len: %v, bytes: %v", var1, len(var1), []byte(var1)) // var1: A���, len: 10, bytes: [0 1 2 65 195 255 195 169 255 253] // String literal containing byte-level escape sequences: var2 := "\xbd\x20\xb2\x3d\xbc\x48" fmt.Printf("var2: %v, len: %v, bytes: %v", var2, len(var2), []byte(var2)) // var2: � �=�H, len: 6, bytes: [189 32 178 61 188 72] // String constructed from a regular string literal: var3 := "Hello" fmt.Printf("var3: %v, len: %v, bytes: %v", var3, len(var3), []byte(var3)) // var3: Hello, len: 5, bytes: [72 101 108 108 111]For example, the byte sequence for
var1is[0 1 2 65 195 255 195 169 255 253], where some readable characters can be spotted, such asAandé.Ahas Unicode code point65, which is in the0to127range. In UTF-8, this range uses one byte, so65maps directly toA.éhas Unicode code point233, which is in the128to2047range. In UTF-8, this uses two bytes. Invar1, these are195and169, which together decode to233.
Other bytes do not form valid UTF-8 sequences, so they are shown as replacement characters (
�). - First byte: