Intro
Second part of my Rust learning notes, about Rust datatypes. This corresponds with chapter 3 of Programming Rust, 2nd Edition
Fundamental Types
Coming from a Python-heavy background, Rust type handling is of course completely different – everything statically typed, and a lot of nuance in basic datatypes. Fortunately the compiler does a decent job of inferring type info (avoiding much of the verbosity that I was unhappy about in Java).
Type overview
- Integer types
i8, i16, i32, i64, i128, u8, u16, u32, u64, u128
- Integer literals
42, -5i8, 0x400u16, 0o644i16, 20_922_789_888_000u64, b'*'
- Address width integers
isize, usize
- Floats
f32, f64
- Bool
bool
- Unicode char (32b)
char
- Tuple (mixed type allowed!)
(char, u8, i32)
- Unit (empty tuple)
()
- Named field struct
struct S { x: f32, y: f32 }
- Tuple-like struct
struct T (i32, char)
- Struct without fields
struct E
- Enum
enum Attend { OnTime, Late(u32) }
- Box: owning pointer to value in heap
Box<T>
- Shared/mutable ref
&i32, &mut i32
- UTF-8 string, dynamic
String
- Reference to str: non-owning pointer to UTF-8 text
&str
- Array, fix len, same type
[f64; 4], [u8; 256]
- Vector, varying len
Vec<u64>
- Ref to slice
&[u8], &mut [u8]
- Optional value, either None or Some(v)
Option<&str>
- Result of operation: Ok(v) xor Err(e)
Result<u64, Error>
- Trait object
&dyn Any, &mut dyn Read
- Fn pointer
fn(&str) -> bool
- Closures
e.g.
|a, b| { a*a + b*b }
Numbers
Fixed-Width Numeric Types
For efficiency, Rust provides ints and floats with fixed width from 8 to 128bits, signed and unsigned – see above. The isize/usize are "native" width, either 32b or 64 depending on machine
Byte values are written as u8 – this is what you get if you read from a socket or from a binary file
One constraint regarding arrays: indices for those must be usize
Int literals can be suffized with their type, 123u32. If those are left off, Rust tries to infer the correct type. Prefixes 0x, 0o, and 0b are hexadecimal, octal, and binary literals. For readability, one may add "_" in the literal at arbirtray places.
Char values are not numeric. However, there's a byte literal: b'A' and 65u8 are equivalent. Byte literal use backslashes for escaping, e.g. b'\'', b''. ASCII codes are provided as b'', e.g. ASCII escape: b'1b'
There's a few int methods in the std lib, e.g. 2u16.pow(4)
, (-4i32).abs()
. Note the parens around the neg. int, method calls have higher precedence that unary minus. Also note that while usually Rust will infer types, here this can be ambiguous. Rust refuses to guess and will throw a compiler error.
Depending on the build type (either debug or release), Rust will runtime panic on overflow, or silently wrap around. There are versions of operators for doing checked operation, always overflow, or saturate (i.e. return the "nearest" val). They have the prefixes checked_, wrapping_, saturating_, or overflowing_ – e.g. checked_add()
Floating-Point Types
Rust’s f32 and f64 correspond to the float and double types in C. When inferring the type, Rust will prefer f64 if both would be possible. When inferring, Rust will always keep ints and floats separate.
The std::f32::consts and std::f64::consts modules have widely used constants – E, PI, etc. Note theres the primitive types f32/f64 but also modules for each (std::f32 / std::f64).
Bool
Comparison operators have bool values. Only those are allowed in control structures, e.g. must write if x != 0 { ... }
, the Pythonesque if x { ... }
is not allowed.
The as
operator converts bool to integers (false == 0, true == 1), the other way around is not possible though
Characters
These represent single characters, as a 32bit value – while Strings are encoded as utf8, i.e. Strings are not vectors of chars.
Char literals can be written in single quotes, in hex and in 32b hex notation '\u{HHHHHH}'
Tuples
Tuple elements don't have to be of the same type. They can only be indexed with a constant, e.g. t.2
Funs often returns tuples – use pattern matching syntax to unpack:
let (head, tail) = text.split_at(21);
The zero tuple ()
is used in places where context requires a value but we don't have meaningful value to use. E.g. a fun that doesn't return a value, this evals to ()
Btw. trailing commas are allowed everywhere where commas are used – in tuples, fun params, arrays, etc.
Pointer Types
References
This is something I really dig about Rust – how it explicitly exposes pointers in a safe way. This results in C-like control, while still preventing the kind of catastrophic faults that plague C/C++. And separating this into shared (r/o) and exclusive (r/w) helps muchly with safety, also in the face of concurrency.
&T
a shared read-only reference
&mut T
exclusive read-write reference
Box
Box variables can be used to allocate a variable on the heap: let b = Box::new(somevalue);
Raw Pointers
Just like C pointers. May only be dereferenced in an unsafe
block. So this is where the foot-shooting area begins
*const T
read-only
*mut T
read-write
Arrays, Vectors, and Slices
[T; N]
Array of N values, each of type T. Arrays are of constant size; it's size is part of the type signature. E.g.
let x = [true; 10000]
is an array of 10000 bools set to true;let buf = [0u8; 1024]
is a 1kb buffer, zero-initializedVec<T>
Vector, dynamically allocated on the heap, like a Python list
&[T]
,&mut [T]
A shared slice of Ts and a mutable slice of Ts. These can refer to arrays or vectors, consist of a pointer to the first elem and the len. Shared vs. mut works as for other references
For generality the latter should be used in function signatures if either an array or a vector would work. Rust auto-converts a ref to an array to a slice ref, thus many methods on slices are available for arrays as well
For vectors there is the vec![]
macro. Some examples
Creating from a literal:
let mut nums = vec![1,2,3,4]
Adding a new elem:
nums.push(123)
Collecting items from some iterable into a vector:
let v: Vec<i32> = (0..5).collect()
Similar to arrays, vectors also can use methods defined on slices, e.g.:
Length:
nums.len()
Sort:
nums.sort()
Reverse:
nums.reverse()
A new empty vector can be created with Vec::new()
; for optimization you can pre-allocate a capacity with Vec::with_capacity(N)
if you have a good guess what that will be. This can avoid re-allocation of the Vector elems that would occur on capacity increases
Some vector methods:
Insert:
ve.insert(pos, val)
Remove:
ve.remove(pos)
Remove and return last:
ve.pop()
Iterating over a vector:
let nums = vec![1,2,3,4];
for x in nums {
"=> {}", x);
println(}
Slices are regions of arrays or vectors. Slices can't be stored in variables, only refs to slices can.
String Literals
Standard strings literals are double quoted and can span several lines; the allow or the usual backslash escapes. If these are not wanted, there are also raw strings, r"some \raw text"
, or with a number of hash signs: r###"Look ma -- a \" quote"###
– those don't recognize escapes.
Byte Strings
These are strings of u8 characters: b"GET"
, and a raw variant: br"POST"
Using Strings
A &str
is similar to a &[T]
, a fat pointer to some data (which can come from a string literal or a String
). A String
on the other hand resembles a Vec<T>
– it lives on the heap, is resizable etc.
String
vars can be created with the .to_string()
method which copies from a &str
. The format!()
macro interpolates string similarly to how println!()
works.
String iterables have methods .concat()
and .join()
that produce a single string from several strings. Strings have an assortment of methods as you'd expect for case conversion, finding and replacing, trimming, etc. Also, Strings are comparable.
Strings in Rust are utf-8. If other charsets are needed:
use
Vec<u8>
use
std::path::PathBuf
for filesystem pathsuse
OsString
for things like cli args or envvarsand finally
std::ffi::CString
for null-terminated strings from C
Type Aliases
Similar to typedef: type Bytes = Vec<u8>
Coda
This concludes the basic Rust types. There's a rich selection of datatypes, in line with Rusts efficiency mission. The most important and interesting bits to me were strict division between immutable/mutable, and the way Rust handles references, again with the distinction shared+r/o and mutable-but-exclusive. This is gold for safety, especially in concurrent code.