Structs blog post initial version

4 years ago · c694f1ab3d
parent 3d79740786
commit c694f1ab3d
8 changed files with 309 additions and 42 deletions
--- a/media/smolbotbot.jpeg
+++ b/media/smolbotbot.jpeg
--- a/media/struct_diagram_1.png
+++ b/media/struct_diagram_1.png
--- a/media/struct_diagram_2.png
+++ b/media/struct_diagram_2.png
--- a/media/struct_diagram_3.png
+++ b/media/struct_diagram_3.png
--- a/media/struct_diagram_4.png
+++ b/media/struct_diagram_4.png
--- a/media/struct_diagram_5.png
+++ b/media/struct_diagram_5.png
--- a/src/structs.toml.md
+++ b/src/structs.toml.md
@ -0,0 +1,309 @@
 title = 'How the struct gets made'
 subtitle = 'In which peek behind the curtain to see how compilers represent our data types.'
 author = 'Tom Panton'
 tags = []
 ---
 A while back I came across a question online asking why Rust uses a different layout for structs
 than C. "Layout" here refers to the way a struct gets represented as a sequence of bytes in memory.
 I think it's an excellent question, and it gives us an excuse to mess around with a debugger to see
 what's going on in memory, so let's have a go at answering it!
 To know _why_ the two languages lay out structs differently, we first need to know _how_ they lay
 them out. Let's define a little test struct in C for us to poke and prod at:
 ```C
 #include <stdint.h>
 struct TestStruct {
    uint32_t x; // 32 bits = 4 bytes
    uint64_t y; // 64 bits = 8 bytes
    uint32_t z; // 32 bits = 4 bytes
 };
 ```
 At a glance, representing `TestStruct` in memory doesn't seem like a particularly difficult thing
 to do. My first guess is that it can be 16 contiguous bytes where the first 4 bytes represent `x`,
 the next 8 represent `y` and the last 4 represent `z`. Something like this:
 ![A row of squares representing TestStruct, with each square representing one byte. The first four squares are labelled x, the next eight are labelled y and the last four are labelled z.](/article_media/struct_diagram_1.png)
 Let's do a quick experiment to see if I'm right! We'll use C's `sizeof` operator find the size of
 `TestStruct` in bytes. If my prediction is correct, it should be 16 bytes.
 ```C
@@struct_size.c@@
 #include <stdint.h>
 #include <stdio.h>
 struct TestStruct {
    uint32_t x;
    uint64_t y;
    uint32_t z;
 };
 int main() {
    printf("%zu bytes\n", sizeof (struct TestStruct));
    return 0;
 }
 ```
 Let's run it and see what we get:
 ```
 $ clang struct_size.c
 $ ./a.out
 24 bytes
 ```
 24 bytes?! I was wrong! So what the heck is C doing here?
 Well, let's take a look using a debugger! We'll create a `TestStruct` variable and fill its fields
 with some values that will be easy to spot later:
 ```C
@@struct_layout.c@@
 #include <stdint.h>
 struct TestStruct {
    uint32_t x;
    uint64_t y;
    uint32_t z;
 };
 int main() {
    struct TestStruct test;
    test.x = 0xcafebabe;
    test.y = 0x0123456789abcdef;
    test.z = 0xfeedface;
    return 0;
 }
 ```
 Now let's compile it and load it into LLDB:
 ```
 $ clang -g struct_layout.c
 $ lldb a.out
 Current executable set to '/home/tom/structs/a.out' (x86_64).
 ```
 Let's put a breakpoint to pause the program right before `main` returns on line 14, and then we can
 read the 24 bytes of memory representing our `test` variable. To make it a little easier to read,
 we'll ask LLDB to organise the bytes into groups of four.
 ```
 (lldb) breakpoint set --file struct_layout.c --line 14
 (lldb) run
 (lldb) memory read --format x --size 4 --count `24 / 4` `&test`
 0x7fffffffea10: 0xcafebabe 0x00000000 0x89abcdef 0x01234567
 0x7fffffffea20: 0xfeedface 0x00000000
 ```
 We can see `0xcafebabe` which is the value we stored in `test.x`, `0x0123456789abcdef` which we
 stored in `test.y` (the two groups of four bytes are displayed in reverse because my machine is
 [little-endian](https://en.wikipedia.org/wiki/Endianness)) and `0xfeedface` which we stored in
 `test.z`. However, there are also some bytes that we didn't tell C to store: there's four bytes
 of zeroes sitting between `test.x` and `test.y`, and another four bytes of zeroes after `test.z`!
 Although we don't know what these extra 8 bytes are doing there yet, at least we now have a more
 accurate idea of how `TestStruct` looks in memory:
 ![An updated version of the previous diagram. Four additional squares labelled with question marks have been added between x and y, and another four squares labelled by question marks have been added to the end.](/article_media/struct_diagram_2.png)
 To understand what those extra bytes are there for, we first need to know about _alignment_.
 ## So, what's this "alignment" stuff?
 Every type has both a size and an alignment. Whereas the size determines how much memory is
 required to represent the type, the alignment determines where that memory is allowed to be.
 The rule for alignment is simple:
 > A value should be stored at a memory address that is a multiple of its alignment.
 Most modern CPUs expect this rule to be followed; if it's broken, a variety of platform-dependent
 Bad Things can happen such as performance penalties, [crashes](https://www.oracle.com/technetwork/server-storage/sun-sparc-enterprise/documentation/140521-ua2011-d096-p-ext-2306580.pdf#page=93), and [changes to the atomicity guarantees of instructions](https://www.amd.com/system/files/TechDocs/24593.pdf#page=252).
 Let's look at some examples. Similar to the `sizeof` operator, we can use the `alignof` operator
 to find the alignment of a particular type:
 ```C
@@alignment.c@@
 #include <stdalign.h>
 #include <stdint.h>
 #include <stdio.h>
 int main() {
    printf("uint8_t:  %zu\n", alignof (uint8_t));
    printf("uint32_t: %zu\n", alignof (uint32_t));
    printf("uint64_t: %zu\n", alignof (uint64_t));
    return 0;
 }
 ```
 ```
 $ clang alignment.c
 $ ./a.out
 uint8_t:  1
 uint32_t: 4
 uint64_t: 8
 ```
 C tells us that `uint8_t` has an alignment of 1. Every memory address is a multiple of 1, so that
 means it's ok for a `uint8_t` to live at any memory address. `uint32_t`, on the other hand, has
 an alignment of 4, so it can only live at memory addresses 0, 4, 8, 12, 16 and so on.
 Now we can explain what the mysterious extra bytes in `TestStruct` are there for! The 4 bytes
 between `x` and `y` are _padding_ to ensure that `y` follows the rule of alignment. `y` is a
 `uint64_t` which has an alignment of 8 (on 64-bit platforms); without the padding it would be at
 offset 4, which is not a multiple of 8, but when we add the 4 bytes of padding it ends up at offset
 8 instead, which is of course a multiple of 8.
 ![A diagram showing the struct with and without the padding bytes added between x and y. Without the padding, x is immediately followed by y; since x has size 4, y is at offset 4 which is not a multiple of 8. With the 4 bytes of padding between x and y, y is at offset 8 which is a multiple of 8.](/article_media/struct_diagram_3.png)
 The 4 bytes of padding after `z` are there to make sure the rule of alignment is followed when we
 have an _array_ of `TestStruct`. Suppose we have an array `struct TestStruct a[2]`; arrays are
 represented by just storing the elements contiguously in memory, so without the
 padding after `z`, `a[1].y` would be at offset 20 + 8 = 28 from the start of the array, which is
 not a multiple of 8 so it would break the rule of alignment. With the padding after `z` included,
 `a[1].y` is at offset 24 + 8 = 32 from the start of the array, which is a multiple of 8.
 ![A diagram showing an array of two TestStructs with and without the padding after z. Both arrays include the padding between x and y, however. Without the 4 bytes of padding, the y field of the second element ends up at offset 28, which is not a multiple of 8. With the padding, it ends up at offset 32, which is a multiple of 8.](/article_media/struct_diagram_4.png)
 Ok, so now we have a sense of how C lays out structs; the fields are put in memory in the same
 order as we wrote them in the struct definition, and extra padding is inserted after some of the
 fields when it is needed to follow the rule of alignment.
 ## Turning our attention to Rust
 Time to find out what Rust does differently to C! Let's start off by defining a Rust equivalent
 of `TestStruct` and checking its size:
 ```Rust
@@struct_size.rs@@
 #![allow(dead_code)]
 use std::mem::size_of;
 struct TestStruct {
    x: u32,
    y: u64,
    z: u32,
 }
 fn main() {
    println!("{} bytes", size_of::<TestStruct>());
 }
 ```
 ```
 $ rustc struct_size.rs
 $ ./struct_size
 16 bytes
 ```
 16 bytes is smaller than the 24 bytes used by C, so Rust can't be laying out `TestStruct` the same
 way. To find out what it's doing, let's use the same trick from before of filling the fields of
 the struct with some dummy values then reading the memory using LLDB:
 ```Rust
@@struct_layout.rs@@
 #![feature(bench_black_box)]
 #![allow(dead_code)]
 use std::hint::black_box;
 struct TestStruct {
    x: u32,
    y: u64,
    z: u32,
 }
 fn main() {
    let test = TestStruct {
        x: 0xcafebabe,
        y: 0x0123456789abcdef,
        z: 0xfeedface,
    };
    // Our test value is not actually used for anything in the program, so the
    // Rust compiler wants to optimise it out. We encourage it not to do this
    // by using the black box function.
    black_box(test);
 }
 ```
 ```
 $ rustc -g struct_layout.rs
 $ lldb struct_layout
 (lldb) breakpoint set --file struct_layout.rs --line 22
 (lldb) run
 (lldb) memory read --format x --size 4 --count `16 / 4` `&test`
 0x7fffffffe3d8: 0x89abcdef 0x01234567 0xcafebabe 0xfeedface
 ```
 Two things jump out: there's no padding bytes, and the value we stored in `y` appears before the
 value we stored in `x`. It looks like Rust has **changed the order of the fields**! This is a cool
 little optimisation; by switching the order of `x` and `y` in memory, all of the fields obey the 
 rule of alignment without the need for any padding. `y` is now at offset 0 which is a multiple of
 8, `x` is at offset 8 which is a multiple of 4, and `z` is at offset 12 which is a multiple of 4.
 ![A diagram comparing the C-style struct layout to the Rust-style one. In the C-style layout, the fields x, y and z appear in order, with 4 bytes of padding between x and y and another 4 bytes of padding after z. In the Rust-style layout, y appears first, then x and lastly z, with no padding anywhere in the struct. The Rust-style is two-thirds the size of the C-style layout.](/article_media/struct_diagram_5.png)
 Getting rid of the padding can have some practical performance benefits; since the overall size
 of the struct is smaller, we can fit more in the CPU's limited cache memory, which is _much_
 faster to access than RAM.
 ## "Let me choose the order, dammit!"
 Rust's way of doing things might improve performance, but, (angrily shaking fist), what right does
 the compiler have to mess with the order of our fields without our permission?! We specifically
 said that `x` comes before `y` when we defined `TestStruct`; wouldn't it be better for Rust to
 just tell us that it's a suboptimal ordering rather than silently moving the fields around? Then,
 we could decide whether or not we want to listen to the compiler's recommendation and manually
 change the order of the fields, which would give us more control.
 Unfortunately, this manual approach has problems; in particular, it doesn't play nice with
 generics. Suppose we have a generic struct like this:
 ```Rust
 struct GenericStruct<T, U> {
    x: T,
    y: U,
    z: u32,
 }
 ```
 There's no single ordering of the struct's fields that's optimal (in terms of the amount of
 padding required) for all possible choices of `T` and `U`. For example, the only two orderings
 that are optimal for both `GenericStruct<u32, u64>` and `GenericStruct<u64, u32>` are `x, z, y`
 and `y, z, x`, but neither of these two orderings are optimal for `GenericStruct<u16, u16>`.
 Whatever ordering we pick, there's going to be some choice of `T` and `U` that uses more padding
 than the minimum possible amount.
 That's why it's useful for Rust to pick the order of the fields for us; Rust can use different
 orderings depending on the values of the generic parameters so that padding is always minimised.
 For `GenericStruct<u32, u64>` it can use the ordering `x, z, y`, and for
 `GenericStruct<u16, u16>` it can use a different ordering `x, y, z`.
 Despite this, there's still going to be situations where we _need_ to manually specify the order
 of the fields in memory, so Rust provides us with the `#[repr(C)]` attribute which lets us use
 C's memory layout for a particular struct.
 ## Back to the original question
 Time to answer the question we started with: _why_ do the two languages use different layouts?
 Since C is often used for very low-level tasks like FFI and interfacing with hardware, it's
 important that data has a consistent and predictable layout in memory; therefore the programmer
 is given complete control over the ordering of fields. If you were writing an IP implementation
 in C by
 [casting the received bytes](https://github.com/torvalds/linux/blob/c1084b6c5620a743f86947caca66d90f24060f56/include/linux/ip.h#L21)
 to a
 [struct representing the header format](https://github.com/torvalds/linux/blob/c1084b6c5620a743f86947caca66d90f24060f56/include/uapi/linux/ip.h#L86),
 and the compiler decided to rearrange the order of that struct's fields, then your program would
 misinterpret the IP headers!
 Since casting between bytes and structs
 [can't be done in safe Rust](https://doc.rust-lang.org/nomicon/transmutes.html), it's more
 acceptable for Rust to take a bit of control away from the programmer and reorder the fields.
 This means that structs will always have the optimal size without the need for the programmer to
 think about alignment, even for generic structs that are impossible to optimise by hand.
--- a/src/test.toml.md
+++ b/src/test.toml.md
@ -1,42 +0,0 @@
 title = 'Testing'
 subtitle = 'In which we test my post renderer works correctly.'
 author = 'Tom Panton'
 tags = []
 published = '2022-06-05T11:00:13.318217Z'
 ---
 # Heading
 ## Subheading
 Here is some **text**!!
 testing that _italics_ work and ~~strikethrough~~!
 | Column A | Column 2 |
 |:--------:|----------|
 | aiwdio w | apwidho  |
 ```ruby
@@main.rb@@
 def fib(n)
  if n == 0 then
    0
  elsif n == 1 then
    1
  else
    fib(n - 2) + fib(n - 1)
  end
 end
 ```
 Here is some code without any specific language:
 ```
 func foo(f) {
  if halts(f) {
    print("Turings hate them for discovering this one simple trick");
  }
 }
 ```
 Here is a [shameless plug](https://smolbotbot.com)!
 ![A drawing of smolbotbot](/article_media/smolbotbot.jpeg)