Saturday, November 16

Honey, I shrunk {fmt}: bringing binary size to 14k and ditching the C++ runtime

The {fmt} formatting library is known for its small binary footprint,
often producing code that is several times smaller per function call compared
to alternatives like IOStreams, Boost Format, or, somewhat ironically,
tinyformat. This is mainly achieved through careful application of type erasure
on various levels, which effectively minimizes template bloat.

Formatting arguments are passed via type-erased format_args:

auto vformat(string_view fmt, format_args args) -> std::string;

template
auto format(format_string fmt, T&&… args) -> std::string {
return vformat(fmt, fmt::make_format_args(args…));
}

As you can see, format delegates all its work to vformat, which is not a
template.

Output iterators and other output types are also type-erased through a specially
designed buffer API.

This approach confines template usage to a minimal top-level layer, leading to
both a smaller binary size and faster build times.

For example, the following code:

// test.cc
#include

int main() {
fmt::print(“The answer is {}.”, 42);
}

compiles to just

.LC0:
.string “The answer is {}.”
main:
sub rsp, 24
mov eax, 1
mov edi, OFFSET FLAT:.LC0
mov esi, 17
mov rcx, rsp
mov rdx, rax
mov DWORD PTR [rsp], 42
call fmt::v11::vprint(fmt::v11::basic_string_view, fmt::v11::basic_format_args)
xor eax, eax
add rsp, 24
ret

godbolt

It is much smaller than the equivalent IOStreams code and comparable to that
of printf:

.LC0:
.string “The answer is %d.”
main:
sub rsp, 8
mov esi, 42
mov edi, OFFSET FLAT:.LC0
xor eax, eax
call printf
xor eax, eax
add rsp, 8
ret

godbolt

Unlike printf, {fmt} offers full runtime type safety. Errors in format strings
can be caught at compile time, and even when the format string is determined at
runtime, errors are managed through exceptions, preventing undefined behavior,
memory corruption, and potential crashes. Additionally, {fmt} calls are
generally more efficient, particularly when using positional arguments, which C
varargs are not well-suited for.

Back in 2020, I dedicated some time to optimizing the library size,
successfully reducing it to under 100kB (just ~57kB with -Os -flto).
A lot has changed since then. Most notably, {fmt} now uses the exceptional
Dragonbox algorithm for floating-point formatting, kindly
contributed by its author, Junekey Jeon. Let’s explore how these changes have
impacted the binary size and see if further reductions are possible.

But why, some say, the binary size? Why choose this as our goal?

There has been considerable interest in using {fmt} on memory-constrained
devices, see e.g. #758 and #1226 for just two examples from
the distant past. A particularly intriguing use case is retro computing, with
people using {fmt} on systems like Amiga (#4054).

We’ll apply the same methodology as in previous work, examining the
executable size of a program that uses {fmt}, as this is most relevant to end
users. All tests will be conducted on an aarch64 Ubuntu 22.04 system with GCC
11.4.0.

First, let’s establish the baseline: what is the binary size for the latest
version of {fmt} (11.0.2)?

 » …
Read More