Bytecode compression in Stak Scheme

@raviqqe

December 14, 2025

Contents

  • Stak Scheme
  • Progress
    • Bytecode compression with LZSS
  • Future work

Stak Scheme

  • A bytecode compiler and virtual machine (VM) for Scheme
    • The compiler is written in Scheme.
    • The VM is written in Rust.
  • It aims to support the R7RS-small standard.
  • Forked from Ribbit Scheme

References

Progress

  • Bytecode compression with LZSS

Bytecode compression

  • Ribbit Scheme compresses its bytecode using LZSS.
    • Only in the binary (256-bit base) bytecode mode
  • Stak Scheme now implements the same.
  • Bytecode compression is based on the LZSS algorithm.

LZSS algorithm

  • LZSS is a dictionary-based compression algorithm with a sliding window.
  • It also encodes repetitions of multiple bytes.

Uncompressed data:

         current
            | Look back by 3 bytes.
            | Repeat 8 bytes.
            v
|x|y|z|a|b|c|a|b|c|a|b|c|a|b|d|e|f|...

Compressed data:

|x|y|z|a|b|c|3,8|d|e|f|...

Encoding format

  • For general byte sequences, we need to distinguish:
    • Literal bytes
    • Offset and length pairs

Options

  • Header bytes
    • Adds a header of packed bit flags for every 8 bytes
  • Tight coupling with the code format of underlying byte sequences
    • This is what Ribbit and Stak Scheme use.
    • Stak Scheme borrows one bit from every byte.

Results

Bytecode sizes (bytes)

Source file Before (v0.11.9) After (v0.11.11) Saving rate
run.scm 53883 41092 0.237
repl.scm 53506 40740 0.239
compile.scm 70501 53402 0.243

Future work

  • Tree-based vector
  • Soft float