kageru’s blog

Complexity killed the cat

Note: this is quite specific to video encoding.
Please don’t read this and then scream “kageru doesn’t want people to write idiomatic code”.
Thank you.

Complexity is a known problem.

Lots of people have written about it at length, and almost everyone seems to agree that complexity is something to avoid when writing software. Still, it seems to appear wherever we go.
What is it that makes it so tempting and so hard to control?

I recently realized that even video encoding (that is, filtering and encoding like many fansubbers do) is no longer safe. The complexity distribution in encoding used to be very simple:
A select few people write plugins in C/C++, some of which use pretty fancy math, to accomplish a specific task. Everyone else then uses a simple scripting language to combine these plugins. Back in the Avisynth days, that might have looked like this:

# read source
FFVideoSource("my_file.mp4")
# resize
Spline36Resize(1280, 720)
# deband
f3kdb(18, 64, 64)

The scripting language allowed for function definitions, conditionals via the ternary operator (but no if/else keywords), and loops were implemented with recursion. The language was pretty limited, and that proved to be quite painful when more complex logic was needed, but people somehow made it work, often creating unreadable operator chains and recursive rabbit holes.

Introducing: a proper scripting language

For Vapoursynth, the modern replacement of Avisynth¹, no custom language was implemented for the scripts. Instead, Python was used. That allowed users to replace the dreaded ternary nesting with much simpler if/elif/else chains and just gave them more freedom overall.

For a while, this resulted in much more readable and straight-forward scripts. But, just like all newfound powers, it would soon be misused.

One early example was the port of TAA. Not only does the only public function it defines accept 25 parameters (one of those a 2-element tuple) and **kwargs and has 17 explicit raise statements, it also defines no fewer than 12 classes, which form an inheritance hierarchy with 5 levels. It also contains this gem:

# Use lambda for lazy evaluation
mask_kernel = {
    0: lambda: lambda a, b, *args, **kwargs: b,
    1: lambda: mask_lthresh(clip, mthr, mlthresh, mask_sobel, mpand, opencl=opencl,
                            opencl_device=opencl_device, **kwargs),
    2: lambda: mask_lthresh(clip, mthr, mlthresh, mask_robert, mpand, **kwargs),
    # goes on like that for 10 more cases, some of which use string keys
}

It’s 700 lines of pure overengineered obfuscation because someone decided to bring enterprise Java into the encoding world.² It does what it’s supposed to do, but I don’t think new encoders will be able to learn much from it or change it to their needs – which is often more important because you, the maintainer, won’t always be around to make the necessary adjustments.

Don’t create the ancient and arcane scriptures of tomorrow. There are too many of those already.

Taking the fun out of functions

TAA showed years ago how do to idiomatic enterprise Java in Vapoursynth. Being a Kotlin developer, I of course had other ideas of what good code should look like.³ Why should I let people pass 20 parameters and **kwargs down my inheritance hierarchy when they could just give me a few Callables instead?

Say you have an AA script that passes kwargs to an internal function, but you also want to accept parameters for a resizer call and give the user the choice between two common resizers. Where before you would write:

def some_aa_filter(clip: VideoNode, width: int, height: int, depth: int, kernel: str, chroma_pos: int, fmtc_chroma_pos: str, use_zimg = True, **kwargs) -> VideoNode:
    clip = aa(**kwargs)
    if use_zimg:
        return zimg_resizer(clip, width, height, kernel, chroma_pos)
    return core.fmtc.resample(clip, width, height, depth, kernel, fmtc_chroma_pos)

you could now do this:

def some_aa_filter(clip: VideoNode, width: int, height: int, resizer: Callable[[VideoNode, int, int], VideoNode], **kwargs) -> VideoNode:
    clip = aa(**kwargs)
    return resizer(clip, width, height)

No need to have all those parameters for the resizer that someone may or may not wish to specify at some point. I could even provide a default value for the resizer argument that just uses a bicubic resize, and if someone wanted to specify their own resizer, they could totally do that. Sounds great until you realize that the caller now has to understand functional arguments and create them either with a lambda or something like functools.partial. Both are nontrivial for the target audience, which mostly consists of regular people (i.e. not programmers) who just want to save their favorite anime from whatever the mastering company did to it this time.

But they can handle this, right? It’s just a little bit of complexity that gives them sooo much more freedom.

The real use case was a little more complicated than just an AA function, and I decided to keep the callable. I felt it was necessary to make the function useful, but I later realized it’s very easy to go too far with this. Being the person who wrote the code, I often don’t realize what parts are difficult to understand. I think most of us have experienced that at some point.

How much is too much?

I was recently confronted with this when someone opened a pull request for vsutil which added decorators for things like @disallow_variable_format.

In one of my comments, I wrote:

“Having decorators at all is already a level of complexity that might scare away potential contributors (most VS users don’t know much about Python), but I think they’re quite self-explanatory in this case, so I’m fine with that.”

to which someone replied:

“vsutil is already beyond this point with using typehints and unit tests in the first place imo.”

While I personally disagree that typehints and unit tests obfuscate code as much as decorators and other Python magic, it still got me thinking. Not because I desperately want contributors with zero programming knowledge, but because I would like to create code that the target audience can actually read and understand. People can’t learn from code that they can’t understand at all, and I believe that reading other people’s code is a good way to improve your own.

I certainly learned a lot (about encoding but also in general) by doing that. Not everyone has the luxury of a personal mentor, but everyone can go on Github, read the code of more experienced encoders, and learn from that.⁴

There are more factors than just the code itself. Some repositories, vsutil included, have slowly turned into proper Python modules. That’s not a bad thing per se because it gives us the ability to publish PyPi packages which also simplifies packaging for the AUR or similar repositories, but there is a point at which it makes the folder structure confusing. I think this is stil within reason,⁵ but we should be careful to keep it that way.

Complexity rarely comes all at once. It’s death by a thousand pull requests that slowly make everything more and more complicated, one reasonable step at a time, and before you know it, you don’t understand your own repository.

Maybe I’m too afraid to reject pull requests because “someone put a lot of work into this”, but thinking more about this made me realize that just blindly accepting them will do a lot more harm over time.

The complex is the enemy of the good

Maybe we should only stray from the basics when absolutely necessary, no matter what your (or my) favorite programming style is. Video filtering is about scripting, not understanding someone’s OOP hierarchies, reimplementing popular FP patterns, or emulating any other programming paradigm.

If someone approaches you becauses they can’t figure out how to call a function you’ve written, it’s probably your fault and not theirs.⁶

What I really want to say is: please just think twice before turning a 100 line file of helper functions into a 2000 loc project that is 50% docstrings, 40% error handling, has 5 @decorators per function, 3 different linters, and reads like the Haskell code of a drunk freshman transpiled to Python.

I promise I’ll try to do the same, even if it means typing three lines instead of just one.⁷

I say replacement, but there are still lots of people who refuse to switch, even in $currentYear.↩︎
There are more examples like this one, TAA has just been bugging me for a long time. It’s by no means the only script that has grown far beyond critical mass.↩︎
Kotlin functions often take functional arguments, which is well-supported by the syntax and also much easier if you have statically checked types. That obviously does not translate to Python, but it doesn’t stop me from trying.↩︎
but please don’t just copy code. If you want to copy something because it does exactly what you need, at least try to understand it beforehand. I still regret merging a kagefunc PR once without properly going through the code, because it left me with 50 lines that I barely understood myself and have been procrastinating to refactor ever since.↩︎
having all of the vsutil code in __init__.py does seem weird to me, but that has already been discussed and should change soon.↩︎
Unless they’re just missing a dependency and can’t read the error message.↩︎
And trust me, I’ll miss
def iterate(base, function, count): return reduce(lambda v, _: function(v), range(count), base),
but a simple for loop is just much more readable to non-FP people.↩︎

Stream, Sequence, Iterator – a story of laziness and sad JVM benchmarking noises

Many programming languages have started to include more functional features in their standard libraries. One of those features is lazy collections, for lack of a better term, which seem to have a different name in each language (we’ll just call them iterators here) and sometimes vastly differing implementations. One thing they all have in common, though, is a lack of trust in their performance.

For almost every language out there that offers lazy iterators, there will be people telling you not to use them for performance reasons, more often than not without any data to back that up.

I was personally interested in this because, being a Java/Kotlin developer, I use Java’s Streams and Kotlin’s Sequences almost every day with relatively little regard for potential performance implications. They are intuitive to write and are easy to reason about, which is usually much more important than the results of a thousand microbenchmarks, so please don’t stop using your favorite language feature because it’s 2.8% slower than the alternative. Most code is already bad enough as is without desperate optimization attempts.

Still, I wanted to know how they compare to imperative code. There are some resources on this for Java 8’s Stream API, but Kotlin’s Sequences seem to just be accepted as more convenient Streams, without much discussion about their performance.¹

What is an iterator?

You can think of an iterator as a pipeline. It lets you write code as a sequence of instructions to be applied to all elements of a container.

Let’s use a simple example to demonstrate this. We want to take all numbers from 1 to 100,000, multiply each of them by 2, and then sum all of them.²

First, the imperative solution:

var sum = 0
for (i in 1..100_000) {
    sum += i * 2
}
return sum

and now the functional version using a Sequence (Kotlin’s name for Streams/iterators):

return (1..100_000).asSequence()
    .map { it * 2 }
    .sum()

An iterator is not a list, and it doesn’t support indexing,³ because it doesn’t actually contain any data. It just knows how to get or compute it for you, but you don’t know how it does that. You don’t even always know when (or if at all) an iterator will end (in this case, we do, because we create the Sequence from the range 1..100_000, meaning it will produce the numbers from 1 to 100,00 before it ends).
You can tell an iterator to produce or emit data if you want to use it (which is often called ‘consuming’ because if you read something from the pipeline, it’s usually gone), or you can add a new step to it and hand the new pipeline to someone else, who can then consume it or add even more steps.

An important aspect to note is: adding an operation to the pipeline does nothing until someone actually starts reading from it, and even then, only the elements that are consumed are computed.
This makes it possible to operate on huge data sets⁴ while keeping memory usage low, because only the currently active element has to be held in memory.

Cold, hard numbers

We’ll use that small example from the last section as our first example: take a range of numbers, double each number, and compute the sum – except this time, we’ll do the numbers from 1 to 1 billion. Since everything we’re doing is lazy, memory usage shouldn’t be an issue.

I will use different implementations to solve them and benchmark all of them. Here are the different approaches I came up with:

a simple for loop in Java
Java’s LongStream
a for each loop with a range in Kotlin
Java’s LongStream called from Kotlin⁵
Java’s Stream wrapped in a Kotlin Sequence
a Kotlin range wrapped in a Sequence
Kotlin’s Sequence with a generator to create the range

The benchmarks were executed on an Intel Xeon E3-1271 v3 with 32 GB of RAM, running Arch Linux with kernel 5.4.20-1-lts, using the (at the time of writing) latest OpenJDK preview build (15-ea+17-717), Kotlin 1.4-M1, and jmh version 1.23.
The bytecode target was set to Java 15 for the Java code and Java 13 for Kotlin (newer versions are currently unsupported).

Source code for the Java tests:

public long stream() {
    return LongStream.range(1, upper)
        .map(l -> l * 2)
        .sum();
}

public long loop() {
    long sum = 0;
    for (long i = 0; i < upper; i++) {
        sum += i * 2;
    }
    return sum;
}

and for Kotlin:

fun stream() =
    LongStream.range(1, upper)
        .map { it * 2 }
        .sum()

fun loop(): Long {
    var sum = 0L
    for (l in 1L until upper) {
        sum += l * 2
    }
    return sum
}

fun streamWrappedInSequence() =
    LongStream.range(1L, upper)
        .asSequence()
        .map { it * 2 }
        .sum()

fun sequence() =
    (1 until upper).asSequence()
        .map { it * 2 }
        .sum()

fun withGenerator() =
    generateSequence(0L, { it + 1L })
        .take(upper.toInt())
        .map { it * 2 }
        .sum()

with const val upper = 1_000_000_000L.⁶

Without wasting any more of your time, here are the results:

Benchmark                       Mode  Cnt      Score      Error  Units
Java.loop                       avgt   25    446.055 ±    0.677  ms/op
Java.stream                     avgt   25    601.424 ±   12.606  ms/op
Kotlin.loop                     avgt   25    446.600 ±    1.164  ms/op
Kotlin.sequence                 avgt   25   2732.604 ±    6.644  ms/op
Kotlin.stream                   avgt   25    593.353 ±    1.408  ms/op
Kotlin.streamWrappedInSequence  avgt   25   3829.209 ±   33.569  ms/op
Kotlin.withGenerator            avgt   25   8374.149 ±  880.647  ms/op

(full JMH output)

Unsurprisingly, using Streams from Java and Kotlin is almost identical in terms of performance. The same is true for imperative loops, meaning Kotlin ranges introduce no overhead compared to incrementing for loops.

More surprisingly, using Sequences is an order of magnitude slower. That was not at all according to my expectations, so I investigated.

As it turns out, Java’s LongStream exists because Stream<Long> is much slower. The JVM has to use Long (uppercase) rather than long when the type is used for generics, which involves an additional boxing step and the allocation for the Long object.⁷
Still, we now know that Streams have about 25% overhead compared to the simple loop for this example, that generating sequences is a comparatively slow process, and that wrapping Streams comes at a considerable cost (compared to a sequence created from a range).

That last point seemed odd, so I attached a profiler to see where the CPU time is lost.

Flamegraph of streamWrappedInSequence()

We can see that the LongStream can produce a PrimitiveIterator.OfLong that is used as a source for the Sequence. The operation of boxing a primitive long into an object Long (that’s the Long.valueOf() step) takes almost as long as advancing the underlying iterator itself.
7.7% of the CPU time is spent in Sequence.hasNext(). The exact breakdown of that looks as follows:

Checking if a Sequence has more elements

The Sequence introduces very little overhead here, as it just delegates to hasNext() of the underlying iterator.
Worth noting is that the iterator calls accept() as part of hasNext(), which will already advance the underlying iterator. The value returned by that will be stored temporarily until nextLong() is called.

public boolean tryAdvance(LongConsumer consumer) {
    final long i = from;
    if (i < upTo) {
        from++;
        consumer.accept(i);
        return true;
    }
    // more stuff down here
}

where consumer.accept() is

public void accept(T t) {
    valueReady = true;
    nextElement = t;
}

Knowing this, I have to wonder why nextLong() takes as long as it does. Looking at the implementation, I don’t understand where all that time is going. hasNext() should always be called before next(), so next() just has to return a precomputed value.

Nevertheless, we can now explain the performance difference with the additional boxing step.
Primitives good; everything else bad.

With that in mind, I wrote a second test that avoids the unboxing issue to compare Streams and Sequences.
The next snippet uses a simple wrapper class that guarantees that we have no primitives to execute a few operations on a Stream/Sequence.
I’ll use this opportunity to also compare parallel and sequential streams.

The steps are simple:

take a long
create a LongWrapper from it
double the contained value (which creates a new LongWrapper)
extract the value
calculate the sum

That may sound overcomplicated, but it’s sadly close to the reality of enterprise code. Wrapper types are everywhere.

inner class LongWrapper(val value: Long) {
    fun double() = LongWrapper(value * 2)
}

fun sequence(): Long =
    (1 until upper).asSequence()
        .map(::LongWrapper)
        .map(LongWrapper::double)
        .map(LongWrapper::value)
        .sum()

fun stream(): Optional<Long> =
    StreamSupport.stream((1 until upper).spliterator(), false)
        .map(::LongWrapper)
        .map(LongWrapper::double)
        .map(LongWrapper::value)
        .reduce(Long::plus)

fun parallelStream(): Optional<Long> =
    StreamSupport.stream((1 until upper).spliterator(), true)
        .map(::LongWrapper)
        .map(LongWrapper::double)
        .map(LongWrapper::value)
        .reduce(Long::plus)


fun loop(): Long {
    var sum = 0L
    for (l in 1 until upper) {
        val wrapper = LongWrapper(l)
        val doubled = wrapper.double()
        sum += doubled.value
    }
    return sum
}

The results here paint a different picture:

NonPrimitive.loop               avgt   25    445.992 ±    0.642  ms/op
NonPrimitive.sequence           avgt   25  27257.399 ±  342.686  ms/op
NonPrimitive.stream             avgt   25  44673.318 ± 1325.832  ms/op
NonPrimitive.parallelStream     avgt   25  33856.919 ±  249.911  ms/op

Full results are in the JMH log from earlier.

The overhead of Java streams is much higher than that of Kotlin Sequences, and even a parallel Stream is slower than using a Sequence, even though Sequences only use a single thread, but both are miles behind the simple for loop. My first assumption was that the compiler optimized away the wrapper type and just added the longs, but looking at the bytecode, the constructor invocation and the double() method calls are still there. It’s hard to know what the JIT does at runtime, but the numbers certainly suggest that the wrapper is simply optimized away.
The profiler report wasn’t helpful either, which further leads me to believe that the JIT just deletes the method and inlines the calculations.

This tells us that not only do Streams/Sequences have a very measurable overhead, but they severely limit the optimizer’s (be it compile-time or JIT) ability to understand the code, which can lead to significant slowdowns in code that can be optimized. Obviously, code that doesn’t rely on the optimizer as much won’t be affected to the same degree.

Conclusion

Overall, I think that Kotlin’s Sequences are a good addition to the language, despite their flaws.
They are significantly slower than Streams when working with primitives because the Java standard library has subtypes for many generic constructs to more efficiently handle primitive types, but in most real-world JVM applications (that being enterprise-level bloatware), primitives are the exception rather than the rule. Still, Kotlin already has some types that optimize for this, such as LongIterator, but without a LongSequence to go with it, the boxing will still happen eventually, and all the performance gains are void.

I hope that we can get a few more types like it in the future, which will be especially useful once Kotlin/Native reaches maturity and starts being used for small/embedded hardware.

Apart from the performance, Sequences are also a lot easier to understand and even extend than Streams. Implementing your own Sequence requires barely more than an implementation of the underlying iterator, as can be seen in CoalescingSequence which I implemented last year to get a feeling for how all of this works.
Streams on the other hand are a lot more complex. They extend Consumer<T>, so a Stream<T> is actually just a void consume(T input) that can be called repeatedly. That makes it a lot harder to grasp where data is coming from and how it is requested, at least to me.

Simplicity is often underrated in software, but I consider it a huge plus for Sequences.

I will continue to use them liberally, unless I find myself in a situation where I need to process a huge number of primitives. And even then, I now know that Java’s Streams are a good alternative, as long as my code isn’t plain stupid and in dire need of the JIT optimizer.
25% might sound like a lot, but it’s more than worth it if it means leaving code that is much easier to understand and modify for the next person.
Unless you’re actually in a very performance-critical part of your application, but if you ever find yourself in that situation, you should switch to a different language.

Writing simple and correct code should always be more important than writing fast code.

On the note of switching languages: I was originally going to include Rust’s iterators here for comparison, but rustc optimized away all of my benchmarks with constant time solutions. That was a fascinating discovery for me, and I might write a separate blog post where I dissect some of the assembly that rustc/LLVM produced, but I feel like I’ll need to learn a few more things about compilers first.

If you’ve ever used them, you’ll know what I mean. Java’s Streams are built in a way that allows for easy parallelism, but brings its own problems and limitations for the usage.↩︎
You could also just compute the sum and take that * 2, but we specifically want that intermediate step for the example.↩︎
Or any other operation like it. No iterator[0], no iterator.get(0) or whatever your favorite language uses. An operation like iterator.last() might exist, but it will consume the entire iterator instead of just accessing the last element.↩︎
Huge or even infinite. Infinite iterarors can be very useful and are used a lot in functional languages, but they’re not today’s topic.↩︎
Mainly to make sure there is no performance difference between the two.↩︎
1 until upper is used in these examples because unlike lower..upper, until is end-inclusive like Java’s LongStream.range().↩︎
The JVM has a few primitive types, such as int, char, or array types. They are different from any other type because they cannot be null. Every regular type on the JVM extends java.lang.Object and is just a reference that is being passed around. The primitives are values, not references, so there’s a lot less overhead involved. Unfortunately, primitives can’t be used as generic types, so a list of longs will always convert the long to Long before adding it.↩︎

Writing less code

Code is bad. It’s confusing, it’s easy to break, and it needs to be maintained or even updated. And the more code you have, the worse it gets.

I sometimes get bored, perhaps more often than I’d like to admit, and one of the things I do to fight that boredom is writing code. I’ve created lots of small pieces of software, most of which are awful, useless, or both. My old blog may was one of them, although the exact classification into those categories shall be left as an exercise to the reader.

I realized the process of writing and uploading content to it was also anything but streamlined and likely contributed to my lack of motivation to write and release anything, so I decided to replace it. At first, I thought about using Jekyll, but remember, I’m bored and looking for opportunities to write code (which admittedly is the opposite of today’s title).

So I decided to rewrite it. Not as another Python Django application, not as a Rails project or whatever people do these days. No, I wanted to know how little I could get away with. I wasn’t golfing for line count, obviously (because that’s just stupid), but I ideally wanted a simple shell script that would do everything I needed and only that. I wanted to write markdown and get static HTML. Simple as that. So here’s how you do that while writing as little code as possible:

$ pandoc input.md -t html > output.html

And that’s the secret to all of this.

DRY? More like DRSE

The DRY principle (“don’t repeat yourself”) is something most programmers are familiar with and are probably trying to adhere to. Writing duplicate code feels inherently wrong to most people. But why not take that one step further? Don’t just not repeat yourself; don’t repeat someone else either. If someone has already written software that converts markdown to html, you don’t have to do it again. That part might have been obvious, but we can apply it to almost everything that is necessary for this little project.¹

The components

So what does my blog need to do? Well, quite simple:

read markdown and convert it to HTML
generate an index of all the blog entries
include some basic CSS/JS in the output
update itself automatically when I publish something
be compatible with the content from my previous blog

That last point might be the worst, but it’s what I wanted/needed.

The old blog had a simple sqlite database that would hold the title, date, and link of all blog posts. It then had a predefined template for site header and footer and would just insert the content between those. Relatively simple, but way more than what was necessary and also relatively slow because the template would be rendered for each request. Oh, and I had to write the content directly in HTML.

Static pages converted from markdown would do the job just as well, so that was my new goal.

Markdown conversion

The first and most obvious step is converting my hand-written markdown files to beatiful HTML for the browser. As mentioned previously, I am going to use markdown for the conversion logic.

All I had to do now was define a folder structure which in my case has a src folder with all the .md files and a content folder with the resulting .html documents. The rest is a simple loop and some shell built-ins.

convert_file() {
    path="$9"
    outpath="content/$(basename "$path" .md).html"
    pandoc "$path" -t html > "$outpath"
}

ls -ltu src/*.md | tail -n+1 | while read f; do convert_file $f; done

I used ls -l to have each file on a separate line which makes the parsing much easier. ls -tu will sort the files by modification time so the newest entries are at the top. tail -n+1 removes the first line which is total xxx because of -l.

Step 1 done.

Index generation

This problem was partially solved in the last step because A already had a list of all output paths sorted by edit date. All that is left now is to generate some static html from that. I thus make some changes:

output() {
    echo "$1" >> index.html
}

create_entry() {
    # the code from step 1
    path="$9"
    outpath="content/$(basename "$path" .md).html"
    pandoc "$path" -t html > "$outpath"
    # and some html output
    output "<a href=\"$outpath\">$outpath</a>"
}

rm -f index.html # -f so it doesn’t fail if index.html doesn’t exist yet
ls -ltu src/*.md | tail -n+1 | while read f; do create_entry $f; done

That will give us a list of links to the blog entries with the filenames as titles, but we can do better than that. First, by extracting titles from the files. This is based on the assumption that I begin every blog post with an h1 heading, or a single # Heading in markdown.

title="$(rg 'h1' "$outpath" | head -n1 | rg -o '(?<=>).*(?=<)' --pcre2)"

Match the first line that contains an h1 and return whatever is inside > and < – the title.

By then making the src directory part of a git repository (which I wanted to do anyway because it’s a good way to track changes), we can get the creation time of each file.

created=$(git log --follow --format=%as "$path" | tail -1)

--format=%as returns the creation date of a file as YYYY-MM-DD. man git-log is your friend here.

We can combine this with some more static HTML to turn our index into a table with all the titles, dates, and links:

html_entry() {
    output '<tr>'
    path="$1"
    time="$2"
    title="$3"
    output "<td class=\"first\"><a href=\"$path\">$title</a></td>"
    output "<td class=\"second\">$time</td></tr>"
}

create_entry {
    # mentally insert previous code here
    # ...
    html_entry "$outpath" "created on $created" "$title"
}

rm -f index.html
output '<h1>Blog index</h1>'
output '<table>'
ls -ltu src/*.md | tail -n+1 | while read f; do create_entry $f; done
output '</table>'

It looks quite plain, but we have a fully functional index for our blog. Onto step 3.

Styling

For this, we can use a lesser known nginx feature that allows us to prepend something to the body of each page and append something after. I changed the config and created a simple header as a static html file that would include the necessary resources.

location / {
    add_before_body /before_body.html;
    add_after_body /after_body.html;
    index index.html;
}

That’s it. Next step.

Automatic updates

At first, I had the entire script run every few minutes via cron, but markup conversion isn’t that cheap, so I only wanted to regenerate the files if something actually changed.

Since we’re already using git for the sources, we have everything we need. I can simply check if there are changes upstream.

has_updates() {
    git fetch &> /dev/null
    diff="$(git diff master origin/master)"
    if [ "$diff" ]; then
        return 0
    else
        return 1
    fi
}

if has_updates; then
    # this merges origin/master into local master
    git pull
    # run the previous code
    ...
fi

I’m not super familiar with shell scripting, so if there’s a better way to do that boolean return in POSIX sh, feel free to tell me.

And now, the dreaded last step.

Legacy garbage

That last part was actually quite simple. I added a legacy/index.html with a hand-written list of all previous blog entries, and then made it appear last on the generated index with entry "legacy" "before 2020" "Older posts". Since I use nginx to add the header and footer to every page, the legacy index and legacy pages work almost out of the box. After some slight adjustments to the old content pages, everything looks as intended.

Summary

I now have a working static page generator for my blog in under 50 lines of shell code. It does what I need and only that. The code is (relatively) simple and fully POSIX sh compliant. It’s not built to be super general or reusable, but that wasn’t the goal here.

I am aware that I built this with relatively little regard to dependencies. Pandoc is huge, and the ripgrep call could be replaced with standard grep. I know that, but for now, I don’t care.

If you want to take a look at the final result, the code is on my gitea.

I guess the only question now is: will this new blog give me the motivation to write more? Only time will tell.
I do have a few more ideas, and none of them are encoding-related. Sorry.

Edit: It was brought to my attention that this is very similar to Luke Smith’s lb. I think the comparison is fair, but we seem to have different priorities. He writes HTML; I write markdown. He uses rsync; I want everything in git and also use that to sync. He didn’t want dependencies; I… use pandoc. :^)

Still very interesting to see his approach to this, so thanks for pointing that out.

Now I’m considering adding RSS at some point. We’ll see.

within reason, otherwise we wouldn’t write any code at all or do something ridiculous like depend on an external library to left-pad a string ↩︎