kagerukageru’s blogkageru’s blogComplexity killed the cat
https://blog.kageru.moe/content/vscomplexity.html
<p>Note: this is quite specific to video encoding.<br />
Please don’t read this and then scream “kageru doesn’t want people to write idiomatic code”.<br />
Thank you.</p>
<hr />
<p>Complexity is a known problem.</p>
<p>Lots of people have written about it at length, and almost everyone seems to agree that complexity is something to avoid when writing software. Still, it seems to appear wherever we go.<br />
What is it that makes it so tempting and so hard to control?</p>
<p>I recently realized that even video encoding (that is, filtering and encoding like many fansubbers do) is no longer safe. The complexity distribution in encoding used to be very simple:<br />
A select few people write plugins in C/C++, some of which use pretty fancy math, to accomplish a specific task. Everyone else then uses a simple scripting language to combine these plugins. Back in the Avisynth days, that might have looked like this:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode sh"><code class="sourceCode bash"><span id="cb1-1"><a href="#cb1-1"></a><span class="co"># read source</span></span>
<span id="cb1-2"><a href="#cb1-2"></a><span class="ex">FFVideoSource</span>(<span class="st">"my_file.mp4"</span>)</span>
<span id="cb1-3"><a href="#cb1-3"></a><span class="co"># resize</span></span>
<span id="cb1-4"><a href="#cb1-4"></a><span class="ex">Spline36Resize</span>(1280, 720)</span>
<span id="cb1-5"><a href="#cb1-5"></a><span class="co"># deband</span></span>
<span id="cb1-6"><a href="#cb1-6"></a><span class="ex">f3kdb</span>(18, 64, 64)</span></code></pre></div>
<p>The scripting language allowed for function definitions, conditionals via the <a href="https://en.wikipedia.org/wiki/%3F:">ternary operator</a> (but no if/else keywords), and loops were implemented with recursion. The language was pretty limited, and that proved to be quite painful when more complex logic was needed, but people somehow made it work, often creating unreadable operator chains and recursive rabbit holes.</p>
<h2 id="introducing-a-proper-scripting-language">Introducing: a proper scripting language</h2>
<p>For Vapoursynth, the modern replacement of Avisynth<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>, no custom language was implemented for the scripts. Instead, Python was used. That allowed users to replace the dreaded ternary nesting with much simpler if/elif/else chains and just gave them more freedom overall.</p>
<p>For a while, this resulted in much more readable and straight-forward scripts. But, just like all newfound powers, it would soon be misused.</p>
<p>One early example was the port of TAA. Not only does the only public function it defines accept 25 parameters (one of those a 2-element tuple) <em>and</em> <code>**kwargs</code> and has 17 explicit <code>raise</code> statements, it also defines no fewer than 12 classes, which form an inheritance hierarchy with 5 levels. It also contains this gem:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode py"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1"></a><span class="co"># Use lambda for lazy evaluation</span></span>
<span id="cb2-2"><a href="#cb2-2"></a>mask_kernel <span class="op">=</span> {</span>
<span id="cb2-3"><a href="#cb2-3"></a> <span class="dv">0</span>: <span class="kw">lambda</span>: <span class="kw">lambda</span> a, b, <span class="op">*</span>args, <span class="op">**</span>kwargs: b,</span>
<span id="cb2-4"><a href="#cb2-4"></a> <span class="dv">1</span>: <span class="kw">lambda</span>: mask_lthresh(clip, mthr, mlthresh, mask_sobel, mpand, opencl<span class="op">=</span>opencl,</span>
<span id="cb2-5"><a href="#cb2-5"></a> opencl_device<span class="op">=</span>opencl_device, <span class="op">**</span>kwargs),</span>
<span id="cb2-6"><a href="#cb2-6"></a> <span class="dv">2</span>: <span class="kw">lambda</span>: mask_lthresh(clip, mthr, mlthresh, mask_robert, mpand, <span class="op">**</span>kwargs),</span>
<span id="cb2-7"><a href="#cb2-7"></a> <span class="co"># goes on like that for 10 more cases, some of which use string keys</span></span>
<span id="cb2-8"><a href="#cb2-8"></a>}</span></code></pre></div>
<p>It’s 700 lines of pure overengineered obfuscation because someone decided to bring enterprise Java into the encoding world.<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a> It does what it’s supposed to do, but I don’t think new encoders will be able to learn much from it or change it to their needs – which is often more important because you, the maintainer, won’t always be around to make the necessary adjustments.</p>
<p>Don’t create the ancient and arcane scriptures of tomorrow. There are too many of those already.</p>
<h2 id="taking-the-fun-out-of-functions">Taking the fun out of functions</h2>
<p>TAA showed years ago how do to idiomatic enterprise Java in Vapoursynth. Being a Kotlin developer, I of course had other ideas of what good code should look like.<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a> Why should I let people pass 20 parameters and <code>**kwargs</code> down my inheritance hierarchy when they could just give me a few <code>Callables</code> instead?</p>
<p>Say you have an AA script that passes <code>kwargs</code> to an internal function, but you also want to accept parameters for a resizer call and give the user the choice between two common resizers. Where before you would write:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode py"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1"></a><span class="kw">def</span> some_aa_filter(clip: VideoNode, width: <span class="bu">int</span>, height: <span class="bu">int</span>, depth: <span class="bu">int</span>, kernel: <span class="bu">str</span>, chroma_pos: <span class="bu">int</span>, fmtc_chroma_pos: <span class="bu">str</span>, use_zimg <span class="op">=</span> <span class="va">True</span>, <span class="op">**</span>kwargs) <span class="op">-></span> VideoNode:</span>
<span id="cb3-2"><a href="#cb3-2"></a> clip <span class="op">=</span> aa(<span class="op">**</span>kwargs)</span>
<span id="cb3-3"><a href="#cb3-3"></a> <span class="cf">if</span> use_zimg:</span>
<span id="cb3-4"><a href="#cb3-4"></a> <span class="cf">return</span> zimg_resizer(clip, width, height, kernel, chroma_pos)</span>
<span id="cb3-5"><a href="#cb3-5"></a> <span class="cf">return</span> core.fmtc.resample(clip, width, height, depth, kernel, fmtc_chroma_pos)</span></code></pre></div>
<p>you could now do this:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode py"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1"></a><span class="kw">def</span> some_aa_filter(clip: VideoNode, width: <span class="bu">int</span>, height: <span class="bu">int</span>, resizer: Callable[[VideoNode, <span class="bu">int</span>, <span class="bu">int</span>], VideoNode], <span class="op">**</span>kwargs) <span class="op">-></span> VideoNode:</span>
<span id="cb4-2"><a href="#cb4-2"></a> clip <span class="op">=</span> aa(<span class="op">**</span>kwargs)</span>
<span id="cb4-3"><a href="#cb4-3"></a> <span class="cf">return</span> resizer(clip, width, height)</span></code></pre></div>
<p>No need to have all those parameters for the resizer that someone may or may not wish to specify at some point. I could even provide a default value for the <code>resizer</code> argument that just uses a bicubic resize, and if someone wanted to specify their own resizer, they could totally do that. Sounds great until you realize that the caller now has to understand functional arguments and create them either with a <code>lambda</code> or something like <code>functools.partial</code>. Both are nontrivial for the target audience, which mostly consists of regular people (i.e. not programmers) who just want to save their favorite anime from whatever the mastering company did to it this time.</p>
<p>But they can handle this, right? It’s just a little bit of complexity that gives them <em>sooo</em> much more freedom.</p>
<p>The real use case was a little more complicated than just an AA function, and I decided to keep the callable. I felt it was necessary to make the function useful, but I later realized it’s very easy to go too far with this. Being the person who wrote the code, I often don’t realize what parts are difficult to understand. I think most of us have experienced that at some point.</p>
<h2 id="how-much-is-too-much">How much is too much?</h2>
<p>I was recently confronted with this when someone opened a <a href="https://github.com/Irrational-Encoding-Wizardry/vsutil/pull/37">pull request for vsutil</a> which added decorators for things like <code>@disallow_variable_format</code>.</p>
<p>In one of my comments, I wrote:</p>
<blockquote>
<p>“Having decorators at all is already a level of complexity that might scare away potential contributors (most VS users don’t know much about Python), but I think they’re quite self-explanatory in this case, so I’m fine with that.”</p>
</blockquote>
<p>to which someone replied:</p>
<blockquote>
<p>“vsutil is already beyond this point with using typehints and unit tests in the first place imo.”</p>
</blockquote>
<p>While I personally disagree that typehints and unit tests obfuscate code as much as decorators and other Python magic, it still got me thinking. Not because I desperately want contributors with zero programming knowledge, but because I would like to create code that the target audience can actually read and understand. People can’t learn from code that they can’t understand at all, and I believe that reading other people’s code is a good way to improve your own.</p>
<p>I certainly learned a lot (about encoding but also in general) by doing that. Not everyone has the luxury of a personal mentor, but everyone can go on Github, read the code of more experienced encoders, and learn from that.<a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a></p>
<p>There are more factors than just the code itself. Some repositories, vsutil included, have slowly turned into proper Python modules. That’s not a bad thing per se because it gives us the ability to publish PyPi packages which also simplifies packaging for the AUR or similar repositories, but there is a point at which it makes the folder structure confusing. I think this is stil within reason,<a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a> but we should be careful to keep it that way.</p>
<p>Complexity rarely comes all at once. It’s death by a thousand pull requests that slowly make everything more and more complicated, one reasonable step at a time, and before you know it, you don’t understand your own repository.</p>
<p>Maybe I’m too afraid to reject pull requests because “someone put a lot of work into this”, but thinking more about this made me realize that just blindly accepting them will do a lot more harm over time.</p>
<h2 id="the-complex-is-the-enemy-of-the-good">The complex is the enemy of the good</h2>
<p>Maybe we should only stray from the basics when absolutely necessary, no matter what your (or my) favorite programming style is. Video filtering is about scripting, not understanding someone’s OOP hierarchies, reimplementing popular FP patterns, or emulating any other programming paradigm.</p>
<p>If someone approaches you becauses they can’t figure out how to call a function you’ve written, it’s probably your fault and not theirs.<a href="#fn6" class="footnote-ref" id="fnref6" role="doc-noteref"><sup>6</sup></a></p>
<p>What I really want to say is: please just think twice before turning a 100 line file of helper functions into a 2000 loc project that is 50% docstrings, 40% error handling, has 5 <code>@decorators</code> per function, 3 different linters, and reads like the Haskell code of a drunk freshman transpiled to Python.</p>
<p>I promise I’ll try to do the same, even if it means typing three lines instead of just one.<a href="#fn7" class="footnote-ref" id="fnref7" role="doc-noteref"><sup>7</sup></a></p>
<section class="footnotes" role="doc-endnotes">
<hr />
<ol>
<li id="fn1" role="doc-endnote"><p>I say replacement, but there are still lots of people who refuse to switch, even in $currentYear.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2" role="doc-endnote"><p>There are more examples like this one, TAA has just been bugging me for a long time. It’s by no means the only script that has grown far beyond critical mass.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3" role="doc-endnote"><p>Kotlin functions often take functional arguments, which is well-supported by the syntax and also much easier if you have statically checked types. That obviously does not translate to Python, but it doesn’t stop me from trying.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn4" role="doc-endnote"><p>but please don’t just copy code. If you want to copy something because it does exactly what you need, at least try to understand it beforehand. I still regret merging a kagefunc PR once without properly going through the code, because it left me with 50 lines that I barely understood myself and have been procrastinating to refactor ever since.<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn5" role="doc-endnote"><p>having all of the vsutil code in <code>__init__.py</code> does seem weird to me, but that has already been discussed and should change soon.<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn6" role="doc-endnote"><p>Unless they’re just missing a dependency and can’t read the error message.<a href="#fnref6" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn7" role="doc-endnote"><p>And trust me, I’ll miss<br />
<code>def iterate(base, function, count): return reduce(lambda v, _: function(v), range(count), base)</code>,<br />
but a simple <code>for</code> loop is just much more readable to non-FP people.<a href="#fnref7" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>
Stream, Sequence, Iterator – a story of laziness and sad JVM benchmarking noises
https://blog.kageru.moe/content/iterators.html
<p>Many programming languages have started to include more functional features in their standard libraries. One of those features is lazy collections, for lack of a better term, which seem to have a different name in each language (we’ll just call them iterators here) and sometimes vastly differing implementations. One thing they all have in common, though, is a lack of trust in their performance.</p>
<p>For almost every language out there that offers lazy iterators, there will be people telling you not to use them for performance reasons, more often than not without any data to back that up.</p>
<p>I was personally interested in this because, being a Java/Kotlin developer, I use Java’s Streams and Kotlin’s Sequences almost every day with relatively little regard for potential performance implications. They are intuitive to write and are easy to reason about, which is usually much more important than the results of a thousand microbenchmarks, so please don’t stop using your favorite language feature because it’s 2.8% slower than the alternative. Most code is already bad enough as is without desperate optimization attempts.</p>
<p>Still, I wanted to know how they compare to imperative code. There are some resources on this for Java 8’s Stream API, but Kotlin’s Sequences seem to just be accepted as more convenient Streams, without much discussion about their performance.<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a></p>
<h2 id="what-is-an-iterator">What <em>is</em> an iterator?</h2>
<p>You can think of an iterator as a pipeline. It lets you write code as a sequence of instructions to be applied to all elements of a container.</p>
<p>Let’s use a simple example to demonstrate this. We want to take all numbers from 1 to 100,000, multiply each of them by 2, and then sum all of them.<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a></p>
<p>First, the imperative solution:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode kotlin"><code class="sourceCode kotlin"><span id="cb1-1"><a href="#cb1-1"></a><span class="kw">var</span> <span class="va">sum</span> = <span class="dv">0</span></span>
<span id="cb1-2"><a href="#cb1-2"></a><span class="cf">for</span> (i <span class="kw">in</span> <span class="dv">1</span>..<span class="dv">100</span>_<span class="dv">000</span>) {</span>
<span id="cb1-3"><a href="#cb1-3"></a> sum += i * <span class="dv">2</span></span>
<span id="cb1-4"><a href="#cb1-4"></a>}</span>
<span id="cb1-5"><a href="#cb1-5"></a><span class="kw">return</span> sum</span></code></pre></div>
<p>and now the functional version using a Sequence (Kotlin’s name for Streams/iterators):</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode kotlin"><code class="sourceCode kotlin"><span id="cb2-1"><a href="#cb2-1"></a><span class="kw">return</span> (<span class="dv">1</span>..<span class="dv">100</span>_<span class="dv">000</span>).asSequence()</span>
<span id="cb2-2"><a href="#cb2-2"></a> .map { it * <span class="dv">2</span> }</span>
<span id="cb2-3"><a href="#cb2-3"></a> .sum()</span></code></pre></div>
<p>An iterator is not a list, and it doesn’t support indexing,<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a> because it doesn’t actually contain any data. It just knows how to get or compute it for you, but you don’t know how it does that. You don’t even always know when (or if at all) an iterator will end (in this case, we do, because we create the Sequence from the range <code>1..100_000</code>, meaning it will produce the numbers from 1 to 100,00 before it ends).<br />
You can tell an iterator to produce or emit data if you want to use it (which is often called ‘consuming’ because if you read something from the pipeline, it’s usually gone), or you can add a new step to it and hand the new pipeline to someone else, who can then consume it or add even more steps.</p>
<p>An important aspect to note is: adding an operation to the pipeline does nothing until someone actually starts reading from it, and even then, only the elements that are consumed are computed.<br />
This makes it possible to operate on huge data sets<a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a> while keeping memory usage low, because only the currently active element has to be held in memory.</p>
<h2 id="cold-hard-numbers">Cold, hard numbers</h2>
<p>We’ll use that small example from the last section as our first example: take a range of numbers, double each number, and compute the sum – except this time, we’ll do the numbers from 1 to 1 billion. Since everything we’re doing is lazy, memory usage shouldn’t be an issue.</p>
<p>I will use different implementations to solve them and benchmark all of them. Here are the different approaches I came up with:</p>
<ul>
<li>a simple for loop in Java</li>
<li>Java’s LongStream</li>
<li>a for each loop with a range in Kotlin</li>
<li>Java’s LongStream called from Kotlin<a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a></li>
<li>Java’s Stream wrapped in a Kotlin Sequence</li>
<li>a Kotlin range wrapped in a Sequence</li>
<li>Kotlin’s Sequence with a generator to create the range</li>
</ul>
<p>The benchmarks were executed on an Intel Xeon E3-1271 v3 with 32 GB of RAM, running Arch Linux with kernel 5.4.20-1-lts, using the (at the time of writing) latest OpenJDK preview build (<code>15-ea+17-717</code>), Kotlin 1.4-M1, and <a href="https://openjdk.java.net/projects/code-tools/jmh/">jmh</a> version 1.23.<br />
The bytecode target was set to Java 15 for the Java code and Java 13 for Kotlin (newer versions are currently unsupported).</p>
<p>Source code for the Java tests:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb3-1"><a href="#cb3-1"></a><span class="kw">public</span> <span class="dt">long</span> <span class="fu">stream</span>() {</span>
<span id="cb3-2"><a href="#cb3-2"></a> <span class="kw">return</span> LongStream.<span class="fu">range</span>(<span class="dv">1</span>, upper)</span>
<span id="cb3-3"><a href="#cb3-3"></a> .<span class="fu">map</span>(l -> l * <span class="dv">2</span>)</span>
<span id="cb3-4"><a href="#cb3-4"></a> .<span class="fu">sum</span>();</span>
<span id="cb3-5"><a href="#cb3-5"></a>}</span>
<span id="cb3-6"><a href="#cb3-6"></a></span>
<span id="cb3-7"><a href="#cb3-7"></a><span class="kw">public</span> <span class="dt">long</span> <span class="fu">loop</span>() {</span>
<span id="cb3-8"><a href="#cb3-8"></a> <span class="dt">long</span> sum = <span class="dv">0</span>;</span>
<span id="cb3-9"><a href="#cb3-9"></a> <span class="kw">for</span> (<span class="dt">long</span> i = <span class="dv">0</span>; i < upper; i++) {</span>
<span id="cb3-10"><a href="#cb3-10"></a> sum += i * <span class="dv">2</span>;</span>
<span id="cb3-11"><a href="#cb3-11"></a> }</span>
<span id="cb3-12"><a href="#cb3-12"></a> <span class="kw">return</span> sum;</span>
<span id="cb3-13"><a href="#cb3-13"></a>}</span></code></pre></div>
<p>and for Kotlin:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode kotlin"><code class="sourceCode kotlin"><span id="cb4-1"><a href="#cb4-1"></a><span class="kw">fun</span> <span class="fu">stream</span>() =</span>
<span id="cb4-2"><a href="#cb4-2"></a> LongStream.range(<span class="dv">1</span>, upper)</span>
<span id="cb4-3"><a href="#cb4-3"></a> .map { it * <span class="dv">2</span> }</span>
<span id="cb4-4"><a href="#cb4-4"></a> .sum()</span>
<span id="cb4-5"><a href="#cb4-5"></a></span>
<span id="cb4-6"><a href="#cb4-6"></a><span class="kw">fun</span> <span class="fu">loop</span>(): <span class="dt">Long</span> {</span>
<span id="cb4-7"><a href="#cb4-7"></a> <span class="kw">var</span> <span class="va">sum</span> = 0L</span>
<span id="cb4-8"><a href="#cb4-8"></a> <span class="cf">for</span> (l <span class="kw">in</span> 1L until upper) {</span>
<span id="cb4-9"><a href="#cb4-9"></a> sum += l * <span class="dv">2</span></span>
<span id="cb4-10"><a href="#cb4-10"></a> }</span>
<span id="cb4-11"><a href="#cb4-11"></a> <span class="kw">return</span> sum</span>
<span id="cb4-12"><a href="#cb4-12"></a>}</span>
<span id="cb4-13"><a href="#cb4-13"></a></span>
<span id="cb4-14"><a href="#cb4-14"></a><span class="kw">fun</span> <span class="fu">streamWrappedInSequence</span>() =</span>
<span id="cb4-15"><a href="#cb4-15"></a> LongStream.range(1L, upper)</span>
<span id="cb4-16"><a href="#cb4-16"></a> .asSequence()</span>
<span id="cb4-17"><a href="#cb4-17"></a> .map { it * <span class="dv">2</span> }</span>
<span id="cb4-18"><a href="#cb4-18"></a> .sum()</span>
<span id="cb4-19"><a href="#cb4-19"></a></span>
<span id="cb4-20"><a href="#cb4-20"></a><span class="kw">fun</span> <span class="fu">sequence</span>() =</span>
<span id="cb4-21"><a href="#cb4-21"></a> (<span class="dv">1</span> until upper).asSequence()</span>
<span id="cb4-22"><a href="#cb4-22"></a> .map { it * <span class="dv">2</span> }</span>
<span id="cb4-23"><a href="#cb4-23"></a> .sum()</span>
<span id="cb4-24"><a href="#cb4-24"></a></span>
<span id="cb4-25"><a href="#cb4-25"></a><span class="kw">fun</span> <span class="fu">withGenerator</span>() =</span>
<span id="cb4-26"><a href="#cb4-26"></a> generateSequence(0L, { it + 1L })</span>
<span id="cb4-27"><a href="#cb4-27"></a> .take(upper.toInt())</span>
<span id="cb4-28"><a href="#cb4-28"></a> .map { it * <span class="dv">2</span> }</span>
<span id="cb4-29"><a href="#cb4-29"></a> .sum()</span></code></pre></div>
<p>with <code>const val upper = 1_000_000_000L</code>.<a href="#fn6" class="footnote-ref" id="fnref6" role="doc-noteref"><sup>6</sup></a></p>
<p>Without wasting any more of your time, here are the results:</p>
<pre class="plaintext"><code>Benchmark Mode Cnt Score Error Units
Java.loop avgt 25 446.055 ± 0.677 ms/op
Java.stream avgt 25 601.424 ± 12.606 ms/op
Kotlin.loop avgt 25 446.600 ± 1.164 ms/op
Kotlin.sequence avgt 25 2732.604 ± 6.644 ms/op
Kotlin.stream avgt 25 593.353 ± 1.408 ms/op
Kotlin.streamWrappedInSequence avgt 25 3829.209 ± 33.569 ms/op
Kotlin.withGenerator avgt 25 8374.149 ± 880.647 ms/op</code></pre>
<p>(<a href="https://ruru.moe/pSK13p8">full JMH output</a>)</p>
<p>Unsurprisingly, using Streams from Java and Kotlin is almost identical in terms of performance. The same is true for imperative loops, meaning Kotlin ranges introduce no overhead compared to incrementing for loops.</p>
<p>More surprisingly, using Sequences is an order of magnitude slower. That was not at all according to my expectations, so I investigated.</p>
<p>As it turns out, Java’s <code>LongStream</code> exists because <code>Stream<Long></code> is <em>much</em> slower. The JVM has to use <code>Long</code> (uppercase) rather than <code>long</code> when the type is used for generics, which involves an additional boxing step and the allocation for the <code>Long</code> object.<a href="#fn7" class="footnote-ref" id="fnref7" role="doc-noteref"><sup>7</sup></a><br />
Still, we now know that Streams have about 25% overhead compared to the simple loop for this example, that generating sequences is a comparatively slow process, and that wrapping Streams comes at a considerable cost (compared to a sequence created from a range).</p>
<p>That last point seemed odd, so I attached a profiler to see where the CPU time is lost.</p>
<figure>
<img src="https://i.kageru.moe/knT2Eg.png" alt="" /><figcaption>Flamegraph of <code>streamWrappedInSequence()</code></figcaption>
</figure>
<p>We can see that the <code>LongStream</code> can produce a <code>PrimitiveIterator.OfLong</code> that is used as a source for the Sequence. The operation of boxing a primitive <code>long</code> into an object <code>Long</code> (that’s the <code>Long.valueOf()</code> step) takes almost as long as advancing the underlying iterator itself.<br />
7.7% of the CPU time is spent in <code>Sequence.hasNext()</code>. The exact breakdown of that looks as follows:</p>
<figure>
<img src="https://i.kageru.moe/k4NHhR.png" alt="" /><figcaption>Checking if a Sequence has more elements</figcaption>
</figure>
<p>The Sequence introduces very little overhead here, as it just delegates to <code>hasNext()</code> of the underlying iterator.<br />
Worth noting is that the iterator calls <code>accept()</code> as part of <code>hasNext()</code>, which will already advance the underlying iterator. The value returned by that will be stored temporarily until <code>nextLong()</code> is called.</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb6-1"><a href="#cb6-1"></a><span class="kw">public</span> <span class="dt">boolean</span> <span class="fu">tryAdvance</span>(LongConsumer consumer) {</span>
<span id="cb6-2"><a href="#cb6-2"></a> <span class="dt">final</span> <span class="dt">long</span> i = from;</span>
<span id="cb6-3"><a href="#cb6-3"></a> <span class="kw">if</span> (i < upTo) {</span>
<span id="cb6-4"><a href="#cb6-4"></a> from++;</span>
<span id="cb6-5"><a href="#cb6-5"></a> consumer.<span class="fu">accept</span>(i);</span>
<span id="cb6-6"><a href="#cb6-6"></a> <span class="kw">return</span> <span class="kw">true</span>;</span>
<span id="cb6-7"><a href="#cb6-7"></a> }</span>
<span id="cb6-8"><a href="#cb6-8"></a> <span class="co">// more stuff down here</span></span>
<span id="cb6-9"><a href="#cb6-9"></a>}</span></code></pre></div>
<p>where <code>consumer.accept()</code> is</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb7-1"><a href="#cb7-1"></a><span class="kw">public</span> <span class="dt">void</span> <span class="fu">accept</span>(T t) {</span>
<span id="cb7-2"><a href="#cb7-2"></a> valueReady = <span class="kw">true</span>;</span>
<span id="cb7-3"><a href="#cb7-3"></a> nextElement = t;</span>
<span id="cb7-4"><a href="#cb7-4"></a>}</span></code></pre></div>
<p>Knowing this, I have to wonder why <code>nextLong()</code> takes as long as it does. Looking at <a href="https://github.com/openjdk/jdk/blob/6bab0f539fba8fb441697846347597b4a0ade428/src/java.base/share/classes/java/util/Spliterators.java#L756">the implementation</a>, I don’t understand where all that time is going. <code>hasNext()</code> should always be called before <code>next()</code>, so <code>next()</code> just has to return a precomputed value.</p>
<p>Nevertheless, we can now explain the performance difference with the additional boxing step.<br />
Primitives good; everything else bad.</p>
<p>With that in mind, I wrote a second test that avoids the unboxing issue to compare Streams and Sequences.<br />
The next snippet uses a simple wrapper class that guarantees that we have no primitives to execute a few operations on a Stream/Sequence.<br />
I’ll use this opportunity to also compare parallel and sequential streams.</p>
<p>The steps are simple:</p>
<ol type="1">
<li>take a long</li>
<li>create a LongWrapper from it</li>
<li>double the contained value (which creates a new LongWrapper)</li>
<li>extract the value</li>
<li>calculate the sum</li>
</ol>
<p>That may sound overcomplicated, but it’s sadly close to the reality of enterprise code. Wrapper types are everywhere.</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode kotlin"><code class="sourceCode kotlin"><span id="cb8-1"><a href="#cb8-1"></a>inner <span class="kw">class</span> LongWrapper(<span class="kw">val</span> <span class="va">value</span>: <span class="dt">Long</span>) {</span>
<span id="cb8-2"><a href="#cb8-2"></a> <span class="kw">fun</span> <span class="fu">double</span>() = LongWrapper(value * <span class="dv">2</span>)</span>
<span id="cb8-3"><a href="#cb8-3"></a>}</span>
<span id="cb8-4"><a href="#cb8-4"></a></span>
<span id="cb8-5"><a href="#cb8-5"></a><span class="kw">fun</span> <span class="fu">sequence</span>(): <span class="dt">Long</span> =</span>
<span id="cb8-6"><a href="#cb8-6"></a> (<span class="dv">1</span> until upper).asSequence()</span>
<span id="cb8-7"><a href="#cb8-7"></a> .map(::LongWrapper)</span>
<span id="cb8-8"><a href="#cb8-8"></a> .map(LongWrapper::double)</span>
<span id="cb8-9"><a href="#cb8-9"></a> .map(LongWrapper::value)</span>
<span id="cb8-10"><a href="#cb8-10"></a> .sum()</span>
<span id="cb8-11"><a href="#cb8-11"></a></span>
<span id="cb8-12"><a href="#cb8-12"></a><span class="kw">fun</span> <span class="fu">stream</span>(): <span class="dt">Optional</span><<span class="dt">Long</span>> =</span>
<span id="cb8-13"><a href="#cb8-13"></a> StreamSupport.stream((<span class="dv">1</span> until upper).spliterator(), <span class="kw">false</span>)</span>
<span id="cb8-14"><a href="#cb8-14"></a> .map(::LongWrapper)</span>
<span id="cb8-15"><a href="#cb8-15"></a> .map(LongWrapper::double)</span>
<span id="cb8-16"><a href="#cb8-16"></a> .map(LongWrapper::value)</span>
<span id="cb8-17"><a href="#cb8-17"></a> .reduce(<span class="kw">Long</span>::plus)</span>
<span id="cb8-18"><a href="#cb8-18"></a></span>
<span id="cb8-19"><a href="#cb8-19"></a><span class="kw">fun</span> <span class="fu">parallelStream</span>(): <span class="dt">Optional</span><<span class="dt">Long</span>> =</span>
<span id="cb8-20"><a href="#cb8-20"></a> StreamSupport.stream((<span class="dv">1</span> until upper).spliterator(), <span class="kw">true</span>)</span>
<span id="cb8-21"><a href="#cb8-21"></a> .map(::LongWrapper)</span>
<span id="cb8-22"><a href="#cb8-22"></a> .map(LongWrapper::double)</span>
<span id="cb8-23"><a href="#cb8-23"></a> .map(LongWrapper::value)</span>
<span id="cb8-24"><a href="#cb8-24"></a> .reduce(<span class="kw">Long</span>::plus)</span>
<span id="cb8-25"><a href="#cb8-25"></a></span>
<span id="cb8-26"><a href="#cb8-26"></a></span>
<span id="cb8-27"><a href="#cb8-27"></a><span class="kw">fun</span> <span class="fu">loop</span>(): <span class="dt">Long</span> {</span>
<span id="cb8-28"><a href="#cb8-28"></a> <span class="kw">var</span> <span class="va">sum</span> = 0L</span>
<span id="cb8-29"><a href="#cb8-29"></a> <span class="cf">for</span> (l <span class="kw">in</span> <span class="dv">1</span> until upper) {</span>
<span id="cb8-30"><a href="#cb8-30"></a> <span class="kw">val</span> <span class="va">wrapper</span> = LongWrapper(l)</span>
<span id="cb8-31"><a href="#cb8-31"></a> <span class="kw">val</span> <span class="va">doubled</span> = wrapper.double()</span>
<span id="cb8-32"><a href="#cb8-32"></a> sum += doubled.value</span>
<span id="cb8-33"><a href="#cb8-33"></a> }</span>
<span id="cb8-34"><a href="#cb8-34"></a> <span class="kw">return</span> sum</span>
<span id="cb8-35"><a href="#cb8-35"></a>}</span></code></pre></div>
<p>The results here paint a different picture:</p>
<pre class="plaintext"><code>NonPrimitive.loop avgt 25 445.992 ± 0.642 ms/op
NonPrimitive.sequence avgt 25 27257.399 ± 342.686 ms/op
NonPrimitive.stream avgt 25 44673.318 ± 1325.832 ms/op
NonPrimitive.parallelStream avgt 25 33856.919 ± 249.911 ms/op</code></pre>
<p>Full results are in the <a href="https://ruru.moe/pSK13p8">JMH log from earlier</a>.</p>
<p>The overhead of Java streams is much higher than that of Kotlin Sequences, and even a parallel Stream is slower than using a Sequence, even though Sequences only use a single thread, but both are miles behind the simple for loop. My first assumption was that the compiler optimized away the wrapper type and just added the longs, but looking at <a href="https://p.kageru.moe/AUJKiG">the bytecode</a>, the constructor invocation and the <code>double()</code> method calls are still there. It’s hard to know what the JIT does at runtime, but the numbers certainly suggest that the wrapper is simply optimized away.<br />
The profiler report wasn’t helpful either, which further leads me to believe that the JIT just deletes the method and inlines the calculations.</p>
<p>This tells us that not only do Streams/Sequences have a very measurable overhead, but they severely limit the optimizer’s (be it compile-time or JIT) ability to understand the code, which can lead to significant slowdowns in code that can be optimized. Obviously, code that doesn’t rely on the optimizer as much won’t be affected to the same degree.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Overall, I think that Kotlin’s Sequences are a good addition to the language, despite their flaws.<br />
They are significantly slower than Streams when working with primitives because the Java standard library has subtypes for many generic constructs to more efficiently handle primitive types, but in most real-world JVM applications (that being enterprise-level bloatware), primitives are the exception rather than the rule. Still, Kotlin already has some types that optimize for this, such as <code>LongIterator</code>, but without a <code>LongSequence</code> to go with it, the boxing will still happen eventually, and all the performance gains are void.</p>
<p>I hope that we can get a few more types like it in the future, which will be especially useful once Kotlin/Native reaches maturity and starts being used for small/embedded hardware.</p>
<p>Apart from the performance, Sequences are also a lot easier to understand and even extend than Streams. Implementing your own Sequence requires barely more than an implementation of the underlying iterator, as can be seen in <a href="https://git.kageru.moe/kageru/Sekwences/src/branch/master/src/main/kotlin/moe/kageru/sekwences/CoalescingSequence.kt">CoalescingSequence</a> which I implemented last year to get a feeling for how all of this works.<br />
Streams on the other hand are a lot more complex. They extend <code>Consumer<T></code>, so a <code>Stream<T></code> is actually just a <code>void consume(T input)</code> that can be called repeatedly. That makes it a lot harder to grasp where data is coming from and how it is requested, at least to me.</p>
<p>Simplicity is often underrated in software, but I consider it a huge plus for Sequences.</p>
<p>I will continue to use them liberally, unless I find myself in a situation where I need to process a huge number of primitives. And even then, I now know that Java’s Streams are a good alternative, as long as my code isn’t plain stupid and in dire need of the JIT optimizer.<br />
25% might sound like a lot, but it’s more than worth it if it means leaving code that is much easier to understand and modify for the next person.<br />
Unless you’re actually in a very performance-critical part of your application, but if you ever find yourself in that situation, you should switch to a different language.</p>
<p>Writing simple and correct code should always be more important than writing fast code.<br />
<br />
<br />
<br />
On the note of switching languages: I was originally going to include Rust’s iterators here for comparison, but rustc optimized away all of my benchmarks with <a href="https://godbolt.org/z/iJaWVP">constant time solutions</a>. That was a fascinating discovery for me, and I might write a separate blog post where I dissect some of the assembly that rustc/LLVM produced, but I feel like I’ll need to learn a few more things about compilers first.</p>
<section class="footnotes" role="doc-endnotes">
<hr />
<ol>
<li id="fn1" role="doc-endnote"><p>If you’ve ever used them, you’ll know what I mean. Java’s Streams are built in a way that allows for easy parallelism, but brings its own problems and limitations for the usage.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2" role="doc-endnote"><p>You could also just compute the sum and take that * 2, but we specifically want that intermediate step for the example.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3" role="doc-endnote"><p>Or any other operation like it. No <code>iterator[0]</code>, no <code>iterator.get(0)</code> or whatever your favorite language uses. An operation like <code>iterator.last()</code> might exist, but it will consume the entire iterator instead of just accessing the last element.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn4" role="doc-endnote"><p>Huge or even infinite. Infinite iterarors can be very useful and are used a lot in functional languages, but they’re not today’s topic.<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn5" role="doc-endnote"><p>Mainly to make sure there is no performance difference between the two.<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn6" role="doc-endnote"><p><code>1 until upper</code> is used in these examples because unlike <code>lower..upper</code>, <code>until</code> is end-inclusive like Java’s LongStream.range().<a href="#fnref6" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn7" role="doc-endnote"><p>The JVM has a few primitive types, such as <code>int</code>, <code>char</code>, or array types. They are different from any other type because they cannot be <code>null</code>. Every regular type on the JVM extends <code>java.lang.Object</code> and is just a reference that is being passed around. The primitives are values, not references, so there’s a lot less overhead involved. Unfortunately, primitives can’t be used as generic types, so a list of longs will always convert the <code>long</code> to <code>Long</code> before adding it.<a href="#fnref7" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>
Writing less code
https://blog.kageru.moe/content/lesscode.html
<p>Code is bad. It’s confusing, it’s easy to break, and it needs to be maintained or even updated. And the more code you have, the worse it gets.</p>
<p>I sometimes get bored, perhaps more often than I’d like to admit, and one of the things I do to fight that boredom is writing code. I’ve created lots of small pieces of software, most of which are awful, useless, or both. My old blog may was one of them, although the exact classification into those categories shall be left as an exercise to the reader.</p>
<p>I realized the process of writing and uploading content to it was also anything but streamlined and likely contributed to my lack of motivation to write and release anything, so I decided to replace it. At first, I thought about using <a href="https://jekyllrb.com/">Jekyll</a>, but remember, I’m bored and looking for opportunities to write code (which admittedly is the opposite of today’s title).</p>
<p>So I decided to rewrite it. Not as another Python Django application, not as a Rails project or whatever people do these days. No, I wanted to know how little I could get away with. I wasn’t golfing for line count, obviously (because that’s just stupid), but I ideally wanted a simple shell script that would do everything I needed and only that. I wanted to write markdown and get static HTML. Simple as that. So here’s how you do that while writing as little code as possible:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode sh"><code class="sourceCode bash"><span id="cb1-1"><a href="#cb1-1"></a>$ <span class="ex">pandoc</span> input.md -t html <span class="op">></span> output.html</span></code></pre></div>
<p>And that’s the secret to all of this.</p>
<h2 id="dry-more-like-drse">DRY? More like DRSE</h2>
<p>The DRY principle (“don’t repeat yourself”) is something most programmers are familiar with and are probably trying to adhere to. Writing duplicate code feels inherently wrong to most people. But why not take that one step further? Don’t just not repeat yourself; don’t repeat someone else either. If someone has already written software that converts markdown to html, you don’t have to do it again. That part might have been obvious, but we can apply it to almost everything that is necessary for this little project.<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a></p>
<h2 id="the-components">The components</h2>
<p>So what does my blog need to do? Well, quite simple:</p>
<ul>
<li>read markdown and convert it to HTML</li>
<li>generate an index of all the blog entries</li>
<li>include some basic CSS/JS in the output</li>
<li>update itself automatically when I publish something</li>
<li>be compatible with the content from my previous blog</li>
</ul>
<p>That last point might be the worst, but it’s what I wanted/needed.</p>
<p>The old blog had a simple sqlite database that would hold the title, date, and link of all blog posts. It then had a predefined template for site header and footer and would just insert the content between those. Relatively simple, but way more than what was necessary and also relatively slow because the template would be rendered for each request. Oh, and I had to write the content directly in HTML.</p>
<p>Static pages converted from markdown would do the job just as well, so that was my new goal.</p>
<h3 id="markdown-conversion">Markdown conversion</h3>
<p>The first and most obvious step is converting my hand-written markdown files to beatiful HTML for the browser. As mentioned previously, I am going to use markdown for the conversion logic.</p>
<p>All I had to do now was define a folder structure which in my case has a <code>src</code> folder with all the .md files and a <code>content</code> folder with the resulting .html documents. The rest is a simple loop and some shell built-ins.</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode sh"><code class="sourceCode bash"><span id="cb2-1"><a href="#cb2-1"></a><span class="fu">convert_file()</span> <span class="kw">{</span></span>
<span id="cb2-2"><a href="#cb2-2"></a> <span class="va">path=</span><span class="st">"</span><span class="va">$9</span><span class="st">"</span></span>
<span id="cb2-3"><a href="#cb2-3"></a> <span class="va">outpath=</span><span class="st">"content/</span><span class="va">$(</span><span class="fu">basename</span> <span class="st">"</span><span class="va">$path</span><span class="st">"</span> .md<span class="va">)</span><span class="st">.html"</span></span>
<span id="cb2-4"><a href="#cb2-4"></a> <span class="ex">pandoc</span> <span class="st">"</span><span class="va">$path</span><span class="st">"</span> -t html <span class="op">></span> <span class="st">"</span><span class="va">$outpath</span><span class="st">"</span></span>
<span id="cb2-5"><a href="#cb2-5"></a><span class="kw">}</span></span>
<span id="cb2-6"><a href="#cb2-6"></a></span>
<span id="cb2-7"><a href="#cb2-7"></a><span class="fu">ls</span> -ltu src/*.md <span class="kw">|</span> <span class="fu">tail</span> -n+1 <span class="kw">|</span> <span class="kw">while</span> <span class="bu">read</span> <span class="va">f</span>; <span class="kw">do</span> <span class="ex">convert_file</span> <span class="va">$f</span><span class="kw">;</span> <span class="kw">done</span></span></code></pre></div>
<p>I used <code>ls -l</code> to have each file on a separate line which makes the parsing much easier. <code>ls -tu</code> will sort the files by modification time so the newest entries are at the top. <code>tail -n+1</code> removes the first line which is <code>total xxx</code> because of <code>-l</code>.</p>
<p>Step 1 done.</p>
<h3 id="index-generation">Index generation</h3>
<p>This problem was partially solved in the last step because A already had a list of all output paths sorted by edit date. All that is left now is to generate some static html from that. I thus make some changes:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode sh"><code class="sourceCode bash"><span id="cb3-1"><a href="#cb3-1"></a><span class="fu">output()</span> <span class="kw">{</span></span>
<span id="cb3-2"><a href="#cb3-2"></a> <span class="bu">echo</span> <span class="st">"</span><span class="va">$1</span><span class="st">"</span> <span class="op">>></span> index.html</span>
<span id="cb3-3"><a href="#cb3-3"></a><span class="kw">}</span></span>
<span id="cb3-4"><a href="#cb3-4"></a></span>
<span id="cb3-5"><a href="#cb3-5"></a><span class="fu">create_entry()</span> <span class="kw">{</span></span>
<span id="cb3-6"><a href="#cb3-6"></a> <span class="co"># the code from step 1</span></span>
<span id="cb3-7"><a href="#cb3-7"></a> <span class="va">path=</span><span class="st">"</span><span class="va">$9</span><span class="st">"</span></span>
<span id="cb3-8"><a href="#cb3-8"></a> <span class="va">outpath=</span><span class="st">"content/</span><span class="va">$(</span><span class="fu">basename</span> <span class="st">"</span><span class="va">$path</span><span class="st">"</span> .md<span class="va">)</span><span class="st">.html"</span></span>
<span id="cb3-9"><a href="#cb3-9"></a> <span class="ex">pandoc</span> <span class="st">"</span><span class="va">$path</span><span class="st">"</span> -t html <span class="op">></span> <span class="st">"</span><span class="va">$outpath</span><span class="st">"</span></span>
<span id="cb3-10"><a href="#cb3-10"></a> <span class="co"># and some html output</span></span>
<span id="cb3-11"><a href="#cb3-11"></a> <span class="ex">output</span> <span class="st">"<a href=</span><span class="dt">\"</span><span class="va">$outpath</span><span class="dt">\"</span><span class="st">></span><span class="va">$outpath</span><span class="st"></a>"</span></span>
<span id="cb3-12"><a href="#cb3-12"></a><span class="kw">}</span></span>
<span id="cb3-13"><a href="#cb3-13"></a></span>
<span id="cb3-14"><a href="#cb3-14"></a><span class="fu">rm</span> -f index.html <span class="co"># -f so it doesn’t fail if index.html doesn’t exist yet</span></span>
<span id="cb3-15"><a href="#cb3-15"></a><span class="fu">ls</span> -ltu src/*.md <span class="kw">|</span> <span class="fu">tail</span> -n+1 <span class="kw">|</span> <span class="kw">while</span> <span class="bu">read</span> <span class="va">f</span>; <span class="kw">do</span> <span class="ex">create_entry</span> <span class="va">$f</span><span class="kw">;</span> <span class="kw">done</span></span></code></pre></div>
<p>That will give us a list of links to the blog entries with the filenames as titles, but we can do better than that. First, by extracting titles from the files. This is based on the assumption that I begin every blog post with an h1 heading, or a single <code># Heading</code> in markdown.</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode sh"><code class="sourceCode bash"><span id="cb4-1"><a href="#cb4-1"></a><span class="va">title=</span><span class="st">"</span><span class="va">$(</span><span class="ex">rg</span> <span class="st">'h1'</span> <span class="st">"</span><span class="va">$outpath</span><span class="st">"</span> <span class="kw">|</span> <span class="fu">head</span> -n1 <span class="kw">|</span> <span class="ex">rg</span> -o <span class="st">'(?<=>).*(?=<)'</span> --pcre2<span class="va">)</span><span class="st">"</span></span></code></pre></div>
<p>Match the first line that contains an h1 and return whatever is inside <code>></code> and <code><</code> – the title.</p>
<p>By then making the <code>src</code> directory part of a git repository (which I wanted to do anyway because it’s a good way to track changes), we can get the creation time of each file.</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode sh"><code class="sourceCode bash"><span id="cb5-1"><a href="#cb5-1"></a><span class="va">created=$(</span><span class="fu">git</span> log --follow --format=%as <span class="st">"</span><span class="va">$path</span><span class="st">"</span> <span class="kw">|</span> <span class="fu">tail</span> -1<span class="va">)</span></span></code></pre></div>
<p><code>--format=%as</code> returns the creation date of a file as YYYY-MM-DD. <code>man git-log</code> is your friend here.</p>
<p>We can combine this with some more static HTML to turn our index into a table with all the titles, dates, and links:</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode sh"><code class="sourceCode bash"><span id="cb6-1"><a href="#cb6-1"></a><span class="fu">html_entry()</span> <span class="kw">{</span></span>
<span id="cb6-2"><a href="#cb6-2"></a> <span class="ex">output</span> <span class="st">'<tr>'</span></span>
<span id="cb6-3"><a href="#cb6-3"></a> <span class="va">path=</span><span class="st">"</span><span class="va">$1</span><span class="st">"</span></span>
<span id="cb6-4"><a href="#cb6-4"></a> <span class="va">time=</span><span class="st">"</span><span class="va">$2</span><span class="st">"</span></span>
<span id="cb6-5"><a href="#cb6-5"></a> <span class="va">title=</span><span class="st">"</span><span class="va">$3</span><span class="st">"</span></span>
<span id="cb6-6"><a href="#cb6-6"></a> <span class="ex">output</span> <span class="st">"<td class=</span><span class="dt">\"</span><span class="st">first</span><span class="dt">\"</span><span class="st">><a href=</span><span class="dt">\"</span><span class="va">$path</span><span class="dt">\"</span><span class="st">></span><span class="va">$title</span><span class="st"></a></td>"</span></span>
<span id="cb6-7"><a href="#cb6-7"></a> <span class="ex">output</span> <span class="st">"<td class=</span><span class="dt">\"</span><span class="st">second</span><span class="dt">\"</span><span class="st">></span><span class="va">$time</span><span class="st"></td></tr>"</span></span>
<span id="cb6-8"><a href="#cb6-8"></a><span class="kw">}</span></span>
<span id="cb6-9"><a href="#cb6-9"></a></span>
<span id="cb6-10"><a href="#cb6-10"></a><span class="ex">create_entry</span> {</span>
<span id="cb6-11"><a href="#cb6-11"></a> <span class="co"># mentally insert previous code here</span></span>
<span id="cb6-12"><a href="#cb6-12"></a> <span class="co"># ...</span></span>
<span id="cb6-13"><a href="#cb6-13"></a> <span class="ex">html_entry</span> <span class="st">"</span><span class="va">$outpath</span><span class="st">"</span> <span class="st">"created on </span><span class="va">$created</span><span class="st">"</span> <span class="st">"</span><span class="va">$title</span><span class="st">"</span></span>
<span id="cb6-14"><a href="#cb6-14"></a>}</span>
<span id="cb6-15"><a href="#cb6-15"></a></span>
<span id="cb6-16"><a href="#cb6-16"></a><span class="fu">rm</span> -f index.html</span>
<span id="cb6-17"><a href="#cb6-17"></a><span class="ex">output</span> <span class="st">'<h1>Blog index</h1>'</span></span>
<span id="cb6-18"><a href="#cb6-18"></a><span class="ex">output</span> <span class="st">'<table>'</span></span>
<span id="cb6-19"><a href="#cb6-19"></a><span class="fu">ls</span> -ltu src/*.md <span class="kw">|</span> <span class="fu">tail</span> -n+1 <span class="kw">|</span> <span class="kw">while</span> <span class="bu">read</span> <span class="va">f</span>; <span class="kw">do</span> <span class="ex">create_entry</span> <span class="va">$f</span><span class="kw">;</span> <span class="kw">done</span></span>
<span id="cb6-20"><a href="#cb6-20"></a><span class="ex">output</span> <span class="st">'</table>'</span></span></code></pre></div>
<p>It looks quite plain, but we have a fully functional index for our blog. Onto step 3.</p>
<h3 id="styling">Styling</h3>
<p>For this, we can use a lesser known nginx feature that allows us to prepend something to the body of each page and append something after. I changed the config and created a simple header as a static html file that would include the necessary resources.</p>
<pre class="plaintext"><code>location / {
add_before_body /before_body.html;
add_after_body /after_body.html;
index index.html;
}</code></pre>
<p>That’s it. Next step.</p>
<h3 id="automatic-updates">Automatic updates</h3>
<p>At first, I had the entire script run every few minutes via <code>cron</code>, but markup conversion isn’t that cheap, so I only wanted to regenerate the files if something actually changed.</p>
<p>Since we’re already using git for the sources, we have everything we need. I can simply check if there are changes upstream.</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode sh"><code class="sourceCode bash"><span id="cb8-1"><a href="#cb8-1"></a><span class="fu">has_updates()</span> <span class="kw">{</span></span>
<span id="cb8-2"><a href="#cb8-2"></a> <span class="fu">git</span> fetch <span class="op">&></span> /dev/null</span>
<span id="cb8-3"><a href="#cb8-3"></a> <span class="va">diff=</span><span class="st">"</span><span class="va">$(</span><span class="fu">git</span> diff master origin/master<span class="va">)</span><span class="st">"</span></span>
<span id="cb8-4"><a href="#cb8-4"></a> <span class="kw">if</span><span class="bu"> [</span> <span class="st">"</span><span class="va">$diff</span><span class="st">"</span><span class="bu"> ]</span>; <span class="kw">then</span></span>
<span id="cb8-5"><a href="#cb8-5"></a> <span class="bu">return</span> 0</span>
<span id="cb8-6"><a href="#cb8-6"></a> <span class="kw">else</span></span>
<span id="cb8-7"><a href="#cb8-7"></a> <span class="bu">return</span> 1</span>
<span id="cb8-8"><a href="#cb8-8"></a> <span class="kw">fi</span></span>
<span id="cb8-9"><a href="#cb8-9"></a><span class="kw">}</span></span>
<span id="cb8-10"><a href="#cb8-10"></a></span>
<span id="cb8-11"><a href="#cb8-11"></a><span class="kw">if</span> <span class="ex">has_updates</span><span class="kw">;</span> <span class="kw">then</span></span>
<span id="cb8-12"><a href="#cb8-12"></a> <span class="co"># this merges origin/master into local master</span></span>
<span id="cb8-13"><a href="#cb8-13"></a> <span class="fu">git</span> pull</span>
<span id="cb8-14"><a href="#cb8-14"></a> <span class="co"># run the previous code</span></span>
<span id="cb8-15"><a href="#cb8-15"></a> <span class="ex">...</span></span>
<span id="cb8-16"><a href="#cb8-16"></a><span class="kw">fi</span></span></code></pre></div>
<p>I’m not super familiar with shell scripting, so if there’s a better way to do that boolean return in POSIX sh, feel free to <a href="https://kageru.moe/contact/">tell me</a>.</p>
<p>And now, the dreaded last step.</p>
<h3 id="legacy-garbage">Legacy garbage</h3>
<p>That last part was actually quite simple. I added a <code>legacy/index.html</code> with a hand-written list of all previous blog entries, and then made it appear last on the generated index with <code>entry "legacy" "before 2020" "Older posts"</code>. Since I use nginx to add the header and footer to every page, the legacy index and legacy pages work almost out of the box. After some slight adjustments to the old content pages, everything looks as intended.</p>
<h2 id="summary">Summary</h2>
<p>I now have a working static page generator for my blog in under 50 lines of shell code. It does what I need and only that. The code is (relatively) simple and fully POSIX sh compliant. It’s not built to be super general or reusable, but that wasn’t the goal here.</p>
<p>I am aware that I built this with relatively little regard to dependencies. Pandoc is huge, and the ripgrep call could be replaced with standard grep. I know that, but for now, I don’t care.</p>
<p>If you want to take a look at the final result, the code is <a href="https://git.kageru.moe/kageru/mdb">on my gitea</a>.</p>
<p>I guess the only question now is: will this new blog give me the motivation to write more? Only time will tell.<br />
I do have a few more ideas, and none of them are encoding-related. Sorry.</p>
<p><strong>Edit:</strong> It was brought to my attention that this is very similar to <a href="https://github.com/LukeSmithxyz/lb">Luke Smith’s lb</a>. I think the comparison is fair, but we seem to have different priorities. He writes HTML; I write markdown. He uses rsync; I want everything in git and also use that to sync. He didn’t want dependencies; I… use pandoc. :^)</p>
<p>Still very interesting to see his approach to this, so thanks for pointing that out.</p>
<p>Now I’m considering adding RSS at some point. We’ll see.</p>
<section class="footnotes" role="doc-endnotes">
<hr />
<ol>
<li id="fn1" role="doc-endnote"><p>within reason, otherwise we wouldn’t write any code at all or do something ridiculous like depend on an external library to <a href="https://www.davidhaney.io/npm-left-pad-have-we-forgotten-how-to-program/">left-pad a string</a><a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>