Now With More Plaintext!

It’s been a little while since I needlessly tweaked my blog, just to add a feature nobody asked for, but I got bit by the “why the hell not?” bug and decided to cross-publish all of my posts in plaintext format in addition to HTML.

While the end-result is pretty straightforward (just click on the txt link at the bottom of this post for an example), getting there took a little bit of discovery on my part because I’m not particularly familiar with Jekyll Generators—I say a “little bit” because, if I’m being honest, GitHub Copilot did most of the hard work for me, so all I had to do was expand on some scaffolding.

Generating a Generator

So, as mentioned, the first thing I needed to do was create a Jekyll Generator to turn my Markdown posts into plain text ones (.md => .txt).

For the uninitiated, a generator is a plugin that allows you to generate additional content during the build process, typically against information that has already been inventoried or processed. Generating alternative formats (like my plaintext generator) is a common use-case, however modifying existing content is also not unheard of.

To get started, though, we need to create two files: the generator plugin itself, and a template file. Lets start with the template (which we will place in _layouts/txt.html):

---
layout: null
---
{{ content }}

All of our plaintext files will be built off the above template, which (if you are new to Jekyll templates) is exactly what it looks like: a page that prints the provided content with no additional layouts applied.

Next, we need to create the generator, which will be placed in _plugins/jekyll-txt-generator/jekyll-txt-generator.rb. For the purposes of this tutorial, we’re going to start with as basic a functionality as possible: converting our Markdown post files into plain text files with no additional processing:

module Jekyll
  class TxtGenerator < Generator
    safe true
    priority :low

    def generate(site)
      site.posts.docs.each do |post|
        site.pages << TxtPost.new(site, post)
      end
    end
  end

  class TxtPost < Page
    def initialize(site, post)
      @site = site
      @base = site.source
      @dir  = File.dirname(post.url)
      @name = File.basename(post.url, File.extname(post.url)) + '.txt'

      content = txtify(post)

      self.process(@name)
      self.read_yaml(File.join(@base, '_layouts'), 'txt.html')
      self.content = content
      self.data['layout'] = nil
    end

    def txtify(post)
      content << post.content
      content
    end
  end
end

At its core, what’s happening above is that we are iterating through every post in the _posts directory (the default directory for posts), and then passing its information into the TxtPost::Page class, which takes that data, modifies it as necessary (which is to say, does nothing) using the txtify() method, applies it to our txt.html template created above, and then writes it out to a new .txt file alongside the other generated .html file.

Getting MUDdy with It

While the end-results were acceptable, I quickly realized that I could do a lot more to make things readable, namely including a nicely-formatted post Title and Date at the top of each generated text file.

To do this, I wanted to modify the txtify() method to prefix the post.content with the aforementioned post title and date, as well as a nice line to separate it from the rest of the post content:

def txtify(post)
  content = ""
  content << 'Title: ' + post.data['title'] + "\n"
  content << 'Date: ' + post.date.strftime('%B %d, %Y') + "\n"
  content << "\n=-=-=-=\n\n"
  content << post.content

  content
end

For the unaware, I am a huge fan of Multi-User Dungeons (MUDs). The early ancestors of MMORPGs, MUDs are the reason I got into software development in the first place, so the horizontal line I chose to separate the post data from the content is a nod to the very first MUD I coded for called Ainoki.

For whatever reason, the original developer of Ainoki really loved to use alternating equals signs and dashes as separators, and I’ve liked that aesthetic ever since.

Once the generator runs, we now have text files that look something like this:

Title: A Polyglot Hello World Has Appeared!
Date: December 30, 2023

=-=-=-=

You ever have an interesting problem drop in your lap that you just can't let go of?

Pretty simple, but looks nice and gets the job done.

Not-Quite Markdown

Thankfully, I write all of my posts in Markdown, which is already a pretty plaintext-friendly format, so I didn’t have to go too crazy and have to convert HTML to plaintext (which would be a little more than irritating).

But there are a few elements in Markdown that aren’t the most reader-friendly in their native form that I wanted to address:

Images
Links

Markdown links and images both follow a similar format: [I am a link](http://link.awesome) and ![I am an image alt text](http://link.awesome/image.png). Not the worst thing in the world, but also not the most parseable.

So, at a bare minimum, I knew that I wanted to preserve most of the Markdowniness of my posts, with the exception of more readable images and links.

Hello, Footnotes

A while back, I hosted some content in a self-hosted Gopherhole (Gopher being one of the early competitors of the World Wide Web that is still available in some capacities today).

While managing a Gopherhole ended up being a bit too much overhead for me (maybe I’ll revisit it someday if I can simplify my delivery… probably using another Jekyll plugin!), one common pattern employed by most of the phlogs (Gopher Blogs… get it?) I read was a footnote system for page links.

This made a lot of sense to copy, since you can’t actually create hyperlinks in plaintext anyway, which got me to here:

def txtify(post)
  content = ""
  content << 'Title: ' + post.data['title'] + "\n"
  content << 'Date: ' + post.date.strftime('%B %d, %Y') + "\n"
  content << "\n=-=-=-=\n\n"
  content << post.content

  footnotes = []

  # Convert Markdown links to footnotes
  content.gsub!(/\[([^\]]+)\]\(([^\)]+)\)/) do |match|
	footnotes << "#{$2}"
	"#{$1}[#{footnotes.size}]"
  end

  # Trim trailing newlines at the end of the file
  content.gsub!(/\n+\z/, '')

  # Append footnotes
  content << "\n\n=-=-=-=\n\n"

  footnotes.each_with_index do |link, index|
	# add config.url to relative links
	if link.start_with?('/')
	  link = @site.config['url'] + link
	end

	content << "[#{index+1}]: #{link}\n"
  end

  content
end

To break it down, I did the following:

Identified all Markdown-formatted links using everyone’s favorite tool to use: regular expressions (/\[([^\]]+)\]\(([^\)]+)\)/).
Iterated through each link and saved the link in a footnotes array, and rewrote the Markdown links in a pretty simple format (Link Title[1]).
Wrote out each link at the bottom of the page in a footnotes section, separated with my handy-dandy MUD-style =-=-=-= separator.

I also made sure to convert my internal links to fully-qualified links, so the reader could simply copy-paste the entire link instead of having to type in my domain name.

In the end this resulted in a nice footer section that looks like the following:

Are you in the same boat? If so, I'd love to hear how you are approaching the problem!

=-=-=-=

[1]: https://handbrake.fr/
[2]: https://www.everand.com/
[3]: https://www.amazon.com/Browse-Kindle-Unlimited-Books/
[4]: https://www.overdrive.com/apps/libby
[5]: https://www.reddit.com/r/opendirectories/
[6]: https://calibre-ebook.com/
[7]: https://www.moondownload.com/

*swoon*

And Finally, Images

The last thing I needed to do was deal with my images.

To be honest, by the time I got here I was starting to get burned out. I mean, honestly, how many different ways can you render a link to an image? Turns out, not many if you care about your mental health.

What I ultimately landed on was an extremely simple Image: prefix before each image link.

I don’t put images inline in my posts (in fact I’m not even sure if Markdown supports that), so knowing that the images are going to always be on their own line made it easy enough to cut a few thought corners here.

All I really had to do was write another regular expression to identify the image format, throw away the alt tag—I know, not at all accessible of me, but the vast majority of my images don’t actually add anything to the posts beyond aesthetics—and then add the Image: prefix before the link:

def txtify(post)
  content = ""
  content << 'Title: ' + post.data['title'] + "\n"
  content << 'Date: ' + post.date.strftime('%B %d, %Y') + "\n"
  content << "\n=-=-=-=\n\n"
  content << post.content

  footnotes = []

  # Convert Markdown links to footnotes
  content.gsub!(/\[([^\]]+)\]\(([^\)]+)\)/) do |match|
	footnotes << "#{$2}"
	"#{$1}[#{footnotes.size}]"
  end

  # Convert Markdown images to references
  content.gsub!(/\!\[([^\]]*)\]\(([^\)]+)\)/) do |match|
	link = $2

	# add config.url to relative images
	if link.start_with?('/')
	  link = @site.config['url'] + link
	end

	"Image: #{link}"
  end

  # Trim trailing newlines at the end of the file
  content.gsub!(/\n+\z/, '')

  # Append footnotes
  content << "\n\n=-=-=-=\n\n"

  footnotes.each_with_index do |link, index|
	# add config.url to relative links
	if link.start_with?('/')
	  link = @site.config['url'] + link
	end

	content << "[#{index+1}]: #{link}\n"
  end

  content
end

Simple, no?

I won’t bother breaking the code down this time, because it’s just a sub-routine of the link parser, and I’m sure you get the idea by now. But what it does output is something like the following:

Image: http://flower.codes/assets/img/posts/windowsill-books.jpg

My wife and I have been talking about moving a lot lately.

We've hit a wall with our current environment, and have been longing for something just a bit simpler for quite some time now.

It’s not perfect, but I don’t care enough to try and figure out what perfect means, so good enough is definitely good enough here.

What’s Next?

This was a fun exercise for me.

I’ve been wanting to cross-publish my posts in different formats for quite some time, but for reasons not entirely clear to me, put it all off.

But I’m glad I did it, and it’s got me thinking about my next experiment with this Jekyll-based nightmare I call a blog; namely, properly cross-posting to a Gopherhole or some other format without adding any unnecessary overhead.

Stay tuned, my friends.