..

A Polyglot Hello World Has Appeared!

You ever have an interesting problem drop in your lap that you just can't let go of?

I was goofing off on the Internet last night and was completely nerd sniped by the phrase "polyglot hello world." In a nutshell, a polyglot hello world is a "Hello World" program that is syntactically correct in as many programming languages as possible.

For example, something that can be executed as both C and Python.

This concept wormed itself into the deep recesses of my brain and, quite literally, broke me for the next 6 hours. After being exposed to the concept, I deliberately closed the article I was reading because I wanted to see what I could come up with on my own, and after a ton of trial and error, this is what I came up with:

#include <stdio.h>
#define print(a) int main(){puts(a);return 0;}
#if 0
###
#=
"""echo" "Hello World!";"exit";puts "Hello World!";
__END__
print("Hello World!");
###
console.log 'Hello World!'
process.exit()
# =#
print("Hello World!\n"); exit();
"""#"""
#endif
print("Hello World!");

Looks like a fucking mess, right?

Well, what you're reading is a block of code that can be successfully executed (or compiled) in C, C++, and Objective-C (which isn't very impressive), but also Ruby, Python, Julia, CoffeeScript, and Bash.

Impressed yet?

C/C++/Objective-C

To craft this beautiful monstrosity, I had to employ a number of different tricks, the most important one is the fact that a few of these languages make use of the # character in a meaningful way (C, C++, and Objective-C), while the others use it as a comment character.

That allows us to take advantage of C's preprocessor support, and craft a macro that looks and operates in exactly the same way as Pythons. To make things more readable, here is what this source code looks like when syntax highlighted as a C program:

#include <stdio.h>
#define print(a) int main(){puts(a);return 0;}
#if 0
###
#=
"""echo" "Hello World!";"exit";puts "Hello World!";
__END__
print("Hello World!");
###
console.log 'Hello World!'
process.exit()
# =#
print("Hello World!\n"); exit();
"""#"""
#endif
print("Hello World!");

As you can see, it boils down to just a few lines of actual code and some old-school compiler tricks:

#include <stdio.h>
#define print(a) int main(){puts(a);return 0;}

What the above block of code does is simply define a macro called print() that, when seen by the compiler, gets re-written as:

int main() {
	puts(a);
	return 0;
}

Additionally, C has an unofficial "multi-line comment" system using the preprocessor directives, allowing you to effectively wrap a block of code in an if FALSE conditional, which you can see here:

#if 0
###
#=
"""echo" "Hello World!";"exit";puts "Hello World!";
__END__
print("Hello World!");
###
console.log 'Hello World!'
process.exit()
# =#
print("Hello World!\n"); exit();
"""#"""
#endif

So what the C-based compilers ultimately see when they are executed is code that looks like this:

#include <stdio.h>

int main() {
	puts(a);
	return 0;
}

Pretty cool, right?

Bash

Alright, let's move on to the Bash script. With our syntax highlighter flipped to "Shell," you can see that all that the first five lines are just comments:

#include <stdio.h>
#define print(a) int main(){puts(a);return 0;}
#if 0
###
#=
"""echo" "Hello World!";"exit";puts "Hello World!";
__END__
print("Hello World!");
###
console.log 'Hello World!'
process.exit()
# =#
print("Hello World!\n"); exit();
"""#"""
#endif
print("Hello World!");

The actual first line of code the shell script runs is the only one that matters, namely:

"""echo" "Hello World!";"exit";puts "Hello World!";

So what's happening here?

Well, since Bash scripts are effectively just an executed collection of commands you'd run in the terminal, which in themselves are just typed strings, and semicolons are equivalent to hitting the return key in the terminal, let's remove some of the extraneous quotes and get a better idea of what's getting executed:

echo "Hello World!"
exit
puts "Hello World!";

Easier to read?

I thought so to.

But what's with that puts at the bottom? That doesn't get executed, because the exit directive stops the script immediately after echoing "Hello World!"

So, in reality, our script ends up being just the following code:

echo "Hello World!"
exit

Ruby

Still curious about that puts, though? Well that's one of the only pieces of code that Ruby cares about. If we switch our syntax highlighter up again, you can see what our Ruby interpreter sees:

#include <stdio.h>
#define print(a) int main(){puts(a);return 0;}
#if 0
###
#=
"""echo" "Hello World!";"exit";puts "Hello World!";
__END__
print("Hello World!");
###
console.log 'Hello World!'
process.exit()
# =#
print("Hello World!\n"); exit();
"""#"""
#endif
print("Hello World!");

Again, there are a few special quirks we're taking advantage of to note. First, the # lines are comments in Ruby, so those get ignored. Secondly, the __END__ directive tells the Ruby interpreter to treat everything after it as one big comment block (some syntax highlighters do this well, while others don't).

So, with that in mind, the Ruby script looks more like this:

"""echo" "Hello World!";"exit";puts "Hello World!";

Just like the other languages, semicolons are mostly treated as command separators, so a different way to look at it is this:

"""echo" "Hello World!"
"exit"
puts "Hello World!"

Clear as mud, right?

Keep in mind that you can define raw strings into any Ruby script like this, and if they don't get assigned to a variable, then they're effectively just ignored. So what is actually happening is this:

puts "Hello World!"

Et voila! Now we've got Ruby!

Python

Next on our list is Python:

#include <stdio.h>
#define print(a) int main(){puts(a);return 0;}
#if 0
###
#=
"""echo" "Hello World!";"exit";puts "Hello World!";
__END__
print("Hello World!");
###
console.log 'Hello World!'
process.exit()
# =#
print("Hello World!\n"); exit();
"""#"""
#endif
print("Hello World!");

Just like the others, Python uses the # character to denote single-line comments, but what makes Python stand out is the unique """ character sequence to denote multi-line comments, and when combined with Bash and Ruby's special consideration for string characters, we can take advantage of it by throwing in a few back-to-back "empty" or "unnecessary" strings:

"""echo" "Hello World!";"exit";puts "Hello World!";

Python interprets the above as a block comment thanks to the three quote characters at the beginning, so thanks to that hacky reality and our single-line comments, what's really getting seen by the Python interpreter is this:

print("Hello World!");

Julia

So once I knocked Python, Ruby, Bash, and the C-based languages off of my list, I started to find ways I could stretch a bit further.

I've never written a line of Julia in my life. To be honest, I couldn't even tell you what Julia is normally used for, but it landed on my radar as a candidate because of two criteria I was able to land on:

  1. The language must support #-based single-line comments.
  2. The language must have a unique way to denote multi-line comments without interfering in the other languages.

Julia was the first language I found that matched both of those criteria perfectly:

#include <stdio.h>
#define print(a) int main(){puts(a);return 0;}
#if 0
###
#=
"""echo" "Hello World!";"exit";puts "Hello World!";
__END__
print("Hello World!");
###
console.log 'Hello World!'
process.exit()
# =#
print("Hello World!\n"); exit();
"""#"""
#endif
print("Hello World!");

For Julia, the multi-line comments look like this:

#=
THIS IS A MULTILINE COMMENT
=#

That equals-sign at the close of the multi-line comments almost got in the way, but I discovered that adding a # in front of it allowed it to be ignored by the other interpreters while still being properly understood by Julia.

Gross.

Also, like many of the other languages, Julia has a print() function that we can use that won't interfere with the other interpreters. But because I was shooting for consistent output ("Hello World" followed by a newline), I couldn't just have it fall back to the Python/C print() function at the bottom.

It needed to print a newline!

Which means I both had to and got to use Julia's multi-line comments in order to both print our "Hello World" and then exit() early. This required the Julia code to happen after all the other code, but before the Python block comments closed, ultimately leaving us with the following Julia code:

print("Hello World!\n");
exit();

CoffeeScript

And finally, we've hit CoffeeScript:

#include <stdio.h>
#define print(a) int main(){puts(a);return 0;}
#if 0
###
#=
"""echo" "Hello World!";"exit";puts "Hello World!";
__END__
print("Hello World!");
###
console.log 'Hello World!'
process.exit()
# =#
print("Hello World!\n"); exit();
"""#"""
#endif
print("Hello World!");

An other language I've never actually written, CoffeeScript is very JavaScript-esque, except that it supports (you guessed it) #-based comments instead of C-style // and /* comments. It also matched my previous criteria, and CoffeeScript's multi-line comment syntax is a pretty simple (and easily ignorable by other languages) ### tag to start and end.

The other advantage CoffeeScript has (that is shared amongst most of these languages) is that it didn't seem to check the syntax of any of the code after exiting, which means that invalid functions can show up after process.exit() is called, and the CoffeeScript interpreter couldn't care less.

This let us shoehorn our CoffeeScript "Hello World" in just before the Julia interpreter, like so:

console.log 'Hello World!'
process.exit()

Fin.

Not gonna lie, this monstrosity of code is both the most beautiful thing I've ever written, and the most disturbing. Seriously, once I got the challenge into my head I couldn't get it out. I dreamt about it last night—when I was actually able to sleep.

I'd love to find a way to expand on this with other languages, but I'm not sure I have the emotional fortitude for it at this point.

I'm going to go take a nap.

--

This is post 013 of #100DaysToOffload