Being creative is hard work, let's go optimizing instead! My graphics engine dye was pretty naive about displaying text, and it was wasteful. Let's see how I made it all better with this one weird tip.

Disclaimer: Even after a few years I'm still very much an OpenGL newbie. Please don't hit me with crowbars.

Once upon a time, OpenGL was easy to use - and also falling out of relevancy as far as high-performance 3D graphics were concerned. But it wasn't all bad! You could basically pick up any library out there and integrate it with your existing GL project. Not that it's a good idea, but it usually just worked.

For example, if you wanted to display TrueType fonts in your OpenGL application, you could just go ahead and pick up ftgl (with ooc bindings), and you'd be a happy camper.

But then later you might want to support OpenGL ES, or just OpenGL 3, and then FTGL wouldn't know what to do with a core context, where glPushMatrix and glVertex2f is a thing of the past. Of course, you could look for an OpenGL3 port of FTGL, but where's the fun in that?

Displaying TrueType text isn't that hard, at least for the simple case (ASCII, Americano-centric text), Let's go over a few approaches.

Pre-processing into Bitmap fonts

The nice thing about TrueType fonts is that they're basically vector graphics with a shitton of metrics on top of them. So, they look good at pretty much any size.

But in your game, chances are you don't have arbitrary large text. And if you do, it's probably upscaled from a lower-res texture. So if you don't feel like loading TrueType fonts into your game directly and having to bother with all these, you can turn them into bitmap fonts instead, maybe using something like BMFont.

That is a totally valid approach, which I chose to disregard, because, again, where's the fun in that? And I tend to dislike approaches that require a lot of preprocessing / external tools without a lot of obvious benefits. Choosing a good font is hard, I'd rather have them easy to swap in and out.

Using FreeType

FreeType is life, FreeType is love - it's basically the open-source package to read, interpret, and render .ttf (TrueType) font files. It usually goes a little something like that:

And then you have a black and white of each glyph somewhere in memory. This is easy to convert into an RGBA32 format OpenGL likes. Good! But what now? How can you display them? Well, you have a few solutions.

One texture per glyph

That was my first approach. If you handle each glyph as a separate sprite, you can just lay them out properly and you'll have text rendering. Don't go home quite yet though - that approach is perhaps the least efficient.

Let's say that, like me, you only care about characters between ASCII 32 and ASCII 127 (all the printable characters) - that makes 95 glyphs in memory, and each of those glyphs has:

And you have to use one OpenGL draw call per glyph. It might not seem like a big deal but let's put that into perspective - with a simple scene, 7 tile layers, 1 main character, and one line of text (about 20 letters long), 2/3 of our draw calls would be dedicated to text rendering. Not so hot.

The advantage of this approach, of course, is that even if you modify the text 60 times per second, the cost doesn't grow - all the textures, VBOs and stuff are already there, it's just a matter of drawing them in the right places.

One texture per text object

The way text works in dye, is that you have GlText objects that have a value string. One can assume at least a bit of coherency, e.g. they won't be updated every frame (and even if the value 'pointer' is updated every frame, e.g. in naive UI code, chances are the actual string contents are the same, and that's an easy equality test), so what we can do instead is forget about an individual texture per glyph, and have one texture per text object instead.

Whenever text changes (or at initialization), you need to do the following:

Now, that's bad on several levels - first off, chances are you'll be creating/destroying textures often. You could be smart (and complicate your code) by allocating textures bigger than needed and actually updating only a subset (with glTexSubImage2D) of the texture, but that doesn't get you off the hook - you're still blitting (copying bitmap data) by hand, on the CPU, using either loops or memcpy or whatever, and that's kind of silly when you have a GPU sitting around doing practically nothing.

So, for text that never ever changes, that might be the fastest approach - you only have one quad (well, two triangles, or one if you're smart about texture coordinates). For everything else, e.g. the UI in an action game that changes pretty much constantly, it's not that hot.

One texture per font-size tuple, one VBO per text object

That's the approach used in freetype-gl - from which I heavily drew inspiration for my own implementation.

The idea is to manually blit all the glyphs you need into a single texture that's tightly packed:

The "one texture per glyph" approach was wasteful in more ways than one - since we only want power-of-two texture sizes, often the texture was larger than needed - it's not a big deal if the glyph is 12x14 and the texture is 16x16, but a 257x257 glyph would then require a 512x512 texture.. and that gets wasteful quick.

Another way that approach was wasteful was that having that many different textures pretty much prevented us from having a single draw call for a text object, even if we wanted to - we'd have to sample from many different texture objects and I don't know a simple way to do that in OpenGL.

But by having all the glyphs for a font-size tuple in a single texture, we can have a single Vertex Buffer that contains a whole text object. In Dye, VBOs are usually laid out like this:

Since GL_QUADS is deprecated in recent versions of OpenGL, we need two triangles, aka 6 vertices, which makes 6 * 4 = 24 floats per character. Surely there's a way to reduce that even more, but I'm not going to cover it here. (Basically: GL_TRIANGLE_STRIP allows you to define a quad in only 4 vertices, but then you need degenerate polygons to move on to another quad elsewhere - so you go from 6N to 6N - 2, which is not a huge save).

But wait, we didn't even cover how to pack a font tightly in a single texture. When faced with a hard problem that's been solved before, pretend to read the paper, and then follow suit and port the C++ code.

You know what, let's do things proper. With pen and paper even. Let's dive into that algorithm.

The RectangleBinPack algorithm

RectangleBinPack is an algorithm and a data structure. It's basically a binary tree. You have to choose a 'bin size'. Let's try it with a 6x6 bin. At first, it looks like this:

There's only one node in our tree, we'll name it "root" (although nodes don't actually have names in the data structure). Its position is 0, 0 (the origin is on top left), and its size is 6x6. So far, so good.

Let's try to insert a 1x2 rectangle in there:

What happened? Well, the 1x2 rectangle fit in the 6x6 root node, the root node was a leaf (it had no children), so:

The root node is no longer a leaf node, so it no longer represents free space, but occupied space instead. l and r now span the space that remains free.

Let's insert a 3x1 rectangle now. We'll walk our node, depth-first:

We can start to see that the algorithm tries to leave big spaces free by the way it chooses split directions. We can also see that for a low occupancy, most shapes are going to fit in the shape of an inverted L (or a capital gamma if you want).

Let's do one last step to show something interesting. Let's insert a 5x2 rectangle. Same as before, we do a depth-first walk (implemented here by recursive function call):

But wait, where is l3? The tree tells us it's a 0x2 node at 6,1. The area of l3 is null, since its width is 0: we call that a "degenerate" node. We could add a special case in our algorithm to remove degenerate nodes from the graph, or we could leave it in: since no rectangle is ever going to fit in there, it doesn't hurt. Traversal performance isn't impacted either, because it is never going to split.

And there you have it. If you keep doing that you'll eventually reach a point where all your rectangles are snuggling together in a bin, or you'll reach a point where the bin's occupancy is too high to fit another rectangle in there, and then you can start over with a larger bin.

For dye, we start out with a 32x32 bin and then go on to try 64x64, 128x128, etc.

Future work

There's still a lot that could be improved on the work done so far:

Anyway, I hope you enjoyed the article! Let this article prove that I'm just an attention-grabber looking for fame and to make an easy buck by talking about popular subjects everybody wants to read about.

Until next time, take care!