Performance insights into info-beamer pi

Posted Jan 10 2015 by Florian Wesch

Performance has always been a top priority while developing info-beamer for the Raspberry PI. Since version 0.6.3 (released in november 2013) a lot has changed. It's time for some benchmarks and for some insights on how improvements were made.

Benchmark: font output

This is a node that just displays text of different sizes in various locations. It also uses random text (in the form of random numbers), so the internal text caching of info-beamer is tested.

Snapshot of font test benchmark output

Here's the performance (measured in frame per second/fps) observed in the different info-beamer versions starting with 0.6.3:

Benchmark of font test node

As you can see, font rendering was really slow in 0.6.3. Each character was rendered individually into a texture and then drawn by rendering those textures. Starting in 0.7.0 this changed: Now there is only one texture per font loaded. This texture contains all the characters required to render text. This technique is called font atlas and gives a huge performance boost when compared to rendering each character individually. Such a font atlas dynamically created by info-beamer pi looks like this:

font atlas generated by info-beamer pi

As you can see, info-beamer tries to be smart so it can store the maximum amount of characters in a single texture. If there is no room anymore a bigger texture is used and the atlas is regenerated. The atlas is shared among different nodes using the same fonts so there is no need to waste precious texture memory if the same font is used multiple times.

The rendering is done by uploading a bunch of squares to the GPU that can then be rendered in one batch. The uploaded information is cached so it doesn't have to be uploaded again if the text rendered doesn't change. So rendering a text that doesn't change is very fast.

Since uploaded text also uses memory on the GPU, info-beamer tries to optimize what's cached there: Text that has not been rendered recently is removed. This ensures a good balance between performance and memory usage.

Benchmark: A game scoreboard

This node was used at a small conference where players could compete in a programming game. The node displays the current scoreboard. The node was modified to show more spaceships that are all flying over the screen so it can be used as a benchmark.

Snapshot of scoreboard benchmark output

That's quite some space ships. Let's look that the performance graph:

Benchmark of scoreboard node

The small change between 0.6.3 and 0.7 can be attributed to the change in font rendering again. The huge jump in 0.8.1 is a result of the change made to the default shader used. In an OpenGL there are multiple projections that happen to anything rendered: There is a model, view and projection matrix that transform any object coordinates from the model coordinate to world space coordinates, then to camera space and finally to screen coordinates.

You can think of that as the following steps: The model matrix takes a virtual object and places it into the world. Now this objects exists somewhere. To be useful it should be visible. This is archived by moving it in front of a virtual camera. So the view matrix projects the objects in front of a camera and all its coordinates are now relative to the camera. Finally a projection matrix transforms those into a flat representation (kind of taking a picture of the object).

Each of those projections is represented by a matrix. Coordinates are then multiplied by each of those matrices. There is a trick to speed up these multiplications: You can compose these matrices once so you get a single model view projection matrix and can then be used to project coordinates directly from model space to screen coordinates. Before 0.8.1 this composition wasn't fully utilized: Only the ModelView and the Projection matrix where calculated. As a result, for each texel drawn there were two matrix multiplications instead of one. This slowed things down quite a bit. 0.8.1 fixes this mistake by precalculating the complete model view projection matrix once and then doing only a single matrix multiplication per texel. As you can see it makes quite a difference.

Benchmark: conference information

This is a node that was used at a conference. It shows information about the next talk. It uses a shader to animate the background so it displays a rotating vortex. A font is used to show some information about the next talk.

Snapshot of 30c3-room benchmark output

Benchmark of 30c3-room node

As you can see there's quite some improvement since 0.6.3. The reason for the jump between 0.6.3 and 0.7.0 is an improvement on how font rendering works (see the first benchmark). The change between 0.8.0 and 0.8.1 is a change in the way shaders are set up (see previous benchmark). Finally the change between 0.7.2 and 0.8.0 is related to a change in how output is rendered to the screen. Lets see that this means.

The Raspberry PI can output multiple layers of visual information stacked ontop of each other. Programs request a new layer to draw onto. When info-beamer starts it sets up a new layer and creates an OpenGL surface bound to it. That way output generated by OpenGL ends up on the screen.

Now there is a problem if you output content that uses a 4:3 aspect ratio on a 16:9 screen. You have to scale the output at some point so it looks correct. Previously info-beamer always rendered into a texture and then fitted this texture into the full screen output layer while preserving the aspect ratio.

In 0.7 this changed. Now info-beamer is smarter than that. Instead of allocating a fullscreen layer it requests a layer that corresponds to the size of the root node fitted into the available screen size and then renders directly into that layer. So there's no need anymore to render into an intermediate texture. This saves some GPU bandwidth since the root node output is directly rendered on the screen and not into a texture. The more resolution the root note requests (using gl.setup) the more performance you gain by that change.

Summary

Developing info-beamer is fun. It's an interesting challenge to find more ways that improve performance. Certainly 0.8.1, which is going to be officially stable in the next few weeks is the fastest info-beamer ever. Enjoy. As always, let me know if you have any feedback or questions.

Download the Prerelease

Performance insights into info-beamer pi

Benchmark: font output

Benchmark: A game scoreboard

Benchmark: conference information

Summary

Read more...

Recent blog posts

Learn more about info-beamer.com

Cloud Digital Signage

Hardware

About

Signage Toolkit

Use Cases

Developers

API Documentation