That the Hero is slower than the second-generation devices is no big surprise. However, the PowerVR chip in the Droid is slightly faster than the Adreno chip in the Nexus One, so the preceding results are a little bit strange at first sight. On further inspection we can probably attribute the difference not to the GPU power but to the fact that we call many OpenGL ES methods each frame, which are costly Java Native Interface methods. This means that they actually call into C code, which costs more than calling a Java method on Dalvik. The Nexus One has a JIT compiler and can optimize a little bit there. So let's just assume that the difference stems from the JIT compiler (which is probably not entirely correct).
Now let's examine what's bad for OpenGL ES:
■ Changing states a lot per frame (e.g., blending, enabling/disabling texture mapping, etc.)
■ Changing matrices a lot per frame.
■ Binding textures a lot per frame.
■ Changing the vertex, color, and texture coordinate pointers a lot per frame.
It all boils down to changing state really. Why is this costly? GPUs work like an assembly line in a factory. While the front of the line processes new incoming pieces, the end of the line finishes off pieces already processed by previous stages of the line. Let's try it with a little car factory analogy.
The production line has a few states, such as the tools that are available to factory workers, the type of bolts that are used to assemble parts of the cars, the color the cars get painted with, and so on. Yes, real car factories have multiple assembly lines, but let's just pretend there's only one. Now, each stage of the line will be busy as long as we don't change any of the states. As soon as we change a single state, however, the line will stall until all the cars currently being assembled are finished off. Only then can we actually change the state and assemble cars with the new paint/bolts/whatever.
The key insight is that a call to glDrawElements() or glDrawArrays() is not immediately executed. Instead the command is put into a buffer that is processed asynchronously by the GPU. This means that the calls to the drawing methods will not block. It's therefore a bad idea to measure the time a call to glDrawElements() takes, as the actual work might be performed in the future. That's why we measure FPS instead. When the framebuffer is swapped (yes, we use double-buffering with OpenGL ES as well), OpenGL ES will make sure that all pending operations will be executed.
So translating the car factory analogy to OpenGL ES means the following. While new triangles enter the command buffer via a call to glDrawElements() or glDrawArrays(), the GPU pipeline might finish off the rendering of currently processed triangles from earlier calls to the render methods (e.g., a triangle can be currently processed in the rasterization state of the pipeline). This has the following implications:
■ Changing the currently bound texture is expensive. Any triangles in the command buffer that have not been processed yet and that use the texture must be rendered first. The pipeline will stall.
■ Changing the vertex, color, and texture coordinate pointers is expensive. Any triangles in the command buffer that haven't been rendered yet and use the old pointers must be rendered first. The pipeline will stall.
■ Changing blending state is expensive. Any triangles in the command buffer that need/don't need blending and haven't been rendered yet must be rendered first. The pipeline will stall.
■ Changing the model-view or projection matrix is expensive. Any triangles in the command buffer that haven't been processed yet and to which the old matrices should be applied must be rendered first. The pipeline will stall.
The quintessence of all this is reduce your state changes—all of them.
Was this article helpful?