Generic Mesh Refinement.pdf

Graphics Hardware (2005)

M. Meissner, B.- O. Schneider (Editors)

Generic Mesh Renement on GPU

Tamy Boubekeur & Christophe Schlick

LaBRI INRIA CNRS University of Bordeaux, France

Abstract

Many recent publications have shown that a large variety of computation involved in computer graphics can be

moved from the CPU to the GPU, by a clever use of vertex or fragment shaders. Nonetheless there is still one

kind of algorithms that is hard to translate from CPU to GPU: mesh renement techniques . The main reason

for this, is that vertex shaders available on current graphics hardware do not allow the generation of additional

vertices on a mesh stored in graphics hardware. In this paper, we propose a general solution to generate mesh

renement on GPU. The main idea is to dene a generic renement pattern that will be used to virtually create

additional inner vertices for a given polygon. These vertices are then translated according to some procedural

displacement map dening the underlying geometry (similarly, the normal vectors may be transformed according

to some procedural normal map). For illustration purpose, we use a tesselated triangular pattern, but many other

renement patterns may be employed. To show its exibility, the technique has been applied on a large variety

of renement techniques: procedural displacement mapping, as well as more complex techniques such as curved

PN-triangles or ST-meshes.

Categories and Subject Descriptors (according to ACM CCS) : 1.3.3 [Computer Graphics]: Line and curve generation

1.3.3 [Computer Graphics]: Display algorithms

1. Introduction

One ubiquitous technique to generate complex geometric

models in computer graphics, is to start from a coarse model

and apply some renement techniques to get the nal en-

riched model. The many renement techniques that have

been proposed in the literature over the years, can be divided

in two main families: displacement mapping that are usually

employed to add some geometric details to a coarse model,

and subdivision surfaces that are used to generate smooth

surfaces from a small number of polygons. Trying to get a

full GPU implementation of displacement mapping or sub-

division surfaces is challenging because of one major limita-

tion of current graphics hardware: the lack of geometry gen-

eration. Indeed, the rendering pipeline of current GPU has

been designed to efciently rasterize a huge list of indexed

polygons, but it is not able to generate more polygons than

those sent through the graphics bus, by the application run-

ning on the CPU. One major consequence is that in current

high-end applications, the whole mesh renement process

is done on the CPU, and the huge set of rened triangles is

then sent to the GPU. To avoid a bandwidth bottleneck on the

graphics bus, a trade-off has to be found, between uploading

detailed geometry or uploading detailed textures. This issue

explains why subdivision surfaces or displacement mapping

are mainly used in applications where the rendering is done

(a) CPU

(b) GPU

Figure 1: Starting from a coarse mesh (a) provided by the

CPU, the GPU uses a generic renement pattern (b) to cre-

ate an enriched mesh (c) in a standalone process, thus keep-

ing the CPU resources totally free to perform other tasks,

such as deforming or animating the coarse model.

off-line (e.g. movie industry) and are seldom implemented

in realtime 3D engines. Note that some specic tessella-

tion units (that were able to split some high-level geomet-

ric patch into a set of tris or quads) have sometimes been

included in specialized hardware (e.g. B-Spline and Bezier

tessellation on SGI hardware, Curved PN-Triangle tessella-

tion [ VPBM01 ] on the ATI Radeon 8500), but no current

standard hardware does include a generic tessellation unit,

that would enable a GPU implementation of multi-purpose

renement techniques.

In this paper, we propose a general principle to generate

mesh renement on existing GPUs that only requires an

c The Eurographics Association 2005.

Tamy Boubekeur & Christophe Schlick / Generic Mesh Renement on GPU

easy-to-implement vertex shader. The main idea is to dene

a generic renement pattern that will be used to virtually cre-

ate additional inner vertices for a given polygon. As a con-

sequence, the footprint of the rened mesh is dramatically

reduced, both on the CPU and the GPU memory, and the

amount of data that is transmitted through the graphics bus

becomes totally independent of the renement depth used

for the rendering (see Figure 1 ) .

When real-time constraints require some LOD rendering,

the mesh has to be stored n times, for n different levels of

detail.

When the mesh includes a dynamic behavior that cannot

be expressed by vertex shaders, it has to be updated from

the CPU at each frame.

When considering these limitations, it appears quite obvious

that a desirable solution would be to use only a coarse mesh

at the CPU level (such a coarse mesh can be efciently up-

dated and transmitted on a frame by frame basis, in the case

of complex dynamic behaviors) and to add some on-the-y

mesh renement process at the GPU level (which only gen-

erates the appropriate level-of-detail, according to the view-

ing parameters). This section presents the solution that we

propose for this mesh renement process.

2. Previous Works

In the eld of fast mesh renement for real-time applica-

tions, a large among of work has been done to optimize eval-

uation of various subdivision surface schemes (see [ ZS00 ]

for a survey). But, despite the popularity of subdivision

surfaces in high-end software environments, relatively few

work has been published about hardware assisted implemen-

tation of subdivision schemes. Bishoff et al. [ BKS00 ] have

proposed to use the forward differenciation to compute the

limit position of vertices for the Loop scheme [ Loo87 ] . Re-

cently, Bolz et Schroder [ BS02 ] [ BS03 ] have proposed an

hardware evaluation of the Loop scheme, including the ad-

ditional "shape factor" rules introduced by Biermann et al.

[ BLZ00 ] . The main problem with subdivision schemes such

as Loop or Catmull-Clark, is that the renement process

by subdivision schemes is heavily based on dynamic adja-

cency information. Such information can be easily provided

in software implementation by using clever multi-linked dy-

namic data structures, but is extremely difcult to trans-

late into hardware-friendly terms. As an alternative to high

quality subdivision methods, Vlachos et al. have proposed

a fast mesh renement method called Curved PN Trian-

gles [ VPBM01 ] . This method is based on the construction of

a triangular Bezier patches [ Far01 ] for each triangle, driven

by its 3 vertices and associated normals. As shown by Chung

and Kim [ CK03 ] , this renement scheme is well adapted to

hardware implementation, but it requires some specic tes-

sellation unit (note that such a specic tessellation unit has

actually been included in the ATI Radeon 8500 hardware in

2001). In contrast with our pure vertex shading approach,

and for the case of subdivision surfaces, fragment shad-

ing techniques has been intensively used to render rened

meshes, using for instance geometry images [ LHSW03 ] or

spiral-enumerated fragment meshes [ SJP05 ] .

3.2. Renement Pattern

(a) Coarse Mesh

in CPU Memory

(b) RP in GPU Memory (c) Rened Mesh

on Display

Figure 2: Our renement method only requires one single

renement pattern RP stored at full resolution as a vertex

buffer on the GPU memory. In our example, RP is a tessel-

lated triangle that is splitted into 8 different triangle strips

(each strip has a different color)

The main idea of our approach is to use one single Rene-

ment Pattern (RP) for each polygonal shape that exists in the

mesh. In the remainder of the paper, we only consider the

usual case of triangular meshes (and thus the case of a trian-

gular renement pattern), but the approach is still valid for

other polygonal shapes. This renement pattern is transmit-

ted once for all, from the CPU to the GPU, as a vertex buffer

containing a few strips (see Figure 2 ) . Let A T be the set of

attributes of a coarse triangle T . Typically A T contains the

3 vertex position, 3 vertex normals, 3 vertex colors, and 3

texture coordinates. We propose to render the rened mesh

with the following algorithm:

3. Mesh renement on GPUs

3.1. Overview

render (Mesh M )

for each coarse triangle T of M do

place A T as uniform variables for the vertex shader;

draw RP ;

While GPU-friendly primitives, such as display lists or ver-

tex buffer objects, dramatically reduce the CPU memory

footprint an d the trafc jam on the graphics bus (the model

is loaded once for all on the GPU memory and is then can-

celed from the CPU memory), it also involves some strong

limitations:

Actually, embedded within the draw function, there is a ver-

tex shader which transforms the vertices of RP toward the

position, rotation and scale of T (see at the end of this sub-

section for details). One major advantage of this algorithm,

is that the amount of data transmitted on the graphic bus is

The whole mesh has to t into the GPU memory (this may

be difcult for very detailed meshes).

c The Eurographics Association 2005.

Tamy Boubekeur & Christophe Schlick / Generic Mesh Renement on GPU

independent of the resolution of the rened mesh. The task

of the CPU is thus reduced to the uploading of the coarse

triangle attributes and eventually some additional proper-

ties that characterize the behavior of the rened mesh (for

instance, a random value for a fractal noise). The uniform

variables placed onto the GPU stay constant during all the

drawing process of RP . Let now suppose that a functional

f A T : [ 0 ; 1 ] 2 ! R 3 can be constructed over A T . To evalu-

ate f A T at each vertex V of RP , we have to recover its

parametrization f u ; v g onto T . Since RP is only used for

the topological storage purpose, we propose to encode the

parametrization at V as its position vector: V xyz : =f w ; u ; v g

where w = 1 u v . Now, during the vertex shading pass,

the GPU can clearly identify the parametrization f u ; v g for

each vertex V of RP , and thus evaluate its functional value

f A T ( u ; v ) . Of course, each attribute in the set A T may eventu-

ally be interpolated by a different functional. The most sim-

ple application of our principle is a tessellation of the mesh,

combined with some height eld texture. Let us consider the

position attributes f P 0 ; P 1 ; P 2 g of the current coarse triangle

drawn and the parametrization f 1 u v ; u ; v g (encoded as

the position of inner vertices) of each vertex V of RP . To

perform a simple tessellation, we just have to interpolate be-

tween f P 0 ; P 1 ; P 2 g to obtain the output position V xyz of V :

V xyz : = wP 0 + uP 1 + vP 2 = V x P 0 + V y P 1 + V z P 2

Figure 2 shows the principle of this scheme. Now, since we

have a regular parametrization of RP , it is straightforward

to use it to interpolate between the texture coordinates of

T . For instance, if a height map is stored on the GPU as a

texture, by querying a height value at each inner vertex V , a

displacement mapping generated at the resolution of the RP

is straightforwardly obtained.

different resolutions and by selecting the desired resolution

at rendering-time (see Figure 3 ) .

4. Examples

We have shown that the displacement mapping by height

eld texture was straightforward to integrate with our

method. But a large variety of other local mesh en-

hancements can be integrated in the vertex shader, and

consistently applied on the rened coarse meshes. This

section illustrates some of the results that we have obtained

with our generic on-the-y renement process.

(a) 1x1

(b) 16x16

Figure 4: Procedural displacement with high frequencies.

The smoothness of the function appears at a very high tessel-

lation rate. Our approach limits the GPU memory footprint

to only one tesselated triangle, with an equivalent framerate

to the complete tesselated object stored as a vertex buffer.

Procedural Displacement Mapping: The Shader Model

3 [ NVi ] has introduced the access to the texture memory

from the vertex shading stage. One direct consequence of

this extension is to allow GPU implementation of displace-

ment mapping. Unfortunately, several strong limitations

occur with such an implementation: First, texture look-up

with random access is very expensive when writing vertex

or fragment shaders, which means that the performance

is dramatically reduced in that case. The only solution to

get reasonable performance is to achieve coherent cache-

friendly texture access, which is far from being always

possible. Second, aliasing artefacts are quite common as

textures are not ltered when accessed by the vertex shader.

Third, this approach is not suitable for dynamic effects, like

waves on the ocean, for instance.

Following the work of Perlin, it is widely admitted that

many kinds of visual enrichments (especially dynamic vi-

sual enrichments) can be efciently replaced by a continu-

ous procedural function, rather than storing and interpolat-

ing a discrete texture. This is of special interest for displace-

ment mappings which often have dynamic behaviors. In our

framework, implementing procedural displacement mapping

simply means that the uploaded attributes at each coarse tri-

angle should embed all the mandatory data for the construc-

tion of the procedural displacement map on that triangle,

which usually means a few number of oating point values.

3.3. Levels-Of-Detail on GPU

(a) CPU

(b) GPU

Figure 3: Different LODs of a coarse model can be rendered

by simply selecting the appropriate RP vertex buffer on the

GPU. Here, the vertex shader smoothes the mesh while pre-

serving the sharp creases specied at coarse level.

A basic LOD scheme (depending on the distance of the ob-

ject to the current point of view) can be generated by storing

on the GPU a set of patterns RP 0 ; RP 1 :::; RP n computed at

The nice property of this approach is that the renement

pattern can be adapted to the level of renement needed

c The Eurographics Association 2005.

Tamy Boubekeur & Christophe Schlick / Generic Mesh Renement on GPU

to get correct antialiased effects. Figure 4 shows that,

even with pathological examples that exhibit very high

frequencies, the displacement process can be adapted to get

a nice aliasing free rendering. The displacement process

used in Figure 4 simply consists in an elevation of each

inner vertex P along its normal vector N according to a

sinusoidal function: a sin ( f jj P jj) , so the only additional data

to be transmitted to the GPU are the amplitude a and the

frequency f . The same principle is obviously applicable

to more complex functions such as the ubiquitous Perlin

Noise.

standard for all graphics cards. So, another contribution

of the present paper is to provide a generic solution to the

hardware implementation of the Curved PN-Triangles on all

programmable GPUs. With our method, the CPU has just to

precompute the 10 PN-Triangle coefcients (see Appendix

A ) associated to each coarse triangle, and transmit them

with the set of color attributes, before the rendering call

of the renement pattern RP . The evaluation of the Bezier

patch b ( u ; v ) is then performed on the GPU by the vertex

shader according to this bunch of uniform coefcients. Note

that these coefcients are the only information transmitted

through the graphics bus, whatever the level of renement

generated by the RP ; these coefcients may also be updated

at a very low cost, in the case of dynamic models. Figure

5 presents an example of our hardware PN-Triangles

renement on a coarse mesh.

(a) original

(b) 16x16

(a) original

(b) 16x16

Figure 6: Out renement method applied to ST-Meshes. The

per-vertex tension and sharpness shape parameters express

sharp creases that are propagated to the rened mesh.

Figure 5: Our renement method adapted to PN-Triangles.

Curved PN Triangles: The work of Vlachos et al. on

Curved PN-Triangles [ VPBM01 ] has demonstrated that

a purely local interpolating renement scheme can be

used to obtain a fast mesh smoothing, providing a visual

smoothness , despite the lack of high order geometric

continuity of the underlying patches. The principle of the

Curved PN-Triangle is quite simple: for each triangle of a

mesh, a cubic Bezier surface is constructed with the 3 points

P k and their 3 associated normal vectors N k . With such a

loose denition, it is not possible to generate a continuous

surface across edge boundaries of the initial mesh. So to get

aesthetic results that offer some kind of visual continuity,

Vlachos et al. have proposed to independently generate a

smooth normal eld over the mesh, by using a linear or

a quadratic interpolation of normal vectors (the general

process of PN-Triangles is briey recalled in Appendix A ) .

Scalar Tagged mesh: With the machinery proposed

by Vlachos et al., it becomes possible to propose a complete

surface control system during the renement. In particular,

a set of additional per-vertex scalar attributes can help to

manage other local differential geometry behaviors, such

as sharp edges or cone-like vertices. We have recently

introduced a generalization of PN-Triangles, called Scalar

Tagged Meshes (ST-Meshes, for short, also called Scalar

Tagged PN Triangles in [ BRS05 ] ), that allows the user

to manipulated some additional scalar shape parameters,

classied in two families:

vertex parameters : for instance, a tension parameter con-

trols the local curvature of the rened surface, at each ver-

tex;

edge parameters : for instance, a sharpness parameter

controls the amount of tangent plane discontinuity at each

edge, and a bias parameter controls the orientation of this

discontinuity.

Practically, Curved PN-Triangles offer a visual quality sim-

ilar to the Modifed Buttery subdivision scheme [ DLG90 ] ,

but are much more GPU friendly: they are local, require only

a constant amount of data, and can be directly applied on an

indexed face set representation, without any additional topo-

logical structure.

These shape parameters are then simply combined to modify

the Bezier coefcients used in the PN-Triangle model to gen-

erate the rened geometry and the normal vector eld. The

reader may nd the formulations for these modied coef-

cients and normal in Appendix B . Here again, the coefcient

are transmitted as uniform variables to the vertex shader, and

Unfortunately, even if Curved PN-Triangles have been

integrated in the previous generation of ATI hardware

(8500 family [ ATI ] ), they have never been specied as a

c The Eurographics Association 2005.

Tamy Boubekeur & Christophe Schlick / Generic Mesh Renement on GPU

the remainder of the process is totally similar as above. Fig-

ure 6 presents an example of our hardware renement pro-

cess applied on an ST-Mesh, where the scalar tags have been

set to dene two sharp creases.

the ST-Mesh presented on Figure 6 is made of 162 triangles.

We obtain the same framerate of 61 fps when it is rened to

331 938 triangles either statically (with a full vertex buffer

object) or dynamically (with a RP at 64x64, which corre-

sponds to a recursive subdivision at level 6, a quite usual

renement rate for CG models).

5. Performance

We have analyzed the whole performance of our system, by

comparing the rendering framerate and the memory foot-

print (number of stored triangles) of a given mesh stored at a

given resolution, either by using a static vertex buffer object

(VBO) storing the whole mesh as triangle strips, or by using

our generic renement technique where just one tessellated

renement pattern is actually stored. Our conguration is an

Intel P4 at 3.0 GHz, 1GB RAM, with a NVIDIA Quadro FX

4400 (PCIe). All the tests have been performed under MS

Windows using OpenGL Shading Language [ KBR04 ] . The

results are summarized in Figure 7 .

Limitations: The simple LOD scheme described in Sec-

tion 3.3 is not adaptative, and produces popping effects

[ Hop96 ] . One of our current works is to try to resolve this

constraint locally, at the RP level.

6. Conclusion

We have proposed an easy-to-implement method to obtain

a virtual mesh generation on the GPU. The contributions of

our approach are:

A very simple method to obtain a low cost tessellation of

meshes.

Coarse Triangles

100

1000

Static VBO

64x64 - FPS

973 : 10

107 : 46

11 : 94

A generic and economic hardware implementation of the

Curved PN-Triangle on standard programmable GPU.

64x64 - Vertices

42250

422500

4225000

A virtual ` on-the-y GPU tesselator unit , for the cost of a

linear interpolation between 3 vertices by rened vertex.

128x128 - FPS

271 : 85

29 : 87

3 : 97

128x128 - Vertices

166410 1664100 16641000

Our approach

64x64 - FPS

965 : 12

107 : 11

11 : 72

The possibility to render rened and enriched meshes

without neither storing nor transmitting through the

graphics bus, the high resolution ready-to-render mesh,

that would virtually not t in GPU memory for a very

deep renement.

64x64 - Vertices

4225

128x128 - FPS

269 : 35

29 : 05

3 : 81

128x128 - Vertices

16641

Figure 7: Rendering framerate and GPU footprint (in num-

ber of vertices) for the rendering of some tesselated models

(without culling).

Our method always involves a constant amount of data trans-

mission to the GPU, independently of the target resolution.

This approach is clearly interesting in the case of dynamic

objects, for completely local renement scheme and for pro-

cedural surfaces generated on simple shapes. Our approach

is particulary fruitfully for very deep renement and when

the bottleneck of an application using intensively tesselation

is identied on the CPU-GPU bus (AGP or PCI express for

instance).

One of our main future work will be to propose a better LOD

scheme, adaptive and geomorphic. We investigate also the

rendering of true subdivision surfaces and high quality para-

metric surfaces, by reducing to a xe size the per-triangle

information required.

The main result of our on-the-y renement technique is

clearly visible: the framerate obtained is equivalent to the

one obtained by storing the full renement mesh, but the

footprint is dramatically reduces, as expected. Globally, our

approach divides the GPU footprint by about the number

of input coarse triangles. Note that we have just given the

number of vertices, but the GPU memory also handles the

topology. After a strippication step, the static vertex buffer

topology represents about 50% of additional data to manage

in GPU memory, while the topology cost is totally negligible

with our approach. In terms of framerate, the slight perfor-

mance drop of our system is essentially due to the barycen-

tric interpolation between the 3 original vertices, when eval-

uating the position and normal at each inner vertex of the

RP . Actually, a simple tessellation is never used in practice

and is usua lly combined with more advanced surface rene-

ment procedures, like the ones presented in Section 4 . In

that case, the computation of the procedural function usu-

ally represents the most expensive part of the vertex shader,

and the framerates obtained with static vertex-buffer objects

and with our approach are totally equivalent. For instance,

References

[ATI] ATI: Ati technologies. http://www.ati.com. 4

[BKS00] B ISHOFF S., K OBBELT L., S EIDEL H.-P.: Towards

hardware implementation of loop subdivision. Proc. ACM Work-

shop on Graphics Hardware (2000). 2

[BLZ00] B IERMANN H., L EVIN A., Z ORIN D.: Piecewise

smooth subdivision surfaces with normal control. Proc. ACM

SIGGRAPH (2000). 2

c The Eurographics Association 2005.

Plik z chomika:

Inne pliki z tego folderu:

Inne foldery tego chomika: