Generic Mesh Refinement.pdf

(2016 KB) Pobierz
680732659 UNPDF
Graphics Hardware (2005)
M. Meissner, B.- O. Schneider (Editors)
Generic Mesh Renement on GPU
Tamy Boubekeur & Christophe Schlick
LaBRI INRIA CNRS University of Bordeaux, France
Abstract
Many recent publications have shown that a large variety of computation involved in computer graphics can be
moved from the CPU to the GPU, by a clever use of vertex or fragment shaders. Nonetheless there is still one
kind of algorithms that is hard to translate from CPU to GPU: mesh renement techniques . The main reason
for this, is that vertex shaders available on current graphics hardware do not allow the generation of additional
vertices on a mesh stored in graphics hardware. In this paper, we propose a general solution to generate mesh
renement on GPU. The main idea is to dene a generic renement pattern that will be used to virtually create
additional inner vertices for a given polygon. These vertices are then translated according to some procedural
displacement map dening the underlying geometry (similarly, the normal vectors may be transformed according
to some procedural normal map). For illustration purpose, we use a tesselated triangular pattern, but many other
renement patterns may be employed. To show its exibility, the technique has been applied on a large variety
of renement techniques: procedural displacement mapping, as well as more complex techniques such as curved
PN-triangles or ST-meshes.
Categories and Subject Descriptors (according to ACM CCS) : 1.3.3 [Computer Graphics]: Line and curve generation
1.3.3 [Computer Graphics]: Display algorithms
1. Introduction
One ubiquitous technique to generate complex geometric
models in computer graphics, is to start from a coarse model
and apply some renement techniques to get the nal en-
riched model. The many renement techniques that have
been proposed in the literature over the years, can be divided
in two main families: displacement mapping that are usually
employed to add some geometric details to a coarse model,
and subdivision surfaces that are used to generate smooth
surfaces from a small number of polygons. Trying to get a
full GPU implementation of displacement mapping or sub-
division surfaces is challenging because of one major limita-
tion of current graphics hardware: the lack of geometry gen-
eration. Indeed, the rendering pipeline of current GPU has
been designed to efciently rasterize a huge list of indexed
polygons, but it is not able to generate more polygons than
those sent through the graphics bus, by the application run-
ning on the CPU. One major consequence is that in current
high-end applications, the whole mesh renement process
is done on the CPU, and the huge set of rened triangles is
then sent to the GPU. To avoid a bandwidth bottleneck on the
graphics bus, a trade-off has to be found, between uploading
detailed geometry or uploading detailed textures. This issue
explains why subdivision surfaces or displacement mapping
are mainly used in applications where the rendering is done
(a) CPU
(b) GPU
(c) Display
Figure 1: Starting from a coarse mesh (a) provided by the
CPU, the GPU uses a generic renement pattern (b) to cre-
ate an enriched mesh (c) in a standalone process, thus keep-
ing the CPU resources totally free to perform other tasks,
such as deforming or animating the coarse model.
off-line (e.g. movie industry) and are seldom implemented
in realtime 3D engines. Note that some specic tessella-
tion units (that were able to split some high-level geomet-
ric patch into a set of tris or quads) have sometimes been
included in specialized hardware (e.g. B-Spline and Bezier
tessellation on SGI hardware, Curved PN-Triangle tessella-
tion [ VPBM01 ] on the ATI Radeon 8500), but no current
standard hardware does include a generic tessellation unit,
that would enable a GPU implementation of multi-purpose
renement techniques.
In this paper, we propose a general principle to generate
mesh renement on existing GPUs that only requires an
c The Eurographics Association 2005.
680732659.005.png
Tamy Boubekeur & Christophe Schlick / Generic Mesh Renement on GPU
easy-to-implement vertex shader. The main idea is to dene
a generic renement pattern that will be used to virtually cre-
ate additional inner vertices for a given polygon. As a con-
sequence, the footprint of the rened mesh is dramatically
reduced, both on the CPU and the GPU memory, and the
amount of data that is transmitted through the graphics bus
becomes totally independent of the renement depth used
for the rendering (see Figure 1 ) .
When real-time constraints require some LOD rendering,
the mesh has to be stored n times, for n different levels of
detail.
When the mesh includes a dynamic behavior that cannot
be expressed by vertex shaders, it has to be updated from
the CPU at each frame.
When considering these limitations, it appears quite obvious
that a desirable solution would be to use only a coarse mesh
at the CPU level (such a coarse mesh can be efciently up-
dated and transmitted on a frame by frame basis, in the case
of complex dynamic behaviors) and to add some on-the-y
mesh renement process at the GPU level (which only gen-
erates the appropriate level-of-detail, according to the view-
ing parameters). This section presents the solution that we
propose for this mesh renement process.
2. Previous Works
In the eld of fast mesh renement for real-time applica-
tions, a large among of work has been done to optimize eval-
uation of various subdivision surface schemes (see [ ZS00 ]
for a survey). But, despite the popularity of subdivision
surfaces in high-end software environments, relatively few
work has been published about hardware assisted implemen-
tation of subdivision schemes. Bishoff et al. [ BKS00 ] have
proposed to use the forward differenciation to compute the
limit position of vertices for the Loop scheme [ Loo87 ] . Re-
cently, Bolz et Schroder [ BS02 ] [ BS03 ] have proposed an
hardware evaluation of the Loop scheme, including the ad-
ditional "shape factor" rules introduced by Biermann et al.
[ BLZ00 ] . The main problem with subdivision schemes such
as Loop or Catmull-Clark, is that the renement process
by subdivision schemes is heavily based on dynamic adja-
cency information. Such information can be easily provided
in software implementation by using clever multi-linked dy-
namic data structures, but is extremely difcult to trans-
late into hardware-friendly terms. As an alternative to high
quality subdivision methods, Vlachos et al. have proposed
a fast mesh renement method called Curved PN Trian-
gles [ VPBM01 ] . This method is based on the construction of
a triangular Bezier patches [ Far01 ] for each triangle, driven
by its 3 vertices and associated normals. As shown by Chung
and Kim [ CK03 ] , this renement scheme is well adapted to
hardware implementation, but it requires some specic tes-
sellation unit (note that such a specic tessellation unit has
actually been included in the ATI Radeon 8500 hardware in
2001). In contrast with our pure vertex shading approach,
and for the case of subdivision surfaces, fragment shad-
ing techniques has been intensively used to render rened
meshes, using for instance geometry images [ LHSW03 ] or
spiral-enumerated fragment meshes [ SJP05 ] .
3.2. Renement Pattern
(a) Coarse Mesh
in CPU Memory
(b) RP in GPU Memory (c) Rened Mesh
on Display
Figure 2: Our renement method only requires one single
renement pattern RP stored at full resolution as a vertex
buffer on the GPU memory. In our example, RP is a tessel-
lated triangle that is splitted into 8 different triangle strips
(each strip has a different color)
The main idea of our approach is to use one single Rene-
ment Pattern (RP) for each polygonal shape that exists in the
mesh. In the remainder of the paper, we only consider the
usual case of triangular meshes (and thus the case of a trian-
gular renement pattern), but the approach is still valid for
other polygonal shapes. This renement pattern is transmit-
ted once for all, from the CPU to the GPU, as a vertex buffer
containing a few strips (see Figure 2 ) . Let A T be the set of
attributes of a coarse triangle T . Typically A T contains the
3 vertex position, 3 vertex normals, 3 vertex colors, and 3
texture coordinates. We propose to render the rened mesh
with the following algorithm:
3. Mesh renement on GPUs
3.1. Overview
render (Mesh M )
for each coarse triangle T of M do
place A T as uniform variables for the vertex shader;
draw RP ;
While GPU-friendly primitives, such as display lists or ver-
tex buffer objects, dramatically reduce the CPU memory
footprint an d the trafc jam on the graphics bus (the model
is loaded once for all on the GPU memory and is then can-
celed from the CPU memory), it also involves some strong
limitations:
Actually, embedded within the draw function, there is a ver-
tex shader which transforms the vertices of RP toward the
position, rotation and scale of T (see at the end of this sub-
section for details). One major advantage of this algorithm,
is that the amount of data transmitted on the graphic bus is
The whole mesh has to t into the GPU memory (this may
be difcult for very detailed meshes).
c The Eurographics Association 2005.
680732659.006.png
Tamy Boubekeur & Christophe Schlick / Generic Mesh Renement on GPU
independent of the resolution of the rened mesh. The task
of the CPU is thus reduced to the uploading of the coarse
triangle attributes and eventually some additional proper-
ties that characterize the behavior of the rened mesh (for
instance, a random value for a fractal noise). The uniform
variables placed onto the GPU stay constant during all the
drawing process of RP . Let now suppose that a functional
f A T : [ 0 ; 1 ] 2 ! R 3 can be constructed over A T . To evalu-
ate f A T at each vertex V of RP , we have to recover its
parametrization f u ; v g onto T . Since RP is only used for
the topological storage purpose, we propose to encode the
parametrization at V as its position vector: V xyz : =f w ; u ; v g
where w = 1 u v . Now, during the vertex shading pass,
the GPU can clearly identify the parametrization f u ; v g for
each vertex V of RP , and thus evaluate its functional value
f A T ( u ; v ) . Of course, each attribute in the set A T may eventu-
ally be interpolated by a different functional. The most sim-
ple application of our principle is a tessellation of the mesh,
combined with some height eld texture. Let us consider the
position attributes f P 0 ; P 1 ; P 2 g of the current coarse triangle
drawn and the parametrization f 1 u v ; u ; v g (encoded as
the position of inner vertices) of each vertex V of RP . To
perform a simple tessellation, we just have to interpolate be-
tween f P 0 ; P 1 ; P 2 g to obtain the output position V xyz of V :
V xyz : = wP 0 + uP 1 + vP 2 = V x P 0 + V y P 1 + V z P 2
Figure 2 shows the principle of this scheme. Now, since we
have a regular parametrization of RP , it is straightforward
to use it to interpolate between the texture coordinates of
T . For instance, if a height map is stored on the GPU as a
texture, by querying a height value at each inner vertex V , a
displacement mapping generated at the resolution of the RP
is straightforwardly obtained.
different resolutions and by selecting the desired resolution
at rendering-time (see Figure 3 ) .
4. Examples
We have shown that the displacement mapping by height
eld texture was straightforward to integrate with our
method. But a large variety of other local mesh en-
hancements can be integrated in the vertex shader, and
consistently applied on the rened coarse meshes. This
section illustrates some of the results that we have obtained
with our generic on-the-y renement process.
(a) 1x1
(b) 16x16
(c) 256x256
Figure 4: Procedural displacement with high frequencies.
The smoothness of the function appears at a very high tessel-
lation rate. Our approach limits the GPU memory footprint
to only one tesselated triangle, with an equivalent framerate
to the complete tesselated object stored as a vertex buffer.
Procedural Displacement Mapping: The Shader Model
3 [ NVi ] has introduced the access to the texture memory
from the vertex shading stage. One direct consequence of
this extension is to allow GPU implementation of displace-
ment mapping. Unfortunately, several strong limitations
occur with such an implementation: First, texture look-up
with random access is very expensive when writing vertex
or fragment shaders, which means that the performance
is dramatically reduced in that case. The only solution to
get reasonable performance is to achieve coherent cache-
friendly texture access, which is far from being always
possible. Second, aliasing artefacts are quite common as
textures are not ltered when accessed by the vertex shader.
Third, this approach is not suitable for dynamic effects, like
waves on the ocean, for instance.
Following the work of Perlin, it is widely admitted that
many kinds of visual enrichments (especially dynamic vi-
sual enrichments) can be efciently replaced by a continu-
ous procedural function, rather than storing and interpolat-
ing a discrete texture. This is of special interest for displace-
ment mappings which often have dynamic behaviors. In our
framework, implementing procedural displacement mapping
simply means that the uploaded attributes at each coarse tri-
angle should embed all the mandatory data for the construc-
tion of the procedural displacement map on that triangle,
which usually means a few number of oating point values.
3.3. Levels-Of-Detail on GPU
(a) CPU
(b) GPU
(c) Display
Figure 3: Different LODs of a coarse model can be rendered
by simply selecting the appropriate RP vertex buffer on the
GPU. Here, the vertex shader smoothes the mesh while pre-
serving the sharp creases specied at coarse level.
A basic LOD scheme (depending on the distance of the ob-
ject to the current point of view) can be generated by storing
on the GPU a set of patterns RP 0 ; RP 1 :::; RP n computed at
The nice property of this approach is that the renement
pattern can be adapted to the level of renement needed
c The Eurographics Association 2005.
680732659.007.png
Tamy Boubekeur & Christophe Schlick / Generic Mesh Renement on GPU
to get correct antialiased effects. Figure 4 shows that,
even with pathological examples that exhibit very high
frequencies, the displacement process can be adapted to get
a nice aliasing free rendering. The displacement process
used in Figure 4 simply consists in an elevation of each
inner vertex P along its normal vector N according to a
sinusoidal function: a sin ( f jj P jj) , so the only additional data
to be transmitted to the GPU are the amplitude a and the
frequency f . The same principle is obviously applicable
to more complex functions such as the ubiquitous Perlin
Noise.
standard for all graphics cards. So, another contribution
of the present paper is to provide a generic solution to the
hardware implementation of the Curved PN-Triangles on all
programmable GPUs. With our method, the CPU has just to
precompute the 10 PN-Triangle coefcients (see Appendix
A ) associated to each coarse triangle, and transmit them
with the set of color attributes, before the rendering call
of the renement pattern RP . The evaluation of the Bezier
patch b ( u ; v ) is then performed on the GPU by the vertex
shader according to this bunch of uniform coefcients. Note
that these coefcients are the only information transmitted
through the graphics bus, whatever the level of renement
generated by the RP ; these coefcients may also be updated
at a very low cost, in the case of dynamic models. Figure
5 presents an example of our hardware PN-Triangles
renement on a coarse mesh.
(a) original
(b) 16x16
(a) original
(b) 16x16
Figure 6: Out renement method applied to ST-Meshes. The
per-vertex tension and sharpness shape parameters express
sharp creases that are propagated to the rened mesh.
Figure 5: Our renement method adapted to PN-Triangles.
Curved PN Triangles: The work of Vlachos et al. on
Curved PN-Triangles [ VPBM01 ] has demonstrated that
a purely local interpolating renement scheme can be
used to obtain a fast mesh smoothing, providing a visual
smoothness , despite the lack of high order geometric
continuity of the underlying patches. The principle of the
Curved PN-Triangle is quite simple: for each triangle of a
mesh, a cubic Bezier surface is constructed with the 3 points
P k and their 3 associated normal vectors N k . With such a
loose denition, it is not possible to generate a continuous
surface across edge boundaries of the initial mesh. So to get
aesthetic results that offer some kind of visual continuity,
Vlachos et al. have proposed to independently generate a
smooth normal eld over the mesh, by using a linear or
a quadratic interpolation of normal vectors (the general
process of PN-Triangles is briey recalled in Appendix A ) .
Scalar Tagged mesh: With the machinery proposed
by Vlachos et al., it becomes possible to propose a complete
surface control system during the renement. In particular,
a set of additional per-vertex scalar attributes can help to
manage other local differential geometry behaviors, such
as sharp edges or cone-like vertices. We have recently
introduced a generalization of PN-Triangles, called Scalar
Tagged Meshes (ST-Meshes, for short, also called Scalar
Tagged PN Triangles in [ BRS05 ] ), that allows the user
to manipulated some additional scalar shape parameters,
classied in two families:
vertex parameters : for instance, a tension parameter con-
trols the local curvature of the rened surface, at each ver-
tex;
edge parameters : for instance, a sharpness parameter
controls the amount of tangent plane discontinuity at each
edge, and a bias parameter controls the orientation of this
discontinuity.
Practically, Curved PN-Triangles offer a visual quality sim-
ilar to the Modifed Buttery subdivision scheme [ DLG90 ] ,
but are much more GPU friendly: they are local, require only
a constant amount of data, and can be directly applied on an
indexed face set representation, without any additional topo-
logical structure.
These shape parameters are then simply combined to modify
the Bezier coefcients used in the PN-Triangle model to gen-
erate the rened geometry and the normal vector eld. The
reader may nd the formulations for these modied coef-
cients and normal in Appendix B . Here again, the coefcient
are transmitted as uniform variables to the vertex shader, and
Unfortunately, even if Curved PN-Triangles have been
integrated in the previous generation of ATI hardware
(8500 family [ ATI ] ), they have never been specied as a
c The Eurographics Association 2005.
680732659.008.png
Tamy Boubekeur & Christophe Schlick / Generic Mesh Renement on GPU
the remainder of the process is totally similar as above. Fig-
ure 6 presents an example of our hardware renement pro-
cess applied on an ST-Mesh, where the scalar tags have been
set to dene two sharp creases.
the ST-Mesh presented on Figure 6 is made of 162 triangles.
We obtain the same framerate of 61 fps when it is rened to
331 938 triangles either statically (with a full vertex buffer
object) or dynamically (with a RP at 64x64, which corre-
sponds to a recursive subdivision at level 6, a quite usual
renement rate for CG models).
5. Performance
We have analyzed the whole performance of our system, by
comparing the rendering framerate and the memory foot-
print (number of stored triangles) of a given mesh stored at a
given resolution, either by using a static vertex buffer object
(VBO) storing the whole mesh as triangle strips, or by using
our generic renement technique where just one tessellated
renement pattern is actually stored. Our conguration is an
Intel P4 at 3.0 GHz, 1GB RAM, with a NVIDIA Quadro FX
4400 (PCIe). All the tests have been performed under MS
Windows using OpenGL Shading Language [ KBR04 ] . The
results are summarized in Figure 7 .
Limitations: The simple LOD scheme described in Sec-
tion 3.3 is not adaptative, and produces popping effects
[ Hop96 ] . One of our current works is to try to resolve this
constraint locally, at the RP level.
6. Conclusion
We have proposed an easy-to-implement method to obtain
a virtual mesh generation on the GPU. The contributions of
our approach are:
A very simple method to obtain a low cost tessellation of
meshes.
Coarse Triangles
10
100
1000
Static VBO
64x64 - FPS
973 : 10
107 : 46
11 : 94
A generic and economic hardware implementation of the
Curved PN-Triangle on standard programmable GPU.
64x64 - Vertices
42250
422500
4225000
A virtual ` on-the-y GPU tesselator unit , for the cost of a
linear interpolation between 3 vertices by rened vertex.
128x128 - FPS
271 : 85
29 : 87
3 : 97
128x128 - Vertices
166410 1664100 16641000
Our approach
64x64 - FPS
965 : 12
107 : 11
11 : 72
The possibility to render rened and enriched meshes
without neither storing nor transmitting through the
graphics bus, the high resolution ready-to-render mesh,
that would virtually not t in GPU memory for a very
deep renement.
64x64 - Vertices
4225
4225
4225
128x128 - FPS
269 : 35
29 : 05
3 : 81
128x128 - Vertices
16641
16641
16641
Figure 7: Rendering framerate and GPU footprint (in num-
ber of vertices) for the rendering of some tesselated models
(without culling).
Our method always involves a constant amount of data trans-
mission to the GPU, independently of the target resolution.
This approach is clearly interesting in the case of dynamic
objects, for completely local renement scheme and for pro-
cedural surfaces generated on simple shapes. Our approach
is particulary fruitfully for very deep renement and when
the bottleneck of an application using intensively tesselation
is identied on the CPU-GPU bus (AGP or PCI express for
instance).
One of our main future work will be to propose a better LOD
scheme, adaptive and geomorphic. We investigate also the
rendering of true subdivision surfaces and high quality para-
metric surfaces, by reducing to a xe size the per-triangle
information required.
The main result of our on-the-y renement technique is
clearly visible: the framerate obtained is equivalent to the
one obtained by storing the full renement mesh, but the
footprint is dramatically reduces, as expected. Globally, our
approach divides the GPU footprint by about the number
of input coarse triangles. Note that we have just given the
number of vertices, but the GPU memory also handles the
topology. After a strippication step, the static vertex buffer
topology represents about 50% of additional data to manage
in GPU memory, while the topology cost is totally negligible
with our approach. In terms of framerate, the slight perfor-
mance drop of our system is essentially due to the barycen-
tric interpolation between the 3 original vertices, when eval-
uating the position and normal at each inner vertex of the
RP . Actually, a simple tessellation is never used in practice
and is usua lly combined with more advanced surface rene-
ment procedures, like the ones presented in Section 4 . In
that case, the computation of the procedural function usu-
ally represents the most expensive part of the vertex shader,
and the framerates obtained with static vertex-buffer objects
and with our approach are totally equivalent. For instance,
References
[ATI] ATI: Ati technologies. http://www.ati.com. 4
[BKS00] B ISHOFF S., K OBBELT L., S EIDEL H.-P.: Towards
hardware implementation of loop subdivision. Proc. ACM Work-
shop on Graphics Hardware (2000). 2
[BLZ00] B IERMANN H., L EVIN A., Z ORIN D.: Piecewise
smooth subdivision surfaces with normal control. Proc. ACM
SIGGRAPH (2000). 2
c The Eurographics Association 2005.
680732659.001.png 680732659.002.png 680732659.003.png 680732659.004.png
Zgłoś jeśli naruszono regulamin