Articles Archive
Articles Search
Director Wiki

3DISO: Adapting Isometric Scrolling Theory to 3D Worldspace, Part 5

March 6, 2002
by Andrew M. Phelps

Part 1 of this article discussed some of the impetus behind moving our existing isometric engine over to Shockwave 3D, and the basic theory behind scrolling games.

In Part 2 we went into the math behind scrolling, and introduced the concept of isometric landscapes.

Part 3 discussed mapping systems, where the tile grid is given a coordinate system and how that affects movement strategies..

Pathfinding using the A* algorithm made up most of Part 4.

Sample Director 8.5 movie source is available for download in ZIP or SIT archive format (both files are approximately 1.2MB). If you have Shockwave 8.5, you can view the 3DISO engine (~185K).

4 Scrolling in a 3D World (continued)

4.2 Scene Objects and Object Hierarchy

Given that the engine generates a series of models within the 3D castmember, each of which uses a single and distinct mesh (we could reference the same mesh repeatedly if it is not deformed for elevation mapping, which would further increase performance), it is now inherently very useful to think of how best to store these objects for use. There are 3 requirements that the 3DISO engine places on the geometry used. First, it must be easy to transform all of the tiles as a unit, because they will be transformed on the xz plane with every character movement. Second, they must be able to be textured independently so that the engine can capitalize on the strengths of 2D engines and reuse the same bitmap wherever possible. Third, the objects should allow their vertices to be reset in real time so that the y value can be manipulated from map information to produce terrain. The first two of these criteria are relatively simple.

Any Shockwave 3D model can be textured independently, so that is a non-issue -- provided that each tile is represented by a single model within the SW3D hierarchy.

In order to move all of the tiles cohesively as a unit there are many options. The first of which is to loop through all of the tiles and transform each one independently. While this is possible, this is fairly slow, and not ideal at the speed with which interpreted code runs. The solution used in the code base provided is to create one additional tile -- perfectly centered under the character at startup -- known as the Bounding Tile. All other tiles are -- after initialization and positioning on the grid -- set as children to the Bounding Tile in the scene hierarchy. After this is done, any manipulation of the Bounding Tile is applied not only to that tile, but to all of its children, which in this case applies the transformation to all the tiles in the scene. Thus, instead of calling each tile upon movement, the character script can instead simply execute

call (#translate, gBoundingTile, (x), 0, z, #world)

with the effect of moving all tilea on the xz plane relative to the center of the world.

The third requirement, however, is difficult at best. In order to rapidly manipulate each vertex of each mesh of each tile in real time, this engine does not use the recommended methodology of dealing with vertex level transformations in 3D cast members. Specifically, 3DISO does not employ the use of the meshDeform modifier, as this was significantly slower when tested against our method. Instead, this engine stores for each "tile" an object of type tileRef which holds 2 pointers, the first to its model, and the second to the mesh-type model resource. When the RenderWorldFromMap handler loops through each tile to represent squares in the map, it sets the shader of the model through the model pointer, and manipulates the vertices of the mesh directly through the vertexList property obtained through the mesh pointer (instead of accessing the vertexList created by applying a meshDeform modifier to the model resource). This optimization was responsible for an approximate speed gain of 20%. While the exact reason for this increase remains unknown, one can only assume this is due to the extra object structure that the meshDeform modifier places between the handler call and the eventual vertex transformation. However, this is speculation.

A further optimization would be to remove the use of the two-dimensional array, and to move the texturing and vertex adjustment calls to a method inside the tile_ref object. This would then be called by using the optimized #call symbol, and passing a flat array consisting of all the tile objects in the scene. The speed increases between a flat list and a multi-dimensional array of objects is well-documented both by this author[10] and many others.[11] The structures are left as is in this incarnation, but will likely change in future releases.

4.3 Camera Locks and the Character-centric Viewpoint

Once the keyboard is appropriately mapped into daemon-driven controls, moving the camera to change the view of the world becomes a relatively simple exercise, with one small flaw. Because the engine does not use a two-dimensional system to display the isometric projection, it is free to change the viewpoint and decidedly ignore, in large part, all of the issues associated with classical isometric implementations. There are no corresponding issues with regard to perspective correction and lens simulation as this is (for the most part) built into the Shockwave 3D environment.

There are, however, two issues that deserve attention, one of which involves setting limits on the minimum and maximum height of the camera, and the slightly more complex issue of creating a rotation hierarchy to avoid "gimble-lock". The minimum and maximum value scenario is easily solved if one considers each "move" of the camera on the y (up) axis as a positive or negative increment to a global counter. Setting minimum and maximum values on this counter then leads to an effective bounding mechanism that is quicker than actually querying the world for the location of the camera in world-space (although this increase is, for the most part, marginal unless there is continuous camera animation).

The rotational problem is slightly more complex, and attention should be paid to the cause of this difficulty. If a camera can be said to "point at" the origin from a particular point in space, that camera will be unable to rotate correctly around the origin if it has already been rotated around the y axis at an angle that is not a multiple of 90 degrees. (Actually, this problem has nothing to do with the origin itself, it can be reproduced in any quadrant using any object that rotates around a point that is not its own center of reference). This can be seen if a camera is, say 200 units towards the viewer along the z axis, and 100 units "up" on the y axis. This camera is rotated to point at the origin (0 ,0, 0). Such a projection would, in fact, produce the standard isometric viewpoint described earlier. Now assume that the desired movement scheme is that of our world, meaning that the user is free to rotate around the y axis to face any direction, and free to rotate around the world-space x axis to tilt into either a bird's-eye view or a view in which the angle is very close to the ground plane. It's easy to tilt at first, because the camera is exactly on the world's z axis, and so to tilt we rotate around the world's x axis. If, however, the user has rotated the camera around the y axis, we now need a vector perpendicular to the facing direction of the camera, and horizontal to the ground plane. The world x and z axes no longer fits this description, and rotating around them produces undesirable results. In fact, this exact scenario is the classic description of gimble lock as originally discovered, which is the underlying inability of vector/matrix multiplication to solve this problem directly.

There is, however, an incredibly simplistic solution that exists in many animation packages (Maya, Max, etc.) and which is duplicated here. The idea is to represent a camera not with one object but with 2, a source and a target. The camera source is the point used for vertex projections, but the target is used for rotations. The source receives the same transformations as the target, relative to the target. If this target is placed at the origin, and rotated about the y axis the same number of degrees as the camera source then, when it is desirable to tilt the camera source, the camera source can be rotated about the x axis of the camera target rather than the world. This produces the correct transformation on the source point, and the projection is correct. In most modern systems, this is implemented by a 2-node hierarchy, assuming that transformations to the top of a hierarchy are perpetuated to the end, relative to the start node. Thus, in most animation systems, it is possible to create a camera target and a camera, and to make the target the "parent" of the camera. Often, this is done automatically within the user interface and the animator can simply animate them together or separately.

This system is implemented in the ISO3D engine by creating a "dummy" object (which is how animation systems implemented this solution before pre-built camera targets). This dummy is placed at the origin and rotated whenever the camera rotates on the y axis. Whenever the camera tilts relative to the ground plane, it rotates relative to the x axis of the dummy, this avoiding the traditional gimble-lock issue.

4.4 Height Mapping and Terrain Generation

The engine presented here uses a simple image to store the map height values. This is accomplished by the DrawMapFromImage and GetHeightFromPixel handlers shown in Figure 20. The first handler loops through all of the pixels of the image in such a way that the map will store a square for every 2 pixels in the image. Thus a 300x300 image produces a 150x150 tile map. The middle of a tile is computed by simply averaging the 4 corners. The actual height is computed based on the color of the pixel in an incredibly simple fashion (a more correct reader would convert the image to grayscale before computing the height from the pixel). All the pixel reader does is add the red green and blue channels, take the average value from 0-255 and return that value over a predefined constant divisor. This divisor can be thought of as a global scale, it has the effect of setting a maximum height for the land values to which all other values are scaled.

on DrawMapFromImage whichMember

  repeat with x = 1 to D3DISO[#gMapSizeX]
    repeat with y = 1 to D3DISO[#gMapSizeY]
      a = GetHeightFromPixel(whichMember, x,y)
      b = GetHeightFromPixel(whichMember, x+1,y)
      c = GetHeightFromPixel(whichMember,x+1, y+1)
      d = GetHeightFromPixel(whichMember,x,y+1)
      e = (a + b + c + d) / 4
      D3DISO[#gMap][x][y].tHeight = [a,b,c,d,e]
    end repeat
  end repeat


on GetHeightFromPixel whichMember, x, y

  daImage = member(whichMember).image
  c = 0
  c = daImage.getPixel(x,y)
  if c <> 0 then
    c = + + / 3
  end if
  return c / divisor


Figure 20: Method(s) to read height from image and plot terrain in map structure.

Andrew (Andy!) is a professor at the Rochester Institute of Technology (RIT) serving in the Dept. of Information Technology, specializing in Multimedia and Web Programming. While completing his MS in Information Technology, he became increasingly interested in multi-user virtual spaces. He is also developing a game programming curriculum, with an emphasis on Lingo based solutions as well as more traditional approaches. Visit his home at

Copyright 1997-2017, Director Online. Article content copyright by respective authors.