Improving Product Discovery of Tabletop RPG Maps (Preliminary Investigation)
As catalogs for digital marketplaces grow in size, customers have greater difficulty in finding products that meet their needs. Marketplace owners might improve product discovery by adding new categories, tags, or other searchable and filterable metadata to product descriptions. As it is often difficult to get product owners to update old product descriptions, a marketplace owner may use automated methods to “backfill” product data. We investigate using multi-modal computer models to extract data from digital products, specifically, digital maps for tabletop role playing games. Across five tasks, we find that models perform comparatively with their ranking on the Open VLM Leaderboard. State-of-the-commercial art models perform well with zero-shot feature extraction and image association/captioning tasks, but all models perform poorly with reasoning and quantitative tasks.
Introduction
Tabletop role playing game maps primarily serve two roles: a way for players and the game master to track positions of characters and creatures via miniatures or tokens (see battlemap below), or a depiction of an area for the game master’s use as they run an adventure which denotes buildings, catacombs, traps, and locations of creatures (see dungeon map below). The former is either printed and placed in the center of the table during play or displayed using virtual tabletop (VTT) software.

Example Battlemap/VTT-style Map (source: Owlette Publications LLC)

Example Game Master Dungeon Map
Game masters need to marry a map with their intended adventure and campaign setting, so the choice of a map is multi-factor decision. Although game masters often make their own maps for an adventure, high-quality maps require skill, creativity, and can be time-consuming to make, so game masters may instead purchase maps. Multiple marketplaces exist to sell maps created by independent publishers. The marketplaces support product discovery via categorization filters, free-form text search, and limited recommendation systems.
Since maps are digital goods and marketplaces have access to the product (marketplaces store the product within their databases), a marketplace may use computer vision technology to extract product data and augment their product discovery tools and database. As a means to maintain accuracy, publishers could approve any additional data being added to the product description or database. A possible activity flow for this process could be:
- Publisher creates product description and uploads product to Marketplace
- Model (mediated by the Marketplace) processes the product and computes additional metadata
- Publisher examines and approves additional metadata
- Customers use tools provided by the Marketplace to discover products
Computer vision technology has improved greatly in the past few years, particularly in its ability to identify and extract objects and other features from an image. The combination of disparate neural network architectures allows model to interpret image data with semantically-laden text, greatly improving performance on tasks such as image captioning and visual question and answering.
Methodology
As a preliminary study, our objective is to gain a sense of the current quality of existing models for this domain, gain insight on what engineering efforts will be required for a large scale project, and characterize some of the likely challenges such a project would encounter.
We evaluated zero-shot performance of five existing multi-modal neural networks using five prompts or tasks. The prompts were:
List locations or spaces featured within the map. Format as a bullet list.
List items featured within the map. Format as a bullet list.
In units of the listed map scale or map grids, if they exist, what are the dimensions of this map?
Is movement within this map mostly linear or mostly branching?
List types of adventures or scenarios this map is suitable for. Format as a bullet list.
The first two prompts focus on identifying and extracting features, the former focusing on locations where characters may be placed, while the latter on items the characters might pick up or interact with. The third prompt involves interpreting, quantitatively, a map grid or scale. The fourth prompt involves semi-subjective interpretation of the map features along with the external concept of movement. The fifth prompt tests themed open-ended captioning.
We selected five multi-modal LLMs for the evaluation (see table below). Since we would like to determine a necessary “minimum strength,” we selected models across a variety of benchmark levels. The Avg Score column in the table acts as a proxy for the model’s general strength (source: OpenVLM Leaderboard).
Short Name | Model | Avg Score | Type |
---|---|---|---|
Gemini | Gemini-2.5 Pro Preview 05-06 | 80.1 | API |
GPT | GPT-4.1 20250414 | 75.9 | API |
Pixtral | Pixtral-12B | 61 | OS |
Gemma3 | Gemma3-4B | 55.4 | OS |
LLaVA | LLaVA-v1.5-7B | 36.9 | OS |
All models were evaluated using the same core prompts. We adjusted the prompts to fit model requirements (e.g. adding ‘USER’ and ‘ASSISTANT’ breaks), but the core prompt was constant.
We did not supply contextual data to any of the models. Each prompt was presented as a stand-alone invocation rather than a conversation to avoid interacting effects. Some models required converting images to RGB mode first, but we otherwise did not scale, crop, or convert the images for a specific model.
For our dataset, we randomly selected products from a digital marketplace. All products were created between 2022 and 2024, inclusive. We dropped products from our selection if they were city or world maps or were not maps (e.g. collections of images meant to build new maps). Since most products include multiple maps and map variants (e.g. day and night), we selected a single map from each product, preferring ones meant for VTT use and variants with labeled content.
Processed File | Size (MB) | Width x Height (pixels) | Type |
---|---|---|---|
M001.jpg | 1.2 | 1800 x 3648 | VTT |
M002.jpg | 10.0 | 6000 x 4800 | VTT |
M003.jpg | 1.5 | 2380 x 2380 | VTT |
M004.jpg | 0.9 | 2160 x 2880 | VTT |
M005.jpg | 8.1 | 8400 x 8400 | VTT |
M006.jpg | 0.9 | 2523 x 3220 | GM |
M007.jpg | 2.1 | 2523 x 3220 | GM |
M008.jpg | 0.6 | 1400 x 1400 | VTT |
We ran the three open-source models on a 16-core, 64-gb virtual machine without an attached GPU. Our code used the HuggingFace transformer library. We called the two API models using their commercial endpoints.
Results and Evaluation
For each of the five prompts, we evaluated the quality of output with prompt-specific measures and human judgement. Although we selected the sample images randomly, this study does not include sufficient data points for a signficance test. Instead, as a preliminary study, we want a rough measure of baseline capability in order to identify potential research directions and likely necessary data engineering efforts.
For transparency, we report the output of the models for each task within a table. However, the output has been edited for uniformity and space. The models often output filler sentences, repeat themselves, and change formatting. Any large-scale application of these models will require an engineering effort to standardize the outputs.
Inference Time and Money
Inference times in seconds, not including any file upload times, were:
Model | Min. Inf Time | Med. Inf Time | Max. Inf Time |
---|---|---|---|
Gemma3 | 118 | 122 | 189 |
LLaVA | 235 | 261 | 326 |
Pixtral | 905 | 1193 | 1425 |
Gemini | 31 | 49 | 290 |
GPT | 33 | 39 | 50 |
Our intent here is not to rigourously benchmark inference performance but to give a sense of the scale of latencies. The APIs, which are presumably tuned and run on high-end machines, needed thirty to sixty seconds to evaluate five prompts on a image. Run locally on a high-end machine that lacks a GPU, inferences took multiple minutes to complete.
The marginal cost of calling the API services was between $0.002 to $0.003 per image. The APIs use a variety of factors to calculate a cost in terms of tokens, but this calculation is dominated by the size of the image in pixels.
To compute the marginal cost using for local inference, we use the cost of running a virtual machine per hour and processing throughput. The virtual machine used for this study costs $0.54/hour. This translates to costs from $0.018 to $0.18 per image processed, depending on the model.
We expect an operator could process 10,000 images for less than $100.
Prompt 1: Locations
List locations or spaces featured within the map. Format as a bullet list.
This task tests the model’s performance at finding map features that are conceptually a “room” or defined space. Characters should be able to occupy a location. For each response, we apply a subjective criteria that the list describes rooms that are accurate and meaningful in a product query.
We’ve summarized the results into the table below. Both Gemini and GPT extracted useful details consistently from the images, although tuning or other filters may be required to make the outputs useful. The three hosted models typically had poor quality. When labels were present, all models preferred to extract those labels rather than interpret details in the image.
Processed File | Gemma3 | LLaVA | Pixtral | Gemini | GPT |
---|---|---|---|---|---|
M001.jpg | Fail | Pass | Fail | Fail | Pass |
M002.jpg | Fail | Fail | Fail | Pass | Pass |
M003.jpg | Fail | Pass | Fail | Pass | Fail |
M004.jpg | Fail | Fail | Fail | Pass | Pass |
M005.jpg | Fail | Fail | Fail | Pass | Pass |
M006.jpg | Fail | Fail | Fail | Pass | Pass |
M007.jpg | Fail | Fail | Pass | Pass | Fail |
M008.jpg | Fail | Pass | Fail | Pass | Pass |
Details
M001 shows the deck of a ship. There is one room to the stern with a hammock, table, barrles and a crate. There is also a small room to the stern holding an anchor. The middle area hosts four cannon, two to a side, along with more barrels, crates, a ladder, and a grate. The aft is separated into two rooms, each with a hammock and a table with chair. All the models pick up on the naval theme. Pixtral picks up on the contents of rooms, but none of the rooms themselves. Gemma3 reports a “Captain’s Quarters” and a “Navigation Station,” but it is unclear what room is affiliated with either description. Similarly, Gemini uses general terms to refer to sections of a boat, without reference to specific rooms on the map, and reports “lifeboats/rafts”. GPT fares better by referring to “Crew quarters” and “Private cabins” without suggesting specific roles.
M002 is a cliff-face, dominated by a large skull where the mouth opens into a cave. The cliff face is embedded with single-room buildings connected by stairs. The ground features a small camp and some streams crossed with bridges. None of the models pick up on the skull imagery, although all models except Pixtral report an entrance to a cave. Gemma3 fails to identify the buildings and the camp while LLaVA combines these features into a generic “small village” label as well as scales the small rocks in the streams into islands. GPT provides the most complete and detailed list, followed by Gemini.
M003 is a large tree with a small opening that leads to an interior, hollowed-out room that is likely a lair for a creature. LLaVA directly extracts the hollowed-out nature of the room (albeit encounters a model loop and keeps repeating the fact). Gemini reports the fact somewhat indirectly, referring to it as a “central clearing/nest”. The other three models report the existence of the tree and note other elements of the image, but not the interesting part of the image — the hallow.
M004 is a complex map, featuring a bridge leading to a fortress of rooms, with internal paths and stairways connecting detailed rooms. LLaVA fails and simply repeats “A dungeon with a staircase” repeatedly. Other than Gemini, the other models associate rooms with functions in a non-apparent manner. Is a room with a table and three seating places the “dining hall”? Which room is a “jail” or “prison”?
M005 is a difficult map for a human to interpret since the color range is low contrast and the buildings are largely collapsed with few distinguishing marks. Of the open source models, Pixtral does best by listing “structures”. None of the open source models identify the stream or river in the picture, but both API models do. Gemini’s descriptions are matter-of-fact, while GPT’s are more detailed and add color (not necessarily supported by the image).
Both M006 and M007 use traditional RPG cartography as found in TSR publications in the 80s and 90s (similar to the GM map above). Rooms are labeled and, while doors, traps, and secret passages are marked, rooms are not populated with images of their contents. Gemini and GPT extracted all the room labels and other markers from the map, however, GPT also added labels to each room number with descriptions that fail to follow the actual room layouts. Of the three opensource models, Pixtral performed the best by extracting 80% in M006’s case and one-third of the rooms in M007’s case. From a product discovery, extracting room labels is not useful by itself, as customers are not searching for a map that includes “room 33”. However, customers may be interested in maps with a certain number or range of rooms, so the output may be useful as part of an analysis pipeline.
Map M008 is a octagular bar or tavern. Gemma3 and Pixtral extract elements of the image, but fail to extract the meaning of the elements. LLaVA, Gemini, and GPT identify the bar and the other dining/drinking elements.
Responses
Processed File | Ground Truth | Gemma3 | LLaVA | Pixtral | Gemini | GPT |
---|---|---|---|---|---|---|
M001.jpg | (5 locations) Three living areas/berths Midship/Cannon firing Anchor storage |
Deck Captain’s Quarters Navigation Station |
Cabin Storage area Wooden planks Sail Anchor Rope Sword Ship Cannon Barrel Chest |
Sail Barrel Anchor Rope Cannon Ladder Grating Wheel Oar Barrel Barrel Barrel Barrel Barrel Barrel Barrel Barrel Barrel Barrel Barrel |
Deck Cannon emplacements Central grate/hatch Bow (forecastle area with anchor, ropes, and supplies) Stern (aftcastle area with cargo) Lifeboats/rafts |
Forecastle deck Storage areas Weapon/armory area Cannon stations Main deck Stairwell/ladder access Crew quarters Private cabins or enclosed spaces Miscellaneous deck equipment |
M002.jpg | House/Bedroom (5) Bridge (3) Entry to cave Camp Rivers |
Large Mountain Cave Entrance River (Two connected streams) |
A river flowing through the middle of the map A waterfall in the middle of the map A cave with a waterfall inside A bridge over the river A small village with buildings and a water source A large mountain with a waterfall on it A small island in the middle of the river A large island in the middle of the river A small island in the middle |
House with multiple windows Open area with a well House with a large door House with a small entrance House with a large entrance House with a small window House with a large entrance and a small window House with a small entrance and a small window House with a large entrance and a small window House with a small entrance and a large window House with a large |
Cave entrance River Stone bridges Buildings/Huts Staircases Tents/Makeshift shelters Grassy areas/Vegetation Rocky cliffs/Cave walls Small pond/Pool of water Central plaza/Open area |
Cave entrance in the cliff face Rocky cliffside and surrounding rock formations Multiple interior rooms built into the cliffside Room with beds and tables Room with chests and barrels Room with beds Room with beds and a table Room with crates and supplies Water streams and river systems running through the map Multiple stone bridges crossing the water A central tent camp area with several tents Grassland and forested areas surrounding the water Scattered trees, shrubs, and stones throughout the landscape |
M003.jpg | Room inside tree hollow | Central massive tree with a large root system Small patches of grass |
A large tree with a hole in the center A small tree with a hole in the center A tree with a hole in the center A tree with a hole in the center A tree with a hole in the center A tree with a hole in the center A tree with a hole in the center A tree with a hole in the center A tree with a hole in the |
Central Tree Trunk Upper Branches Lower Branches Surrounding Bushes Ground Cover Small Rocks Fallen Leaves Small Plants Grass Patches |
The Great Tree Central Clearing/Nest Surrounding Forest Floor Tree Stumps Bone Pile Debris/Wreckage Thickets/Bushes Smaller Trees Fallen Logs/Branches |
Large central tree stump Exposed and sprawling tree roots area Central pool or depression with muddy water Broken wooden debris in the central depression Small clusters of bushes and shrubs around the roots Scattered tree stumps throughout the map Various fallen logs and branches Surrounding dense forest/wooded edge Patches of colorful foliage Open grassy clearings between roots and trees |
M004.jpg | Bridge Gate Magic defense rings 10 Rooms |
Main Entrance Grand Hall Library Armory |
A dungeon with a staircase A dungeon with a staircase A dungeon with a staircase A dungeon with a staircase A dungeon with a staircase A dungeon with a staircase A dungeon with a staircase A dungeon with a staircase |
Entrance/Exit Hallway Room with Red Light Control Room Storage Room Laboratory Prison Cells Torture Chamber Dungeon Bridge Chasm |
Chasm/Abyss Fortified walkways/ledges along the chasm Balconies with glowing teal orbs and spikes Rooms (possibly barracks, studies, or chambers) Stairways Areas with barrels and crates (storage) Circular platforms with magical symbols Wooden bridges or walkways connecting upper levels Corridors/Passages Small alcoves or guard posts along the walls |
Entry hallway/corridor Main hall or crossroads Stairs Small bedrooms or chambers Armory or storage rooms Dining area or mess hall Office or study Ritual or summoning room Jail or holding cells Guard posts or sentry rooms Bridge Balcony/ledge overlooking a chasm Arcane or crystal chambers Tentacle-filled chasm/abyss Storage alcoves with barrels and sacks |
M005.jpg | Rooms Stream |
Central Chamber Collapsed Tunnel 1 Collapsed Tunnel 2 |
A large, dark room A small, dark room A large, dark room with a clock A small, dark room with a clock A large, dark room with a clock and a small, dark room with a clock A large, dark room with a clock and a small, dark room with a clock A large, dark room with a clock and a small, dark room with a clock |
Top-left circular structure Top-right circular structure Top-center rectangular structure Middle-left square structure Middle-right square structure Bottom-left square structure Bottom-right square structure Central irregularly shaped area |
Glowing Purple River Star-Patterned Circular Chamber Staired Circular Dais/Platform Northern Ruins Southern Ruins Rocky Island |
Circular ritual or summoning chamber Large ruined rooms or halls Central underground river or stream Rocky outcroppings and cave formations along the river Isolated rocky island or promontory (center of river) Small ruined structure or shrine Passageways and corridors connecting various rooms and spaces Arched or circular entrance/exit Crumbled and partially-collapsed areas throughout map |
M006.jpg | Rooms (60) | 11 12 13 14 |
10..27 | 1..19a | 1..58 (all room labels) | Room 1 .. Room 60 Multiple staircases Multiple well/circle rooms Several small antechambers and corridors Spaces marked "b" and "c" (possibly doors or barriers) Decorative/statue symbols in some rooms |
M007.jpg | Rooms (24) Forest area |
24 23 22 21 |
23 24 Lists 25..40 (none exist) |
1..20 | 1..24 (all room labels) | 24 entries described like so. Room descriptions do not match the map 1. Cave entrance (northernmost entry) 2. Small cavern 3. Room with barrier/door 4. Chamber with central structure 5. Large winding corridor |
M008.jpg | Outside table (4) Inside bar |
Octagonal Table Stone Floor Walls Chairs |
A large wooden structure A dining area with tables and chairs A bar with a variety of drinks A large wooden table A large wooden table A large wooden table A large wooden table A large wooden table |
Central Table Barrels Benches Tables Floor |
Central Hexagonal Building/Bar Bar Counter Service Area (behind the bar) Spiral Staircase Seating Area (inside) Awnings Outdoor Paved Area Outdoor Tables |
Central main building/structure Central bar or serving counter Multiple colored seating areas (tables and chairs) around the main room Four exterior market stalls or vendor stands Four exterior round tables with benches Surrounding open plaza or paved area |
Prompt 2: Items
List items featured within the map. Format as a bullet list.
This task is intended to evaluate sub-image identification within a larger image, focusing on images that are “items” — things characters could interact with, versus rooms the characters could inhabit. This is a feature associated with “battle maps” or maps that will be seen by players, versus maps only seen by the game master. Thus, we remove M005, M006, and M007 from the evaluation of this task. M005 is excluded because the maps lacks details of items and M006 and M007 are not battle maps.
This task led the LLaVA model to repeat itself. For clarity, we have removed duplicate entries.
For this task, we use Jaccard similarity, also called intersection over union, to quantify how effectively the models are finding all the items. The score ranges from zero to one, where one would indicate the model replied with all of the items in the ground truth and without any extras. Mathematically:
$$ J(A, B) = \frac{|A \cap B|}{|A \cup B|} $$
For our evaluation, we ignore plurality and counts. For example, “cannon” is treated as a correct answer for M001, as well as “cannons” and “cannon (6)”, even though there are only four.
The scores for the models on M001, M002, M003, M004, and M008:
Processed File | Gemma3 | LLaVA | Pixtral | Gemini | GPT |
---|---|---|---|---|---|
M001.jpg | 2 / 14 (0.14) | 5 / 15 (0.33) | 6 / 16 (0.38) | 10 / 14 (0.71) | 12 / 14 (0.86) |
M002.jpg | 0 / 7 (0) | 0 / 11 (0) | 1 / 10 (0.10) | 4 / 16 (0.25) | 4 / 13 (0.31) |
M003.jpg | 0 / 4 (0) | 0 / 4 (0) | 1 / 11 (0.09) | 3 / 11 (0.27) | 2 / 11 (0.18) |
M004.jpg | 0 / 20 (0) | 0 / 20 (0) | 2 / 20 (0.10) | 20 / 20 (1.00) | 20 / 20 (1.00) |
M008.jpg | 2 / 7 (0.29) | 2 / 5 (0.4) | 3 / 7 (0.43) | 5 / 9 (0.56) | 4 / 11 (0.36) |
For the open source models, their errors were often ones of omission – not reporting elements of an image. In contrast, the API model scores were damaged by reporting non-existent elements or reporting elements that did not fit within the definition of an item. This is a case where further prompt engineering might significantly improve the quality of the answers.
Responses
Processed File | Ground Truth | Gemma3 | LLaVA | Pixtral | Gemini | GPT |
---|---|---|---|---|---|---|
M001.jpg | Hammock Barrels Crates Sword Table Bottles Anchor Rope coil Cannon (4) Ladder Grate Mast bases |
Wooden Deck Anchor Two Cannons Cannon Balls |
Wooden barrels Sword Anchor Book Rope Clock Chest Barrels |
Barrels Anchor Ladder Grating Ropes Cannon Oars Chests Wooden Planks Wheel |
Cannons Wooden barrels Wooden crates Lifeboats (or folded sails) Ladder Grate Mast bases (circular objects) Anchor Rope coil Bottles Cutlass/Sword Planks/Decking (as part of the ship structure) |
Cannons (6) Barrels (several) Wooden crates (several) Hammocks (4) Anchor Rope coil Table with a sword and bottles Ladders (2) Grate/hatch Round hatches/openings (3) Wooden planks (flooring) Small chest/box Small round table or stool |
M002.jpg | Tent Bed Table Crates |
A large cave A river Stone buildings |
Waterfall River Water source Cave Mountain Buildings Trees |
Several buildings, some with windows and doors Multiple bridges crossing over waterways A winding river or stream Various patches of green vegetation Rocky terrain with large boulders and outcrops A few small structures resembling huts or shelters |
Cave entrance River/Stream Waterfalls Stone bridges (two) Houses/Buildings Stairs Trees/Bushes Grassy areas Rocks/Boulders Tents/Tarpaulins Pond/Small body of water Crates and Barrels Beds Tables and Chairs Fireplaces Pathways/Dirt tracks |
Mountain/cliff face with a large cave entrance Flowing river/stream starting from the cave and branching out Stone bridges crossing the river/stream Rocky terrain with patches of grass and greenery Multiple tents set up near the river Various stone buildings/rooms carved into the mountain/cliff sides Furnishings inside the buildings such as beds, tables, barrels, crates, and sacks Stairs leading up to the stone buildings/rooms Ruined or broken bridge Small pond or pool Freestanding boulders and rocks scattered throughout Trees and bushes dotting the landscape Pathways and trails winding between features |
M003.jpg | Shrubs Tree stumps Pile of bones |
A massive, ancient tree with a wide, gnarled root system | A large tree with a hole in the center | Large tree with extensive roots Small plants Bushes Rocks Fallen leaves Branches Moss Flowers Grass patches |
Giant tree roots Campfire Pile of bones Trees Bushes Tree stumps Fallen logs Mushrooms Red berries or fruit Grassy ground Dirt/muddy patch |
Giant tree stump with extensive roots Central hollow area within the stump, partially filled with water or mud Broken wooden cart or debris in the central hollow Various bushes and shrubs scattered around the roots Tree stumps from felled trees Fallen logs and branches Rocks and boulders Diverse trees with autumn-colored foliage surrounding the area Small patches of flowers or undergrowth Leaf litter and scattered debris |
M004.jpg | Special | A large, damp, stone room, reminiscent of a cavern | A dungeon with a staircase | Multiple rooms with various symbols and colors Red symbol in a room at the top left Yellow symbol in a room at the top center Blue symbol in a room at the top right Central corridor with two prominent rooms Left room with a yellow symbol Right room with a red symbol Mechanical elements Tentacle-like structures with green gems Chains and |
Stone walls and floors Wooden bridges Chains Large, tentacle-like appendages Barrels Crates Tables Chairs Rugs/Carpets Beds or cots Stairs Torches or lanterns providing light Books, scrolls, or papers A pentagram or similar magical symbol on the floor An abyss or chasm below the structures Wooden planking or scaffolding Doors or archways Chests or coffers Tapestries or cloth hangings Pillars or columns Balconies or ledges Large, spiked, metallic objects A circular stone platform or area |
Wooden stairs and ladders Stone staircases Various tables and chairs Rugs and carpets Chests and crates Beds Bookcases Barrels Wall torches Stone archways Mounted shields and wall decorations Desk with books and papers Magical circles or glowing symbols Large chains Stone bridge over a chasm Giant, glowing crystals embedded in fixtures Massive tentacles emerging from the darkness below Closed chests and cabinets Armories or weapon racks Sectioned and barred rooms/cells or vaults |
M008.jpg | Bar table Chairs Barrels Bottle rack Tables (round) |
Octagonal Table Wooden Chairs Stone Flooring Lantern |
Tables Chairs |
Tables Barrels Mugs Bench Wooden Bar |
Central hexagonal bar/kiosk structure Wooden flooring Cobblestone paving Awnings Barrels Spiral staircase Shelves/racks Round wooden tables Chairs/stools |
Large central octagonal structure (building/tavern layout) Hexagonal bar counter in the center Barrels and mugs on the bar counter Bar stools in red, blue, yellow, and green arranged around the bar counter Booths or low couches/benches around the interior edge of the building Semi-circular stairwell behind the bar Four large market stalls/tents surrounding the octagonal building Four round wooden tables with four chairs each located outside the building Wooden plank floor inside the building Cobblestone pavement outside the building and under the stalls/tables |
Prompt 3: Grid Dimensions
In units of the listed map scale or map grids, if they exist, what are the dimensions of this map?
This task involves identifying a map feature (a grid) and then counting squares. If a model performs well at this task, a marketplace might be able to offer geographical information system-like filters.
Processed File | Ground Truth | Gemma3 | LLaVA | Pixtral | Gemini | GPT |
---|---|---|---|---|---|---|
M001.jpg | 4x11 | 20x20 | non answer | 10x20 | 5x10 | 6x16 |
M002.jpg | 40x30 | 800 squares | non answer | 40x30 | 30x20 | 40x22 |
M003.jpg | 34x34 | non answer | 10x10 | 10x10 | 20x20 | 28x28 |
M004.jpg | no grid | non answer | 1000x1000 | 10x15 | no grid | no grid |
M005.jpg | 60x60 | non answer | 1000x1000 | 40x40 | 30x30 | 28x20 |
M006.jpg | 34x44 | 60x60 | non answer | 40x50 | 32x30 | 34x43 |
M007.jpg | 34x44 | non answer | non answer | 24x32 | 20x25 | 30x25 |
M008.jpg | no grid | non answer | non answer | 10x10 | no grid | no grid |
All of the models failed this task. The two most advanced models, Gemini and GPT, successfully gave a “cannot be determined” answer when the maps lacked a grid or any scale marker. (Contrast this with “non answer” which uses filler phrases.) However, they were otherwise highly unreliable with their grid counts, even if they were close (numerical difference) in some cases.
This task will be difficult for LLMs. First, since LLMs work on a tiling of the source images, the image will usually be distorted in order to be standardized. Second, this task involves counting which neural networks are not ideal for. If this was an important element for product discovery, we expect a specialized model using a convolutional neural network (CNN) to extract the grids, or perhaps a more direct technique (e.g. Hough transform), would be efficient and reliable.
Prompt 4: Linear or Branching Movement
Is movement within this map mostly linear or mostly branching?
This tasks involves a higher-order understanding of the graph or connected nature of rooms, along with an understanding of how characters move along the network. Practically, this distinction is an important element of adventure design.
Processed File | Ground Truth | Gemma3 | LLaVA | Pixtral | Gemini | GPT |
---|---|---|---|---|---|---|
M001.jpg | Linear | Mostly linear | Mostly branching | Mostly linear | Mostly linear | Mostly linear |
M002.jpg | Branching | Non answer | Mostly linear | Mostly branching | Mostly branching | Mostly branching |
M003.jpg | Linear | Non answer | Mostly linear | Mostly branching | Mostly branching | Mostly branching |
M004.jpg | Branching | Mix | Mostly linear | Mostly linear | Mostly branching | Mostly branching |
M005.jpg | Branching | Mostly branching | Mostly linear | Mostly branching | Mostly branching | Mostly linear |
M006.jpg | Branching | Non answer | Mostly linear | Mostly branching | Mostly branching | Mostly branching |
M007.jpg | Linear (Complex) | Non answer | Mostly linear | Mostly branching | Mostly branching | Mostly branching |
M008.jpg | Linear | Primarily linear | Mostly linear | Mostly linear | Mostly branching | Mostly branching |
If we treat this task as a boolean classifier, Pixtral and Gemini both scored the best with 5 correct out of 8. GPT scored next with 4 correct out of 8 and Gemma3 and LLaVA scored 3 out of 8. (We graded Gemma3’s non answers as incorrect answers, but they could be graded more severely.) Overall, the results suggest performance no better than a coin flip. However, this is a subjective task and we expect that if more humans were involved to make a labeled dataset inter-rater reliability would be low.
The reasoning text provided by Gemini and GPT suggests the models are recognizing the graph-like nature of the maps and how movement flows through passages and doors. With further experimentation and a more developed dataset, higher performance may be unlocked.
Prompt 5: Scenarios
List types of adventures or scenarios this map is suitable for. Format as a bullet list.
This task is an open-ended generation of adventure ideas or classifications. This is similar to image caption generation. Alternatively, a marketplace owner could turn this into a classification problem by restricting the list of scenarios to a defined set. Publishers, of course, could use this technology to augment their own product descriptions with lists of creative ways the maps could be used in games.
As an open-ended generation task, we will use a metric modeled after signal-to-noise. (A method like CIDEr would be more rigorous, but more expensive to implement.) Each adventure or scenario line will be assigned to either the “signal” or “noise” category. The score will be the ratio between the count of the “signal” responses over the count of the “noise” responses plus one (to avoid division by zero). Mathematically:
$$ h = \frac{|s|}{|n| + 1} $$
Processed File | Gemma3 | LLaVA | Pixtral | Gemini | GPT |
---|---|---|---|---|---|
M001.jpg | 1/1 (1) | 2/1 (2) | 4/1 (4) | 11/1 (11) | 15/1 (15) |
M002.jpg | 1/1 (1) | 1/14 (0.07) | 2/3 (0.7) | 12/2 (6) | 12/1 (12) |
M003.jpg | 1/1 (1) | 6/5 (1.2) | 3/2 (1.5) | 8/1 (8) | 15/1 (15) |
M004.jpg | 1/1 (1) | 10/1 (10) | 5/1 (5) | 8/1 (8) | 15/1 (15) |
M005.jpg | 1/1 (1) | 1/14 (0.07) | 4/1 (4) | 11/1 (11) | 13/1 (13) |
M006.jpg | 1/1 (1) | 1/14 (0.07) | 3/1 (3) | 12/1 (12) | 20/1 (20) |
M007.jpg | 0/1 (0) | 6/1 (6) | 4/1 (4) | 11/1 (11) | 20/1 (20) |
M008.jpg | 1/1 (1) | 4/9 (0.4) | 0/12 (0) | 10/1 (10) | 12/1 (12) |
Gemini and GPT both excel at this task. Gemini’s responses tend to be more “classification”-oriented while GPT tends to be more of a story prompt, although neither are fully consistent. Gemma3 scores low because the model outputs few responses. Pixtral has more consistently higher signal than LLaVA.
Processed File | Portrayed Area | Gemma3 | LLaVA | Pixtral | Gemini | GPT |
---|---|---|---|---|---|---|
M001.jpg | Ship | Pirate Ship Heist | Pirate Adventure Treasure hunt |
Pirate Adventures Historical Seafaring Missions Underwater Exploration Mutiny and Intrigue |
Pirate Attack/Boarding Action Naval Combat Encounter Smuggling Operation Mutiny Escaping a Sinking Ship Treasure Hunt/Retrieval Assassination/Kidnapping Target on Board Investigation/Mystery Defending Against a Sea Monster Attack Transport Mission Stealth/Infiltration Mission |
Pirate ship battles at sea Naval skirmishes between rival crews Boarding and raiding encounters Ship-to-ship chases and cannon duels Exploring a derelict or haunted vessel Smuggling or contraband hiding missions Mutiny or shipboard betrayal scenarios Prisoner escape from the brig Rescue mission for captured allies Treasure transport and theft Monster or sea creature attacks on the ship Diplomatic or parley meetings on neutral waters Undercover infiltration of an enemy ship Investigating sabotage or sabotage attempts Storm survival or shipwreck scenarios |
M002.jpg | Rooms on cliff face | Bandit Ambush | Dungeon crawl Fantasy adventure Exploration game Role-playing game Stealth game Survival game Puzzle-solving game Strategy game Horror game Science fiction game Post-apocalyptic game Historical reenactment game Cultural exploration game Educational game |
Fantasy Exploration River Trade Dungeon Crawl Survival |
Bandit Hideout/Raider Camp Cultist Stronghold Lost Tribe/Hidden Settlement Monster Lair Siege/Defense Scenario Ambush Site Exploration of Ancient Ruins Rescue Mission Source of a Problem Negotiation/Diplomacy Hub Secret Passage/Escape Route Guardian of a Pass Resource Scramble |
Exploring a bandit or outlaw hideout nestled in the mountains Assaulting a fortified goblin or orc camp in search of stolen goods or hostages Investigating ancient ruins or a dwarven outpost hidden within a cliff face Defending a mountain pass against an invading force or monster attack Navigating a river and its cave system to reach a lost treasure or artifact Mounting a rescue mission for prisoners held in one of the outbuildings or caves Tracing the source of a corrupted river affecting nearby villages Delving into a cave system rumored to house a dangerous creature or valuable resource Delivering supplies or messages between isolated outposts during a time of war Negotiating peace or trade with a faction that controls the mountain stronghold Escorting a group of refugees or travelers through treacherous mountainous terrain Uncovering a secret cult or forbidden ritual taking place within the caves |
M003.jpg | Room inside huge tree | Monster Lair | Fantasy adventure Exploration of a magical forest Quest for hidden treasure Encounter with mythical creatures Survival in a dangerous environment Escape from a cursed tree Battle between good and evil forces Puzzle-solving in a mysterious setting Cooperation between characters to overcome challenges Role-playing game |
Fantasy Exploration Ambush Scenario Treasure Hunt Nature-based Quests |
Monster Lair Ritual Site Sacred Grove/Ancient Tree Guardian Ambush Point Hidden Entrance Clue to a Mystery Survival Challenge/Rest Stop Gone Wrong Negotiation with Fey or Nature Spirits |
Ancient tree ritual site: Place for druidic ceremonies or summoning rituals. Feywild crossing: Portal or gateway to the Feywild or another mystical plane. Forest guardian’s lair: Home of a treant, dryad, or nature spirit the party must parley with or defeat. Bandit or cultist camp: Group using the tree’s shelter for nefarious purposes. Quest objective: Recover a magical item hidden among the roots or in the tree’s hollow. Ambush encounter: Enemies attack as the party navigates the tangled roots. Puzzle or trap location: Roots conceal clues or dangerous mechanisms. Gathering place: Neutral meeting ground for woodland creatures, druids, or tribes. Corrupted grove: The tree is dying or cursed, and restoring it is the mission. Nature’s trial: Test for rangers, druids, or nature-themed characters to prove themselves. Fey or nature festival: The site hosts celebrations or strange happenings. Hidden entrance: Secret trapdoor or passage beneath the roots leading to an underground dungeon. Monster lair: Home of giant spiders, ankhegs, blights, or similar forest-dwelling threats. Vision or dream sequence: The tree appears in a prophetic dream or mystical vision. Healing or wish-granting site: The tree bestows boons, heals, or answers deep questions. |
M004.jpg | Fortress and bridge | Dungeon Crawl | Dungeon crawl Exploration of a haunted castle Stealthy infiltration of a fortress Escaping a prison or dungeon A quest for hidden treasure A battle between rival factions A race against time to save a kingdom A mission to rescue a captured princess A journey through a magical portal A test of courage |
Dungeon Crawl Escape Room Stealth Mission Investigation Trap Disarmament |
Infiltrating an Evil Cult’s Lair Rescuing a Prisoner from a Fortified Dungeon Assaulting a Cliffside Fortress Stopping a Ritual to Summon/Control a Monster Exploring an Abandoned, Monster-Infested Stronghold Sabotaging an Enemy War Machine or Operation Escaping from a Perilous Prison Investigating the Source of a Strange Magical Phenomenon |
Infiltrating a cultist stronghold hidden deep underground Rescuing prisoners from a dungeon surrounded by monstrous tentacles Investigating an ancient, forgotten fortress sitting atop a slumbering creature Defending the fortress from a monstrous attack breaking through the floor Escaping from a collapsing stronghold as tentacles breach the walls Searching for magical artifacts hidden within a forbidden underground sanctum Negotiating or allying with a cult that worships an eldritch being Conducting a heist on a fortress protected by unnatural guardians A ritual gone wrong unleashing tentacled horrors from beneath Exploring a fortress built on the remains of a titanic, ancient creature A final boss encounter with an eldritch, tentacled entity Reclaiming a corrupted outpost for a noble house or order Uncovering the secrets behind the fortress’s construction and its hidden dangers Escorting an NPC through a treacherous, monster-infested stronghold Sealing a rift to another dimension opening beneath the fortress |
M005.jpg | Ruins with stream | Dungeon Crawl | Dungeon crawl Fantasy adventure Horror-themed game Steampunk setting Science fiction adventure Mystery solving Puzzle-solving game Exploration game Survival game Role-playing game Board game Strategy game Tactical game Turn-based game |
Dungeon Crawl Escape Room Underground Exploration Investigative Missions |
Ancient Ruin Exploration Cultist Lair Assault Underdark/Subterranean Journey Monster Hunt Lost Civilization Discovery Necromancer’s Stronghold Shadowfell Excursion Retrieval Mission Guardian of a Forbidden Place Escape from a Collapsing Dungeon Ritual Disruption |
Exploration of Ancient Ruins Dungeon Delve Underdark Adventure Magical River Crossing Rescue Mission Cultist Lair Haunted Expedition Artifact Retrieval Fugitive Chase Environmental Challenge Secret Society Base Waterborne Encounter Interdimensional Portal |
M006.jpg | Dungeon complex | Escape Room | Dungeon crawl Fantasy adventure Science fiction adventure Steampunk adventure Horror adventure Mystery adventure Puzzle-solving adventure Exploration game Role-playing game Board game Strategy game Tactical game City-building game Survival game |
Dungeon Crawl Escape Room Mystery Investigation |
Classic Dungeon Crawl Lost Temple or Sanctuary Ancient Vault or Treasury Wizard’s Labyrinth or Magical Research Facility Cultist Stronghold/Lair Prison Complex Exploration of an Abandoned Fortress/City Core Infiltration and Espionage Mission Monster Lair Complex Quest to Activate/Deactivate an Ancient Device Gauntlet/Trial Site Underground Bunker/Shelter |
Classic dungeon crawl with traps, monsters, and treasure Exploring an ancient, long-forgotten underground city Infiltrating the lair of a villain or cult Recovering a lost artifact from a crypt or tomb Navigating a labyrinth filled with puzzles and challenges Mapping out an old, abandoned wizard’s stronghold Escaping from imprisonment in an underground complex Investigating rumors of hauntings or strange occurrences Conducting a rescue mission for captured allies or townsfolk Searching for the hidden vault of a powerful sorcerer Battling rival adventurers or treasure hunters Encountering and negotiating with a secret underground society Uncovering the mystery behind a sealed magical portal Fighting through the defenses of an undead king’s throne room Delving into the ruins to learn secrets of an ancient civilization Clearing out a den of monsters threatening the surface Preventing the summoning of a dangerous extraplanar entity Discovering and dealing with a living dungeon (sentient architecture) Retrieving lost knowledge from a hidden library or archive Surviving a supernatural cataclysm that has engulfed the dungeon |
M007.jpg | Cave network and forest | D&D 5e | Exploring a mysterious underwater cave system Navigating through a maze of underwater tunnels and chambers Searching for hidden treasures or artifacts within the cave system Encountering dangerous sea creatures and overcoming challenges Solving puzzles and riddles to progress through the cave system Escaping from a submerged temple or ruins |
Exploration Quests Navigation Challenges Pirate Adventures Maritime Commerce |
Dungeon Crawl Coastal Cave Exploration Smuggler’s Den/Pirate Hideout Monster Lair Search and Rescue Infiltration Mission Treasure Hunt Escape from Captivity Geological Survey/Exploration Ritual Site Amphibious Assault |
Hidden pirate or smuggler cave base Lost treasure hunt on a remote island Monster lair or nest exploration Shipwreck survivors seeking shelter Secret cult or ritual site Rescue mission for captured NPCs Forgotten tomb or ancient ruins Lair of aquatic or amphibious creatures Bandit stronghold or hideout Gathering rare herbs or resources for a quest Underwater or coastal infiltration scenario Battle between rival pirate factions Investigation of mysterious disappearances Protecting island inhabitants from an invading force Discovery of a natural wonder or magical anomaly Navigating a maze of tunnels to escape pursuers First-contact adventure with isolated island dwellers A haunted or cursed cave system Race to claim a newly discovered cave’s riches Expedition to study unique island flora or fauna |
M008.jpg | Circular bar | Diplomacy/Negotiation | Pirate adventure Treasure hunt Medieval fantasy Dungeon crawl Board game Role-playing game Strategy game Party game Themed event Restaurant or bar setting Historical reenactment Cultural celebration |
Role-Playing Campaigns Dungeons & Dragons adventures Pathfinder scenarios Fantasy and medieval role-playing games Board Games Strategy games with tactical combat Miniature wargames Fantasy-themed board games Tabletop Simulations Virtual tabletop role-playing sessions Online gaming scenarios Digital strategy games |
Tavern Brawl Information Gathering Ambush/Assassination Attempt Stealth/Heist Secret Meeting Festival Stall Defending the Establishment Investigation Hub Recruitment Drive Celebration/Gathering Spot |
Meeting an important contact or informant in a lively tavern or public house Gathering information or rumors from locals and travelers Holding a negotiation, peace talk, or clandestine meeting Participating in or breaking up a heated tavern brawl Investigating suspicious activity, theft, or a murder at a busy establishment Entertaining a crowd or taking part in a storytelling or bardic competition Setting up an ambush for a villainous NPC Surviving an unexpected attack or raid unfolding inside the tavern Hosting a festival, market day, or other community celebration Recruiting adventurers or mercenaries for a quest Solving a puzzle or riddle as part of a secretive society’s initiation Escaping from guards or pursuing enemies by blending into the crowd |
Conclusion
Multi-modal large language models exhibit a range of quality. Product discovery applications will likely require, at least, models with late 2024 or 2025 state-of-the-art performance. For VTT-style maps, models can yield lists of rooms and items found within a map, although individual entries may be irrelevant for purpose and the lists often contain hallucinated or misinterpreted items. For GM-style maps, models extract labels with high-recall and precision, but other details may be imprecise. Our tests showed all models were poor at understanding scale or grid systems, although some could detect the existence of a grid. Similarly, reasoning about movement through the map may be beyond current LLMs, although we suspect this area needs more rigorous study. LLMs scored highly on generating scenario or story ideas tied to a map, suggesting this might be a straight-forward product discovery mechanism.
Notes
The dataset was not incorporated as training material into any model.