Attempting to analyze plans with GPT 4 in early 2024
It couldn’t 6 months ago…
Every 6 months I try prompting multimodal language models with an architectural plan from a real estate website to see if they can critique it for me. Your average home buyer isn’t likely to be able to tell if a space will feel cramped or hot or dark just from looking at a plan, so if these things could be automatically identified it should be obviously valuable.
I first tried this with GPT4 early in 2023. While it was exciting that this was possible at all, GPT 4’s ability to reason about plans was poor. While multimodal models are technically capable of recognizing plans and can usually identify the presence of elements that are labelled, drawing any conclusions about the quality of the spaces has been beyond them. Gemini Flash started to do marginally better in 2024, but would still hallucinate details and struggle to understand circulation between rooms. The model often seems to understand 90% of the plan but miss a single wall, leading to wildly inaccurate conclusions about needing to go through bathrooms to get to bedrooms, or rooms having no windows.
Gemini 2.0 Is Good At PDFing
The recent release of Gemini 2.0 turned the most heads for it’s near perfect ability to extract arbitrary data from PDFs. Because the API uses the same number of tokens for each page in the PDF as it does for an image, I am guessing that Gemini is not operating on text tokens but instead converting and parsing PDF pages as collections of images. If Gemini has suddenly gotten much much better at extracting data from images, then perhaps it is better at reading plans?
I found this plan on realestate.com.au to try it out. It’s a particularly weird plan so the model can’t just rest on its laurels about issues or benefits of conventional floor plan layouts. I loaded this up in Runchat (try it yourself here https://www.runchat.app/editor?id=eecabff5-f252-4984-8d4a-c5b926156228) and used it as context to a prompt with Gemini 2.0. First, I tried extracting some structured data to see how well the model could parse the image.
List each room in the plan. For each room, return the name, area and floor level as well as a brief architectural analysis of the pros and cons of the space
As expected, Gemini 2.0 is very good at returning accurate data from the image. For all of the returned rooms, it gets the area, floor and room name spot on. It still weirdly has an issue with bathrooms and doesn’t want to count these as rooms. However, the analysis is an absolute turd and could at best be described as stating the obvious. To Gemini’s credit, the reasoning ability of models does tend to suffer when they are forced to return valid JSON output like this, but still, no prizes for insight here.
So I tried removing the structured output and just asking for an analysis to see if this would improve things. The new prompt and response from Gemini 2.0 is:
This is a floor plan of a property I am considering purchasing in Fitzroy, Victoria. Can you conduct an architectural analysis of the floor plan and identify any particular strengths and weaknesses to the design, then propose changes and modifications.
Here's a brief architectural analysis of the floor plan:
Strengths:
Clear Zoning: Ground floor primarily for living/entry, upper floor for bedrooms/library and bathroom areas and balcony.
Good Natural Light: The sunroom and multiple windows suggest ample natural light in certain areas.
Weaknesses:
Limited Flow/Awkward Circulation: The layout appears disjointed, especially the connection between the ground floor and first floor and to the outdoor space.
Kitchen position: The position of the kitchen does not feel natural or aligned with modern standards.
Outdoor Access: The location of outdoor access seems limited.
Proposed Modifications:
Reconfigure the Kitchen: The dining and kitchen area could be reconfigured for modern living.
Improve First Floor flow: Re-evaluate connections between the study, balcony and first floor bedroom.
Consider a Second Outdoor Space: Explore creating a dedicated outdoor space accessible to the kitchen and dining area.
This is a high-level analysis. A full review would consider specific needs and budget.
This analysis is better than our JSON table, but still lacking any real insight. The goal is to be able to reveal strengths or weaknesses to the design that wouldn’t be apparent to the average potential home buyer. We might hope that the model could consider the aspect of the building, the specific dimensions of the spaces, challenges to renovations or changes to how spaces are used and so on. But this brings us to the second recent advancement in language models: the ability to “think” about a problem for longer.
Analysing plans with reasoning models
I tried the exact same prompt with Gemini 2.0 using thinking mode which takes longer to return a response but provides the model with an opportunity to plan a response to the users query. This has the emergent property of increasing the likelihood of “wait a minute” or “aha!” moments where the model realises something about their internal reasoning that might lead to a different response or solution. So can reasoning models think about architectural plans?
Floor Plan Analysis: Fitzroy Property
Here's an architectural analysis of the floor plan you provided:
Strengths:
Zoning: Clear separation between ground and first floors potentially offers good zoning for living and sleeping areas.
Dedicated Spaces: Features distinct rooms like a Library and Study, offering flexibility for work or hobbies.
Outdoor Areas: Includes a Balcony and Courtyard, providing valuable outdoor living spaces.
Master Suite: First-floor bedroom benefits from an ensuite (ENS) and walk-in robe (WIR), creating a private master suite.
Large Living Area: Generous "Living" room (5.1 x 12.0) on the ground floor provides ample space.
Weaknesses:
Entry Hallway: Long, narrow "Entry" (6.0 x 3.2) could feel like wasted space, potentially lacking natural light and purpose.
Ground Floor Bedroom/Office Access: "Office/Bedroom" (3.1 x 4.0) is accessed through the "Bath/L'dry", creating an awkward and privacy-compromising layout.
Small Ground Floor Bedroom: "Bedroom" (3.0 x 3.0) on the ground floor is relatively small and may feel cramped.
Kitchen Isolation: Kitchen appears somewhat isolated from the main "Living" and "Dining" areas, potentially hindering social interaction.
Sunroom Integration: "Sunroom" (3.2 x 2.4) feels small and potentially disconnected from the main living spaces.
Stair Location: Stair location isn't immediately obvious in relation to main living areas, potentially leading to less intuitive circulation between floors.
Proposed Modifications:
Reconfigure Entry/Office/Bath:
Option 1 (Major Change): Relocate the "Bath/L'dry" to create direct access to the "Office/Bedroom" from the "Entry" or "Living" area, improving privacy and flow. Consider making the "Entry" wider or repurposing some space.
Option 2 (Minor Change): If relocation is difficult, consider creating a separate hallway or partition within the "Bath/L'dry" to provide more privacy for the "Office/Bedroom" access.
Open Kitchen to Living/Dining: Explore opening up the wall between the "Kitchen" and "Dining" or "Living" to create a more open-plan living space and improve flow and social interaction.
Sunroom Enhancement: Assess if the "Sunroom" could be enlarged or better integrated into the "Dining" or "Kitchen" area to increase its usability. Consider larger windows or doors to maximize light and connection to the outside.
Clarify Stair Location/Access: If the stairs are indeed only accessible near the "Entry," evaluate if repositioning or adding a secondary staircase could improve first-floor access from the main living areas on the ground floor.
These modifications aim to address the weaknesses by improving circulation, privacy, and the usability of key spaces, ultimately enhancing the property's functionality and liveability. Remember to consider structural feasibility and budget implications for any proposed changes.
This is interesting. The thinking model did slightly better with the strengths and weaknesses than the vanilla model. It picks up on issues with going through a powder room to get to a bedroom (sort of, it seems confused between the office and the bedroom). But it still misses so many of the more obvious issues with the plan that a potential buyer might want to resolve e.g.
The upstairs ensuite is enormous
The upstairs circulation is extremely inefficient
Ground floor bedrooms don’t have natural light
and so on.
Is it a me problem or a model problem?
This is one of those use cases for AI where models exhibit stunning multimodal and reasoning capabilities and yet because they cannot be 100% relied upon they remain practically useless. However, we’re a lot closer to being there now than we were 6 months ago, or 6 months before that. It seems possible that even just improved prompts might lead to better results. A quick experiment with prompting the model to explicitly consider things that it hadn’t on its own did improve things somewhat:
This is a floor plan of a property I am considering purchasing in Fitzroy, Victoria. Can you conduct an architectural analysis of the floor plan and identify any particular strengths and weaknesses to the design, then propose changes and modifications. For each room, consider whether the area and circulation are fit for purpose. Consider ventilation and access to natural light. Consider how efficient the floor plan is, and where and how a potential buyer could improve the efficiency of the plan.
This does dramatically improve things. It is still far from perfect and reveals some of the pitfalls of the model, in particular that it cannot estimate the size of the ensuite as the dimensions are not labelled and that it cannot identify all windows in the plan. This in particular is interesting as this same flaw has persisted in language model plan understanding since my first tests with GPT4. However, the model does now pick up on the fact that the ground floor bedrooms don’t have any light and that circulation space is inefficient:
Here's an architectural analysis of the floor plan for the Fitzroy property:
Strengths:
Distinct Zones: Clear separation between ground and first floor functions, with living spaces downstairs and bedrooms upstairs.
Multiple Living Spaces: Offers flexibility with living, dining, sunroom, and library areas.
Outdoor Access: Balcony and Courtyard provide valuable outdoor spaces.
WIR and ENS: Master bedroom benefits from a walk-in robe and ensuite, enhancing functionality.
OSP: Off-street parking is a significant asset in Fitzroy.
Weaknesses:
Circulation: Long, narrow hallway on the ground floor might feel constricted. Internal staircase placement could impact flow.
Natural Light in Ground Floor Centre: Central bedroom/office and bath/laundry areas may rely on artificial light.
Kitchen Location: Kitchen placement is somewhat isolated and not directly connected to the main living area, potentially impacting social flow.
Balcony Access: Balcony is only accessible via the library, limiting its usability for other first-floor rooms.
Efficiency of Space: Some areas, like the entry and hallways, consume space that could be utilized more effectively.
Room-by-Room Analysis:
Ground Floor:
Entry (6.0 x 3.2): Generous entry, could feel overly spacious.
Living (5.1 x 12.0): Very large living area, potential for flexible layouts.
Kitchen: Adequate size, but somewhat isolated. Needs assessment for layout efficiency.
Dining (3.3 x 4.5): Decent dining space, but passage through it to PDR might be awkward.
Sunroom (3.2 x 2.4): Small sunroom, could be a pleasant reading nook or breakfast area.
Bedroom (3.0 x 3.0): Ground floor bedroom is small, potentially better suited as a study or guest room.
Bath/L'dry: Combined bath/laundry might feel cramped, assess storage and ventilation.
PDR: Convenient powder room, well located near living and dining.
First Floor:
Library (2.6 x 4.4): Narrow library, consider shelving and furniture placement carefully. Balcony access is a plus.
Balcony (3.5 x 2.3): Good sized balcony, but access is limited.
Study (4.2 x 4.2): Spacious study, good for work from home. Desk placement needs consideration for light and privacy.
Bedroom (4.7 x 3.2): Master bedroom is a good size with ENS and WIR.
ENS: Ensuite size not indicated, assess adequacy.
WIR (3.3 x 4.6): Walk-in robe is spacious, excellent storage.
Outdoor:
Courtyard: Valuable private outdoor space, assess size and usability.
OSP: Highly desirable off-street parking.
R.O.W: Right of way, understand any implications or limitations.
Ventilation & Natural Light:
Ground Floor: Living and Sunroom likely have good natural light. Kitchen and Dining light depends on window placement (not shown). Central bedroom and bath/laundry might be dark, requiring good ventilation strategy.
First Floor: Study and Bedroom should receive good light. Library's natural light depends on window placement and balcony door. Balcony provides good ventilation for Library.
Efficiency:
Moderate Efficiency: Some circulation space feels inefficient, particularly ground floor hallway. Kitchen's separation from living areas impacts flow. Staircase location creates division on ground floor.
Potential Improvement: Reconfiguring ground floor layout could enhance flow and space utilization.
Proposed Modifications:
Ground Floor Kitchen/Living: Consider opening up the kitchen to the dining or living area to create a more social and contemporary layout. This might involve removing walls and reconfiguring the kitchen.
Ground Floor Bedroom Repurpose: Re-evaluate the ground floor bedroom. Could become a more spacious study, home office, or be integrated into a larger, open-plan living zone if structurally feasible.
Balcony Access Improvement: Explore options to improve balcony access from the master bedroom or study, potentially adding a door from the study to the balcony.
Bath/Laundry Redesign: If cramped, consider separating bath and laundry functions or optimizing layout for better flow and storage.
Entry Optimization: Assess if the large entry is needed or if some space could be redistributed to the living area or kitchen.
Staircase Integration: Consider if the staircase can be repositioned or redesigned to improve flow and reduce its dividing effect on the ground floor.
Conclusion:
The property offers good bones with distinct living zones, outdoor space, and parking. However, the ground floor layout has some inefficiencies in circulation and kitchen placement. Potential buyers could significantly enhance the property by reconfiguring the ground floor layout to improve flow, light, and social connectivity, particularly around the kitchen and living areas. Addressing balcony access would further enhance the first floor amenities
Probably better than nothing for most people
This is already at the point now where this kind of analysis is probably better than nothing for your average punter. While I don’t expect most people to be pasting every plan in to Runchat or ChatGPT, this is the kind of task that could be automatically run in the background with results collated and emailed to you on a weekly basis. We could get language models to perform these kind of structured and qualitative analysis of multiple floor plans each week, then compare this analysis to make suggestions or identify concerns for buyers before they visit home opens.
Try it out with your own plan
You can generate a table of spaces and an analysis of your own plan with the Runchat here:
https://www.runchat.app/editor?id=eecabff5-f252-4984-8d4a-c5b926156228
Check out our tutorials and examples on https://runchat.notion.site for how to automate jobs with spreadsheets if you want to try building the weekly plan analysis tool.