Testing GPT-4 with Vision for Drafting Descriptions of Patent Figures

OpenAI’s latest GPT-4 with Vision model can analyze input images and answer questions about them. This new GPT capability can be a real game-changer for patent drafters, so we’ve updated ClaimMaster 2024 +Drafting version to allow drafters to pass-in figures directly to GPT for writing detailed figure descriptions. The latest version of ClaimMaster will also let you connect ClaimMaster to your own private GPT-4 Vision models in OpenAI and Azureif you are interested in trying this feature but have not yet set up a private OpenAI or Azure GPT account, contact us and we’ll provide you with a short-term access to the latest OpenAI GPT vision model for testing (complimentary).

We tested GPT to see how well it does with different types of common patent drawings and the results were surprisingly good, especially when GPT was provided with sufficient context to describe the technology and elements shown in the figures. For our tests, we’ve asked GPT to write descriptions of modified figures from various issued patents. These figures were sufficiently altered in structure, numbering, and part names to make it harder for GPT to trace them back to the original applications. We’ve also tried a few hand-drawn figures with similar success. To streamline the drafting process, we used ClaimMaster GPT prompt templates on low “creativity” settings. The prompts were also auto-populated with part names and brief figure descriptions from the document and the figures passed to GPT were annotated with part # locations. While the results were generally good, they were not always consistent on a first attempt, so keep on reading for more in-depth analysis and check out our recommendations for producing better drafting outcomes at the bottom of the article.

1. Cross-Sectional Figures

Initially, we tasked GPT with drafting a detailed description of a patent figure showing a commonly utilized cross-sectional view in optical and semiconductor device applications. GPT accurately identified the spatial relationships among different layers within the cross-sectional view. Additionally, it incorporated technical specifics about mode stripping based on the figure’s description (i.e., “cladding mode stripper”). However, GPT had some trouble with elements that weren’t clearly labeled, so we had to adjust the contrast and shading of the figure until GPT was able to correctly pinpoint their location. You may also want to experiment with part # “annotation” option in ClaimMaster, as it may help GPT identify locations of part #s in less clear or densely populated figures.

Figure + GPT Prompt:

cross-section figure
Write the description of a substrate cross-section shown in the attached figure 6 for a patent application. FIG. 6 is a side cross-sectional view of another embodiment of a cladding mode stripper. Use part names and numbers specified below in the description. Do not place part numbers in parentheses.
###
cladding mode stripper device 600, base 602, first block 606A, second block 606B, bottom surface 607, opaque cover 608, heat sink 610, optical fiber 612, first cladding 616, groove 618
####

GPT Output:

FIG. 6 illustrates a cross-sectional view of a cladding mode stripper device 600. The device includes a base 602 that supports the various components of the cladding mode stripper device 600. A heat sink 610 is disposed under base 602, providing thermal management for the device. The cladding mode stripper device 600 further comprises a first block 606A and a second block 606B. An optical fiber 612 is positioned above the first block 606A. The optical fiber 612 includes a first cladding 616 that surrounds the core of the optical fiber (not labeled). The first cladding 616 is designed to guide light along the optical fiber 612. First block 606A and second block 606B are arranged on opposite sides of the optical fiber 612. A groove 618 is formed between the first block 606A and the second block 606B, wherein the optical fiber 612 is seated. An opaque cover 608 is positioned over the first block 606A, the second block 606B, and the air gap 624. The opaque cover 608 serves to contain and block any light escaping from the cladding modes, preventing it from exiting the device and interfering with external components. The bottom surface 607 of the cladding mode stripper device 600 is in contact with the heat sink 610, facilitating the transfer of heat away from the device to maintain optimal operating temperatures.

2. Mechanical Designs

We then asked GPT to describe a patent figure of a 3D mechanical drawing of the cladding mode stripper from the same application. This time GPT had much easier time with the description. GPT provided satisfactory output in the first attempt, likely because the elements in the figure were spaced out well for image recognition. It also inferred and added some basic technical details based on the provided part names and a brief description of the figure.

Figure + GPT Prompt:

mechanical drawing
Write the description of a device shown in the attached figure 8 for a patent application. FIG. 8 is an exploded view of a prototype of the cladding mode stripper. Use part names and numbers specified below in the description. Do not place part numbers in parentheses. 
### 
cladding mode stripper 800, base 802, glass block 806A, glass block 806B, opaque cover 808, curved groove 818
####

GPT Output:

FIG. 8 illustrates an exploded view of a cladding mode stripper, designated generally as cladding mode stripper 800. The cladding mode stripper 800 is composed of several components that, when assembled, function to strip cladding modes from an optical fiber, not shown. At the base of the cladding mode stripper 800 is the base 802, which serves as the foundation for the assembly. The base 802 is designed to support the other components and may include features for alignment and securing the cladding mode stripper 800 within an optical system. Positioned above the base 802 are two sets of glass blocks, glass block 806A and glass block 806B. These glass blocks are arranged in a stacked configuration and are designed to interface with the optical fiber. The glass blocks 806A and 806B are responsible for absorbing and dissipating the light from the cladding modes of the optical fiber. The glass blocks are separated by a space to accommodate a curved groove 818. The curved groove 818 is integral to the function of the cladding mode stripper 800, providing a path for the optical fiber and ensuring proper interaction between the fiber and the glass blocks 806A and 806B. Completing the assembly is the opaque cover 808, which is positioned above the glass block 806B. The opaque cover 808 serves to protect the internal components of the cladding mode stripper 800 and may also prevent the escape of stripped light from the cladding modes, thereby improving the safety and efficiency of the device. The components are designed to be assembled in a manner that ensures the optical fiber is properly aligned with the curved groove 818 and in contact with the glass blocks 806A and 806B for optimal mode stripping performance. The precise arrangement and dimensions of the components are critical to the function of the cladding mode stripper 800 and are determined based on the specific requirements of the optical system in which it is to be used.

3. Graphs

Next, we asked GPT to write a description of a 2-D graph showing modulation error ratio vs attenuation level. Notably, GPT was able to recognize and incorporate the labels from the graph’s axes into the figure description. Additionally, although the brief description we provided mentioned both balanced and imbalanced digital channels, curves 110 and 120 were not explicitly labeled as such in the prompt or the figure. Nevertheless, GPT accurately associated curve 110 with the “balanced” channel and curve 120 with the “imbalanced” channel in its generated output. While the correct identification of curves may be accidental, it still appears very impressive.

Figure + GPT Prompt:

graph figure
Write the description of a plot shown in the attached figure 1 for a patent application. FIG. 1 is a schematic plot of modulation error ratio [MER] against attenuation level for a balanced digital channel and for an imbalanced digital channel. Do not place part numbers in parentheses. 
### 
first curve 110, first regime 111, second regime 112, optimal attenuation level 113, second curve 120
####

GPT Output:

Figure 1 illustrates a schematic plot of Modulation Error Ratio (MER) in decibels (dB) against attenuation level, also in decibels (dB), for two different types of digital channels. The vertical axis represents the MER, which is a measure of the quality of a digital television signal. The horizontal axis represents the attenuation level, which indicates the reduction in signal strength. The first curve 110 represents the MER for a balanced digital channel. This curve shows a distinct peak indicating the optimal performance of the channel. Within the first curve 110, there are two regimes of interest. The first regime 111 is where the MER increases with attenuation level, showing an improvement in signal quality up to a certain point. The second regime 112 is characterized by a decrease in MER with further increases in attenuation level, indicating a degradation in signal quality beyond the optimal point. The optimal attenuation level 113 is denoted by the peak of the first curve 110, at which the MER is maximized for the balanced digital channel. This point represents the ideal attenuation setting for the channel to achieve the best signal quality. The second curve 120 represents the MER for an imbalanced digital channel. This curve does not exhibit a peak like the first curve 110 but instead shows a consistent decline in MER as the attenuation level increases. The second curve 120 demonstrates the typical performance of a digital channel that is not optimized, where the signal quality continuously deteriorates with increasing attenuation. The comparison between the first curve 110 and the second curve 120 highlights the importance of balancing a digital channel to achieve an optimal attenuation level 113, which results in the highest possible MER and, consequently, the best signal quality.

4. System Diagrams

We also tasked GPT with generating a description for a sample system or network-level figure commonly employed in networking patent applications. The resulting description appears a bit generic, especially when discussing core nodes 205 and packet flows 206a-c, but it’s not outright wrong. While not perfect, it serves as a decent starting point for the draft, although additional details provided by the drafter would enhance the figure description. Notably, GPT was able to extract block names directly from the figure.

Figure + GPT Prompt:

system figure
Write the description of a system shown in the attached figure 2 for a patent application. Figure 2 illustrates a test system for monitoring a network segment of the Ethernet data network. Use part names and numbers specified below in the description. 
### 
network segment 200, test unit 201, test unit 202, second port 203b, service ports 203, first port 203a, intermediate node 204, core node 205, primary data route 206a, secondary data route 206b, tertiary data route 206c, traffic monitor 207
####

GPT Output:

Figure 2 illustrates a test system designed for monitoring a network segment 200 within an Ethernet data network. The system comprises an initiating test unit 201 and a responding test unit 202, which are configured to communicate over the network segment 200. The initiating test unit 201 is equipped with a first port 203a, while the responding test unit 202 includes a second port 203b. These test units 201 and 202 are utilized to initiate and respond to network traffic tests, respectively. The network segment 200 is further defined by multiple service ports 203, which facilitate the connection and flow of data between various nodes within the network. The system includes at least one intermediate node 204 and a core node 205, which are interconnected by a primary data route 206a. The primary data route 206a represents the main pathway for data transmission between the initiating test unit 201 and the responding test unit 202. In addition to the primary data route 206a, the system is equipped with a secondary data route 206b and a tertiary data route 206c. These alternative routes provide additional pathways for data transmission, ensuring redundancy and enabling the system to maintain communication in the event of a failure or congestion on the primary data route 206a. To monitor the traffic flow and performance of the network segment 200, traffic monitors 207 are strategically positioned to observe data transmission along the primary data route 206a, secondary data route 206b, and tertiary data route 206c. These traffic monitors 207 are capable of analyzing the traffic for various metrics such as latency, packet loss, and throughput, providing valuable insights into the network segment’s performance and reliability. The configuration of the test system as shown in Figure 2 allows for comprehensive monitoring and testing of the network segment 200, ensuring that the Ethernet data network operates efficiently and effectively. The system’s ability to route data through multiple paths and its inclusion of traffic monitors 207 contribute to its robustness and adaptability in a dynamic networking environment.

5. Complex Flowcharts

Finally, we’ve given GPT a fairly intricate flowchart to see whether it could map out logical flows shown in the diagram. GPT had some trouble with conditional branches, but after a few minor tweaks to figure layout, part # positions, and “creativity” settings in the prompt, GPT was able to identify the order of steps in the flowchart (mostly correct) and write out its description. GPT was also able to extract step labels directly from the figure. Although the output requires editing, it generally matches the flow depicted in the figure and could serve as a reasonable initial draft outline.

Figure + GPT Prompt:

flowchart figure
Write the description of a substrate cross-section shown in the attached figure 3 for a patent application.  Figure 3 is a logical flow diagram of a method of upgrading an internal logic of a programmable pluggable transceiver. Use part names and numbers specified below in the description. Do not place part numbers in parentheses.  
### 
step 301, determination step 302, next idle period step 304, loopback step 306, end-of frame step 310, disabled frame insertion step 312, disabled check 314, pass-through switch step 316, mask data step 318, step 320, unmasking step 322, re-enable frame processing step 324, re-enable control step 326
####

GPT Output:

Figure 3 illustrates a logical flow diagram for a method of upgrading the internal logic of a programmable pluggable transceiver. The process begins at START 301 and proceeds to determination step 302, where the system checks if Loopback is active. If Loopback is active, the process moves to next idle period step 304, where it waits for the next idle period before proceeding to disable Loopback step 306. The process checks if Frame Insertion is active at step 308. If active, the system waits for the end of the current frame at end-of-frame step 310 before proceeding to disable Frame Insertion step 312. Once both Loopback and Frame Insertion are disabled, as determined at disabled check 314, the system switches to Pass-through Mode step 316. In this mode, mask data step 318 is executed to mask data and control multiplexing. Following this, the system can reprogram the upgradeable logic at step 320. After the reprogramming is complete, the process continues to unmasking step 322, where data and control multiplexing are unmasked. The system then proceeds to re-enable frame processing step 324, followed by re-enable control step 326, which re-enables data and control multiplexing. This completes the upgrade process for the internal logic of the programmable pluggable transceiver as depicted in Figure 3.

Other Types of Figures

We’ve also tested GPT with basic hand-drawn “system” figures, chemical formulas and biotech figures. GPT was able to handle hand-drawn figures without too many issues, provided they were clear and not too different from the CAD-drawn ones. It had harder time with chemical and biotech figures – while GPT often recognized the general subject matter of the figures, it struggled to generate acceptable drafts of patent application sections. This could be attributed to the lack of clearly labeled part numbers and names in the figures, which would have provided GPT with additional context about the subject matter. Additionally, patent applications in chemical and biotech fields follow a different format from those in “predictable arts” patents, making them less conducive to the iterative, element-by-element descriptive style that GPT provides. As a result, further testing and “prompt “engineering” for these technical areas is needed and we may revisit this topic in the next blog post.

Conclusions and Tips

Overall, it appears that GPT-4 with Vision models can, if handled correctly, generate reasonable first drafts of detailed descriptions for many types of patent figures. While GPT output is not good enough to insert into applications outright without further review and editing, it can save you initial drafting time and provide you with a quick textual outline of the figure that helps you overcome “writer’s block.” It would be easier to edit and selectively add subject matter to such an outline than drafting the entire section from scratch.

Below are some “best practice” tips to help you get the most out of GPT vision for drafting figure descriptions:

  • Use clear figures without too much shading. Hand-drawn figures should work as well, provided they are sufficiently clear. Make sure that figure elements are sufficiently spaced apart and clearly labeled with part #s, preferably without using arrows that reach across other elements. GPT appears to have trouble with densely labeled figures and arrows drawn across other parts or otherwise obscured.
  • Provide pertinent detail about the figure in your prompt to GPT. You can use ClaimMaster to populate GPT prompts with the brief figure description pulled from the open Word document. This will help focus GPT on the correct technology when describing the figure.
  • Provide clear and descriptive part names in the GPT prompts. Again, this will help focus GPT on the correct technology, so it will generate higher-quality output. Make sure part names actually match their numbers in the figures.
  • Include additional “examples/context” from other applications or patents in your prompts to further control GPT output. You can learn how to store examples in your prompts in this help section. (see “Adding Examples” section towards the bottom)
  • Set “creativity/randomness” to low value or even 0 to reduce “hallucinations”
  • Extremely important: This cannot be stressed enough – always double-check all GPT output for factual correctness, part-by-part, sentence-by-sentence. In our testing, GPT could produce eloquent-sounding but subtly inaccurate descriptions for some parts in the figures. This typically happened when GPT was provided with unclear or very dense figures with many closely-located elements, so that it could not clearly match part names specified in the prompt to part #s shown in figures. Mismatched part names and numbers in figures also cause description inaccuracies. In summary, if you insert GPT output in your applications without careful proofreading, you risk finding hidden surprises in the published applications, such as it happened in this medical article (search for the article for the “I’m very sorry…” section).
  • Have fun with it – the new capability is very neat and could be a huge time saver, if applied correctly and carefully. If you are interested in trying this feature but have not yet set up a private OpenAI or Azure GPT account, contact us and we’ll provide you with a short-term access to the latest OpenAI GPT vision model for testing (complimentary).

For more information on GPT features in ClaimMaster, check out the following step-by-step tutorials: