Implementing OpenAI Deep Research API Support For Enhanced Workplans And Judgements

by StackCamp Team 84 views

Summary

This comprehensive workplan details the steps required to seamlessly integrate OpenAI's Deep Research models, specifically o3-deep-research and o4-mini-deep-research, into the yellhorn-mcp ecosystem. The core of this endeavor involves a strategic update to the existing OpenAI integration, transitioning from the Chat Completions API to the more advanced Responses API. This transition is crucial as the Responses API is designed to support more complex, agentic tasks, aligning with the sophisticated capabilities of the Deep Research models. A key aspect of this implementation will be the targeted modification of yellhorn_mcp/server.py. We will introduce conditional logic to enable the web_search_preview and code_interpreter tools, which are fundamental for the Deep Research models' functionality. Furthermore, the workplan encompasses the addition of new model definitions, meticulous updates to pricing information, and crucial modifications to the testing suite. These modifications are designed to ensure that all existing functionalities continue to operate flawlessly while the new Deep Research capabilities are integrated seamlessly. The successful execution of this workplan will significantly enhance yellhorn-mcp's ability to perform in-depth research and provide comprehensive judgements, leveraging the full potential of OpenAI's cutting-edge Deep Research models. This integration will empower users to conduct more thorough investigations and derive more insightful conclusions, ultimately leading to improved decision-making processes and enhanced overall performance of the platform. The integration of these models marks a significant step forward in leveraging AI for complex tasks.

Implementation Steps

1. Configuration and Dependency Updates

The initial phase of implementing OpenAI Deep Research API support involves crucial configuration and dependency updates. This stage lays the groundwork for a seamless integration process, ensuring that all necessary components are in place and compatible. The updates include refining the project's dependencies, incorporating new models and their pricing, and enhancing documentation to reflect the added capabilities. These foundational steps are critical for the subsequent stages of implementation, setting the stage for the successful integration of OpenAI's Deep Research models into the yellhorn-mcp ecosystem. A well-prepared foundation is essential for the stability and performance of the integrated system. The meticulous attention to detail in this phase will minimize potential issues down the line, ensuring a smooth transition to the new API and model functionalities.

  • [ ] Update pyproject.toml:

    • Upgrade the openai dependency to the latest version to ensure support for the Responses API and its features. Based on recent documentation, a version greater than 1.23.6 is required.
      # pyproject.toml
      
      [project]
      #...
      dependencies = [
          "mcp[cli]~=1.10.1",
          "google-genai~=1.24.0",
          "aiohttp~=3.12.13",
          "pydantic~=2.11.7",
          "openai~=1.93.0", # Update to a version that fully supports the Responses API
          "jedi~=0.19.2",
      ]
      #...
      

    To ensure seamless integration with the OpenAI Responses API and its advanced features, updating the pyproject.toml file is a critical first step. Specifically, the openai dependency must be upgraded to the latest version, as versions greater than 1.23.6 are required to fully support the Responses API. This update ensures that the project can leverage the new API's capabilities, including the more complex, agentic tasks that the Deep Research models are designed to handle. The pyproject.toml file serves as the project's configuration hub, defining its dependencies and other settings. By updating the openai dependency within this file, we ensure that the project has access to the most recent functionalities and improvements offered by the OpenAI library. This proactive approach to dependency management is vital for maintaining the project's compatibility and performance, setting the stage for the successful implementation of the Deep Research models.

  • [ ] Update yellhorn_mcp/server.py with new models and pricing:

    • Add o3-deep-research and o4-mini-deep-research to the MODEL_PRICING dictionary. Pricing for o3-deep-research is reported as $10/1M input and $40/1M output tokens. Pricing for the mini version aligns with o4-mini.
      # yellhorn_mcp/server.py
      
      MODEL_PRICING = {
          # ... existing Gemini models
          "gemini-2.5-flash-preview-05-20": { ... },
          # ... existing OpenAI models
          "gpt-4o": { ... },
          "gpt-4o-mini": { ... },
          "o4-mini": { ... },
          "o3": { ... },
          # Add new Deep Research Models
          "o3-deep-research": {
              "input": {"default": 10.00},
              "output": {"default": 40.00},
          },
          "o4-mini-deep-research": {
              "input": {"default": 1.10}, # Same as o4-mini
              "output": {"default": 4.40}, # Same as o4-mini
          },
      }
      

    To accurately reflect the integration of OpenAI's Deep Research models, the yellhorn_mcp/server.py file must be updated to include these new models and their corresponding pricing information. Specifically, the o3-deep-research and o4-mini-deep-research models need to be added to the MODEL_PRICING dictionary. This dictionary serves as a central repository for model pricing data, which is crucial for cost estimation and resource management within the yellhorn-mcp system. The pricing for o3-deep-research is reported as $10 per 1 million input tokens and $40 per 1 million output tokens, while the pricing for the o4-mini-deep-research model aligns with that of the o4-mini model. By meticulously adding these models and their pricing details, we ensure that the system is aware of these new capabilities and can accurately track their usage and associated costs. This update is essential for maintaining transparency and control over resource allocation, allowing users to effectively leverage the Deep Research models while remaining mindful of budgetary considerations.

  • [ ] Update yellhorn_mcp/cli.py:

    • Add the new Deep Research models to the --model argument's help text to make them discoverable.
      # yellhorn_mcp/cli.py
      
      parser.add_argument(
          "--model",
          dest="model",
          default=os.getenv("YELLHORN_MCP_MODEL", "gemini-2.5-pro-preview-05-06"),
          help="Model to use (e.g., gemini-2.5-pro-preview-05-06, gpt-4o, o3, "
          "o3-deep-research, o4-mini-deep-research). Default: ...",
      )
      

    To ensure that users can easily discover and utilize the newly integrated Deep Research models, the yellhorn_mcp/cli.py file needs to be updated. This update involves adding the o3-deep-research and o4-mini-deep-research models to the --model argument's help text. The command-line interface (CLI) is a primary means of interaction with the yellhorn-mcp system, and the help text provides users with a concise overview of available options and functionalities. By including the new models in this help text, we make them readily discoverable, encouraging users to explore and leverage their capabilities. This seemingly small update significantly enhances the user experience, promoting the adoption of the Deep Research models and maximizing their impact within the yellhorn-mcp ecosystem. Clear and informative help text is crucial for empowering users to effectively interact with the system and unlock its full potential.

  • [ ] Update Documentation (README.md, docs/USAGE.md):

    • Add the new Deep Research models to the list of available OpenAI models in both documentation files.
    • Add a note explaining that these models use web_search_preview and code_interpreter tools for enhanced research capabilities.

    To ensure users are fully informed about the new capabilities, updating the documentation is a crucial step. This involves adding the o3-deep-research and o4-mini-deep-research models to the list of available OpenAI models in both the README.md and docs/USAGE.md files. These documentation files serve as primary resources for users seeking to understand and utilize the yellhorn-mcp system. Furthermore, a note should be added to explain that these models leverage the web_search_preview and code_interpreter tools for enhanced research capabilities. This additional information provides users with valuable context, highlighting the unique features and benefits of the Deep Research models. Comprehensive and up-to-date documentation is essential for user adoption and satisfaction, empowering users to effectively leverage the system's functionalities and achieve their desired outcomes. Clear and informative documentation is a cornerstone of any successful software project.

2. Implement Responses API Logic

Implementing the Responses API logic is the core of this integration effort, requiring careful attention to detail and a thorough understanding of the API's structure and functionality. This phase involves creating helper functions, modifying existing processing logic, and ensuring that the Deep Research models can effectively leverage the new API's capabilities. The successful implementation of this logic is crucial for enabling the advanced features of the Deep Research models, such as web search and code interpretation, ultimately enhancing the system's ability to perform complex research tasks. This step is essential for unlocking the full potential of the new models.

  • [ ] Create a helper function in yellhorn_mcp/server.py:

    • Add a simple helper to identify Deep Research models, which will determine when to add special tools to the API call.
      # yellhorn_mcp/server.py
      
      def is_deep_research_model(model_name: str) -> bool:
          """Checks if the model is an OpenAI Deep Research model."""
          return "deep-research" in model_name
      
      # ...
      

    To streamline the process of identifying Deep Research models and applying the necessary configurations, a helper function is created in yellhorn_mcp/server.py. This function, named is_deep_research_model, takes a model name as input and returns a boolean value indicating whether the model is a Deep Research model. This simple yet effective function serves as a gatekeeper, determining when to add special tools, such as web_search_preview and code_interpreter, to the API call. By encapsulating this logic within a dedicated function, we enhance the code's readability and maintainability, making it easier to adapt to future changes or additions. This helper function plays a crucial role in ensuring that the Deep Research models are correctly configured and utilized, enabling them to leverage their advanced capabilities for enhanced research and analysis. This function simplifies the conditional logic for tool enablement.

  • [ ] Modify process_workplan_async in yellhorn_mcp/server.py:

    • Update the OpenAI API call to use the Responses API structure and conditionally include tools for Deep Research models. The Responses API uses client.responses.create instead of client.chat.completions.create.
      # yellhorn_mcp/server.py
      
      async def process_workplan_async(...):
          # ... inside the `if is_openai_model:` block
          if not openai_client:
              raise YellhornMCPError("OpenAI client not initialized. Is OPENAI_API_KEY set?")
      
          # Prepare parameters for the API call
          api_params = {
              "model": model,
              "input": prompt, # Responses API uses `input` instead of `messages`
              # store: false can be set to not persist the conversation state
          }
      
          if is_deep_research_model(model):
              await ctx.log(level="info", message=f"Enabling Deep Research tools for model {model}")
              api_params["tools"] = [
                  {"type": "web_search_preview"},
                  {"type": "code_interpreter"}
              ]
      
          # Call OpenAI Responses API
          response = await openai_client.responses.create(**api_params)
      
          # Extract content and usage from the new response format
          workplan_content = response.output.text # Output is in response.output.text
          usage_metadata = response.usage
          # ... rest of the function
      

    To fully leverage the capabilities of the Responses API and the Deep Research models, the process_workplan_async function in yellhorn_mcp/server.py must be modified. This modification involves updating the OpenAI API call to use the client.responses.create method, which is the designated endpoint for the Responses API, instead of the older client.chat.completions.create method. Furthermore, the code is updated to conditionally include tools, such as web_search_preview and code_interpreter, for Deep Research models. This conditional inclusion is based on the is_deep_research_model helper function, ensuring that the tools are only enabled when appropriate. The API parameters are prepared with the input parameter, which replaces the messages parameter used in the Chat Completions API. Finally, the response from the API is processed to extract the workplan content and usage metadata, adapting to the new response format of the Responses API. This modification is crucial for enabling the Deep Research models to perform their tasks effectively, leveraging the advanced features of the Responses API.

  • [ ] Modify process_judgement_async in yellhorn_mcp/server.py:

    • Apply the same API call modification as in process_workplan_async to the judgement generation logic.
      # yellhorn_mcp/server.py
      
      async def process_judgement_async(...):
          # ... inside the `if is_openai_model:` block
          if not openai_client:
              raise YellhornMCPError("OpenAI client not initialized. Is OPENAI_API_KEY set?")
      
          # Prepare parameters for the API call
          api_params = {
              "model": model,
              "input": prompt,
          }
      
          if is_deep_research_model(model):
              await ctx.log(level="info", message=f"Enabling Deep Research tools for model {model}")
              api_params["tools"] = [
                  {"type": "web_search_preview"},
                  {"type": "code_interpreter"}
              ]
      
          # Call OpenAI Responses API
          response = await openai_client.responses.create(**api_params)
      
          # Extract content and usage
          judgement_content = response.output.text
          usage_metadata = response.usage
          # ... rest of the function
      

    To maintain consistency and ensure that the Deep Research models are utilized effectively across the system, the process_judgement_async function in yellhorn_mcp/server.py must also be modified. This modification mirrors the changes made to process_workplan_async, applying the same API call updates to the judgement generation logic. Specifically, the OpenAI API call is updated to use the client.responses.create method, and tools are conditionally included for Deep Research models based on the is_deep_research_model helper function. The API parameters are prepared with the input parameter, and the response is processed to extract the judgement content and usage metadata, adapting to the Responses API's new format. By applying these consistent modifications across both process_workplan_async and process_judgement_async, we ensure that the Deep Research models are seamlessly integrated into the system's core functionalities, providing a unified and consistent user experience. This ensures consistency in how Deep Research models are used.

3. Update Testing Suite

A comprehensive testing suite is paramount to ensuring the successful integration of the Deep Research API. The testing suite must be updated to reflect the changes in the API and the introduction of the new models. This involves modifying existing tests, adding new tests specifically for the Deep Research models, and ensuring that all tests pass with the new functionality in place. A robust testing suite provides confidence in the stability and reliability of the system, minimizing the risk of introducing bugs or regressions. Thorough testing is crucial for a successful integration.

  • [ ] Modify tests/test_openai.py:

    • Add a new fixture or update mock_openai_client to mock openai_client.responses.create.
    • The mock response object should now have an output.text attribute for the content and a usage attribute for token metrics.
    • Add new tests for o3-deep-research and o4-mini-deep-research. These tests should assert that openai_client.responses.create is called with the tools parameter containing web_search_preview and code_interpreter.
    • Update existing OpenAI tests to call the mocked responses.create instead of chat.completions.create and to check the new input parameter instead of messages.

    To ensure the proper functionality of the OpenAI integration, the tests/test_openai.py file must be modified. This involves several key updates, including mocking the openai_client.responses.create method, updating the mock response object to reflect the Responses API's structure, and adding new tests specifically for the o3-deep-research and o4-mini-deep-research models. The mock response object should now have an output.text attribute for the content and a usage attribute for token metrics, mirroring the Responses API's response format. The new tests for the Deep Research models should assert that openai_client.responses.create is called with the tools parameter containing web_search_preview and code_interpreter, verifying that these tools are correctly enabled for these models. Additionally, existing OpenAI tests should be updated to call the mocked responses.create method instead of chat.completions.create and to check the new input parameter instead of messages, ensuring compatibility with the Responses API. These modifications are essential for validating the integration of the Deep Research models and the Responses API.

  • [ ] Update tests/test_async_flows_openai.py and tests/test_server.py:

    • Review and update any tests that mock or interact with the OpenAI client to reflect the switch to the Responses API (responses.create). This includes error handling and empty response tests.

    To ensure comprehensive test coverage, the tests/test_async_flows_openai.py and tests/test_server.py files need to be updated. This update involves reviewing and modifying any tests that mock or interact with the OpenAI client to reflect the switch to the Responses API (responses.create). This includes updating error handling tests and empty response tests to align with the new API's behavior. By ensuring that all relevant tests are updated, we maintain a robust testing suite that thoroughly validates the system's functionality, providing confidence in its stability and reliability. This comprehensive approach to testing is crucial for identifying and addressing potential issues early in the development process, minimizing the risk of introducing bugs or regressions. This thorough review ensures no tests are missed.

Technical Details

The technical details of this integration are crucial for understanding the underlying mechanisms and ensuring a smooth transition. The primary change is the shift from OpenAI's chat.completions.create to responses.create. The responses endpoint is designed for more stateful, agentic workflows. While the current implementation is stateless, adopting this new API aligns with OpenAI's future direction and is necessary for tool use. The Deep Research feature is enabled by passing a tools array in the API request. For o3-deep-research and o4-mini-deep-research, we must include {"type": "web_search_preview"} and {"type": "code_interpreter"}. This will be handled conditionally based on the model name. The response object from the responses.create endpoint has a different structure. Text content is now located in response.output.text instead of response.choices[0].message.content. The usage metadata object, however, appears to retain a similar structure for token counts. The application will remain stateless. We will not use the previous_response_id feature of the Responses API, as each workplan or judgement is a self-contained task. This is achieved by not setting store: true in the API call. These details are critical for a successful implementation.

  • API Migration: The primary change is the shift from OpenAI's chat.completions.create to responses.create. The responses endpoint is designed for more stateful, agentic workflows. While the current implementation is stateless, adopting this new API aligns with OpenAI's future direction and is necessary for tool use.
  • Tool Configuration: The Deep Research feature is enabled by passing a tools array in the API request. For o3-deep-research and o4-mini-deep-research, we must include {"type": "web_search_preview"} and {"type": "code_interpreter"}. This will be handled conditionally based on the model name.
  • Response Handling: The response object from the responses.create endpoint has a different structure. Text content is now located in response.output.text instead of response.choices[0].message.content. The usage metadata object, however, appears to retain a similar structure for token counts.
  • Stateless Approach: The application will remain stateless. We will not use the previous_response_id feature of the Responses API, as each workplan or judgement is a self-contained task. This is achieved by not setting store: true in the API call.

Files to Modify

A clear understanding of the files that need modification is essential for efficient implementation. This section provides a comprehensive list of the files that will be directly impacted by this integration, ensuring that developers can easily navigate the codebase and make the necessary changes. The list includes configuration files, server-side logic, CLI tools, documentation, and testing files. This transparency facilitates collaboration and minimizes the risk of overlooking critical modifications. Knowing which files to modify is key to efficient work.

  • pyproject.toml (To update the openai dependency)
  • yellhorn_mcp/server.py (To add new models, pricing, and implement the new API logic)
  • yellhorn_mcp/cli.py (To update CLI help text)
  • README.md (For documentation updates)
  • docs/USAGE.md (For documentation updates)
  • tests/test_openai.py (To add tests for new models and API)
  • tests/test_async_flows_openai.py (To update existing tests for the new API)

New Files to Create

This integration does not require the creation of any new files. All necessary modifications will be made to existing files, streamlining the implementation process and minimizing the risk of introducing unnecessary complexity. This approach helps maintain the codebase's organization and coherence, making it easier to manage and maintain in the long run. No new files are required, simplifying the process.

  • None

References

Access to relevant documentation is crucial for a successful integration. This section provides a list of references, including links to the OpenAI Deep Research API documentation and the current implementation locations within the codebase. These references serve as valuable resources for developers, providing them with the information they need to understand the API's intricacies and navigate the existing code. By providing these references, we empower developers to make informed decisions and implement the integration effectively. Easy access to documentation is key to a smooth integration.


Completion Metrics

  • Model Used: gemini-2.5-pro-preview-05-06
  • Input Tokens: 135181
  • Output Tokens: 2533
  • Total Tokens: 141124
  • Estimated Cost: $0.1943