Implementing OpenAI Deep Research API Support For Enhanced Workplans And Judgements
Summary
This comprehensive workplan details the steps required to seamlessly integrate OpenAI's Deep Research models, specifically o3-deep-research
and o4-mini-deep-research
, into the yellhorn-mcp
ecosystem. The core of this endeavor involves a strategic update to the existing OpenAI integration, transitioning from the Chat Completions API to the more advanced Responses API. This transition is crucial as the Responses API is designed to support more complex, agentic tasks, aligning with the sophisticated capabilities of the Deep Research models. A key aspect of this implementation will be the targeted modification of yellhorn_mcp/server.py
. We will introduce conditional logic to enable the web_search_preview
and code_interpreter
tools, which are fundamental for the Deep Research models' functionality. Furthermore, the workplan encompasses the addition of new model definitions, meticulous updates to pricing information, and crucial modifications to the testing suite. These modifications are designed to ensure that all existing functionalities continue to operate flawlessly while the new Deep Research capabilities are integrated seamlessly. The successful execution of this workplan will significantly enhance yellhorn-mcp
's ability to perform in-depth research and provide comprehensive judgements, leveraging the full potential of OpenAI's cutting-edge Deep Research models. This integration will empower users to conduct more thorough investigations and derive more insightful conclusions, ultimately leading to improved decision-making processes and enhanced overall performance of the platform. The integration of these models marks a significant step forward in leveraging AI for complex tasks.
Implementation Steps
1. Configuration and Dependency Updates
The initial phase of implementing OpenAI Deep Research API support involves crucial configuration and dependency updates. This stage lays the groundwork for a seamless integration process, ensuring that all necessary components are in place and compatible. The updates include refining the project's dependencies, incorporating new models and their pricing, and enhancing documentation to reflect the added capabilities. These foundational steps are critical for the subsequent stages of implementation, setting the stage for the successful integration of OpenAI's Deep Research models into the yellhorn-mcp
ecosystem. A well-prepared foundation is essential for the stability and performance of the integrated system. The meticulous attention to detail in this phase will minimize potential issues down the line, ensuring a smooth transition to the new API and model functionalities.
-
[ ] Update
pyproject.toml
:- Upgrade the
openai
dependency to the latest version to ensure support for the Responses API and its features. Based on recent documentation, a version greater than1.23.6
is required.# pyproject.toml [project] #... dependencies = [ "mcp[cli]~=1.10.1", "google-genai~=1.24.0", "aiohttp~=3.12.13", "pydantic~=2.11.7", "openai~=1.93.0", # Update to a version that fully supports the Responses API "jedi~=0.19.2", ] #...
To ensure seamless integration with the OpenAI Responses API and its advanced features, updating the
pyproject.toml
file is a critical first step. Specifically, theopenai
dependency must be upgraded to the latest version, as versions greater than1.23.6
are required to fully support the Responses API. This update ensures that the project can leverage the new API's capabilities, including the more complex, agentic tasks that the Deep Research models are designed to handle. Thepyproject.toml
file serves as the project's configuration hub, defining its dependencies and other settings. By updating theopenai
dependency within this file, we ensure that the project has access to the most recent functionalities and improvements offered by the OpenAI library. This proactive approach to dependency management is vital for maintaining the project's compatibility and performance, setting the stage for the successful implementation of the Deep Research models. - Upgrade the
-
[ ] Update
yellhorn_mcp/server.py
with new models and pricing:- Add
o3-deep-research
ando4-mini-deep-research
to theMODEL_PRICING
dictionary. Pricing foro3-deep-research
is reported as $10/1M input and $40/1M output tokens. Pricing for the mini version aligns witho4-mini
.# yellhorn_mcp/server.py MODEL_PRICING = { # ... existing Gemini models "gemini-2.5-flash-preview-05-20": { ... }, # ... existing OpenAI models "gpt-4o": { ... }, "gpt-4o-mini": { ... }, "o4-mini": { ... }, "o3": { ... }, # Add new Deep Research Models "o3-deep-research": { "input": {"default": 10.00}, "output": {"default": 40.00}, }, "o4-mini-deep-research": { "input": {"default": 1.10}, # Same as o4-mini "output": {"default": 4.40}, # Same as o4-mini }, }
To accurately reflect the integration of OpenAI's Deep Research models, the
yellhorn_mcp/server.py
file must be updated to include these new models and their corresponding pricing information. Specifically, theo3-deep-research
ando4-mini-deep-research
models need to be added to theMODEL_PRICING
dictionary. This dictionary serves as a central repository for model pricing data, which is crucial for cost estimation and resource management within theyellhorn-mcp
system. The pricing foro3-deep-research
is reported as $10 per 1 million input tokens and $40 per 1 million output tokens, while the pricing for theo4-mini-deep-research
model aligns with that of theo4-mini
model. By meticulously adding these models and their pricing details, we ensure that the system is aware of these new capabilities and can accurately track their usage and associated costs. This update is essential for maintaining transparency and control over resource allocation, allowing users to effectively leverage the Deep Research models while remaining mindful of budgetary considerations. - Add
-
[ ] Update
yellhorn_mcp/cli.py
:- Add the new Deep Research models to the
--model
argument's help text to make them discoverable.# yellhorn_mcp/cli.py parser.add_argument( "--model", dest="model", default=os.getenv("YELLHORN_MCP_MODEL", "gemini-2.5-pro-preview-05-06"), help="Model to use (e.g., gemini-2.5-pro-preview-05-06, gpt-4o, o3, " "o3-deep-research, o4-mini-deep-research). Default: ...", )
To ensure that users can easily discover and utilize the newly integrated Deep Research models, the
yellhorn_mcp/cli.py
file needs to be updated. This update involves adding theo3-deep-research
ando4-mini-deep-research
models to the--model
argument's help text. The command-line interface (CLI) is a primary means of interaction with theyellhorn-mcp
system, and the help text provides users with a concise overview of available options and functionalities. By including the new models in this help text, we make them readily discoverable, encouraging users to explore and leverage their capabilities. This seemingly small update significantly enhances the user experience, promoting the adoption of the Deep Research models and maximizing their impact within theyellhorn-mcp
ecosystem. Clear and informative help text is crucial for empowering users to effectively interact with the system and unlock its full potential. - Add the new Deep Research models to the
-
[ ] Update Documentation (
README.md
,docs/USAGE.md
):- Add the new Deep Research models to the list of available OpenAI models in both documentation files.
- Add a note explaining that these models use
web_search_preview
andcode_interpreter
tools for enhanced research capabilities.
To ensure users are fully informed about the new capabilities, updating the documentation is a crucial step. This involves adding the
o3-deep-research
ando4-mini-deep-research
models to the list of available OpenAI models in both theREADME.md
anddocs/USAGE.md
files. These documentation files serve as primary resources for users seeking to understand and utilize theyellhorn-mcp
system. Furthermore, a note should be added to explain that these models leverage theweb_search_preview
andcode_interpreter
tools for enhanced research capabilities. This additional information provides users with valuable context, highlighting the unique features and benefits of the Deep Research models. Comprehensive and up-to-date documentation is essential for user adoption and satisfaction, empowering users to effectively leverage the system's functionalities and achieve their desired outcomes. Clear and informative documentation is a cornerstone of any successful software project.
2. Implement Responses API Logic
Implementing the Responses API logic is the core of this integration effort, requiring careful attention to detail and a thorough understanding of the API's structure and functionality. This phase involves creating helper functions, modifying existing processing logic, and ensuring that the Deep Research models can effectively leverage the new API's capabilities. The successful implementation of this logic is crucial for enabling the advanced features of the Deep Research models, such as web search and code interpretation, ultimately enhancing the system's ability to perform complex research tasks. This step is essential for unlocking the full potential of the new models.
-
[ ] Create a helper function in
yellhorn_mcp/server.py
:- Add a simple helper to identify Deep Research models, which will determine when to add special tools to the API call.
# yellhorn_mcp/server.py def is_deep_research_model(model_name: str) -> bool: """Checks if the model is an OpenAI Deep Research model.""" return "deep-research" in model_name # ...
To streamline the process of identifying Deep Research models and applying the necessary configurations, a helper function is created in
yellhorn_mcp/server.py
. This function, namedis_deep_research_model
, takes a model name as input and returns a boolean value indicating whether the model is a Deep Research model. This simple yet effective function serves as a gatekeeper, determining when to add special tools, such asweb_search_preview
andcode_interpreter
, to the API call. By encapsulating this logic within a dedicated function, we enhance the code's readability and maintainability, making it easier to adapt to future changes or additions. This helper function plays a crucial role in ensuring that the Deep Research models are correctly configured and utilized, enabling them to leverage their advanced capabilities for enhanced research and analysis. This function simplifies the conditional logic for tool enablement. - Add a simple helper to identify Deep Research models, which will determine when to add special tools to the API call.
-
[ ] Modify
process_workplan_async
inyellhorn_mcp/server.py
:- Update the OpenAI API call to use the Responses API structure and conditionally include tools for Deep Research models. The Responses API uses
client.responses.create
instead ofclient.chat.completions.create
.# yellhorn_mcp/server.py async def process_workplan_async(...): # ... inside the `if is_openai_model:` block if not openai_client: raise YellhornMCPError("OpenAI client not initialized. Is OPENAI_API_KEY set?") # Prepare parameters for the API call api_params = { "model": model, "input": prompt, # Responses API uses `input` instead of `messages` # store: false can be set to not persist the conversation state } if is_deep_research_model(model): await ctx.log(level="info", message=f"Enabling Deep Research tools for model {model}") api_params["tools"] = [ {"type": "web_search_preview"}, {"type": "code_interpreter"} ] # Call OpenAI Responses API response = await openai_client.responses.create(**api_params) # Extract content and usage from the new response format workplan_content = response.output.text # Output is in response.output.text usage_metadata = response.usage # ... rest of the function
To fully leverage the capabilities of the Responses API and the Deep Research models, the
process_workplan_async
function inyellhorn_mcp/server.py
must be modified. This modification involves updating the OpenAI API call to use theclient.responses.create
method, which is the designated endpoint for the Responses API, instead of the olderclient.chat.completions.create
method. Furthermore, the code is updated to conditionally include tools, such asweb_search_preview
andcode_interpreter
, for Deep Research models. This conditional inclusion is based on theis_deep_research_model
helper function, ensuring that the tools are only enabled when appropriate. The API parameters are prepared with theinput
parameter, which replaces themessages
parameter used in the Chat Completions API. Finally, the response from the API is processed to extract the workplan content and usage metadata, adapting to the new response format of the Responses API. This modification is crucial for enabling the Deep Research models to perform their tasks effectively, leveraging the advanced features of the Responses API. - Update the OpenAI API call to use the Responses API structure and conditionally include tools for Deep Research models. The Responses API uses
-
[ ] Modify
process_judgement_async
inyellhorn_mcp/server.py
:- Apply the same API call modification as in
process_workplan_async
to the judgement generation logic.# yellhorn_mcp/server.py async def process_judgement_async(...): # ... inside the `if is_openai_model:` block if not openai_client: raise YellhornMCPError("OpenAI client not initialized. Is OPENAI_API_KEY set?") # Prepare parameters for the API call api_params = { "model": model, "input": prompt, } if is_deep_research_model(model): await ctx.log(level="info", message=f"Enabling Deep Research tools for model {model}") api_params["tools"] = [ {"type": "web_search_preview"}, {"type": "code_interpreter"} ] # Call OpenAI Responses API response = await openai_client.responses.create(**api_params) # Extract content and usage judgement_content = response.output.text usage_metadata = response.usage # ... rest of the function
To maintain consistency and ensure that the Deep Research models are utilized effectively across the system, the
process_judgement_async
function inyellhorn_mcp/server.py
must also be modified. This modification mirrors the changes made toprocess_workplan_async
, applying the same API call updates to the judgement generation logic. Specifically, the OpenAI API call is updated to use theclient.responses.create
method, and tools are conditionally included for Deep Research models based on theis_deep_research_model
helper function. The API parameters are prepared with theinput
parameter, and the response is processed to extract the judgement content and usage metadata, adapting to the Responses API's new format. By applying these consistent modifications across bothprocess_workplan_async
andprocess_judgement_async
, we ensure that the Deep Research models are seamlessly integrated into the system's core functionalities, providing a unified and consistent user experience. This ensures consistency in how Deep Research models are used. - Apply the same API call modification as in
3. Update Testing Suite
A comprehensive testing suite is paramount to ensuring the successful integration of the Deep Research API. The testing suite must be updated to reflect the changes in the API and the introduction of the new models. This involves modifying existing tests, adding new tests specifically for the Deep Research models, and ensuring that all tests pass with the new functionality in place. A robust testing suite provides confidence in the stability and reliability of the system, minimizing the risk of introducing bugs or regressions. Thorough testing is crucial for a successful integration.
-
[ ] Modify
tests/test_openai.py
:- Add a new fixture or update
mock_openai_client
to mockopenai_client.responses.create
. - The mock response object should now have an
output.text
attribute for the content and ausage
attribute for token metrics. - Add new tests for
o3-deep-research
ando4-mini-deep-research
. These tests should assert thatopenai_client.responses.create
is called with thetools
parameter containingweb_search_preview
andcode_interpreter
. - Update existing OpenAI tests to call the mocked
responses.create
instead ofchat.completions.create
and to check the newinput
parameter instead ofmessages
.
To ensure the proper functionality of the OpenAI integration, the
tests/test_openai.py
file must be modified. This involves several key updates, including mocking theopenai_client.responses.create
method, updating the mock response object to reflect the Responses API's structure, and adding new tests specifically for theo3-deep-research
ando4-mini-deep-research
models. The mock response object should now have anoutput.text
attribute for the content and ausage
attribute for token metrics, mirroring the Responses API's response format. The new tests for the Deep Research models should assert thatopenai_client.responses.create
is called with thetools
parameter containingweb_search_preview
andcode_interpreter
, verifying that these tools are correctly enabled for these models. Additionally, existing OpenAI tests should be updated to call the mockedresponses.create
method instead ofchat.completions.create
and to check the newinput
parameter instead ofmessages
, ensuring compatibility with the Responses API. These modifications are essential for validating the integration of the Deep Research models and the Responses API. - Add a new fixture or update
-
[ ] Update
tests/test_async_flows_openai.py
andtests/test_server.py
:- Review and update any tests that mock or interact with the OpenAI client to reflect the switch to the Responses API (
responses.create
). This includes error handling and empty response tests.
To ensure comprehensive test coverage, the
tests/test_async_flows_openai.py
andtests/test_server.py
files need to be updated. This update involves reviewing and modifying any tests that mock or interact with the OpenAI client to reflect the switch to the Responses API (responses.create
). This includes updating error handling tests and empty response tests to align with the new API's behavior. By ensuring that all relevant tests are updated, we maintain a robust testing suite that thoroughly validates the system's functionality, providing confidence in its stability and reliability. This comprehensive approach to testing is crucial for identifying and addressing potential issues early in the development process, minimizing the risk of introducing bugs or regressions. This thorough review ensures no tests are missed. - Review and update any tests that mock or interact with the OpenAI client to reflect the switch to the Responses API (
Technical Details
The technical details of this integration are crucial for understanding the underlying mechanisms and ensuring a smooth transition. The primary change is the shift from OpenAI's chat.completions.create
to responses.create
. The responses
endpoint is designed for more stateful, agentic workflows. While the current implementation is stateless, adopting this new API aligns with OpenAI's future direction and is necessary for tool use. The Deep Research feature is enabled by passing a tools
array in the API request. For o3-deep-research
and o4-mini-deep-research
, we must include {"type": "web_search_preview"}
and {"type": "code_interpreter"}
. This will be handled conditionally based on the model name. The response object from the responses.create
endpoint has a different structure. Text content is now located in response.output.text
instead of response.choices[0].message.content
. The usage
metadata object, however, appears to retain a similar structure for token counts. The application will remain stateless. We will not use the previous_response_id
feature of the Responses API, as each workplan or judgement is a self-contained task. This is achieved by not setting store: true
in the API call. These details are critical for a successful implementation.
- API Migration: The primary change is the shift from OpenAI's
chat.completions.create
toresponses.create
. Theresponses
endpoint is designed for more stateful, agentic workflows. While the current implementation is stateless, adopting this new API aligns with OpenAI's future direction and is necessary for tool use. - Tool Configuration: The Deep Research feature is enabled by passing a
tools
array in the API request. Foro3-deep-research
ando4-mini-deep-research
, we must include{"type": "web_search_preview"}
and{"type": "code_interpreter"}
. This will be handled conditionally based on the model name. - Response Handling: The response object from the
responses.create
endpoint has a different structure. Text content is now located inresponse.output.text
instead ofresponse.choices[0].message.content
. Theusage
metadata object, however, appears to retain a similar structure for token counts. - Stateless Approach: The application will remain stateless. We will not use the
previous_response_id
feature of the Responses API, as each workplan or judgement is a self-contained task. This is achieved by not settingstore: true
in the API call.
Files to Modify
A clear understanding of the files that need modification is essential for efficient implementation. This section provides a comprehensive list of the files that will be directly impacted by this integration, ensuring that developers can easily navigate the codebase and make the necessary changes. The list includes configuration files, server-side logic, CLI tools, documentation, and testing files. This transparency facilitates collaboration and minimizes the risk of overlooking critical modifications. Knowing which files to modify is key to efficient work.
pyproject.toml
(To update theopenai
dependency)yellhorn_mcp/server.py
(To add new models, pricing, and implement the new API logic)yellhorn_mcp/cli.py
(To update CLI help text)README.md
(For documentation updates)docs/USAGE.md
(For documentation updates)tests/test_openai.py
(To add tests for new models and API)tests/test_async_flows_openai.py
(To update existing tests for the new API)
New Files to Create
This integration does not require the creation of any new files. All necessary modifications will be made to existing files, streamlining the implementation process and minimizing the risk of introducing unnecessary complexity. This approach helps maintain the codebase's organization and coherence, making it easier to manage and maintain in the long run. No new files are required, simplifying the process.
- None
References
Access to relevant documentation is crucial for a successful integration. This section provides a list of references, including links to the OpenAI Deep Research API documentation and the current implementation locations within the codebase. These references serve as valuable resources for developers, providing them with the information they need to understand the API's intricacies and navigate the existing code. By providing these references, we empower developers to make informed decisions and implement the integration effectively. Easy access to documentation is key to a smooth integration.
- OpenAI Deep Research API Documentation: https://platform.openai.com/docs/guides/deep-research
- Current implementation locations:
yellhorn_mcp/server.py
(lines ~259, ~1049, ~1429, ~1988)
Completion Metrics
- Model Used:
gemini-2.5-pro-preview-05-06
- Input Tokens: 135181
- Output Tokens: 2533
- Total Tokens: 141124
- Estimated Cost: $0.1943