-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated Specification and Documentation to support Audio Modality. #93
base: main
Are you sure you want to change the base?
Updated Specification and Documentation to support Audio Modality. #93
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! This makes sense to me, and seems like a clean extension to the protocol.
@dsp-ant Any thoughts?
We probably want to rev the protocol version, since this would be a new |
Co-authored-by: Justin Spahr-Summers <[email protected]>
This seems reasonable to me. I am not accepting mostly so we don't accidentally merge this. We need to first rev the protocol and add ways to handle revisions in the current protocol. |
@dsp-ant - I couldn't find the tool/script to generate new versions of the SDKs from the spec for testing - are they available? |
We've now created a separate place for the draft version of the spec. Can you please move this there?
We use Claude to update the SDKs in response to spec changes—e.g., by giving it the current SDK interfaces and a diff of what changed in the schema. |
This change supports discussion #88 and includes Audio Modality in the specification.
Motivation and Context
This would enable integration with models that support Audio as input/output in context such as
gpt-4o-audio-preview
.https://platform.openai.com/docs/guides/audio
How Has This Been Tested?
This has been tested using the Inspector tool, with local type extensions:
The Inspector application was updated to render the Audio player for this type:
The Server produced this JSON:
I was unable to find the process to build the TypeScript SDK from the Schema, hence the approach of extending types.
Ultimately I would like to integrate this in to my Chat application supporting gpt-4o (and potential new models) with Audio support.
Breaking Changes
No. However:
Types of changes
Checklist
Additional context
I believe that adding the "Audio" type is appropriate as it seems congruent the way that text/image modalities are typically handled.