PUBLISH /models
Analyze a continuous media stream with Hume models.
Supports the following Hume models:
burst
face
facemesh
language
prosody
To learn more about what these models do and the science behind them please check out the Hume platform help pages.
Note that this endpoint has a timeout of one minute. To maintain a connection longer than a minute you will need to implement custom reconnect logic in your application.
Models endpoint payload
Payload
-
data string
Raw bytes of media file. Should be base64 encoded and in a standard media format.
Recommended filetypes:
- audio:
wav
,mp3
,mp4
- image:
png
,jpeg
- text:
txt
- facemesh:
json
Note: Streaming video files is not recommended. For better latency you should stream videos by sending a single frame image in each payload.
Limits on data size:
- audio: 5000 milliseconds
- image: 3,000 x 3,000 pixels
- text: 10,000 characters
- facemesh: 10 faces
- audio:
-
models object
Configuration used to specify which models should be used and with what settings.
Hide models attributes Show models attributes-
burst object
Configuration for the vocal burst emotion model.
Note: Model configuration is not currently available in streaming.
Please use the default configuration by passing an empty object
{}
. -
face object
Configuration for the facial expression emotion model.
Note: Using the
reset_stream
parameter does not have any effect on face identification. A single face identifier cache is maintained over a full session whetherreset_stream
is used or not.Hide face attributes Show face attributes-
facs object
Configuration for FACS predictions. If missing or null, no FACS predictions will be generated.
-
descriptions object
Configuration for Descriptions predictions. If missing or null, no Descriptions predictions will be generated.
-
identify_faces boolean
Whether to return identifiers for faces across frames. If true, unique identifiers will be assigned to face bounding boxes to differentiate different faces. If false, all faces will be tagged with an "unknown" ID.
Default value is
false
. -
fps_pred number
Number of frames per second to process. Other frames will be omitted from the response.
Default value is
3
. -
prob_threshold number
Face detection probability threshold. Faces detected with a probability less than this threshold will be omitted from the response.
Default value is
3
. -
min_face_size number
Minimum bounding box side length in pixels to treat as a face. Faces detected with a bounding box side length in pixels less than this threshold will be omitted from the response.
Default value is
3
.
-
-
facemesh object
Configuration for the facemesh emotion model.
Note: Model configuration is not currently available in streaming.
Please use the default configuration by passing an empty object
{}
. -
language object
Configuration for the language emotion model.
Hide language attributes Show language attributes-
sentiment object
Configuration for sentiment predictions. If missing or null, no sentiment predictions will be generated.
-
toxicity object
Configuration for toxicity predictions. If missing or null, no toxicity predictions will be generated.
-
granularity string
The granularity at which to generate predictions. Values are
word
,sentence
,utterance
, orpassage
. To get a single prediction for the entire text of your streaming payload usepassage
. Default value isword
.
-
-
prosody object
Configuration for the speech prosody emotion model.
Note: Model configuration is not currently available in streaming.
Please use the default configuration by passing an empty object
{}
.
-
-
stream_window_ms number
Length in milliseconds of streaming sliding window.
Extending the length of this window will prepend media context from past payloads into the current payload.
For example, if on the first payload you send 500ms of data and on the second payload you send an additional 500ms of data, a window of at least 1000ms will allow the model to process all 1000ms of stream data.
A window of 600ms would append the full 500ms of the second payload to the last 100ms of the first payload.
Note: This feature is currently only supported for audio data and audio models. For other file types and models this parameter will be ignored.
Minimum value is
500
, maximum value is10000
. Default value is5000
. -
reset_stream boolean
Whether to reset the streaming sliding window before processing the current payload.
If this parameter is set to
true
then past context will be deleted before processing the current payload.Use reset_stream when one audio file is done being processed and you do not want context to leak across files.
Default value is
false
. -
raw_text boolean
Set to
true
to enable the data parameter to be parsed as raw text rather than base64 encoded bytes.
This parameter is useful if you want to send text to be processed by the language model, but it cannot be used with other file types like audio, image, or video.Default value is
false
. -
job_details boolean
Set to
true
to get details about the job.This parameter can be set in the same payload as data or it can be set without data and models configuration to get the job details between payloads.
This parameter is useful to get the unique job ID.
Default value is
false
. -
payload_id string
Pass an arbitrary string as the payload ID and get it back at the top level of the socket response.
This can be useful if you have multiple requests running asynchronously and want to disambiguate responses as they are received.
{
"data": "string",
"models": {
"burst": {},
"face": {
"facs": {},
"descriptions": {},
"identify_faces": false,
"fps_pred": 3,
"prob_threshold": 3,
"min_face_size": 3
},
"facemesh": {},
"language": {
"sentiment": {},
"toxicity": {},
"granularity": "string"
},
"prosody": {}
},
"stream_window_ms": 5000,
"reset_stream": false,
"raw_text": false,
"job_details": false,
"payload_id": "string"
}