PUBLISH /models

PUB /models

Analyze a continuous media stream with Hume models.

Supports the following Hume models:

  • burst
  • face
  • facemesh
  • language
  • prosody

To learn more about what these models do and the science behind them please check out the Hume platform help pages.

Note that this endpoint has a timeout of one minute. To maintain a connection longer than a minute you will need to implement custom reconnect logic in your application.

PUBLISH /models

Models endpoint payload

Payload

  • data string

    Raw bytes of media file. Should be base64 encoded and in a standard media format.

    Recommended filetypes:

    • audio: wav, mp3, mp4
    • image: png, jpeg
    • text: txt
    • facemesh: json

    Note: Streaming video files is not recommended. For better latency you should stream videos by sending a single frame image in each payload.

    Limits on data size:

    • audio: 5000 milliseconds
    • image: 3,000 x 3,000 pixels
    • text: 10,000 characters
    • facemesh: 10 faces
  • models object

    Configuration used to specify which models should be used and with what settings.

    Hide models attributes Show models attributes
    • burst object

      Configuration for the vocal burst emotion model.

      Note: Model configuration is not currently available in streaming.

      Please use the default configuration by passing an empty object {}.

    • face object

      Configuration for the facial expression emotion model.

      Note: Using the reset_stream parameter does not have any effect on face identification. A single face identifier cache is maintained over a full session whether reset_stream is used or not.

      Hide face attributes Show face attributes
      • facs object

        Configuration for FACS predictions. If missing or null, no FACS predictions will be generated.

      • Configuration for Descriptions predictions. If missing or null, no Descriptions predictions will be generated.

      • Whether to return identifiers for faces across frames. If true, unique identifiers will be assigned to face bounding boxes to differentiate different faces. If false, all faces will be tagged with an "unknown" ID.

        Default value is false.

      • fps_pred number

        Number of frames per second to process. Other frames will be omitted from the response.

        Default value is 3.

      • Face detection probability threshold. Faces detected with a probability less than this threshold will be omitted from the response.

        Default value is 3.

      • Minimum bounding box side length in pixels to treat as a face. Faces detected with a bounding box side length in pixels less than this threshold will be omitted from the response.

        Default value is 3.

    • facemesh object

      Configuration for the facemesh emotion model.

      Note: Model configuration is not currently available in streaming.

      Please use the default configuration by passing an empty object {}.

    • language object

      Configuration for the language emotion model.

      Hide language attributes Show language attributes
      • Configuration for sentiment predictions. If missing or null, no sentiment predictions will be generated.

      • toxicity object

        Configuration for toxicity predictions. If missing or null, no toxicity predictions will be generated.

      • The granularity at which to generate predictions. Values are word, sentence, utterance, or passage. To get a single prediction for the entire text of your streaming payload use passage. Default value is word.

    • prosody object

      Configuration for the speech prosody emotion model.

      Note: Model configuration is not currently available in streaming.

      Please use the default configuration by passing an empty object {}.

  • Length in milliseconds of streaming sliding window.

    Extending the length of this window will prepend media context from past payloads into the current payload.

    For example, if on the first payload you send 500ms of data and on the second payload you send an additional 500ms of data, a window of at least 1000ms will allow the model to process all 1000ms of stream data.

    A window of 600ms would append the full 500ms of the second payload to the last 100ms of the first payload.

    Note: This feature is currently only supported for audio data and audio models. For other file types and models this parameter will be ignored.

    Minimum value is 500, maximum value is 10000. Default value is 5000.

  • Whether to reset the streaming sliding window before processing the current payload.

    If this parameter is set to true then past context will be deleted before processing the current payload.

    Use reset_stream when one audio file is done being processed and you do not want context to leak across files.

    Default value is false.

  • raw_text boolean

    Set to true to enable the data parameter to be parsed as raw text rather than base64 encoded bytes.
    This parameter is useful if you want to send text to be processed by the language model, but it cannot be used with other file types like audio, image, or video.

    Default value is false.

  • Set to true to get details about the job.

    This parameter can be set in the same payload as data or it can be set without data and models configuration to get the job details between payloads.

    This parameter is useful to get the unique job ID.

    Default value is false.

  • Pass an arbitrary string as the payload ID and get it back at the top level of the socket response.

    This can be useful if you have multiple requests running asynchronously and want to disambiguate responses as they are received.

Payload example
{
  "data": "string",
  "models": {
    "burst": {},
    "face": {
      "facs": {},
      "descriptions": {},
      "identify_faces": false,
      "fps_pred": 3,
      "prob_threshold": 3,
      "min_face_size": 3
    },
    "facemesh": {},
    "language": {
      "sentiment": {},
      "toxicity": {},
      "granularity": "string"
    },
    "prosody": {}
  },
  "stream_window_ms": 5000,
  "reset_stream": false,
  "raw_text": false,
  "job_details": false,
  "payload_id": "string"
}