---
title: Universal Endpoint (Deprecated)
description: Route requests to any AI provider through a single AI Gateway endpoint with support for fallbacks and retries.
image: https://developers.cloudflare.com/dev-products-preview.png
---

> Documentation Index  
> Fetch the complete documentation index at: https://developers.cloudflare.com/ai-gateway/llms.txt  
> Use this file to discover all available pages before exploring further. 

[Skip to content](#%5Ftop) 

# Universal Endpoint (Deprecated)

Deprecated

The Universal Endpoint is deprecated. Use the [OpenAI-compatible endpoint](https://developers.cloudflare.com/ai-gateway/usage/chat-completion/) for new integrations, and [Dynamic Routing](https://developers.cloudflare.com/ai-gateway/features/dynamic-routing/) for fallbacks, retries, and conditional routing. The Universal Endpoint will continue to work for existing integrations.

The Universal Endpoint allows you to contact every provider through a single endpoint.

```
https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}
```

The payload expects an array of messages. Each message is an object with the following parameters:

* `provider`: the name of the provider you would like to direct this message to. Can be OpenAI, workers-ai, or any of our supported providers.
* `endpoint`: the pathname of the provider API you are trying to reach. For example, on OpenAI it can be `chat/completions`, and for Workers AI this might be [@cf/meta/llama-3.1-8b-instruct](https://developers.cloudflare.com/workers-ai/models/llama-3.1-8b-instruct/). Refer to the sections that are specific to [each provider](https://developers.cloudflare.com/ai-gateway/usage/providers/).
* `authorization`: the content of the Authorization HTTP Header that should be used when contacting this provider. This usually starts with `Token` or `Bearer`.
* `query`: the payload as the provider expects it in their official API.

## cURL example

Request

```
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id} \  --header 'Content-Type: application/json' \  --data '[  {    "provider": "workers-ai",    "endpoint": "@cf/meta/llama-3.1-8b-instruct",    "headers": {      "Authorization": "Bearer {cloudflare_token}",      "Content-Type": "application/json"    },    "query": {      "messages": [        {          "role": "system",          "content": "You are a friendly assistant"        },        {          "role": "user",          "content": "What is Cloudflare?"        }      ]    }  },  {    "provider": "openai",    "endpoint": "chat/completions",    "headers": {      "Authorization": "Bearer {open_ai_token}",      "Content-Type": "application/json"    },    "query": {      "model": "gpt-4o-mini",      "stream": true,      "messages": [        {          "role": "user",          "content": "What is Cloudflare?"        }      ]    }  }]'
```

The above will send a request to Workers AI Inference API. If it fails, it will proceed to OpenAI. You can add as many fallbacks as you need by adding another object in the array.

## Fallbacks

You can specify model or provider fallbacks to handle request failures and ensure reliability. The payload array defines the fallback sequence — if the first provider fails, the request falls to the next entry in the array. For more details, refer to [Fallbacks](https://developers.cloudflare.com/ai-gateway/configuration/fallbacks/).

By default, Cloudflare triggers your fallback if a model request returns an error. You can also configure [request timeouts](#request-timeouts) to trigger fallbacks when a provider takes too long to respond.

### Response header (`cf-aig-step`)

When using fallbacks, the response header `cf-aig-step` indicates which model successfully processed the request by returning the step number:

* `cf-aig-step:0` — The first (primary) model was used successfully.
* `cf-aig-step:1` — The request fell back to the second model.
* `cf-aig-step:2` — The request fell back to the third model.
* Subsequent steps — Each fallback increments the step number by 1.

## Request timeouts

A request timeout triggers a fallback if a provider takes too long to respond.

Configure the timeout by setting a `requestTimeout` property (in milliseconds) within the provider-specific `config` object. Each provider can have a different `requestTimeout` value.

The timeout is based on when the first part of the response comes back. As long as the first part of the response returns within the specified timeframe — such as when streaming a response — your gateway will wait for the response.

Request timeout example

```
curl 'https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}' \  --header 'Content-Type: application/json' \  --data '[    {        "provider": "workers-ai",        "endpoint": "@cf/meta/llama-3.1-8b-instruct",        "headers": {            "Authorization": "Bearer {cloudflare_token}",            "Content-Type": "application/json"        },        "config": {            "requestTimeout": 1000        },        "query": {34 collapsed lines            "messages": [                {                    "role": "system",                    "content": "You are a friendly assistant"                },                {                    "role": "user",                    "content": "What is Cloudflare?"                }            ]        }    },    {        "provider": "workers-ai",        "endpoint": "@cf/meta/llama-3.1-8b-instruct-fast",        "headers": {            "Authorization": "Bearer {cloudflare_token}",            "Content-Type": "application/json"        },        "query": {            "messages": [                {                    "role": "system",                    "content": "You are a friendly assistant"                },                {                    "role": "user",                    "content": "What is Cloudflare?"                }            ]        },        "config": {            "requestTimeout": 3000        },    }]'
```

## Request retries

The Universal Endpoint supports automatic retries for failed requests, with a maximum of five retry attempts. Retries are attempted before triggering any configured fallbacks.

Configure the retry settings with the following properties in the provider-specific `config`:

TypeScript

```
config:{  maxAttempts?: number;  retryDelay?: number;  backoff?: "constant" | "linear" | "exponential";}
```

* `maxAttempts`: Maximum number of retry attempts (up to 5).
* `retryDelay`: Delay before retrying, in milliseconds (maximum of 5 seconds).
* `backoff`: Backoff method — `constant`, `linear`, or `exponential`.

On the final retry attempt, your gateway will wait until the request completes, regardless of how long it takes. Each provider can have different retry settings.

Request retry example

```
curl 'https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}' \  --header 'Content-Type: application/json' \  --data '[    {        "provider": "workers-ai",        "endpoint": "@cf/meta/llama-3.1-8b-instruct",        "headers": {            "Authorization": "Bearer {cloudflare_token}",            "Content-Type": "application/json"        },        "config": {            "maxAttempts": 2,            "retryDelay": 1000,            "backoff": "constant"        },39 collapsed lines        "query": {            "messages": [                {                    "role": "system",                    "content": "You are a friendly assistant"                },                {                    "role": "user",                    "content": "What is Cloudflare?"                }            ]        }    },    {        "provider": "workers-ai",        "endpoint": "@cf/meta/llama-3.1-8b-instruct-fast",        "headers": {            "Authorization": "Bearer {cloudflare_token}",            "Content-Type": "application/json"        },        "query": {            "messages": [                {                    "role": "system",                    "content": "You are a friendly assistant"                },                {                    "role": "user",                    "content": "What is Cloudflare?"                }            ]        },        "config": {            "maxAttempts": 4,            "retryDelay": 1000,            "backoff": "exponential"        },    }]'
```

## WebSockets API beta

The Universal Endpoint can also be accessed via a [WebSockets API](https://developers.cloudflare.com/ai-gateway/usage/websockets-api/) which provides a single persistent connection, enabling continuous communication. This API supports all AI providers connected to AI Gateway, including those that do not natively support WebSockets.

### WebSockets example

JavaScript

```
import WebSocket from "ws";const ws = new WebSocket(  "wss://gateway.ai.cloudflare.com/v1/my-account-id/my-gateway/",  {    headers: {      "cf-aig-authorization": "Bearer AI_GATEWAY_TOKEN",    },  },);
ws.send(  JSON.stringify({    type: "universal.create",    request: {      eventId: "my-request",      provider: "workers-ai",      endpoint: "@cf/meta/llama-3.1-8b-instruct",      headers: {        Authorization: "Bearer WORKERS_AI_TOKEN",        "Content-Type": "application/json",      },      query: {        prompt: "tell me a joke",      },    },  }),);
ws.on("message", function incoming(message) {  console.log(message.toString());});
```

## Workers Binding example

* [  wrangler.jsonc ](#tab-panel-6888)
* [  wrangler.toml ](#tab-panel-6889)

JSONC

```
{  "ai": {    "binding": "AI",  },}
```

TOML

```
[ai]binding = "AI"
```

src/index.ts

```
type Env = {  AI: Ai;};
export default {  async fetch(request: Request, env: Env) {    return env.AI.gateway("my-gateway").run({      provider: "workers-ai",      endpoint: "@cf/meta/llama-3.1-8b-instruct",      headers: {        authorization: "Bearer my-api-token",      },      query: {        prompt: "tell me a joke",      },    });  },};
```

## Header configuration hierarchy

The Universal Endpoint allows you to set fallback models or providers and customize headers for each provider or request. You can configure headers at three levels:

1. **Provider level**: Headers specific to a particular provider.
2. **Request level**: Headers included in individual requests.
3. **Gateway settings**: Default headers configured in your gateway dashboard.

Since the same settings can be configured in multiple locations, AI Gateway applies a hierarchy to determine which configuration takes precedence:

* **Provider-level headers** override all other configurations.
* **Request-level headers** are used if no provider-level headers are set.
* **Gateway-level settings** are used only if no headers are configured at the provider or request levels.

This hierarchy ensures consistent behavior, prioritizing the most specific configurations. Use provider-level and request-level headers for fine-tuned control, and gateway settings for general defaults.

### Hierarchy example

This example demonstrates how headers set at different levels impact caching behavior:

* **Request-level header**: The `cf-aig-cache-ttl` is set to `3600` seconds, applying this caching duration to the request by default.
* **Provider-level header**: For the fallback provider (OpenAI), `cf-aig-cache-ttl` is explicitly set to `0` seconds, overriding the request-level header and disabling caching for responses when OpenAI is used as the provider.

This shows how provider-level headers take precedence over request-level headers, allowing for granular control of caching behavior.

Terminal window

```
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id} \  --header 'Content-Type: application/json' \  --header 'cf-aig-cache-ttl: 3600' \  --data '[    {      "provider": "workers-ai",      "endpoint": "@cf/meta/llama-3.1-8b-instruct",      "headers": {        "Authorization": "Bearer {cloudflare_token}",        "Content-Type": "application/json"      },      "query": {        "messages": [          {            "role": "system",            "content": "You are a friendly assistant"          },          {            "role": "user",            "content": "What is Cloudflare?"          }        ]      }    },    {      "provider": "openai",      "endpoint": "chat/completions",      "headers": {        "Authorization": "Bearer {open_ai_token}",        "Content-Type": "application/json",        "cf-aig-cache-ttl": "0"      },      "query": {        "model": "gpt-4o-mini",        "stream": true,        "messages": [          {            "role": "user",            "content": "What is Cloudflare?"          }        ]      }    }  ]'
```

```json
{"@context":"https://schema.org","@type":"TechArticle","@id":"https://developers.cloudflare.com/ai-gateway/usage/universal/#page","headline":"Universal Endpoint (Deprecated) · Cloudflare AI Gateway docs","description":"Route requests to any AI provider through a single AI Gateway endpoint with support for fallbacks and retries.","url":"https://developers.cloudflare.com/ai-gateway/usage/universal/","inLanguage":"en","image":"https://developers.cloudflare.com/dev-products-preview.png","dateModified":"2026-05-08","publisher":{"@type":"Organization","name":"Cloudflare","url":"https://www.cloudflare.com/"},"isPartOf":{"@type":"WebSite","@id":"https://developers.cloudflare.com/#website","name":"Cloudflare Docs","url":"https://developers.cloudflare.com/"}}
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/ai-gateway/","name":"AI Gateway"}},{"@type":"ListItem","position":3,"item":{"@id":"/ai-gateway/usage/","name":"Using AI Gateway"}},{"@type":"ListItem","position":4,"item":{"@id":"/ai-gateway/usage/universal/","name":"Universal Endpoint (Deprecated)"}}]}
```