---
title: llama-4-scout-17b-16e-instruct
description: Meta's Llama 4 Scout is a 17 billion parameter model with 16 experts that is natively multimodal. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.
image: https://developers.cloudflare.com/dev-products-preview.png
---

> Documentation Index  
> Fetch the complete documentation index at: https://developers.cloudflare.com/workers-ai/llms.txt  
> Use this file to discover all available pages before exploring further. 

[Skip to content](#%5Ftop) 

![Meta logo](https://developers.cloudflare.com/_astro/meta.BR4nfp35.svg) 

#  llama-4-scout-17b-16e-instruct 

Text Generation • Meta 

`@cf/meta/llama-4-scout-17b-16e-instruct` 

Meta's Llama 4 Scout is a 17 billion parameter model with 16 experts that is natively multimodal. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

| Model Info                                                                           |                                                                                      |
| ------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------ |
| Context Window[ ↗](https://developers.cloudflare.com/workers-ai/glossary/)           | 131,000 tokens                                                                       |
| Terms and License                                                                    | [link ↗](https://github.com/meta-llama/llama-models/blob/main/models/llama4/LICENSE) |
| Function calling [ ↗](https://developers.cloudflare.com/workers-ai/function-calling) | Yes                                                                                  |
| Vision                                                                               | Yes                                                                                  |
| Batch                                                                                | Yes                                                                                  |
| Unit Pricing                                                                         | $0.27 per M input tokens, $0.85 per M output tokens                                  |

## Playground

Try out this model with Workers AI LLM Playground. It does not require any setup or authentication and an instant way to preview and test a model directly in the browser. 

[ Launch the LLM Playground ](https://playground.ai.cloudflare.com/?model=@cf/meta/llama-4-scout-17b-16e-instruct) 

## Usage

* [  Worker (Streaming) ](#tab-panel-5184)
* [  TypeScript ](#tab-panel-5185)
* [  Python ](#tab-panel-5186)
* [  curl ](#tab-panel-5187)

TypeScript

```
export interface Env {  AI: Ai;}
export default {  async fetch(request, env): Promise<Response> {
    const messages = [      { role: "system", content: "You are a friendly assistant" },      {        role: "user",        content: "What is the origin of the phrase Hello, World",      },    ];
    const stream = await env.AI.run("@cf/meta/llama-4-scout-17b-16e-instruct", {      messages,      stream: true,    });
    return new Response(stream, {      headers: { "content-type": "text/event-stream" },    });  },} satisfies ExportedHandler<Env>;
```

```
export interface Env {  AI: Ai;}
export default {  async fetch(request, env): Promise<Response> {
    const messages = [      { role: "system", content: "You are a friendly assistant" },      {        role: "user",        content: "What is the origin of the phrase Hello, World",      },    ];    const response = await env.AI.run("@cf/meta/llama-4-scout-17b-16e-instruct", { messages });
    return Response.json(response);  },} satisfies ExportedHandler<Env>;
```

```
import osimport requests
ACCOUNT_ID = "your-account-id"AUTH_TOKEN = os.environ.get("CLOUDFLARE_AUTH_TOKEN")
prompt = "Tell me all about PEP-8"response = requests.post(  f"https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/meta/llama-4-scout-17b-16e-instruct",    headers={"Authorization": f"Bearer {AUTH_TOKEN}"},    json={      "messages": [        {"role": "system", "content": "You are a friendly assistant"},        {"role": "user", "content": prompt}      ]    })result = response.json()print(result)
```

Terminal window

```
curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/meta/llama-4-scout-17b-16e-instruct \  -X POST \  -H "Authorization: Bearer $CLOUDFLARE_AUTH_TOKEN" \  -d '{ "messages": [{ "role": "system", "content": "You are a friendly assistant" }, { "role": "user", "content": "Why is pizza so good" }]}'
```

OpenAI compatible endpoints 

Workers AI also supports OpenAI compatible API endpoints for `/v1/chat/completions` and `/v1/embeddings`. For more details, refer to [Configurations ](https://developers.cloudflare.com/workers-ai/configuration/open-ai-compatibility/). 

## Parameters

Synchronous — Send a request and receive a complete response 

* [ Input ](#tab-panel-5188)
* [ Output ](#tab-panel-5189)

prompt

`string`requiredminLength: 1The input text prompt for the model to generate a response.

guided\_json{}

`object`JSON schema that should be fulfilled for the response.

▶response\_format{}

`object`

raw

`boolean`default: falseIf true, a chat template is not applied and you must adhere to the specific model's expected formatting.

stream

`boolean`default: falseIf true, the response will be streamed back incrementally using SSE, Server Sent Events.

max\_tokens

`integer`default: 256The maximum number of tokens to generate in the response.

temperature

`number`default: 0.15minimum: 0maximum: 5Controls the randomness of the output; higher values produce more random results.

top\_p

`number`minimum: 0maximum: 2Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.

top\_k

`integer`minimum: 1maximum: 50Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.

seed

`integer`minimum: 1maximum: 9999999999Random seed for reproducibility of the generation.

repetition\_penalty

`number`minimum: 0maximum: 2Penalty for repeated tokens; higher values discourage repetition.

frequency\_penalty

`number`minimum: 0maximum: 2Decreases the likelihood of the model repeating the same lines verbatim.

presence\_penalty

`number`minimum: 0maximum: 2Increases the likelihood of the model introducing new topics.

response

`string`The generated text response from the model

▶usage{}

`object`Usage statistics for the inference request

▶tool\_calls\[\]

`array`An array of tool calls requests made during the response generation

Streaming — Send a request with \`stream: true\` and receive server-sent events 

* [ Input ](#tab-panel-5190)
* [ Output ](#tab-panel-5191)

prompt

`string`requiredminLength: 1The input text prompt for the model to generate a response.

guided\_json{}

`object`JSON schema that should be fulfilled for the response.

▶response\_format{}

`object`

raw

`boolean`default: falseIf true, a chat template is not applied and you must adhere to the specific model's expected formatting.

stream

`boolean`default: falseIf true, the response will be streamed back incrementally using SSE, Server Sent Events.

max\_tokens

`integer`default: 256The maximum number of tokens to generate in the response.

temperature

`number`default: 0.15minimum: 0maximum: 5Controls the randomness of the output; higher values produce more random results.

top\_p

`number`minimum: 0maximum: 2Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.

top\_k

`integer`minimum: 1maximum: 50Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.

seed

`integer`minimum: 1maximum: 9999999999Random seed for reproducibility of the generation.

repetition\_penalty

`number`minimum: 0maximum: 2Penalty for repeated tokens; higher values discourage repetition.

frequency\_penalty

`number`minimum: 0maximum: 2Decreases the likelihood of the model repeating the same lines verbatim.

presence\_penalty

`number`minimum: 0maximum: 2Increases the likelihood of the model introducing new topics.

type

`string`

contentType

`text/event-stream`

format

`binary`

Batch — Send multiple requests in a single API call 

* [ Input ](#tab-panel-5192)
* [ Output ](#tab-panel-5193)

▶requests\[\]

`array`required

response

`string`The generated text response from the model

▶usage{}

`object`Usage statistics for the inference request

▶tool\_calls\[\]

`array`An array of tool calls requests made during the response generation

## API Schemas (Raw)

 Synchronous Input [ ](https://developers.cloudflare.com/workers-ai/models/llama-4-scout-17b-16e-instruct/sync-input.json "Open") [ ](https://developers.cloudflare.com/workers-ai/models/llama-4-scout-17b-16e-instruct/sync-input.json "Download") 

 Synchronous Output [ ](https://developers.cloudflare.com/workers-ai/models/llama-4-scout-17b-16e-instruct/sync-output.json "Open") [ ](https://developers.cloudflare.com/workers-ai/models/llama-4-scout-17b-16e-instruct/sync-output.json "Download") 

 Streaming Input [ ](https://developers.cloudflare.com/workers-ai/models/llama-4-scout-17b-16e-instruct/streaming-input.json "Open") [ ](https://developers.cloudflare.com/workers-ai/models/llama-4-scout-17b-16e-instruct/streaming-input.json "Download") 

 Streaming Output [ ](https://developers.cloudflare.com/workers-ai/models/llama-4-scout-17b-16e-instruct/streaming-output.json "Open") [ ](https://developers.cloudflare.com/workers-ai/models/llama-4-scout-17b-16e-instruct/streaming-output.json "Download") 

 Batch Input [ ](https://developers.cloudflare.com/workers-ai/models/llama-4-scout-17b-16e-instruct/batch-input.json "Open") [ ](https://developers.cloudflare.com/workers-ai/models/llama-4-scout-17b-16e-instruct/batch-input.json "Download") 

 Batch Output [ ](https://developers.cloudflare.com/workers-ai/models/llama-4-scout-17b-16e-instruct/batch-output.json "Open") [ ](https://developers.cloudflare.com/workers-ai/models/llama-4-scout-17b-16e-instruct/batch-output.json "Download")

```json
{"@context":"https://schema.org","@type":"TechArticle","@id":"https://developers.cloudflare.com/workers-ai/models/llama-4-scout-17b-16e-instruct/#page","headline":"llama-4-scout-17b-16e-instruct (Meta) · Cloudflare AI docs · Cloudflare Workers AI docs","description":"Meta's Llama 4 Scout is a 17 billion parameter model with 16 experts that is natively multimodal. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.","url":"https://developers.cloudflare.com/workers-ai/models/llama-4-scout-17b-16e-instruct/","inLanguage":"en","image":"https://developers.cloudflare.com/dev-products-preview.png","publisher":{"@type":"Organization","name":"Cloudflare","url":"https://www.cloudflare.com/"},"isPartOf":{"@type":"WebSite","@id":"https://developers.cloudflare.com/#website","name":"Cloudflare Docs","url":"https://developers.cloudflare.com/"}}
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/models/","name":"Models"}}]}
```