此页面由 Cloud Translation API 翻译。

使用 Claude 模型请求预测
使用集合让一切井井有条根据您的偏好保存内容并对其进行分类。

您可以使用 Anthropic 的 SDK 或 curl 命令通过以下模型名称向 Vertex AI 端点发送请求：

对于 Claude Opus 4，请使用 claude-opus-4@20250514
对于 Claude Sonnet 4，请使用 claude-sonnet-4@20250514
对于 Claude��3.7 Sonnet，请使用 claude-3-7-sonnet@20250219
对于 Claude 3.5 Sonnet v2，请使用 claude-3-5-sonnet-v2@20241022
对于 Claude 3.5 Haiku，请使用 claude-3-5-haiku@20241022
对于 Claude 3.5 Sonnet，请使用 claude-3-5-sonnet@20240620
对于 Claude 3 Opus，请使用 claude-3-opus@20240229
对于 Claude 3 Haiku，请使用 claude-3-haiku@20240307

必须将 Anthropic Claude 模型版本与以 @ 符号开头的后缀（例如 claude-3-7-sonnet@20250219 或 claude-3-5-haiku@20241022）搭配使用，才能保证行为一致。

准备工作

如需将 Anthropic Claude 模型与 Vertex AI 搭配使用，您必须执行以下步骤。必须启用 Vertex AI API (aiplatform.googleapis.com) 才能使用 Vertex AI。如果您已有启用了 Vertex AI API 的项目，则可以使用该项目，而无需创建新项目。

确保您拥有启用和使用合作伙伴模型所需的权限。如需了解详情，请参阅授予所需权限。

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Make sure that billing is enabled for your Google Cloud project.

Enable the Vertex AI API.

Enable the API

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Make sure that billing is enabled for your Google Cloud project.

Enable the Vertex AI API.

Enable the API

前往以下 Model Garden 模型卡片之一，然后点击启用：
Anthropic 建议您启用 30 天的问题和完成活动日志记录，以记录任何模型滥用行为。如需启用日志记录，请参阅 [记录请求和响应][logging]。

使用 Anthropic SDK

您可以使用 Anthropic Claude SDK 向 Anthropic Claude 模型发出 API 请求。如需了解详情，请参阅以下内容：

使用 Anthropic Vertex SDK 对 Claude 模型进行流式调用

以下代码示例使用 Anthropic Vertex SDK 对 Claude 模型执行流式调用。

Python 版 Vertex AI SDK

如需了解如何安装或更新 Vertex AI SDK for Python，请参阅安装 Vertex AI SDK for Python。如需了解详情，请参阅 Python 版 Vertex AI SDK API 参考文档。

以下示例使用地区端点。如需使用全球端点，请参阅指定全球端点。

# TODO(developer): Vertex AI SDK - uncomment below & run
# pip3 install --upgrade --user google-cloud-aiplatform
# gcloud auth application-default login
# pip3 install -U 'anthropic[vertex]'

# TODO(developer): Update and un-comment below line
# PROJECT_ID = "your-project-id"

from anthropic import AnthropicVertex

client = AnthropicVertex(project_id=PROJECT_ID, region="us-east5")
result = []

with client.messages.stream(
    model="claude-3-5-sonnet-v2@20241022",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "Send me a recipe for banana bread.",
        }
    ],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
        result.append(text)

# Example response:
# Here's a simple recipe for delicious banana bread:
# Ingredients:
# - 2-3 ripe bananas, mashed
# - 1/3 cup melted butter
# ...
# ...
# 8. Bake for 50-60 minutes, or until a toothpick inserted into the center comes out clean.
# 9. Let cool in the pan for a few minutes, then remove and cool completely on a wire rack.

使用 Anthropic Vertex SDK 对 Claude 模型进行一元调用

以下代码示例使用 Anthropic Vertex SDK 对 Claude 模型执行一元调用。

Python 版 Vertex AI SDK

如需了解如何安装或更新 Vertex AI SDK for Python，请参阅安装 Vertex AI SDK for Python。如需了解详情，请参阅 Python 版 Vertex AI SDK API 参考文档。

以下示例使用地区端点。如需使用全球端点，请参阅指定全球端点。

# TODO(developer): Vertex AI SDK - uncomment below & run
# pip3 install --upgrade --user google-cloud-aiplatform
# gcloud auth application-default login
# pip3 install -U 'anthropic[vertex]'

# TODO(developer): Update and un-comment below line
# PROJECT_ID = "your-project-id"

from anthropic import AnthropicVertex

client = AnthropicVertex(project_id=PROJECT_ID, region="us-east5")
message = client.messages.create(
    model="claude-3-5-sonnet-v2@20241022",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "Send me a recipe for banana bread.",
        }
    ],
)
print(message.model_dump_json(indent=2))
# Example response:
# {
#   "id": "msg_vrtx_0162rhgehxa9rvJM5BSVLZ9j",
#   "content": [
#     {
#       "text": "Here's a simple recipe for delicious banana bread:\n\nIngredients:\n- 2-3 ripe bananas...
#   ...

使用 curl 命令

您可以使用 curl 命令向 Vertex AI 端点发出请求。curl 命令指定要使用的受支持的 Claude 模型。

必须将 Anthropic Claude 模型版本与以 @ 符号开头的后缀（例如 claude-3-7-sonnet@20250219 或 claude-3-5-haiku@20241022）搭配使用，才能保证行为一致。

以下主题介绍如何创建 curl 命令并包含示例 curl 命令。

REST

如需使用 Vertex AI API 测试文本提示，请向发布方模型端点发送 POST 请求。

以下示例使用地区端点。如需使用全球端点，请参阅指定全球端点。

在使用任何请求数据之前，请先进行以下替换：

LOCATION：支持 Anthropic Claude 模型的区域。如需使用全球端点，请参阅指定全球端点。
MODEL：您要使用的模型名称。
ROLE：与消息关联的角色。您可以指定 user 或 assistant。第一条消息必须使用 user 角色。 Claude 模型使用交替的 user 和 assistant 回合运行。如果最终消息使用 assistant 角色，则回答内容会立即从该消息中的内容继续。您可以使用它来限制模型的部分回答。
STREAM：一个布尔值，用于指定是否流式传输回答。流式传输您的回答，以降低对最终使用延迟的感知度。设置为 true 可流式传输回答，设置为 false 可一次性返回所有回答。
CONTENT：user 或 assistant 消息的内容（如文本）。
MAX_TOKENS：回答中可生成的词元数量上限。一个词元约为　3.5 个字符。100 个词元对应大约 60-80 个单词。
指定较低的值可获得较短的回答，指定较高的值可获得可能较长的回答。
TOP_P（可选）：Top-p 可更改模型选择输出词元的方式。系统会按照概率从最高（见 top-K）到最低的顺序选择词元，直到所选词元的概率总和等于 top-P 的值。例如，如果词元 A、B 和 C 的概率分别为 0.3、0.2 和 0.1，并且 top-P 值为 0.5，则模型将选择 A 或 B 作为下一个词元（通过温度确定），并会排除 C，将其作为候选词元。
指定较低的值可获得随机程度较低的回答，指定较高的值可获得随机程度较高的回答。
TOP_K（可选）：Top-K 可更改模型选择输出词元的方式。如果 top-K 设为 1，表示所选词元是模型词汇表的所有词元中概率最高的词元（也称为贪心解码）。如果 top-K 设为 3，则表示系统将从 3 个概率最高的词元（通过温度确定）中选择下一个词元。
在每个词元选择步骤中，系统都会对概率最高的 top-K 词元进行采样。然后，系统会根据 top-P 进一步过滤词元，并使用温度采样选择最终的词元。

指定较低的值可获得随机程度较低的回答，指定较高的值可获得随机程度较高的回答。
TYPE：（仅适用于 Claude 3.7 Sonnet）如需启用扩展思考模式，请指定 enable。
BUDGET_TOKENS：如果您启用了扩展思考，则必须指定模型可用于其内部推理作为输出内容的一部分的 token 数。更大的预算可以让您对复杂问题进行更全面的分析，并提高回答质量。您必须指定一个大于或等于 1024 但小于 MAX_TOKENS 的值。

HTTP 方法和网址：

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:streamRawPredict

请求 JSON 正文：

{
  "anthropic_version": "vertex-2023-10-16",
  "messages": [
   {
    "role": "ROLE",
    "content": "CONTENT"
   }],
  "max_tokens": MAX_TOKENS,
  "stream": STREAM,
  "thinking": {
    "type": "TYPE",
    "budget_tokens": BUDGET_TOKENS
  }
}

如需发送请求，请选择以下方式之一：

curl

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI，或者使用了 Cloud Shell，这会使您自动登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:streamRawPredict"

PowerShell

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:streamRawPredict" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应。

响应

{
  "id":"msg_012NDLxqh6LsztWCU7zTb14C",
  "type":"message",
  "role":"assistant",
  "content":[{
    "type":"text",
    "text":"Hello! Nice to meet you."
  }],
  "model":"claude-2.1",
  "stop_reason":"end_turn",
  "stop_sequence":null,
  "usage":{
    "input_tokens":11,
    "output_tokens":11
  }
}

示例 curl 命令

MODEL_ID="MODEL"
LOCATION="us-central1"
PROJECT_ID="PROJECT_ID"

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/anthropic/models/${MODEL_ID}:streamRawPredict -d \
'{
  "anthropic_version": "vertex-2023-10-16",
  "messages": [{
    "role": "user",
    "content": "Hello!"
  }],
  "max_tokens": 50,
  "stream": true}'

工具使用（函数调用）

Anthropic Claude 模型支持工具和函数调用，以增强模型的功能。如需了解详情，请参阅 Anthropic 文档中的工具使用概览。

以下示例演示了如何使用 SDK 或 curl 命令使用工具。这些示例会搜索旧金山附近正在营业的餐厅。

Python 版 Vertex AI SDK

如需了解如何安装或更新 Vertex AI SDK for Python，请参阅安装 Vertex AI SDK for Python。如需了解详情，请参阅 Python 版 Vertex AI SDK API 参考文档。

以下示例使用地区端点。如需使用全球端点，请参阅指定全球端点。

# TODO(developer): Vertex AI SDK - uncomment below & run
# pip3 install --upgrade --user google-cloud-aiplatform
# gcloud auth application-default login
# pip3 install -U 'anthropic[vertex]'
from anthropic import AnthropicVertex

# TODO(developer): Update and un-comment below line
# PROJECT_ID = "your-project-id"

client = AnthropicVertex(project_id=PROJECT_ID, region="us-east5")
message = client.messages.create(
    model="claude-3-5-sonnet-v2@20241022",
    max_tokens=1024,
    tools=[
        {
            "name": "text_search_places_api",
            "description": "returns information about a set of places based on a string",
            "input_schema": {
                "type": "object",
                "properties": {
                    "textQuery": {
                        "type": "string",
                        "description": "The text string on which to search",
                    },
                    "priceLevels": {
                        "type": "array",
                        "description": "Price levels to query places, value can be one of [PRICE_LEVEL_INEXPENSIVE, PRICE_LEVEL_MODERATE, PRICE_LEVEL_EXPENSIVE, PRICE_LEVEL_VERY_EXPENSIVE]",
                    },
                    "openNow": {
                        "type": "boolean",
                        "description": "whether those places are open for business.",
                    },
                },
                "required": ["textQuery"],
            },
        }
    ],
    messages=[
        {
            "role": "user",
            "content": "What are some affordable and good Italian restaurants open now in San Francisco??",
        }
    ],
)
print(message.model_dump_json(indent=2))
# Example response:
# {
#   "id": "msg_vrtx_018pk1ykbbxAYhyWUdP1bJoQ",
#   "content": [
#     {
#       "text": "To answer your question about affordable and good Italian restaurants
#       that are currently open in San Francisco....
# ...

REST

以下示例使用地区端点。如需使用全球端点，请参阅指定全球端点。

在使用任何请求数据之前，请先进行以下替换：

LOCATION：支持 Anthropic Claude 模型的区域。如需使用全球端点，请参阅指定全球端点。
MODEL：要使用的模型名称。
ROLE：与消息关联的角色。您可以指定 user 或 assistant。第一条消息必须使用 user 角色。 Claude 模型使用交替的 user 和 assistant 回合运行。如果最终消息使用 assistant 角色，则回答内容会立即从该消息中的内容继续。您可以使用它来限制模型的部分回答。
STREAM：一个布尔值，用于指定是否流式传输回答。流式传输您的回答，以降低对最终使用延迟的感知度。设置为 true 可流式传输回答，设置为 false 可一次性返回所有回答。
CONTENT：user 或 assistant 消息的内容（如文本）。
MAX_TOKENS：响应中可生成的词元数量上限。一个词元约为　3.5 个字符。100 个词元对应大约 60-80 个单词。
指定较低的值可获得较短的回答，指定较高的值可获得可能较长的回答。

HTTP 方法和网址：

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:rawPredict

请求 JSON 正文：

{
  "anthropic_version": "vertex-2023-10-16",
  "max_tokens": MAX_TOKENS,
  "stream": STREAM,
  "tools": [
    {
      "name": "text_search_places_api",
      "description": "Returns information about a set of places based on a string",
      "input_schema": {
        "type": "object",
        "properties": {
          "textQuery": {
            "type": "string",
            "description": "The text string on which to search"
          },
          "priceLevels": {
            "type": "array",
            "description": "Price levels to query places, value can be one of [PRICE_LEVEL_INEXPENSIVE, PRICE_LEVEL_MODERATE, PRICE_LEVEL_EXPENSIVE, PRICE_LEVEL_VERY_EXPENSIVE]",
          },
          "openNow": {
            "type": "boolean",
            "description": "Describes whether a place is open for business at
            the time of the query."
          },
        },
        "required": ["textQuery"]
      }
    }
  ],
  "messages": [
    {
      "role": "user",
      "content": "What are some affordable and good Italian restaurants that are open now in San Francisco??"
    }
  ]
}

如需发送请求，请选择以下方式之一：

curl

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:rawPredict"

PowerShell

注意：以下命令假定您已使用您的用户账号通过运行 gcloud init 或 gcloud auth login 登录 gcloud CLI。您可以运行 gcloud auth list 来检查当前活跃的账号。

将请求正文保存在名为 request.json 的文件中，然后执行以下命令：

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/anthropic/models/MODEL:rawPredict" | Select-Object -Expand Content

您应该收到类似以下内容的 JSON 响应。

响应

{
  "id": "msg_vrtx_01ErR7VMNQdnvDt3n7Nmc4ER",
  "type": "message",
  "role": "assistant",
  "model": "claude-3-opus-20240229",
  "content": [
    {
      "type": "text",
      "text": "\nTo find affordable and good Italian restaurants that are currently open in San Francisco, the text_search_places_api tool seems most relevant. \n\nThe required textQuery parameter can be inferred as \"Italian restaurants in San Francisco\", since the user specified Italian restaurants and the location of San Francisco.\n\nTwo optional parameters are also relevant:\nopenNow - this should be set to true, since the user specified they want restaurants open now\npriceLevels - to find affordable restaurants, this can be set to [PRICE_LEVEL_INEXPENSIVE, PRICE_LEVEL_MODERATE]\n\nWith the textQuery provided and the two optional parameters that can help narrow the results to match the user's criteria, we have enough information to make a good call to the text_search_places_api tool to try to answer the user's request.\n"
    },
    {
      "type": "tool_use",
      "id": "toolu_vrtx_01TAJCTkxe8HhRoaQ69N4ouP",
      "name": "text_search_places_api",
      "input": {
        "textQuery": "Italian restaurants in San Francisco",
        "openNow": true,
        "priceLevels": [
          "PRICE_LEVEL_INEXPENSIVE",
          "PRICE_LEVEL_MODERATE"
        ]
      }
    }
  ],
  "stop_reason": "tool_use",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 727,
    "output_tokens": 308
  }
}

使用 Vertex AI Studio

对于某些 Anthropic Claude 模型，您可以使用 Vertex AI Studio 在 Google Cloud 控制台中快速对生成式 AI 模型进行原型开发和测试。例如，您可以使用 Vertex AI Studio 将 Claude 模型的回答与其他受支持的模型（例如 Google Gemini）的回答进行比较。

如需了解详情，请参阅快速入门：使用 Vertex AI Studio 向 Gemini 发送文本问题。

Anthropic Claude 配额和区域可用性

Claude 模型具有区域配额，对于支持全球端点的模型，还有全球配额。配额以每分钟查询次数 (QPM) 和每分钟令牌数 (TPM) 来指定。TPM 包括输入和输出词元。

为了保持整体服务性能和可接受的使用情况，配额上限可能会因账号而异，并且在某些情况下，访问可能会受到限制。在 Google Cloud 控制台的配额和系统限制页面上查看项目的配额。您还必须拥有以下配额：

Online prediction requests per base model per minute per region per base_model 用于定义 QPM 配额。
对于 TPM，有三个配额值适用于特定型号：
- 对于将输入和输出词元计为 1 个词元的模型，Online prediction tokens per minute per base model per minute per region per base_model 会定义模型 TPM 配额。
- 对于分别统计输入和输出词元的模型，online_prediction_input_tokens_per_minute_per_base_model 定义输入 TPM 配额，online_prediction_output_tokens_per_minute_per_base_model 定义输出 TPM 配额。
如需查看哪些模型会单独统计输入和输出令牌，请参阅按模型和区域的配额。

输入令牌

以下列表定义了可计入输入 TPM 配额的输入令牌。每个模型统计的输入令牌可能有所不同。如需查看模型会统计哪些输入令牌，请参阅按模型和区域的配额。

输入令牌包括所有输入令牌，包括缓存读取令牌和缓存写入令牌。
未缓存的输入令牌仅包含未从缓存中读取的输入令牌（缓存读取令牌）。
缓存写入令牌包含用于创建或更新缓存的令牌。

按模型和区域的配额

下表显示了每个区域中每个模型的默认配额和支持的上下文长度。

型号	区域	配额	上下文长度
Claude Opus 4
Claude Opus 4	`us-east5`	QPM：25 输入 TPM：6 万次非缓存写入和缓存写入输出 TPM：6,000	200,000
Claude Sonnet 4
	`us-east5`	QPM：35 输入 TPM：28 万次未缓存和缓存写入输出 TPM：20,000	200,000
	`europe-west1`	QPM：25 输入 TPM：18 万次未缓存和缓存写入输出 TPM：20,000	200,000
	`global`	QPM：15 输入 TPM：13 万次未缓存和缓存写入输出 TPM：10,000	200,000
Claude 3.7 Sonnet
	`us-east5`	QPM：55 TPM��50 万次（未缓存的输入和输出）	200,000
	`europe-west1`	QPM：40 TPM：30 万次（未缓存的输入和输出）	200,000
	`global`	QPM：15 TPM：13 万次（未缓存的输入和输出）	200,000
Claude 3.5 Sonnet v2
	`us-east5`	QPM：90 TPM：54 万次（输入和输出）	200,000
	`europe-west1`	QPM：55 TPM：33 万次（输入和输出）	200,000
	`global`	QPM：25 TPM：14 万次（输入和输出）	200,000
Claude 3.5 Haiku
Claude 3.5 Haiku	`us-east5`	QPM：80 TPM：35 万次（输入和输出）	200,000
Claude 3.5 Sonnet
	`us-east5`	QPM：80 TPM：35 万次（输入和输出）	200,000
	`europe-west1`	QPM：130 TPM：600,000 个（输入和输出）	200,000
	`asia-southeast1`	QPM：35 TPM：15 万个（输入和输出）	200,000
Claude 3 Opus
Claude 3 Opus	`us-east5`	QPM：20 TPM：105,000 个（输入和输出）	200,000
Claude 3 Haiku
	`us-east5`	QPM：245 TPM：600,000 个（输入和输出）	200,000
	`europe-west1`	QPM：75 TPM：181,000 次（输入和输出）	200,000
	`asia-southeast1`	QPM：70 TPM：174,000 次（输入和输出）	200,000

若要增加 Vertex AI 上的生成式 AI 的任何配额，您可以使用 Google Cloud 控制台申请增加配额。如需详细了解配额，请参阅使用配额。

使用 Claude 模型请求预测 使用集合让一切井井有条 根据您的偏好保存内容并对其进行分类。

准备工作

使用 Anthropic SDK

使用 Anthropic Vertex SDK 对 Claude 模型进行流式调用

Python 版 Vertex AI SDK

使用 Anthropic Vertex SDK 对 Claude 模型进行一元调用

Python 版 Vertex AI SDK

使用 curl 命令

REST

curl

PowerShell

响应

示例 curl 命令

工具使用（函数调用）

Python 版 Vertex AI SDK

REST

curl

PowerShell

响应

使用 Vertex AI Studio

Anthropic Claude 配额和区域可用性

输入令牌

按模型和区域的配额

使用 Claude 模型请求预测
使用集合让一切井井有条根据您的偏好保存内容并对其进行分类。