Vercel AI 部署最佳實踐｜2025 AI 應用效能與成本優化指南

在 Vercel 部署 AI 應用很簡單。

但要部署得「好」，需要考慮很多事情。

效能、成本、安全性、可靠性...

這篇文章整理了 AI 部署的最佳實踐，讓你的 AI 應用又快又省錢。

效能優化

選擇正確的執行環境

Serverless Functions vs Edge Functions：

項目	Serverless	Edge
冷啟動	200-500ms	幾乎為零
執行時間限制	10s-900s	30s
Node.js API	完整支援	部分支援
適合場景	複雜運算	簡單 AI 呼叫

Edge Functions 適合：

簡單的 AI 請求轉發
需要低延遲的場景
全球分散的使用者

// app/api/chat/route.ts
export const runtime = 'edge'; // 使用 Edge Runtime

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = await streamText({
    model: openai('gpt-3.5-turbo'),
    messages,
  });

  return result.toDataStreamResponse();
}

Serverless 適合：

需要 Node.js 完整 API
複雜的資料處理
較長的執行時間

使用串流回應

永遠使用串流，不要等完整回應：

// ✅ 好：使用串流
import { streamText } from 'ai';

const result = await streamText({
  model: openai('gpt-4-turbo'),
  messages,
});

return result.toDataStreamResponse();

// ❌ 不好：等待完整回應
import { generateText } from 'ai';

const result = await generateText({
  model: openai('gpt-4-turbo'),
  messages,
});

// 使用者要等很久才能看到回應
return Response.json({ content: result.text });

串流的好處：

使用者體驗更好（立即看到回應）
避免 Serverless 超時
感覺更快

限制回應長度

const result = await streamText({
  model: openai('gpt-4-turbo'),
  messages,
  maxTokens: 1000, // 限制回應 token 數
});

為什麼要限制：

減少 API 成本
加快回應速度
避免超時

選擇適合的模型

模型	速度	成本	能力
GPT-3.5-turbo	快	低	一般
GPT-4-turbo	中	高	強
Claude 3 Haiku	很快	低	一般
Claude 3 Opus	慢	很高	很強

選擇策略：

簡單任務（分類、提取）→ 快速便宜的模型
複雜任務（推理、創作）→ 強大的模型

// 根據任務選擇模型
function selectModel(taskType: string) {
  switch (taskType) {
    case 'simple':
      return openai('gpt-3.5-turbo');
    case 'complex':
      return openai('gpt-4-turbo');
    default:
      return openai('gpt-3.5-turbo');
  }
}

成本優化

監控 API 使用量

在程式中記錄：

export async function POST(req: Request) {
  const startTime = Date.now();
  const { messages } = await req.json();

  const result = await streamText({
    model: openai('gpt-4-turbo'),
    messages,
  });

  // 記錄使用量
  console.log({
    endpoint: '/api/chat',
    model: 'gpt-4-turbo',
    inputTokens: countTokens(messages),
    duration: Date.now() - startTime,
    timestamp: new Date().toISOString(),
  });

  return result.toDataStreamResponse();
}

在 OpenAI Dashboard 設定預算：

登入 OpenAI Dashboard
Settings → Limits
設定每月預算上限

實作請求限流

import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';

const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(10, '1 m'), // 每分鐘 10 次
  analytics: true,
});

export async function POST(req: Request) {
  // 根據 IP 或使用者 ID 限流
  const identifier = req.headers.get('x-forwarded-for') ?? 'anonymous';
  const { success, limit, remaining } = await ratelimit.limit(identifier);

  if (!success) {
    return Response.json(
      { error: '請求過於頻繁，請稍後再試' },
      {
        status: 429,
        headers: {
          'X-RateLimit-Limit': limit.toString(),
          'X-RateLimit-Remaining': remaining.toString(),
        },
      }
    );
  }

  // 繼續處理...
}

快取常見回應

import { kv } from '@vercel/kv';
import { createHash } from 'crypto';

function hashMessages(messages: Message[]) {
  return createHash('md5')
    .update(JSON.stringify(messages))
    .digest('hex');
}

export async function POST(req: Request) {
  const { messages } = await req.json();
  const cacheKey = `chat:${hashMessages(messages)}`;

  // 檢查快取
  const cached = await kv.get(cacheKey);
  if (cached) {
    return Response.json({ content: cached, cached: true });
  }

  // 呼叫 AI
  const result = await generateText({
    model: openai('gpt-4-turbo'),
    messages,
  });

  // 儲存快取（1 小時）
  await kv.set(cacheKey, result.text, { ex: 3600 });

  return Response.json({ content: result.text, cached: false });
}

使用較短的 Prompt

// ❌ 冗長的 prompt
const system = `
你是一個非常專業的客服助理，擁有豐富的經驗和知識。
你的目標是幫助用戶解決他們的問題。
請用友善的語氣回答。
如果你不知道答案，請誠實地說不知道。
請使用繁體中文回答所有問題。
...（更多內容）
`;

// ✅ 精簡的 prompt
const system = `你是客服助理。繁體中文回答，簡潔友善。不確定就說不知道。`;

安全性最佳實踐

保護 API Key

永遠不要在前端暴露 API Key：

// ❌ 危險：在前端呼叫
const response = await fetch('https://api.openai.com/v1/chat', {
  headers: {
    'Authorization': `Bearer ${process.env.NEXT_PUBLIC_OPENAI_KEY}`, // 會被看到！
  },
});

// ✅ 安全：透過自己的 API 呼叫
const response = await fetch('/api/chat', {
  method: 'POST',
  body: JSON.stringify({ messages }),
});

驗證輸入

import { z } from 'zod';

const chatSchema = z.object({
  messages: z.array(z.object({
    role: z.enum(['user', 'assistant', 'system']),
    content: z.string().max(10000), // 限制長度
  })).max(50), // 限制訊息數量
});

export async function POST(req: Request) {
  const body = await req.json();

  // 驗證輸入
  const result = chatSchema.safeParse(body);
  if (!result.success) {
    return Response.json(
      { error: '輸入格式錯誤' },
      { status: 400 }
    );
  }

  const { messages } = result.data;
  // 繼續處理...
}

過濾敏感內容

import OpenAI from 'openai';

const openai = new OpenAI();

async function moderateContent(content: string) {
  const moderation = await openai.moderations.create({
    input: content,
  });

  return moderation.results[0].flagged;
}

export async function POST(req: Request) {
  const { messages } = await req.json();
  const lastMessage = messages[messages.length - 1];

  // 檢查使用者輸入
  if (await moderateContent(lastMessage.content)) {
    return Response.json(
      { error: '訊息包含不當內容' },
      { status: 400 }
    );
  }

  // 繼續處理...
}

實作認證

import { auth } from '@/auth';

export async function POST(req: Request) {
  // 檢查使用者是否登入
  const session = await auth();

  if (!session) {
    return Response.json(
      { error: '請先登入' },
      { status: 401 }
    );
  }

  // 檢查使用者額度
  const user = await db.user.findUnique({
    where: { id: session.user.id },
  });

  if (user.credits <= 0) {
    return Response.json(
      { error: '額度已用完' },
      { status: 403 }
    );
  }

  // 繼續處理...
}

可靠性和錯誤處理

處理 API 錯誤

export async function POST(req: Request) {
  try {
    const { messages } = await req.json();

    const result = await streamText({
      model: openai('gpt-4-turbo'),
      messages,
    });

    return result.toDataStreamResponse();
  } catch (error) {
    console.error('AI API Error:', error);

    // 分類錯誤
    if (error.code === 'insufficient_quota') {
      return Response.json(
        { error: 'AI 服務額度已用完，請稍後再試' },
        { status: 503 }
      );
    }

    if (error.code === 'rate_limit_exceeded') {
      return Response.json(
        { error: '請求過於頻繁，請稍後再試' },
        { status: 429 }
      );
    }

    if (error.code === 'context_length_exceeded') {
      return Response.json(
        { error: '對話過長，請開始新對話' },
        { status: 400 }
      );
    }

    // 通用錯誤
    return Response.json(
      { error: '發生錯誤，請稍後再試' },
      { status: 500 }
    );
  }
}

實作重試機制

async function callAIWithRetry(messages: Message[], retries = 3) {
  for (let i = 0; i < retries; i++) {
    try {
      return await streamText({
        model: openai('gpt-4-turbo'),
        messages,
      });
    } catch (error) {
      if (i === retries - 1) throw error;

      // 可重試的錯誤
      if (error.code === 'rate_limit_exceeded') {
        const waitTime = Math.pow(2, i) * 1000;
        await new Promise(r => setTimeout(r, waitTime));
        continue;
      }

      // 不可重試的錯誤
      throw error;
    }
  }
}

設定請求超時

const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 25000); // 25 秒超時

try {
  const result = await streamText({
    model: openai('gpt-4-turbo'),
    messages,
    abortSignal: controller.signal,
  });
  clearTimeout(timeout);
  return result.toDataStreamResponse();
} catch (error) {
  clearTimeout(timeout);
  if (error.name === 'AbortError') {
    return Response.json(
      { error: '請求超時，請重試' },
      { status: 504 }
    );
  }
  throw error;
}

監控和分析

記錄關鍵指標

import { logAnalytics } from '@/lib/analytics';

export async function POST(req: Request) {
  const startTime = Date.now();
  const { messages, userId } = await req.json();

  try {
    const result = await streamText({
      model: openai('gpt-4-turbo'),
      messages,
    });

    // 成功時記錄
    await logAnalytics({
      event: 'ai_request_success',
      userId,
      model: 'gpt-4-turbo',
      inputTokens: countTokens(messages),
      duration: Date.now() - startTime,
    });

    return result.toDataStreamResponse();
  } catch (error) {
    // 失敗時記錄
    await logAnalytics({
      event: 'ai_request_error',
      userId,
      error: error.code,
      duration: Date.now() - startTime,
    });

    throw error;
  }
}

使用 Vercel Analytics

// app/layout.tsx
import { Analytics } from '@vercel/analytics/react';

export default function RootLayout({ children }) {
  return (
    <html>
      <body>
        {children}
        <Analytics />
      </body>
    </html>
  );
}

設定告警

整合告警服務（如 PagerDuty、Slack）：

async function sendAlert(message: string) {
  await fetch(process.env.SLACK_WEBHOOK_URL, {
    method: 'POST',
    body: JSON.stringify({ text: `🚨 AI 服務告警: ${message}` }),
  });
}

// 當錯誤率過高時告警
if (errorRate > 0.1) {
  await sendAlert('AI API 錯誤率超過 10%');
}

部署檢查清單

部署前

[ ] API Key 設定在環境變數
[ ] 實作了請求限流
[ ] 實作了錯誤處理
[ ] 實作了輸入驗證
[ ] 設定了適當的 maxTokens
[ ] 選擇了正確的執行環境

部署後

[ ] 測試串流回應正常
[ ] 測試錯誤處理正常
[ ] 確認監控正常運作
[ ] 設定了 OpenAI 預算上限
[ ] 文件更新

常見問題 FAQ

Q1：AI 回應太慢怎麼辦？

使用 Edge Runtime
使用較快的模型
減少 maxTokens
檢查網路問題

Q2：成本超出預期怎麼辦？

檢查是否有異常請求
加強限流
使用較便宜的模型
優化 prompt 長度

Q3：如何處理大量並發？

Vercel 會自動擴展，但要注意：

OpenAI API 有自己的限制
考慮使用佇列處理
實作更嚴格的限流

Vercel 部署失敗？

Build Error、環境變數、自訂網域，我們幫你快速排除問題。

解決 Vercel 問題

Vercel AI 部署最佳實踐｜2025 AI 應用效能與成本優化指南

Vercel AI 部署最佳實踐｜2025 AI 應用效能與成本優化指南

效能優化

選擇正確的執行環境

使用串流回應

限制回應長度

選擇適合的模型

成本優化

監控 API 使用量

實作請求限流

快取常見回應

使用較短的 Prompt

安全性最佳實踐

保護 API Key

驗證輸入

過濾敏感內容

實作認證

可靠性和錯誤處理

處理 API 錯誤

實作重試機制

設定請求超時

監控和分析

記錄關鍵指標

使用 Vercel Analytics

設定告警

部署檢查清單

部署前

部署後

常見問題 FAQ

Q1：AI 回應太慢怎麼辦？

Q2：成本超出預期怎麼辦？

Q3：如何處理大量並發？

Vercel 部署失敗？

延伸閱讀

VibeFix

這篇文章有幫到你嗎？