Building AI Applications with Claude 4.5 Opus
Comprehensive guide to building production AI applications using Claude 4.5 Opus, covering tool use, streaming, caching, and advanced prompting techniques.
Dibyank Padhy
Engineering Manager & Full Stack Developer
Building AI Applications with Claude 4.5 Opus
Claude 4.5 Opus represents a significant leap in AI capabilities, offering enhanced reasoning, better instruction following, and improved tool use. In this guide, we'll build production-ready AI applications leveraging these capabilities.
Setting Up the Claude SDK
First, let's set up a robust Claude client with proper error handling and retry logic:
import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
interface Message {
role: 'user' | 'assistant';
content: string | ContentBlock[];
}
interface ContentBlock {
type: 'text' | 'tool_use' | 'tool_result';
text?: string;
id?: string;
name?: string;
input?: any;
tool_use_id?: string;
content?: string;
}
// Retry wrapper with exponential backoff
async function withRetry<T>(
fn: () => Promise<T>,
maxRetries = 3,
baseDelay = 1000
): Promise<T> {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
return await fn();
} catch (error: any) {
const isRetryable =
error.status === 429 || // Rate limit
error.status === 529 || // Overloaded
error.status >= 500; // Server error
if (!isRetryable || attempt === maxRetries) {
throw error;
}
const delay = baseDelay * Math.pow(2, attempt - 1);
console.log(`Attempt ${attempt} failed, retrying in ${delay}ms...`);
await new Promise(resolve => setTimeout(resolve, delay));
}
}
throw new Error('Retry failed');
}Tool Use (Function Calling)
Claude 4.5 Opus excels at tool use. Here's how to implement a multi-tool system:
// Define tools
const tools: Anthropic.Tool[] = [
{
name: 'get_weather',
description: 'Get current weather for a location. Use this when the user asks about weather conditions.',
input_schema: {
type: 'object',
properties: {
location: {
type: 'string',
description: 'City and country, e.g., "London, UK"',
},
units: {
type: 'string',
enum: ['celsius', 'fahrenheit'],
description: 'Temperature units',
},
},
required: ['location'],
},
},
{
name: 'search_database',
description: 'Search the product database for items matching criteria.',
input_schema: {
type: 'object',
properties: {
query: { type: 'string', description: 'Search query' },
category: { type: 'string', description: 'Product category' },
max_price: { type: 'number', description: 'Maximum price filter' },
limit: { type: 'number', description: 'Max results to return' },
},
required: ['query'],
},
},
{
name: 'create_order',
description: 'Create a new order for a customer. Only use after confirming with user.',
input_schema: {
type: 'object',
properties: {
product_ids: {
type: 'array',
items: { type: 'string' },
description: 'Product IDs to order',
},
customer_id: { type: 'string', description: 'Customer ID' },
shipping_address: {
type: 'object',
properties: {
street: { type: 'string' },
city: { type: 'string' },
postal_code: { type: 'string' },
country: { type: 'string' },
},
required: ['street', 'city', 'postal_code', 'country'],
},
},
required: ['product_ids', 'customer_id', 'shipping_address'],
},
},
];
// Tool implementations
const toolImplementations: Record<string, (input: any) => Promise<any>> = {
get_weather: async ({ location, units = 'celsius' }) => {
const response = await fetch(
`https://api.weather.service/v1/current?location=${encodeURIComponent(location)}&units=${units}`
);
return response.json();
},
search_database: async ({ query, category, max_price, limit = 10 }) => {
// Your database search logic
return db.products.search({ query, category, max_price, limit });
},
create_order: async ({ product_ids, customer_id, shipping_address }) => {
return orderService.create({ product_ids, customer_id, shipping_address });
},
};
// Agentic loop with tool execution
async function runAgent(userMessage: string, conversationHistory: Message[] = []) {
const messages: Message[] = [
...conversationHistory,
{ role: 'user', content: userMessage },
];
while (true) {
const response = await withRetry(() =>
anthropic.messages.create({
model: 'claude-opus-4-5-20251101',
max_tokens: 4096,
system: `You are a helpful shopping assistant. Use the available tools to help customers find products and place orders. Always confirm order details before creating an order.`,
tools,
messages,
})
);
// Check if we need to execute tools
const toolUseBlocks = response.content.filter(
(block): block is Anthropic.ToolUseBlock => block.type === 'tool_use'
);
if (toolUseBlocks.length === 0) {
// No tool use - return the text response
const textBlock = response.content.find(b => b.type === 'text');
return {
response: textBlock?.text || '',
messages,
};
}
// Execute tools
const toolResults: ContentBlock[] = [];
for (const toolUse of toolUseBlocks) {
console.log(`Executing tool: ${toolUse.name}`, toolUse.input);
try {
const implementation = toolImplementations[toolUse.name];
if (!implementation) {
throw new Error(`Unknown tool: ${toolUse.name}`);
}
const result = await implementation(toolUse.input);
toolResults.push({
type: 'tool_result',
tool_use_id: toolUse.id,
content: JSON.stringify(result),
});
} catch (error: any) {
toolResults.push({
type: 'tool_result',
tool_use_id: toolUse.id,
content: JSON.stringify({ error: error.message }),
is_error: true,
});
}
}
// Add assistant response and tool results to messages
messages.push({ role: 'assistant', content: response.content });
messages.push({ role: 'user', content: toolResults });
}
}Streaming Responses
For better UX, implement streaming to show responses as they generate:
async function* streamResponse(
userMessage: string,
systemPrompt: string
): AsyncGenerator<string> {
const stream = anthropic.messages.stream({
model: 'claude-opus-4-5-20251101',
max_tokens: 4096,
system: systemPrompt,
messages: [{ role: 'user', content: userMessage }],
});
for await (const event of stream) {
if (
event.type === 'content_block_delta' &&
event.delta.type === 'text_delta'
) {
yield event.delta.text;
}
}
// Get final message for metadata
const finalMessage = await stream.finalMessage();
console.log('Token usage:', finalMessage.usage);
}
// Express.js streaming endpoint
app.post('/api/chat/stream', async (req, res) => {
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
try {
const generator = streamResponse(req.body.message, req.body.systemPrompt);
for await (const chunk of generator) {
res.write(`data: ${JSON.stringify({ text: chunk })}\n\n`);
}
res.write(`data: [DONE]\n\n`);
} catch (error) {
res.write(`data: ${JSON.stringify({ error: 'Stream error' })}\n\n`);
} finally {
res.end();
}
});
// Client-side consumption
async function consumeStream(message: string) {
const response = await fetch('/api/chat/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ message }),
});
const reader = response.body?.getReader();
const decoder = new TextDecoder();
while (reader) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') return;
const parsed = JSON.parse(data);
if (parsed.text) {
appendToUI(parsed.text);
}
}
}
}
}Prompt Caching
Use prompt caching to reduce latency and costs for repeated prompts:
// Enable caching for system prompts and large context
async function chatWithCaching(
userMessage: string,
documentContext: string
) {
const response = await anthropic.messages.create({
model: 'claude-opus-4-5-20251101',
max_tokens: 4096,
system: [
{
type: 'text',
text: `You are an expert analyst. Use the following document to answer questions accurately.`,
},
{
type: 'text',
text: documentContext,
cache_control: { type: 'ephemeral' }, // Cache this block
},
],
messages: [{ role: 'user', content: userMessage }],
});
console.log('Cache metrics:', {
input_tokens: response.usage.input_tokens,
cache_creation_input_tokens: response.usage.cache_creation_input_tokens,
cache_read_input_tokens: response.usage.cache_read_input_tokens,
});
return response;
}
// Multi-turn conversation with cached context
class CachedConversation {
private messages: Message[] = [];
private cachedContext: string;
constructor(context: string) {
this.cachedContext = context;
}
async chat(userMessage: string): Promise<string> {
this.messages.push({ role: 'user', content: userMessage });
const response = await anthropic.messages.create({
model: 'claude-opus-4-5-20251101',
max_tokens: 4096,
system: [
{
type: 'text',
text: 'You are a helpful assistant with access to the user\'s documents.',
},
{
type: 'text',
text: this.cachedContext,
cache_control: { type: 'ephemeral' },
},
],
messages: this.messages,
});
const assistantMessage = response.content[0].type === 'text'
? response.content[0].text
: '';
this.messages.push({ role: 'assistant', content: assistantMessage });
return assistantMessage;
}
}Structured Output with JSON Mode
Extract structured data reliably using JSON mode:
interface ExtractedData {
entities: Array<{
name: string;
type: 'person' | 'organization' | 'location' | 'date';
confidence: number;
}>;
sentiment: 'positive' | 'negative' | 'neutral';
summary: string;
key_points: string[];
}
async function extractStructuredData(text: string): Promise<ExtractedData> {
const response = await anthropic.messages.create({
model: 'claude-opus-4-5-20251101',
max_tokens: 2048,
messages: [
{
role: 'user',
content: `Analyze the following text and extract structured information.
Text:
${text}
Respond with a JSON object containing:
- entities: array of {name, type, confidence} where type is person/organization/location/date
- sentiment: positive/negative/neutral
- summary: brief summary
- key_points: array of main points
Return only valid JSON, no markdown formatting.`,
},
],
});
const textContent = response.content.find(b => b.type === 'text');
if (!textContent || textContent.type !== 'text') {
throw new Error('No text response');
}
// Parse and validate
const data = JSON.parse(textContent.text);
// Type validation
if (!Array.isArray(data.entities) || !data.sentiment || !data.summary) {
throw new Error('Invalid response structure');
}
return data as ExtractedData;
}
// With Zod validation for type safety
import { z } from 'zod';
const ExtractedDataSchema = z.object({
entities: z.array(z.object({
name: z.string(),
type: z.enum(['person', 'organization', 'location', 'date']),
confidence: z.number().min(0).max(1),
})),
sentiment: z.enum(['positive', 'negative', 'neutral']),
summary: z.string(),
key_points: z.array(z.string()),
});
async function extractWithValidation(text: string) {
const response = await extractStructuredData(text);
return ExtractedDataSchema.parse(response);
}Building a RAG Pipeline
Implement Retrieval-Augmented Generation for document Q&A:
import { OpenAI } from 'openai';
const openai = new OpenAI(); // For embeddings
interface Document {
id: string;
content: string;
metadata: Record<string, any>;
embedding?: number[];
}
// Generate embeddings
async function generateEmbedding(text: string): Promise<number[]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text,
});
return response.data[0].embedding;
}
// Cosine similarity
function cosineSimilarity(a: number[], b: number[]): number {
const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
return dotProduct / (magnitudeA * magnitudeB);
}
// RAG system
class RAGSystem {
private documents: Document[] = [];
async addDocument(content: string, metadata: Record<string, any> = {}) {
const embedding = await generateEmbedding(content);
this.documents.push({
id: crypto.randomUUID(),
content,
metadata,
embedding,
});
}
async query(question: string, topK = 3): Promise<string> {
// Get question embedding
const questionEmbedding = await generateEmbedding(question);
// Find most relevant documents
const scored = this.documents.map(doc => ({
doc,
score: cosineSimilarity(questionEmbedding, doc.embedding!),
}));
scored.sort((a, b) => b.score - a.score);
const relevant = scored.slice(0, topK);
// Build context
const context = relevant
.map((r, i) => `[Document ${i + 1}] (relevance: ${r.score.toFixed(2)})\n${r.doc.content}`)
.join('\n\n---\n\n');
// Query Claude with context
const response = await anthropic.messages.create({
model: 'claude-opus-4-5-20251101',
max_tokens: 2048,
system: [
{
type: 'text',
text: `You are a helpful assistant that answers questions based on the provided documents.
If the answer cannot be found in the documents, say so clearly.
Always cite which document(s) you're drawing information from.`,
},
{
type: 'text',
text: `Reference Documents:\n\n${context}`,
cache_control: { type: 'ephemeral' },
},
],
messages: [{ role: 'user', content: question }],
});
return response.content[0].type === 'text' ? response.content[0].text : '';
}
}
// Usage
const rag = new RAGSystem();
await rag.addDocument('AWS Lambda pricing is based on requests and duration...');
await rag.addDocument('CloudWatch can monitor Lambda functions with custom metrics...');
await rag.addDocument('Lambda layers allow sharing code between functions...');
const answer = await rag.query('How is Lambda pricing calculated?');
console.log(answer);Conclusion
Claude 4.5 Opus provides powerful capabilities for building sophisticated AI applications. By implementing proper tool use, streaming, caching, and RAG patterns, you can create production-ready systems that are both performant and cost-effective.
Key takeaways:
- Use tool use for complex agentic workflows
- Implement streaming for better user experience
- Leverage prompt caching for repeated contexts
- Combine with RAG for knowledge-grounded responses