สรุปสำคัญ

Prompt คือ Control Plane ไม่ใช่แค่ Input Box — มันคือ interface ควบคุมพฤติกรรม AI ทั้งระบบ
Prompt Layering 3 ระดับ: Orchestration (จัดการงาน) → Runtime (ขณะทำงาน) → Model Interface (ติดต่อโมเดล)
ความต่างระหว่าง Prompt Engineering (ปรับแต่งข้อความ) กับ Harness Engineering (ออกแบบระบบควบคุม)

Harness Engineering ตอนที่ 2: Prompt คือ Control Plane (ไม่ใช่ Input Box)

คำโปรย: “Prompt กำหนดวิธีพูด, Harness กำหนดวิธีทำงาน” — เรียนรู้ว่าทำไม Prompt Engineering ถึงถึงจุดอิ่มตัว และ Harness Engineering คือคำตอบ

🎣 ส่วนนำ: ทำไม Prompt ถึงสำคัญกว่าที่คิด?

ลองนึกภาพว่าคุณมีรถยนต์คันหนึ่ง

แบบที่ 1: คุณบอก “ขับไปถึงที่หมาย” — รถจะพาคุณไปได้ แต่ถ้ามีเด็กวิ่งตัดหน้า? รถอาจจะเบรกไม่ทัน

แบบที่ 2: คุณบอก “ขับไปที่หมาย แต่ต้องระวังเด็กข้างทาง ขับช้ากว่า 50 กม./ชม. ห้ามแซง และต้องหยุดเติมน้ำมันทุก 200 กม.” — ผลลัพธ์จะต่างกันมาก

Prompt ก็เหมือนกัน

หลายคนมอง Prompt เป็นแค่ “กล่องใส่ข้อความ” ที่พิมพ์ๆ แล้วกดส่ง แต่ถ้ามองในมุมของ Harness Engineering — Prompt คือ Control Plane ที่ควบคุมพฤติกรรมของ AI ไม่ใช่แค่ input ที่ใส่เข้าไป

และนี่คือจุดที่หลายคนเข้าใจผิด

🤔 Prompt คืออะไร?

มุมมองเดิม vs มุมมองใหม่

มุมมองเดิม (Input Box):

“Prompt คือ ข้อความที่ใส่เข้าไปในกล่อง chat เพื่อบอก AI ให้ทำอะไรสักอย่าง”

มุมมองใหม่ (Control Plane):

“Prompt คือ interface สำหรับควบคุมพฤติกรรมของ AI — เหมือนพวงมาลัยที่ควบคุมทิศทาง ไม่ใช่แค่เชื้อเพลิงที่ใส่เข้าไป”

ทำไมต้องแยกให้ชัด?

เพราะถ้ามอง Prompt เป็นแค่ Input Box → คุณจะโฟกัสที่ “จะพิมพ์อะไรดี” แต่ถ้ามอง Prompt เป็น Control Plane → คุณจะโฟกัสที่ “จะ design ระบบอย่างไรให้ AI ทำงานถูกต้อง”

นี่คือความแตกต่างระหว่าง Prompt Engineering (ปรับแต่งข้อความ) กับ Harness Engineering (ออกแบบระบบควบคุม)

📊 ตัวอย่างเปรียบเทียบ: Input Box vs Control Plane

สถานการณ์	แบบ Input Box (เดิม)	แบบ Control Plane (ใหม่)
ระบบเคลมประกัน	“ตรวจสอบคำขอเคลมประกัน”	กำหนด workflow: ตรวจสอบเงื่อนไข → คำนวณค่าชดเชย → ตรวจสอบเอกสาร → ส่งข้อมูลให้คนอนุมัติ
เขียนโค้ด	“เขียน Python function”	“เขียน Python + เขียน test ด้วย + ห้าม commit ถ้า test fail”
วิเคราะห์ข้อมูล	“วิเคราะห์ข้อมูลนี้”	กำหนด: ใช้สถิติอะไร → รูปแบบการแสดงผล → ข้อจำกัดของข้อมูล
Customer Support	“ตอบลูกค้า”	กำหนด: โทนเสียง → SLA → Escalation path → Satisfaction survey
Content Creation	“เขียนบทความ”	กำหนด: Tone of voice → SEO keywords → Word count → Fact-checking process

เห็นไหม? Control Plane ไม่ได้แค่ “บอกว่าทำอะไร” แต่ “บอกว่าทำอย่างไร ด้วยเงื่อนไขอะไร”

🏗️ Prompt Layering: 3 ระดับของการควบคุม

ไม่ใช่ทุก Prompt อยู่ในระดับเดียวกัน การแบ่งชั้นของ Prompt ช่วยให้เราออกแบบระบบที่ซับซ้อนได้ดีขึ้น

ตารางเปรียบเทียบ 3 ระดับ

ระดับ	ชื่อ	หน้าที่	ตัวอย่าง	ความถี่ในการเปลี่ยน
1. Orchestration	ระดับจัดการงาน	กำหนดว่า “ต้องทำอะไรบ้าง เรียงลำดับอย่างไร”	Agent workflow, task decomposition	นานๆ ครั้ง
2. Runtime	ระดับขณะทำงาน	กำหนด “บริบท ข้อจำกัด เงื่อนไข” ขณะ AI ทำงาน	Context, constraints, validation rules	ปรับตามงาน
3. Model Interface	ระดับติดต่อโมเดล	กำหนด “รูปแบบการสื่อสารกับโมเดล”	Instructions, format, output structure	บ่อย (ปรับ prompt ทุกครั้ง)

อธิบายแบบง่ายๆ

Orchestration = ผู้จัดการโปรเจกต์ ที่บอกว่า “เรามี 5 ขั้นตอน ขั้น 1 ทำ A ขั้น 2 ทำ B…”
Runtime = หัวหน้างาน ที่บอกว่า “ตอนทำขั้นนี้ อย่าลืมเรื่องความปลอดภัยด้วย”
Model Interface = เลขาที่ ที่บอกว่า “เขียนรายงานในรูปแบบนี้…”

ทั้ง 3 ระดับทำงานร่วมกัน เหมือนโครงสร้างองค์กร — แต่ละชั้นมีหน้าที่ต่างกัน

ตัวอย่างโค้ด: Prompt Layering ในทางปฏิบัติ

 1# Layer 1: Orchestration (Foundation)
 2SYSTEM_PROMPT = """
 3You are a senior Python developer working on a FastAPI project.
 4You always write type-safe, well-documented code.
 5You follow TDD: write tests before implementation.
 6"""
 7
 8# Layer 2: Runtime (Context)
 9CONTEXT_PROMPT = """
10Current project structure:
11- /app/main.py - FastAPI entry point
12- /app/models/ - Pydantic models
13- /app/routers/ - API endpoints
14- /tests/ - Pytest test files
15
16Current task: Implement user authentication
17"""
18
19# Layer 3: Model Interface (Task)
20TASK_PROMPT = """
21Write a function to authenticate user by JWT token.
22
23Requirements:
24- Use Pydantic for validation
25- Return HTTPException on failure
26- Include unit tests
27- Follow existing code style
28"""
29
30# Combine all layers
31full_prompt = f"{SYSTEM_PROMPT}\n\n{CONTEXT_PROMPT}\n\n{TASK_PROMPT}"
32print(full_prompt)

📈 สถิติที่น่าสนใจ

ข้อมูลจากงานวิจัยชี้ว่า:

วิธี	ผลตอบแทน
Prompt Engineering แบบเดิม	ปรับปรุงได้ <3%
Harness-level changes (รวม Prompt Layering)	ปรับปรุงได้ 28-47%

นั่นหมายความว่า การเปลี่ยนแปลงที่ระดับ “ระบบ” (Harness) มีผลมากกว่าการเปลี่ยนแปลงที่ระดับ “ข้อความ” (Prompt) ถึง 10 เท่า!

และนี่คือเหตุผลที่เราต้องมอง Prompt เป็น Control Plane ไม่ใช่แค่ Input Box

📄 AGENTS.md: แผนที่ ไม่ใช่ Prompt ยาวๆ

อีกตัวอย่างที่ดีคือไฟล์ AGENTS.md ในโปรเจกต์ต่างๆ

หลายคนเขียน prompt ยาวเต็มไฟล์ แต่ AGENTS.md ที่ดีควรเป็น แผนที่ — บอกว่า:

Agent นี้ทำอะไร
ต้อง interact กับอะไรบ้าง
มีข้อจำกัดอะไร

ไม่ใช่ “script ที่ต้องอ่านทุกบรรทัด”

ตัวอย่าง: AGENTS.md ที่ดี

 1# Agent Role: Backend Developer
 2
 3## Responsibilities
 4- Implement API endpoints
 5- Write unit tests
 6- Update documentation
 7
 8## Constraints
 9- Must use type hints
10- Must achieve 90% test coverage
11- Cannot modify database schema without approval
12
13## Workflow
141. Read task from TASKS.md
152. Implement code
163. Run tests
174. Submit for review

นี่คือหลักการของ Prompt Layering — แบ่งให้ชัด ไม่ยัดทุกอย่างไว้ที่เดียว

🔍 จากทฤษฎีสู่ Reality Check: Claude Code vs Codex

ตอนนี้เราเข้าใจหลักการแล้ว มาดูตัวอย่างจริงกัน

Claude Code vs OpenAI Codex

ด้าน	Claude Code	OpenAI Codex
แนวทาง	Proactive Planner	Shell-first Surgeon
Workflow	สแกน repo ก่อนแล้ว plan	เริ่มจาก lean context
Memory	ใช้ CLAUDE.md เป็น long-term memory	ใช้ AGENTS.md เป็น map
Context Window	1M tokens	200K tokens
Token Usage	ใช้มากกว่า 3.2-4.2 เท่า	ใช้น้อยกว่า แต่ thorough น้อยกว่า
Agent Teams	Coordinated agents	Cloud sandbox per task
Isolation	Git worktree per agent	Cloud sandbox

ตัวอย่าง: Token Usage ต่างกันอย่างไร?

 1งาน: Implement user authentication
 2
 3Claude Code:
 4- Scan repo: 50K tokens
 5- Read CLAUDE.md: 10K tokens
 6- Plan: 5K tokens
 7- Implement: 30K tokens
 8- Test: 20K tokens
 9- Total: ~115K tokens
10
11Codex:
12- Read AGENTS.md: 5K tokens
13- Implement: 20K tokens
14- Test: 10K tokens
15- Total: ~35K tokens
16
17Ratio: Claude Code ใช้ token มากกว่า ~3.3 เท่า

คำถาม: แล้วควรเลือกอันไหน?

คำตอบ: ขึ้นอยู่กับงาน

Claude Code — เหมาะกับงานที่ซับซ้อน ต้องการ thorough plan
Codex — เหมาะกับงานเร็วๆ ไม่ซับซ้อนมาก

🛡️ Guardrails 3 ระดับ

Prompt ที่ดีต้องมี Guardrails — เหมือนรั้วที่ป้องกันไม่ให้ AI ทำผิด

ตาราง: Guardrails 3 ระดับ

ระดับ	ประเภท	ตัวอย่าง
Input	Content filtering, Schema validation, Rate limiting	ห้าม prompt injection, ต้องเป็น JSON, จำกัด 10 requests/min
Output	Format validation, Factual grounding, Safety classifiers	ต้องมี type hints, ต้องอ้างอิงแหล่งที่มา, ห้าม generate harmful content
Execution	Tool call approval, Resource limits, Deadlock detection	ต้องขออนุญาตก่อน rm -rf, จำกัด CPU 50%, ตรวจจับ infinite loop

ตัวอย่างโค้ด: Guardrails ในทางปฏิบัติ

 1class Guardrails:
 2    def validate_input(self, prompt: str) -> bool:
 3        # Input guardrails
 4        if len(prompt) > 10000:
 5            raise ValueError("Prompt too long")
 6        if "rm -rf" in prompt:
 7            raise ValueError("Dangerous command detected")
 8        return True
 9    
10    def validate_output(self, code: str) -> bool:
11        # Output guardrails
12        if not self.has_type_hints(code):
13            raise ValueError("Missing type hints")
14        if not self.has_docstrings(code):
15            raise ValueError("Missing docstrings")
16        return True
17    
18    def validate_execution(self, tool_call: dict) -> bool:
19        # Execution guardrails
20        if tool_call['name'] == 'file_write':
21            if not tool_call['path'].startswith('/safe/'):
22                raise ValueError("Unsafe path")
23        return True

🧠 Memory Systems 5 ประเภท

AI จำเป็นต้องมี Memory — แต่ไม่ใช่แค่ “จำได้ทุกเรื่อง” แต่ต้องจำอย่างมีระบบ

ตาราง: Memory Systems 5 ประเภท

ประเภท	หน้าที่	ตัวอย่าง
System Memory	ระบบพื้นฐาน	Rules, constraints, guardrails
Session Memory	ระหว่าง session	Conversation history, current task
Project Memory	โปรเจกต์ปัจจุบัน	CLAUDE.md, AGENTS.md, progress.md
User Memory	ความชอบผู้ใช้	Coding style, preferences, patterns
World Memory	ความรู้ทั่วไป	Documentation, APIs, best practices

ตัวอย่าง: การใช้งาน Memory ในทางปฏิบัติ

 1# CLAUDE.md (Project Memory)
 2
 3## Project Overview
 4- Name: FastAPI Auth System
 5- Version: 1.0.0
 6- Python: 3.11+
 7
 8## Coding Standards
 9- Type hints: Required
10- Test coverage: 90%+
11- Documentation: Google style
12
13## Current Progress
14- [x] User model
15- [x] Authentication endpoint
16- [ ] Authorization middleware
17- [ ] Unit tests

🔄 Retry Logic 5 ระดับ

เมื่อ AI ทำผิด — จะทำอย่างไร?

ตาราง: Retry Logic 5 ระดับ

ระดับ	ประเภท	เมื่อไหร่ใช้
1. Simple Retry	ลองใหม่เหมือนเดิม	Error ชั่วคราว (network timeout)
2. Reformulated Retry	ลองใหม่โดยปรับ prompt	Model เข้าใจผิด
3. Model Fallback	เปลี่ยนโมเดล	โมเดลปัจจุบันทำไม่ได้
4. Decomposition Retry	แยกงานย่อย	งานซับซ้อนเกินไป
5. Human Escalation	ให้คนทำ	AI ทำไม่ได้จริงๆ

ตัวอย่างโค้ด: Retry Logic ในทางปฏิบัติ

 1class RetryLogic:
 2    def execute_with_retry(self, task: str, max_retries: int = 5):
 3        for attempt in range(max_retries):
 4            try:
 5                # Level 1: Simple Retry
 6                result = self.model.execute(task)
 7                return result
 8            except TemporaryError:
 9                continue
10            except MisunderstandingError:
11                # Level 2: Reformulated Retry
12                task = self.reformulate(task)
13            except ModelCapabilityError:
14                # Level 3: Model Fallback
15                self.model = self.get_fallback_model()
16            except ComplexityError:
17                # Level 4: Decomposition Retry
18                subtasks = self.decompose(task)
19                results = [self.execute_with_retry(t) for t in subtasks]
20                return self.combine(results)
21        
22        # Level 5: Human Escalation
23        self.escalate_to_human(task)

👥 Sub-agent Isolation

เมื่อมีหลาย Agent — จะแยกกันอย่างไร?

ตาราง: Codex vs Claude

ด้าน	Codex	Claude
Isolation	Cloud sandbox per task	Git worktree per agent
Communication	ผ่าน API	อ่าน shared files ได้
State	Stateless per task	Stateful across session
Resource	แยกชัดเจน	Shared แต่มี limits

📝 สรุปตอนที่ 2

สิ่งที่ได้เรียนรู้:

✅ Prompt คือ Control Plane — ไม่ใช่ Input Box แต่เป็น interface ควบคุมพฤติกรรม

✅ Prompt Layering 3 ระดับ — Orchestration, Runtime, Model Interface

✅ สถิติ — Harness-level changes ได้ 28-47% improvement (vs <3% จาก prompt engineering)

✅ Claude Code vs Codex — 2 แนวคิดต่างกัน (Proactive Planner vs Shell-first Surgeon)

✅ Guardrails 3 ระดับ — Input, Output, Execution

✅ Memory Systems 5 ประเภท — System, Session, Project, User, World

✅ Retry Logic 5 ระดับ — Simple → Reformulated → Fallback → Decomposition → Human

✅ Sub-agent Isolation — Cloud Sandbox vs Git Worktree

💡 บทเรียนจากประสบการณ์เหน่ง

ช่วงแรก: ใช้ AI โดยไม่มี Harness

1❌ ใส่ prompt สั้นๆ แล้วดูว่าได้อะไร
2❌ "ช่วยเขียน Python script หน่อย"
3❌ "สรุปข้อความนี้ให้หน่อย"

ผลลัพธ์: ได้มาบ้าง ไม่ได้บ้าง AI บางทีเขียนโค้ดผิด ต้องมานั่งแก้ไขเองเยอะ

ปัจจุบัน: ใช้ Qwen (Alibaba) พร้อม Flow

1✅ กำหนด Orchestration → "งานนี้ต้องทำอะไรบ้าง"
2✅ กำหนด Runtime → "มีเงื่อนไขอะไรต้องระวัง"
3✅ กำหนด Model Interface → "output ต้องออกมาในรูปแบบไหน"

ผลลัพธ์: พอใจ 90% — AI ให้สิ่งที่ต้องการมากขึ้น แก้ไขน้อยลง

“Flow แล้ว” = มีขั้นตอนชัดเจน ไม่ใช่แค่ “ถามๆๆ ไปเรื่อย”

การเดินทาง: ไม่มี Harness → มี Flow → มี Guardrails

 1Stage 1: ไม่มี Harness
 2- ใช้ AI ตามใจ
 3- ผลลัพธ์ไม่แน่นอน
 4- เสียเวลาแก้ไขเยอะ
 5
 6Stage 2: มี Flow
 7- มีขั้นตอนชัดเจน
 8- ผลลัพธ์ดีขึ้น
 9- เสียเวลาน้อยลง
10
11Stage 3: มี Guardrails
12- มีระบบป้องกัน
13- ผลลัพธ์น่าเชื่อถือ
14- เสียเวลาน้อยมาก

🔄 ตอนต่อไป: Harness Components — ระบบอวัยวะของ AI

ตอนนี้เราเข้าใจแล้วว่า:

✅ Prompt คือ Control Plane
✅ Prompt Layering มี 3 ระดับ
✅ Guardrails, Memory, Retry Logic สำคัญอย่างไร

แล้ว Harness Components คืออะไร?

Harness มี “อวัยวะ” หลายอย่างที่ทำงานร่วมกัน:

Control Plane — Prompt ที่เราเพิ่งคุยกัน
Query Loop — หัวใจที่สูบฉีดงาน
Tools & Permissions — มือที่ทำงาน
Memory & Context — สมองที่จำ
Recovery Paths — ระบบภูมิคุ้มกัน

ในตอนต่อไป เราจะมาเจาะลึกแต่ละ “อวัยวะ” ว่าทำงานอย่างไร และทำไมต้องมี

ติดตามตอนต่อไปได้เลย 🚀

📚 อ้างอิง

แหล่งหลัก:

Harness Books 2 เล่ม โดย wquguru

Book 1: Claude Code Harness
Book 2: Claude Code vs Codex
Online Version: harness-books.agentway.dev

แหล่งเสริม:

Stanford HAI - Prompt Engineering Limitations (2025)
Morph LLM - Codex vs Claude Code Benchmarks
Anthropic - Constitutional AI
Google - Chain of Thought Prompting
OpenAI - Templatized Prompt Engineering