Member-only story
Agents, Planning, Evaluation and AI Index Reports 2025
Started writing again! :)
I recently compiled a personal reading list of standout AI articles and reports — now available as a single PDF. It includes pieces from Chip Huyen, McKinsey, Anthropic, and a great paper on LLM evaluation:
📘 Download the full reading list (PDF)
Key Takeaways:
• Agent Planning Failures: From Chip Huyen’s article, I learned how agents fail at planning — e.g., using invalid tools, wrong parameters, or solving the wrong task. She also offers evaluation metrics and tool selection tips.
• Subjectivity in Evaluation: The EvalGen paper highlights how evaluation criteria evolve as reviewers read more outputs. Evaluation is dynamic — rubrics shouldn’t be static but refined over time.
• AI Usage Trends: Anthropic’s economic index shows that AI is still used in a small fraction of tasks across professions. Learning and direct-use cases are growing, while iterative task use is declining — likely due to model quality improving.