Development Featured SWE-bench: LLM Developer Metrics vs. Hype TL/DR; SWE-bench evaluates current LLMs abilities to complete common developer tasks and shows that, even with task specific agents, today's LLMs are at best ~12.5% successful. We need more benchmarks like these. Real-world benchmarks like these are critical for engineering leaders to understand so we can
Unreal Field Notes: Real Unreal Landscapes Prologue One constant challenge of tinkering with a bunch of different topics is remembering what I read, where I found it, where I was in the process, and all the blind alleys that wasted an evening. A lot of tasks only need to be done once or twice for a
Unreal It's Not Me, It's You—or Screwing up With Unreal Engine Upon being told I was no longer needed at my job of 19 years, I did what any reasonable adult and parent of small children would do. I installed an Unreal development environment and wrote a blog post about it.