Meta Just Dropped Muse Spark and Honestly? The Benchmarks Are Weird

Meta Just Dropped Muse Spark and Honestly? The Benchmarks Are Weird

So Meta launched a new AI model called Muse Spark, and look — I know what you're thinking. Another model? Another acronym? Another CEO standing in front of a slide deck promising the moon?

But hear me out. This one's kind of interesting.

Muse Spark is the firstborn of Meta Superintelligence Labs — yes, that's the real name of the team they assembled after dropping $14 billion on Scale AI nine months ago. It's now live at meta.ai, and it's coming to Facebook, Instagram, and WhatsApp soon, which means your aunt's forwarded memes are about to get analyzed by something smarter than Clippy.

The standout feature is called "Contemplating mode," which is Meta's way of saying "we run a bunch of agents in parallel so the AI thinks harder before it says something stupid." Think of it like having a committee of AI interns tackle your problem simultaneously instead of one confused intern who just Googled everything.

The benchmarks? chef's kiss — Meta's scoring 42.8 on HealthBench Hard while GPT 5.4 sits at 40.1 and Gemini 3.1 Pro wallows at 20.6. On agentic search, Muse hits 74.8. Not bad for a "small and fast" model built in nine months.

But here's the twist. Gemini 3.1 Pro still wins on most categories overall, and Meta knows this. Their own blog basically says "yeah we have gaps in coding and long-horizon agentic stuff." Refreshing honesty or damage control? You decide.

Oh, and in a move that would make the open-source crowd weep, Muse Spark is closed. No weights, no architecture, nothing. After Llama 4's rough reception earlier this year, Meta apparently decided the next chapter of its AI story needs to be written behind locked doors.

The stock jumped 9% on Wednesday though, so investors seem to be buying the hype.

Fun detail: internally it was codenamed "Avocado." I don't know why, but I respect it.


Editor’s note: This article is an original rewrite and analysis based on reported developments.