PotentialX

Friday, Dec 12, 2025

OpenAI and Disney Character Licensing ๐Ÿ”—

OpenAI has a $1b deal with Disney to license characters. They’re saying it’s for Sora, but I’m guessing there’s another angle to this.

They’ve been working with Jony Ive on a new hardware-based AI product. They haven’t announced a form factor, but have said it will not have a screen.

I think they’re going to create AI “characters” that you’ll be able to buy, and that will embody the voice and personality of the character.

I expected to see them doing this with celebrities and sports and historical figures - buy an “Albert Einstein” figure that you can chat with, for example - but I didn’t expect to see Disney sign up for this. They’re putting a lot of trust in OpenAI.

Thursday, Dec 11, 2025

ChatGPT 5.2 Released ๐Ÿ”—

ChatGPT 5.2 was released today. Whether you see this as a response to other models quickly eclipsing 5.1, or just the natural progression, it was a pretty quick turnaround.

The knowledge cutoff is August 31, 2025, which means this model went from starting training (presumably) to shipped in about 3 months.

Sunday, Dec 7, 2025

VibeVoice 0.5b on macOS

Microsoft has a new TTS model in the VibeVoice series, VibeVoice-Realtime-0.5B. Here’s a quick guide on how to try it out on macOS.

Clone the vibevoice-community/VibeVoice git repo.

cd into the directory and:

python3.13 -m venv .venv
source .venv/bin/activate
pip install -e .
export MODEL_PATH=microsoft/VibeVoice-Realtime-0.5B
MODEL_DEVICE=mps python -m uvicorn demo.web.app:app

If this all works, open http://localhost:8000 and try it out.

Tuesday, Dec 2, 2025

Apple VP of AI Retiring

Apple’s Senior VP of Machine Learning and AI Strategy, John Giannandrea, is retiring, and being replaced with Amar Subramanya who recently worked at Microsoft, but has been with Google for 16 years and worked on the Gemini assistant.

Apple is in a weird place with AI. They recognized early on the value of machine learning, and their strategy has been to bake it into the products to produce valuable features, without really identifying it as AI.

They have a great architecture for it. They’ve been building in AI acceleration hardware for years. The photos and camera pipeline specifically makes heavy use of machine learning. Other areas include Touch ID and Face ID, the rich handwriting recognition and math features in Notes, health features in the Apple Watch, even battery charging optimization.

The unified memory architecture that once seemed like an architectural mistake, has accidentally turned out to be exactly what you need for big LLMs.

But generative AI caught them by surprise, and then culturally they didn’t know what to do with it.

Apple has always been a very “human” company, and features that replace human creativity with AI are just culturally not aligned with who they are.

They’ve been flailing for the last 2 years. Siri’s efforts keep getting rebooted and retooled, and the main interface that people use for AI today - a chatbot - is completely missing.

They’re also paying for some bad architectural decisions with Siri. Best-in-class AI is not compatible with privacy. It’s just not. For the frontier models, you need hardware that’s bigger than what fits in a phone, so you have to either limit yourself to what you can run on a phone, or do it remotely.

Apple has tried to find a middle ground with Private Cloud Compute, a clever solution that runs the LLM in the cloud on essentially a beefy Mac that they trust with your sensitive information. Problem is, PCC is still fairly restricted compared to what top-tier LLMs run on, and until we actually get a new Siri, I’m not sure it’s being used for much of anything.

But Apple has potential.

I think Apple has more potential here than either Microsoft or Google does, because they own more of the stack. If you’re in the Apple ecosystem, you’re using an Apple desktop OS, an Apple phone OS, and Apple’s cloud services. Google doesn’t (really) have a desktop OS, and Microsoft doesn’t have a phone.

What Apple needs is a vision, and for someone to reset their culture on AI and privacy.

Microsoft has made culture shifts before. They famously missed the Internet, until Bill Gates redirected them. But they also missed mobile, and simply surrendered. I don’t know what Amar Subramanya will have gotten from his time at Microsoft, but I’m guessing his short tenure means it didn’t click with him.

His much longer tenure at Google is more interesting. Google made the right decisions with AI. They invented the GPT architecture we’re all using now. After some of their own early flailing, they have a solid, integrated direction, much better than the Copilot strategy that Microsoft seems to be taking. Both Copilot and Siri are brands more than products; Gemini is one textbox that you can use to do everything.

It’s not too late for Apple to bring all the pieces together into a coherent whole, but it’s both a technical and a cultural task inside Apple. I wish Amar Subramanya good luck in pulling it off.

Saturday, Nov 29, 2025

Z-Image-Turbo Released ๐Ÿ”—

Alibaba’s new open-source image generation model is a nice surprise. It’s the best open-source image generation model, and this is the small version of a bigger one they’re also promising to open source in the future.

And it’s small. 6b parameters. Runs nicely on macOS in ComfyUI.

Tuesday, Nov 25, 2025

FLUX.2 Model Released ๐Ÿ”—

Black Forest Labs introduces a new open-source image generation and image editing model, FLUX.2.

In a side by side comparison with Nano Banana 2 and FLUX.1 Kontext, their previous model it doesn’t always win, but it’s a solid showing for an open-source model.

32 billion parameters; quantized to 8 bit this might run on a 24gb or 32gb VRAM / unified memory computer.

Monday, Nov 24, 2025

Claude Opus 4.5 Released ๐Ÿ”—

Ending this release season we have Claude Opus 4.5, claiming to be the best coding model, a crown it only lost a few a few days ago.

I appreciate that Anthropic didn’t pre-announce anything, just “here it is, available today”.

Friday, Nov 21, 2025

Dramatic improvement in Time to First Token on M5 iPad ๐Ÿ”—

Apple’s unified memory architecture is great for LLMS (and starting to show up in the PC world), but Apple had an achilles heel: the time it took to generate the first output token with a long prompt.

This was due to the lack of an accelerated matmul instruction, and that was allegedly addressed in the M5. I say allegedly, because the software support to expose wasn’t there until now.

Thursday, Nov 20, 2025

Nano Banana Pro ๐Ÿ”—

This week of big AI releases continues with Nano Banana Pro, the sequel to Google’s Nano Banana.

The usual pattern in AI is that the big companies leapfrog each other - a company is only #1 until the next release. But with image generation, Google took first place by a solid margin, and now is pulling even farther ahead.

Wednesday, Nov 19, 2025

GPT 5.1 Codex Max ๐Ÿ”—

OpenAI has announced GPT 5.1 Codex Max, an update to their Codex coding product, “built for long-running, detailed work”.

They’re claiming a big jump on SWE-Bench Verified, past Anthropic’s Claude Sonnet 4.5, which has been the one to beat in coding performance. Looking forward to some real world usage.

Tuesday, Nov 18, 2025

Google Gemini 3 is Out ๐Ÿ”—

Gemini 3 seems like a significant release, and Google is deploying it across their whole product suite today.

I find it fascinating that it beats all the other frontier models in every benchmark, except SWE-Verified where it’s just barely behind Claude Sonnet 4.5.

Monday, Nov 17, 2025

Grok 4.1 is Out ๐Ÿ”—

Grok 4.1 dropped today. They’re claiming a big jump on the usual benchmarks, but not always ahead of ChatGPT 5.1 and Gemini 3 is also right around the corner. Should be an interesting week.

Sunday, Nov 16, 2025

Maya1 TTS ๐Ÿ”—

Maya Research has released Maya1, a text to speech model with a new capability: You describe the voice you want using text.

“Create any voice you can imagine โ€” a 20s British girl, an American guy, or a full-blown demon”

I updated the sample script to work on macOS, you can find that here.

Saturday, Nov 15, 2025

Detailed post of Apple M-series chips with LLM performance. ๐Ÿ”—

The M5 is listed in one of the comments as 153Gb/s. I don’t think this takes into account the improvement in time-to-first-token from the improved matmul that they added to the A19.

The Best Local LLMs To Run On Every Mac ๐Ÿ”—

Good research, and a good list of models, sorted by the amount of memory you need to run them. A bit dated for the very large models with so many good ones being released in the latter part of this year, but for people with average an average Mac, it’s still relevant.

Reddit vs Claude Code

There have been a lot of posts in the r/ClaudeCode subreddit about essentially two things:

I use Claude Code every day, and have been for months. And there’s the thing: I haven’t seen either of these problems, at all.

There are a few possible explanations.

One is that I’m not using it as intensely or for the kinds of work that are showing these problems. I don’t buy that, because I know my own usage isn’t “cutting edge” but it’s also not nothing. I think I do more with it than most people do.

Another explanation is that it’s simply a bug. Some users are getting routed to infrastructure that’s miscounting tokens, or is using lower-quality models. That’s possible, and Anthropic did offer that as an explanation for some issues, but it didn’t really cover a lot of what people were complaining about.

The last explanation is that these posts were not genuine. That someone was attacking public perception of Claude Code for some reason.

This seems likely to me.

One of the reasons is that these comments showed up around the same time, and then just stopped. Maybe Claude was super-broken for a while and they fixed it, but as I said, I just didn’t see that, and even while the negative posts were happening, there was the occasional post saying “huh? works fine here”.

I won’t speculate on who might have been behind this, simply because I don’t know the politics or the players well enough to be able to guess. It could have just been one guy with an n8n instance who was having a bad day.

I still use it every day. I just got Claude to build me a posting interface for Hugo to make updating this site easier, and it made a working site first try, and personalized it to my workflow with a couple more prompts.

Tuesday, Jun 10, 2025

o3-pro and o3 price drop

Some OpenAI changes today: o3-pro is replacing o1-pro, and o3 is seeing an 80% price drop.

Thursday, Jun 5, 2025

Gemini 2.5 Pro 06-05 ๐Ÿ”—

Less than a month after the last preview we have another update to Google’s Gemini Pro model.

You can try it out on AI Studio.

A significant new feature this time is audio-to-audio, similar to ChatGPT’s Advanced Voice mode, and AI Studio also lets you have a screen-sharing session with the model as you chat.

Thursday, May 29, 2025

FLUX.1 Kontext

Black Forest Labs’ FLUX.1 Kontext new image generation and edit model feels like a big leap forward in usability. Image generation is one thing, but the image is almost never exactly what you want. Being able to provide text instructions that do a good job of preserving the original image, while making the edits you asked for, is what makes it unique.

Wednesday, May 21, 2025

Google I/O 2025

I’m watching The Verge’s Google I/O 2025 keynote in 32 minutes and after about 10 minutes I’m already overwhemed. So much new stuff.

Not all of it is available now - in fact most of it is “coming soon” - but there’s so much there.

One of the reasons I started this site is to give myself a place to take notes on the things I discover. I haven’t been doing a good job of that, but I will be writing more about the individual announcements.

Friday, Dec 6, 2024

ChatGPT Pro ๐Ÿ”—

OpenAI announces ChatGPT Pro for $199/month.

They weren’t really clear on how it’s actually better than the $19/month regular subscription. You do get unlimited access, so if you really do depend on it for constant daily use, maybe it’s worth it.

$199/month is $2388/year, and you can buy a rig that can run Llama 3.3 locally on an unlimited basis for that.

Thursday, Nov 28, 2024

LLaMa Mesh in Blender ๐Ÿ”—

Integration of LLaMa Mesh (an AI 3d objedct generator) into Blender.

This is the 3D equivalent of integrating stable diffusion into image apps like generative fill or remove object. This specific implementation is pretty basic, but you can see the potential.

Thursday, Oct 24, 2024

Anthropic introduces computer use ๐Ÿ”—

Anthropic has an AI model that can use computers.

It’s pretty basic at this point. Gist of it is you run a VM and feed screenshots to the AI, and it outputs the actions to take in the VM.

This doesn’t seem like a big leap from the models that already have image comprehension, but there’s a precision to computer use that likely had to be trained in.

This unlocks so many things.

Friday, Oct 18, 2024

What's new with Optimus? ๐Ÿ”—

Tesla’s Optimus robot is getting better at doing everyday tasks, as seen in this video.

It’s impressive, but I’m not sure why.

It seems like everything Optimus is doing has been done before. Disney has been building robots for decades. So has Boston Dynamics.

Is it the software?

If it is, why is Tesla ahead here? Shouldn’t the robot companies have been working at this for years be way ahead?

I think it’s Elon’s willingness to invest in potential.

Friday, Sep 13, 2024

Is O1 a New Model?

OpenAI’s new o1 model, released yesterday into preview, is impressive. It seems to have the ability to do some new things:

We’ve seen this kind of thing before, in tools like langchain, but they’ve never worked this well.

I’m curious whether o1 is really a single “new model”, or whether it’s a new set of tooling that actually runs a series of requests back through the model until its happy with response.

You can do this yourself now. In ChatGPT, ask the model to make a plan, ask it to execute parts of the plan, feed the response back into it, and you can get it to hone in on a very good response, but it’s the human that’s guiding it.

The real innvoation here may be the tooling to orchestrate this process.