PotentialX

Tuesday, Jun 10, 2025

o3-pro and o3 price drop

Some OpenAI changes today: o3-pro is replacing o1-pro, and o3 is seeing an 80% price drop.

Thursday, Jun 5, 2025

Gemini 2.5 Pro 06-05 🔗

Less than a month after the last preview we have another update to Google’s Gemini Pro model.

You can try it out on AI Studio.

A significant new feature this time is audio-to-audio, similar to ChatGPT’s Advanced Voice mode, and AI Studio also lets you have a screen-sharing session with the model as you chat.

Thursday, May 29, 2025

Black Forest Labs’ FLUX.1 Kontext new image generation and edit model feels like a big leap forward in usability. Image generation is one thing, but the image is almost never exactly what you want. Being able to provide text instructions that do a good job of preserving the original image, while making the edits you asked for, is what makes it unique.

Wednesday, May 21, 2025

Google I/O 2025

I’m watching The Verge’s Google I/O 2025 keynote in 32 minutes and after about 10 minutes I’m already overwhemed. So much new stuff.

Not all of it is available now - in fact most of it is “coming soon” - but there’s so much there.

One of the reasons I started this site is to give myself a place to take notes on the things I discover. I haven’t been doing a good job of that, but I will be writing more about the individual announcements.

Friday, Dec 6, 2024

ChatGPT Pro 🔗

OpenAI announces ChatGPT Pro for $199/month.

They weren’t really clear on how it’s actually better than the $19/month regular subscription. You do get unlimited access, so if you really do depend on it for constant daily use, maybe it’s worth it.

$199/month is $2388/year, and you can buy a rig that can run Llama 3.3 locally on an unlimited basis for that.

Thursday, Nov 28, 2024

LLaMa Mesh in Blender 🔗

Integration of LLaMa Mesh (an AI 3d objedct generator) into Blender.

This is the 3D equivalent of integrating stable diffusion into image apps like generative fill or remove object. This specific implementation is pretty basic, but you can see the potential.

Thursday, Oct 24, 2024

Anthropic introduces computer use 🔗

Anthropic has an AI model that can use computers.

It’s pretty basic at this point. Gist of it is you run a VM and feed screenshots to the AI, and it outputs the actions to take in the VM.

This doesn’t seem like a big leap from the models that already have image comprehension, but there’s a precision to computer use that likely had to be trained in.

This unlocks so many things.

Friday, Oct 18, 2024

What's new with Optimus? 🔗

Tesla’s Optimus robot is getting better at doing everyday tasks, as seen in this video.

It’s impressive, but I’m not sure why.

It seems like everything Optimus is doing has been done before. Disney has been building robots for decades. So has Boston Dynamics.

Is it the software?

If it is, why is Tesla ahead here? Shouldn’t the robot companies have been working at this for years be way ahead?

I think it’s Elon’s willingness to invest in potential.

Friday, Sep 13, 2024

Is O1 a New Model?

OpenAI’s new o1 model, released yesterday into preview, is impressive. It seems to have the ability to do some new things:

Explain it’s planning.
Read and adjust its own responses.

We’ve seen this kind of thing before, in tools like langchain, but they’ve never worked this well.

I’m curious whether o1 is really a single “new model”, or whether it’s a new set of tooling that actually runs a series of requests back through the model until its happy with response.

You can do this yourself now. In ChatGPT, ask the model to make a plan, ask it to execute parts of the plan, feed the response back into it, and you can get it to hone in on a very good response, but it’s the human that’s guiding it.

The real innvoation here may be the tooling to orchestrate this process.

Saturday, Sep 7, 2024

OpenAI O1 Model Preview 🔗

New drop from OpenAI.

We trained these models to spend more time thinking through problems before they respond, much like a person would. Through training, they learn to refine their thinking process, try different strategies, and recognize their mistakes.

But for complex reasoning tasks this is a significant advancement and represents a new level of AI capability. Given this, we are resetting the counter back to 1 and naming this series OpenAI o1.

It’s interesting how often we find parallels between how the human mind works and how LLMs work.

Walmart “100x more productive” with generative AI 🔗

Wal-Mart is letting generative AI edit their product catalog.

We’ve used multiple LLMs to accurately create or improve over 850 million pieces of data in the catalog. Without the use of generative AI, this work would have required nearly 100-times the current head count to complete in the same amount of time.

Wal-Mart has a lot of smart people and I’m sure they’re doing their best here, but it would be really easy to look at the AI “improvements” to a product listing and miss that it actually got something wrong. And there’s no way they’re checking them at that scale.

Would love to know more about how they’re anchoring the AI with some sort of ground truth about the items it’s describing.

Friday, Sep 6, 2024

Studio AI Video Generation Tool 🔗

This is fun to play with. The quality of the videos I created wasn’t great, but it’s fast, free, and demos just how easy AI video generation is becoming.

A $2000/month LLM

I don’t want to get into the habit of reporting on rumours, so I’m not linking to an original story here. While there are no definitive sources, there’s bego seren talk of a $2000/month subscription for a new OpenAI LLM.

To justify a price like this, a service would need to do more than chat with you. It would need to be a tool that you can delegate tasks to, and trust that they would get done. It would be an employee, not a chatbot.

I don’t think we’re far from this now. I think three things are missing:

A “run loop” that gives the LLM the ability to figure out what needs to be done next, and do it.
The ability for an LLM to take actions on your behalf.
An oversight framework, where a human can instruct, monitor and approve actions.

These don’t seem particularly difficult, but in practice the various frameworks we’ve been using (like langchain for example) have fallen short of delivering a truly useful agent. They get lost, go off on tangents, and waste a lot of time and tokens getting nothing useful done.

But it seems to me like we’re not that far from useful agents. Not AGI, but agents that can be productive in limited domains.

Larger context is a big part of this. To run a task to completion will require many steps and all the requests working on it need the output from all the previous steps. Gpt-4o and Llama 3.1 are only 128k, Claude 3 is 200k, while Gemini Pro comes with a 2m context window.

1m is sufficient for some pretty long tasks, but consider the cost of running jobs that are approaching these limits. At $1+ per million tokens, an agent looping on tasks will be expensive to run. Hence the $2000/month cost.

Will these agents be competent enough at the jobs we can assign them to justify that cost?

Thursday, Sep 5, 2024

Stable Image Ultra, SD3 Large and Stable Image Core on Amazon Bedrock

Stability AI has released new image generation models, with a new high end model called Stable Image Ultra.

Three of Stability AI’s most advanced text-to-image models – Stable Image Ultra, Stable Diffusion 3 Large and Stable Image Core – are now live in Amazon Bedrock, providing high-speed, scalable, AI-powered visual content creation.

Stable Image Ultra is presented as ideal for “Ultra-realistic imagery for luxury brands and high-end campaigns”. With that description I expected it to be even more expensive than it is. Here is the on-demand per-image cost from their pricing page (US$):


Stable Image Core	$0.04
SD3 Large	$0.08
Stable Image Ultra	$0.14

This may be Stability AI backing away from open source releases, but honestly that seemed like it had to happen at some point. We’re in a bubble where companies are spending up to billions of dollars training models and giving them away for free. That can’t last forever.

Qwen2-VL Vision Language Model released 🔗

Although there was some drama around the GitHub repo, the Quen2-VL vision language model is available now.

After a year’s relentless efforts, today we are thrilled to release Qwen2-VL! Qwen2-VL is the latest version of the vision language models based on Qwen2 in the Qwen model families.

Open weights, Apache2 license.

Wednesday, Sep 4, 2024

OpenAI co-founder Sutskever's new safety-focused AI startup SSI raises $1 billion 🔗

Seems like a lot of money for a company with 10 employees.

Australian government study claims AI is bad at summarizing information 🔗

An Australian study, conducted by Amazon, found:

Artificial intelligence is worse than humans in every way at summarising documents and might actually create additional work for people

I applaud the attempt to study this, but using Llama2-70B, a model that’s over a year old now, tell us little about the current state of the art.

Tuesday, Sep 3, 2024

OpenAI, Adobe, Microsoft support bill requiring watermarks on AI Content 🔗

Watermarking AI generated is a reasonable thing to be doing. I look at it like attribution, simply acknowledging that the image was created by AI.

The bill itself (AB 3211) is targeted at deepfakes, and also makes it illegal to produce tools to remove the watermark.

xAI Colossus Cluster is 100k H100 GPUs 🔗

I don’t plan to link to tweets very often but sometimes a tweet is the definitive source.

From Elon Musk on X:

This weekend, the @xAI team brought our Colossus 100k H100 training cluster online.

This cluster is used for training AI models, and is one of, if not the, largest in the world.

One H100, for reference, is at least $10k. That’s over a billion dollar cluster.

And they’re expecting it to double in 100 days.

AI is not happy to see you 🔗

I love this quote from Ted Chiang’s New Yorker article, “Why A.I. Isn’t Going to Make Art”:

It is very easy to get ChatGPT to emit a series of words such as ‘I am happy to see you.’ There are many things we don’t understand about how large language models work, but one thing we can be sure of is that ChatGPT is not happy to see you. A dog can communicate that it is happy to see you, and so can a prelinguistic child, even though both lack the capability to use words. ChatGPT feels nothing and desires nothing, and this lack of intention is why ChatGPT is not actually using language.

Whether “actually using langauge” is technically correct or not, we know there’s no sentiment, as we understand it, behind those generated words.

Generative Doom 🔗

The way I’ve been imagining the use of AI in game development is that it would create assets for a game, but there would be a traditional game engine running the game.

The generative game engine used for this Doom example, GameNGen, skips all that and goes straight to generating the output pixels. No 3D models, no renderer in the traditional sense. It’s quite a leap.

We present GameNGen, the first game engine powered entirely by a neural model that enables real-time interaction with a complex environment over long trajectories at high quality. GameNGen can interactively simulate the classic game DOOM at over 20 frames per second on a single TPU.

More from the authors at the GameNGen github page.

First Post

School starts today. Seems like a good day to start something new.