PotentialX

Thursday, Oct 24, 2024

Anthropic introduces computer use πŸ”—

Anthropic has an AI model that can use computers.

It’s pretty basic at this point. Gist of it is you run a VM and feed screenshots to the AI, and it outputs the actions to take in the VM.

This doesn’t seem like a big leap from the models that already have image comprehension, but there’s a precision to computer use that likely had to be trained in.

This unlocks so many things.

Friday, Oct 18, 2024

What's new with Optimus? πŸ”—

Tesla’s Optimus robot is getting better at doing everyday tasks, as seen in this video.

It’s impressive, but I’m not sure why.

It seems like everything Optimus is doing has been done before. Disney has been building robots for decades. So has Boston Dynamics.

Is it the software?

If it is, why is Tesla ahead here? Shouldn’t the robot companies have been working at this for years be way ahead?

I think it’s Elon’s willingness to invest in potential.

Friday, Sep 13, 2024

Is O1 a New Model?

OpenAI’s new o1 model, released yesterday into preview, is impressive. It seems to have the ability to do some new things:

We’ve seen this kind of thing before, in tools like langchain, but they’ve never worked this well.

I’m curious whether o1 is really a single “new model”, or whether it’s a new set of tooling that actually runs a series of requests back through the model until its happy with response.

You can do this yourself now. In ChatGPT, ask the model to make a plan, ask it to execute parts of the plan, feed the response back into it, and you can get it to hone in on a very good response, but it’s the human that’s guiding it.

The real innvoation here may be the tooling to orchestrate this process.

Saturday, Sep 7, 2024

OpenAI O1 Model Preview πŸ”—

New drop from OpenAI.

We trained these models to spend more time thinking through problems before they respond, much like a person would. Through training, they learn to refine their thinking process, try different strategies, and recognize their mistakes.

But for complex reasoning tasks this is a significant advancement and represents a new level of AI capability. Given this, we are resetting the counter back to 1 and naming this series OpenAI o1.

It’s interesting how often we find parallels between how the human mind works and how LLMs work.

Walmart β€œ100x more productive” with generative AI πŸ”—

Wal-Mart is letting generative AI edit their product catalog.

We’ve used multiple LLMs to accurately create or improve over 850 million pieces of data in the catalog. Without the use of generative AI, this work would have required nearly 100-times the current head count to complete in the same amount of time.

Wal-Mart has a lot of smart people and I’m sure they’re doing their best here, but it would be really easy to look at the AI “improvements” to a product listing and miss that it actually got something wrong. And there’s no way they’re checking them at that scale.

Would love to know more about how they’re anchoring the AI with some sort of ground truth about the items it’s describing.

Friday, Sep 6, 2024

Studio AI Video Generation Tool πŸ”—

This is fun to play with. The quality of the videos I created wasn’t great, but it’s fast, free, and demos just how easy AI video generation is becoming.

A $2000/month LLM

I don’t want to get into the habit of reporting on rumours, so I’m not linking to an original story here. While there are no definitive sources, there’s bego seren talk of a $2000/month subscription for a new OpenAI LLM.

To justify a price like this, a service would need to do more than chat with you. It would need to be a tool that you can delegate tasks to, and trust that they would get done. It would be an employee, not a chatbot.

I don’t think we’re far from this now. I think three things are missing:

These don’t seem particularly difficult, but in practice the various frameworks we’ve been using (like langchain for example) have fallen short of delivering a truly useful agent. They get lost, go off on tangents, and waste a lot of time and tokens getting nothing useful done.

But it seems to me like we’re not that far from useful agents. Not AGI, but agents that can be productive in limited domains.

Larger context is a big part of this. To run a task to completion will require many steps and all the requests working on it need the output from all the previous steps. Gpt-4o and Llama 3.1 are only 128k, Claude 3 is 200k, while Gemini Pro comes with a 2m context window.

1m is sufficient for some pretty long tasks, but consider the cost of running jobs that are approaching these limits. At $1+ per million tokens, an agent looping on tasks will be expensive to run. Hence the $2000/month cost.

Will these agents be competent enough at the jobs we can assign them to justify that cost?

Thursday, Sep 5, 2024

Stable Image Ultra, SD3 Large and Stable Image Core on Amazon Bedrock

Stability AI has released new image generation models, with a new high end model called Stable Image Ultra.

Three of Stability AI’s most advanced text-to-image models – Stable Image Ultra, Stable Diffusion 3 Large and Stable Image Core – are now live in Amazon Bedrock, providing high-speed, scalable, AI-powered visual content creation.

Stable Image Ultra is presented as ideal for “Ultra-realistic imagery for luxury brands and high-end campaigns”. With that description I expected it to be even more expensive than it is. Here is the on-demand per-image cost from their pricing page (US$):

Stable Image Core Β  $0.04
SD3 Large $0.08
Stable Image Ultra Β  $0.14

This may be Stability AI backing away from open source releases, but honestly that seemed like it had to happen at some point. We’re in a bubble where companies are spending up to billions of dollars training models and giving them away for free. That can’t last forever.

Qwen2-VL Vision Language Model released πŸ”—

Although there was some drama around the GitHub repo, the Quen2-VL vision language model is available now.

After a year’s relentless efforts, today we are thrilled to release Qwen2-VL! Qwen2-VL is the latest version of the vision language models based on Qwen2 in the Qwen model families.

Open weights, Apache2 license.

Wednesday, Sep 4, 2024

OpenAI co-founder Sutskever's new safety-focused AI startup SSI raises $1 billion πŸ”—

Seems like a lot of money for a company with 10 employees.

Australian government study claims AI is bad at summarizing information πŸ”—

An Australian study, conducted by Amazon, found:

Artificial intelligence is worse than humans in every way at summarising documents and might actually create additional work for people

I applaud the attempt to study this, but using Llama2-70B, a model that’s over a year old now, tell us little about the current state of the art.

Tuesday, Sep 3, 2024

OpenAI, Adobe, Microsoft support bill requiring watermarks on AI Content πŸ”—

Watermarking AI generated is a reasonable thing to be doing. I look at it like attribution, simply acknowledging that the image was created by AI.

The bill itself (AB 3211) is targeted at deepfakes, and also makes it illegal to produce tools to remove the watermark.

xAI Colossus Cluster is 100k H100 GPUs πŸ”—

I don’t plan to link to tweets very often but sometimes a tweet is the definitive source.

From Elon Musk on X:

This weekend, the @xAI team brought our Colossus 100k H100 training cluster online.

This cluster is used for training AI models, and is one of, if not the, largest in the world.

One H100, for reference, is at least $10k. That’s over a billion dollar cluster.

And they’re expecting it to double in 100 days.

AI is not happy to see you πŸ”—

I love this quote from Ted Chiang’s New Yorker article, “Why A.I. Isn’t Going to Make Art”:

It is very easy to get ChatGPT to emit a series of words such as β€˜I am happy to see you.’ There are many things we don’t understand about how large language models work, but one thing we can be sure of is that ChatGPT is not happy to see you. A dog can communicate that it is happy to see you, and so can a prelinguistic child, even though both lack the capability to use words. ChatGPT feels nothing and desires nothing, and this lack of intention is why ChatGPT is not actually using language.

Whether “actually using langauge” is technically correct or not, we know there’s no sentiment, as we understand it, behind those generated words.

Generative Doom πŸ”—

The way I’ve been imagining the use of AI in game development is that it would create assets for a game, but there would be a traditional game engine running the game.

The generative game engine used for this Doom example, GameNGen, skips all that and goes straight to generating the output pixels. No 3D models, no renderer in the traditional sense. It’s quite a leap.

We present GameNGen, the first game engine powered entirely by a neural model that enables real-time interaction with a complex environment over long trajectories at high quality. GameNGen can interactively simulate the classic game DOOM at over 20 frames per second on a single TPU.

More from the authors at the GameNGen github page.

First Post

School starts today. Seems like a good day to start something new.