Isn’t it just the training that costs a lot in comparison to running it? So if they stopped training new models and just sold their current ones it might be able to run at a profit from year to year, after a huge initial investment that may or may not ever be recovered.
Not really. The infrastructure to run language models is hardware wise demanding, consumes vast amounts of water (cooling) and electricity and requires frequent renewing and added capacity.
This is the difference between running and training them. I can generate text output from an LLM faster than I can read it. But I am not training a new model any time soon.
Your PC is able to run comparatively small language models with limited number of parameters (a few billions). You can even do that on a modern mobile phone. Some of the more advanced models use hundreds of billions or even trillions of parameters. It’s not just the training. It’s more to do with model complexity.
You can also do weather forecasting on a PC, but the result will not be comparable to a forecast made with a supercomputer.
–Edit–
Typo fixes, because while even the most whimsy LLMs can write, apparently I cannot.
Isn’t it just the training that costs a lot in comparison to running it? So if they stopped training new models and just sold their current ones it might be able to run at a profit from year to year, after a huge initial investment that may or may not ever be recovered.
Not really. The infrastructure to run language models is hardware wise demanding, consumes vast amounts of water (cooling) and electricity and requires frequent renewing and added capacity.
My PC can run some LLMs
Ok, so all we need is your PC. Maybe you should contact Nvidia and tell that to them.
This is the difference between running and training them. I can generate text output from an LLM faster than I can read it. But I am not training a new model any time soon.
Your PC is able to run comparatively small language models with limited number of parameters (a few billions). You can even do that on a modern mobile phone. Some of the more advanced models use hundreds of billions or even trillions of parameters. It’s not just the training. It’s more to do with model complexity.
You can also do weather forecasting on a PC, but the result will not be comparable to a forecast made with a supercomputer.
–Edit–
Typo fixes, because while even the most whimsy LLMs can write, apparently I cannot.