Update: I believe that some forms of AI vision are already built into some systems, and that limited forms of this even exist in quite simple and cheap home camera systems.
I wonder what the potential is for multimodal LLMs to be built into national CCTV systems (of course it’s coming). Multimodal surveillance.
Existing systems like GPT-4o require you to point a camera at a scene in order to have the AI interpret what’s happening. What happens when we build it into cameras that are looking at everything, all the time?
I can imagine many possibilities, both good and bad.
- Someone goes into cardiac arrest and the AI recognises it for what it is and dispatches paramedics immediately.
- Someone is mugged and the system alerts the police, and then tracks the perpetrator as they move away from the scene.
- The systems identifies people who “don’t look like they belong here” and someone is sent around to investigate.
Do we want more of this, or less? I’m on the fence. Yes, of course there are nightmare scenarios. But there is also the potential for an enormous amount of good. It all depends on who is in charge of the system.
Which means we need to have stable, accountable, and ethical systems of government.
Oh right. Now I can see where the plan falls apart. Once again, it’s not the technology that’s the problem. It’s the people in charge of the technology.