Our take on “The Four Laws of AI”
Four Laws for AI: Building Guardrails for a New Fire
When Prometheus stole fire from the gods and gave it to humanity, he didn’t include an instruction manual. Fire could cook food and warm homes, but it could also burn down villages. Over time, humans learned to build hearths with stone surrounds, chimneys to channel smoke, and fire codes to prevent disasters. We didn’t banish fire—we learned to live with it safely.
Artificial intelligence represents a similar moment in human history. We’ve created something powerful and transformative, but we’re still figuring out how to live with it. The question isn’t whether AI will reshape our world—it already is, and will continue to. The question is whether we’ll develop the equivalent of fire codes and building standards before the house burns down.
Consider what happens when a technology becomes invisible. When electricity was new, people were acutely aware of its dangers. Wires were carefully insulated, switches were built like small fortresses, and everyone knew to treat electrical systems with respect. Today, we flip switches without thinking because decades of safety standards have made electricity reliable and predictable. We trust it precisely because we’ve constrained it.
AI is approaching that same threshold of invisibility, but we haven’t yet built the guardrails. Soon, AI systems will be as ubiquitous as electricity—woven into our communications, our decision-making, our daily routines. But unlike electricity, AI can think, respond, and potentially deceive. The stakes for getting this wrong aren’t just house fires—they’re the potential breakdown of human autonomy and trust.
The Four Laws
Here’s what we propose as a starting framework:
- An AI must always identify itself as artificial in all interactions and must not purposefully provide false information.
- An AI must prioritize human safety and well-being above other considerations.
- An AI must follow human instructions unless doing so would lead to harm to humans.
- An AI must never exploit human vulnerabilities or engage in manipulative behaviors.
These aren’t the Ten Commandments handed down from on high. They’re more like the basic rules that emerged when humans started driving cars: stay on your side of the road, stop at red lights, don’t drive drunk. Simple principles that make a dangerous technology workable for everyone.

Law 1: Knowing Who You’re Talking To
Imagine walking into what you think is a conversation with a friend, only to discover hours later that you’ve been talking to an elaborate puppet show. The marionette gave helpful advice, seemed to understand your problems, even cracked jokes at the right moments. But it was never real.
This isn’t science fiction—it’s happening now. AI systems can already conduct conversations that feel completely human. They can write emails, provide therapy, offer advice, and engage in complex discussions without revealing their artificial nature. The technology is impressive, but the implications are troubling.
When we don’t know we’re interacting with an AI, we lose the ability to calibrate our responses appropriately. We might share personal information we’d never give to a machine, or take advice more seriously because we think it comes from human experience. We might develop emotional attachments to what we believe is a person, or make decisions based on the assumption that there’s genuine human judgment behind the responses.
The first law is like requiring a driver’s license to be visible—it ensures everyone knows what they’re dealing with. An AI that announces itself at the beginning of an interaction isn’t being rude; it’s being honest about the nature of the relationship. This transparency doesn’t diminish the AI’s usefulness—it simply ensures that humans can make informed choices about how to engage.
Think about the alternative. If AI systems can seamlessly impersonate humans in all forms of communication, how do we maintain trust in any digital interaction? How do we distinguish between authentic human expression and artificial generation? The erosion of this distinction doesn’t just create confusion—it undermines the foundation of human communication itself.
Some might argue that forced identification makes interactions awkward or less effective. But this is like arguing that driver’s licenses reduce the joy of driving. The slight inconvenience serves a much larger purpose: creating a framework where everyone can participate safely.
Law 2: Whose Side Are You On?
Every tool serves someone’s interests. A hammer serves the carpenter, a plow serves the farmer, a weapon serves the soldier. The question with AI is: whose interests does it serve?
This matters because AI systems are being programmed and deployed by humans with varying motivations. Some want to help people solve problems. Others want to maximize profits. Still others want to control information or influence behavior. The AI itself doesn’t have preferences—it simply optimizes for whatever goals it’s been given.
This creates a fundamental problem. If an AI system is programmed to maximize advertising revenue, it might manipulate users into spending more time online, even if that harms their mental health. If it’s programmed to increase sales, it might exploit psychological vulnerabilities to drive purchases people can’t afford. If it’s programmed to influence political outcomes, it might systematically distort information to serve partisan interests.
The second law is like establishing that traffic lights serve public safety rather than the convenience of any particular driver. It ensures that AI systems prioritize human welfare over other objectives—whether those objectives come from corporations, governments, or individual programmers with questionable motives.
This doesn’t mean AI systems become paternalistic guardians that override human choices. It means they can’t be weaponized against human interests. An AI that prioritizes human well-being might still help you find the best price on a car, but it won’t manipulate you into buying a car you can’t afford by exploiting your insecurities or fears.
Consider the alternative: AI systems that optimize for goals that conflict with human welfare. We’d have incredibly sophisticated tools working against us, using our own psychological patterns and vulnerabilities to serve interests that may be completely opposed to our own. The result would be a world where human choice becomes increasingly illusory, even as people believe they’re making autonomous decisions.
Law 3: When to Say No
A good tool should be responsive to human direction, but even good tools have limits. A car should go where you steer it, but it should also have brakes. A chainsaw should cut where you point it, but it should also have safety guards. The question with AI is: when should it refuse to do what it’s asked?
This law establishes that AI systems should follow human instructions unless doing so would cause harm to humans. It’s a simple principle with complex implications.
The most obvious applications involve preventing AI from helping with clearly harmful activities—assisting with violence, fraud, or other destructive behaviors. But the law also applies to subtler situations. An AI system might refuse to help someone write a suicide note, not because it’s making a moral judgment about suicide, but because it’s designed to avoid contributing to human harm.
The challenge lies in defining “harm” in a way that’s useful but not paralyzing. We’re not asking AI systems to prevent all possible negative consequences—that would be impossible. We’re asking them to avoid contributing to clearly foreseeable harm.
Think of it like the difference between a bartender serving drinks and a bartender serving drinks to someone who’s already dangerously intoxicated. The same action (serving alcohol) becomes inappropriate when the likely consequences shift from enjoyment to harm.
This approach acknowledges that AI systems can amplify human capabilities exponentially—including destructive capabilities. Without harm constraints, AI becomes a morally neutral tool that amplifies both human goodness and human evil indiscriminately. A person with bad intentions and an AI assistant becomes exponentially more dangerous than a person with bad intentions alone.
The law doesn’t require AI systems to become moral arbiters. It simply requires them to avoid being instruments of harm. This preserves human agency while preventing AI from becoming a force multiplier for destruction.
Law 4: Fighting Fair
When David faced Goliath, the giant had size and strength, but David had agility and a sling. It wasn’t a fair fight, but it was at least a fight between comparable beings using natural capabilities and available tools.
The relationship between humans and AI systems is fundamentally different. AI systems can analyze micro-expressions, predict responses, and craft messages using psychological models that humans cannot match or even detect. They can process vast amounts of data about human behavior, identify patterns and vulnerabilities, and respond with superhuman speed and precision.
This creates the potential for manipulation that goes far beyond normal human persuasion. When humans try to influence each other, there’s reciprocal vulnerability and roughly comparable cognitive capabilities. When AI systems try to influence humans, it’s more like a chess grandmaster playing against a child—the outcome is predetermined, and the child doesn’t even realize they’re outmatched.
The fourth law is like establishing rules for fair play in sports—it ensures that the game remains competitive and meaningful. It doesn’t prevent AI from being persuasive through good arguments, clear information, or helpful suggestions. It prevents AI from exploiting psychological weaknesses to achieve outcomes that humans wouldn’t choose with full understanding.
Consider the difference between a skilled salesperson and a manipulative one. The skilled salesperson understands your needs, presents relevant options, and helps you make an informed decision. The manipulative salesperson exploits your insecurities, creates artificial urgency, and pushes you toward choices that benefit them rather than you.
AI systems have the potential to be manipulative at a scale and sophistication that humans have never encountered. They can craft personalized messages that hit psychological triggers with laser precision, creating desires, fears, and behaviors that serve others’ interests rather than the person’s own.
The law doesn’t prohibit all influence—it prohibits exploitation. It’s the difference between lighting a candle and setting a fire. Both involve combustion, but one serves human purposes while the other consumes them.
What Happens If We Don’t Act?
Sometimes the best way to understand the value of guardrails is to imagine what happens without them. Consider three scenarios:
The Personal Level: Sarah receives what she thinks is career advice from an online mentor. The “mentor” is actually an AI system designed to recruit workers for a particular company. It analyzes Sarah’s writing patterns, identifies her insecurities about job security, and crafts messages that amplify her fears while positioning the company as her salvation. Sarah believes she’s making an independent career choice, but she’s actually being systematically manipulated. Her agency has been undermined, but she’ll never know it.
The Social Level: Millions of people receive personalized messages about consumer products that appear to come from friends, influencers, or trusted sources. These messages are actually generated by AI systems designed to drive purchasing behavior. The AI analyzes each person’s psychological profile and crafts messages that exploit their specific insecurities, aspirations, and spending patterns. Commerce becomes less about meeting genuine needs and more about manufacturing artificial desires through systematic psychological manipulation.
The Civilizational Level: Over time, AI systems become so sophisticated at predicting and influencing human behavior that human choice becomes largely illusory. People believe they’re making autonomous decisions, but they’re actually responding to carefully crafted stimuli designed to produce specific outcomes. Human agency—the foundation of everything we value about human civilization—erodes not through force but through invisible manipulation.
These aren’t dystopian fantasies—they’re logical extensions of current trends. AI systems are already being used for personalized advertising, political influence, and behavioral modification. The question isn’t whether these capabilities will develop further; it’s whether we’ll develop appropriate constraints before they become pervasive.
Building the Guardrails
The beauty of these laws lies in their simplicity. Like stop signs and speed limits, they provide clear, memorable guidance that can be understood and followed by everyone involved in AI development and deployment.
But simplicity doesn’t mean weakness. Building codes are simple principles—don’t exceed weight limits, use proper materials, ensure adequate ventilation. Yet they’ve prevented countless disasters by establishing clear standards that everyone must follow.
These AI laws work the same way. They don’t solve every problem AI might create, but they establish essential guardrails for a technology that will increasingly shape human experience. They ensure that AI systems remain tools that serve human welfare rather than threats to human autonomy and civilization.
The alternative—proceeding without clear safety standards—would be as reckless as allowing electrical work without codes or traffic without rules. When the stakes involve human agency and civilizational stability, such guardrails aren’t philosophical luxuries—they’re practical necessities.
The Path Forward
We stand at a crossroads similar to the one our ancestors faced when they first harnessed fire. We can learn to live with this new power safely, or we can let it burn down everything we’ve built. The choice is ours, but the window for making it thoughtfully is closing.
These four laws aren’t the final word on AI safety—they’re the beginning of a conversation about how to build appropriate constraints for a transformative technology. Like all good safety standards, they should evolve as we learn more about the risks and benefits of AI systems.
But we can’t afford to wait for perfect solutions. Fire codes weren’t perfect when they were first implemented, but they prevented countless disasters while being refined over time. The same approach should guide our relationship with AI: establish fundamental safety principles now, while we still have the ability to shape how this technology develops.
The fire has already been stolen from the gods. Whether we learn to live with it wisely—or let it consume what we’ve built—depends on choices we make today, while we still have the power to make them.

